How to Self-Host Your Own Private AI Stack

Techno Tim Tinkers

Додати в
- Мій плейлист
- Переглянути пізніше
Поділитися

Поділитися

Вставка

Розмір відео:

Показувати елементи керування програвачем

Автоматичне відтворення

Автоповтор

Опубліковано 1 гру 2024

КОМЕНТАРІ • 138

@W1ldTangent 4 місяці тому ⁺⁹
Hi! I'm one of the contributors to the Open WebUI project, thanks for the mention!
@TechnoTimTinkers 4 місяці тому ⁺¹
Thanks for contributing to such an awesome project!
@shaswatamondal9786 4 місяці тому ⁺³
This guy just knows all ! I am watching all of his videos for almost two years and finally networking and hypervisor are getting easy. I just needed this detailed discussion about personalised AI. ❤
@BrokenGlytch 4 місяці тому ⁺²
Thanks for covering this so in-depth Tim. I've been working on a similar project based on a Dell R730 with 2x Tesla P40 cards...much more power hungry than your setup, but will spit out answers from 70b and complex Stable Diffusion workloads as fast as the small models generate on a 3090. I was running into some issues with getting things integrated in the same stack, so great to see how you've put this all together....I have some rebuilding to do.
@onpointfocusfilms9729 3 місяці тому
what cpus are you using with yours? Do you think I will have any issues with the dual 2660v4 or go with ones that have higher single cpu clock speed. I'll be using a single rtx 3060 for now since I already have it. Small models of course.
@tmaris 4 місяці тому ⁺³
Well there's not a chance that I'll recreate this beast of a configuration but it was a fun watch! Thank you
@henrysowell 4 місяці тому ⁺²
Stoked that you put this together. Thanks Tim!
@jamiesmithnc 3 місяці тому ⁺²
Thanks for another great video!
For anyone else trying to get this working without an NVIDIA card and getting "Found no NVIDIA driver on your system" errors: you can add --cpu to the CLI_ARGS in the compose file (CLI_ARGS=--cpu, yes, it's slow) and, obviously, comment out all of the nvidia/deploy blocks.
I also had to add json to the formats: in searxng/settings.yml to avoid a 403 error w/ web search
@TechnoTimTinkers 3 місяці тому
Thank you!
@moravskyvrabec Місяць тому
Wow, Tim. I'm waiting for the CPU and CPU cooler, the last pieces of my new AI machine, to arrive. After watching this video I feel like my eyes popped open and "whoa, I know AI stack kung fu!" Thanks for doing this.
@Whatsthebesttech 4 місяці тому ⁺¹
I was your 1000th like. Great video
@TechnoTimTinkers 4 місяці тому
Thank you!
@jhon614 3 місяці тому
Great video and awesome guide! I actually got it all working except Whisper, which I wasn't all that worried about, and Home Assistant, which I don't use. But having ollama, webui and stable diffusion running on my own PC with a GPU is a game changer! Thank you!
@TechnoTimTinkers 3 місяці тому
nice work!!!
@CraftComputing 4 місяці тому ⁺²⁰
Oh baby! Time to put some of my Tesla GPUs to work 🙂
@TechnoTimTinkers 4 місяці тому ⁺³
Heck yes!
@cease70 4 місяці тому ⁺⁴
Would definitely be interested in a similar video using Tesla GPUs. Especially if you have multiple generations' worth of Tesla GPUs and can "benchmark" the best bang-for-buck option in the lineup without dipping into performance that is too slow to be acceptable and also without dipping into options that may be out of the price range for some homelabbers/where they'd be better off just buying a newer consumer-grade GPU.
@Techintx 4 місяці тому
Looking forward to a vid on this!
@ProjectKennyB Місяць тому
Would love to see how the Tesla GPUs handle Text2Image with Stable Diffusion using Open WebUI! Running a LLM and ImageGen is taxing on a 6gb card.... in theory a 24gb card would be fantastic, but would love to see some data on it.
@SirJohn2024 4 місяці тому ⁺¹
I'm always surprised when everything just works... Kudos & thanks for the support notes... Great job
@aliaghil1 Місяць тому
What a great video! Thanks for sharing your local, private, self-hosted AI stack! I appreciate the effort and expertise you brought to creating a tutorial on replicating this setup. I'm excited to try it out and learn from your experience.
@TechnoTimTinkers Місяць тому
Thank you!
@PavelMezentsev 4 місяці тому ⁺¹
Thank you for the video, a very nice guide.
One potential thing to play around with is quantization of the models, one can find one that is less quantized but still fits in the memory. For example with gemma2 27b once gets `q4` that takes 16 GB by default but one could get `27b-instruct-q6_K ` that takes ~21 GB and perhaps gives slightly better results. Of course then one has less space to space to host the models for other services like stable diffusion or whisper. One needs to click on tags when picking the model size on ollama website to see the full list.
Another nice potential addition to the stack could be `openedia-speech` to handle text-to-speech. It can be integrated with Open Web UI. Not a must have but complements the stack nicely IMHO.
@NANA.PANPAN 4 місяці тому
Thank you so much for all of the work you put in to making this amazing walkthrough and all of the documentation to go with it!!!
@zerompg 4 місяці тому ⁺³
You *could* run all the AI stuff in an LXC. You can pass the GPUs through to the LXC. The way that I figured out how to do it is mapping the devices in the LXC config and then install (the same!!) drivers on the host and in the LXC. You can actually share the GPUs this way if you wanted.
@williamblair1123 2 місяці тому
I wouldn't advise LXC to anyone. I used for a long time and it when it works, it works well. Overall a pain compared to Docker/Podman
@techlitindia 3 місяці тому
Awesome, such kind of long videos are very helpful for better understanding. I find it very engaging. Thanks brother ❤
@TechnoTimTinkers 3 місяці тому
Thank you! I appreciate you noticing! Glad it helped!
@myopainghan6399 4 місяці тому ⁺³⁵
Dad: we have ai at home
Ai at home:
@ArrowOfTruth01 4 місяці тому ⁺¹
Thank you for posting this! I'm going to give it a try this week :D
@TechnoTimTinkers 4 місяці тому
Let me know how it goes! Full documentation too in the description!
@Shrp91 4 місяці тому ⁺²
Thank you so much for this!!
@TechnoTimTinkers 4 місяці тому
You're welcome!
@jasonperry6046 4 місяці тому ⁺¹²
I hear Raid Owl challenged you to a mini rack challenge. I would love to see a portable build designed to be an extension of my home lab wherever I happen to be.
@TechnoTimTinkers 4 місяці тому
I didn't think he was going to go that hard...
@gr8tbigtreehugger 4 місяці тому
This is everything I want to do! Many thanks!!! 🙏
@coletraintechgames2932 4 місяці тому ⁺¹
Thank you so much for this video!
At 31:37, this is how mine runs because I have a small GPU.
@MsSteganos 4 місяці тому ⁺¹
Awesome video. I am trying to run ollama in kubernetes but now i think it will be easier to run as docker swarn
@TechnoTimTinkers 4 місяці тому
I was going to go that route too but when you consider models are multiple gigs I didn't want to deal with Kubernetes storage for that.
@muddyland6910 4 місяці тому ⁺¹
Surprisingly, I have Ollama running (with Dolphin-phi) on my 3 node, CPU only, Mini PC (16 core, 32GB RAM) K3s nodes :)
I am so invested in the Kubernetes space so it made more sense for me, at the trade off of: My models work SLOW. I am ok with this though. I also love Open WebUI, it works fantastic in a K8s environment.
Cool to see the full setup though, thanks for sharing!
@TechnoTimTinkers 4 місяці тому ⁺¹
That sounds awesome! I did debate making this VM a Kubernetes node and using node selectors to pin workloads to the node with the GPU, but I also didn't want to have to deal with storage since the models are so huge.
@muddyland6910 4 місяці тому ⁺²
@@TechnoTimTinkers that was my plan as well (once I get some hardware for it) but doing docker stacks was def a great alternative, and totally agree with your point of having a central “AI” box is super useful.
@camerontgore 4 місяці тому
This is too cool! Thanks for doing this!!!
@VivekOnAir 4 місяці тому
Thanks mate!. Well explained. Deploying right away.
@Rushil69420 4 місяці тому ⁺²
I got an MLLSE RTX 3060 w/ 12GB of VRAM on Ali Express for the sole purpose of running AI models and it's been great. Very cheap card but solid CUDA compute capability and 12GB VRAM is more than enough for llama 3 (7B) and with quantized models I can really take advantage of it. Power consumption is ridiculous though. If you use AI a lot-- figure out a way to heat your home with compute lol.
If money wasn't an issue I wouldn't even go for a 30/4090 with 24GB VRAM but one of the newer workstation cards. Wendell at Level1 did a good video going over the AI performance of the Ada generation RTX 4000 SFF and it was a little slower but used only a fraction of the power. 2 of those gets you 40GB of VRAM at 140W max which is nuts.
@TechnoTimTinkers 4 місяці тому ⁺³
The 3060 12GB is definitely the value play! (Links in description) The MSI is the one I recommend to anyone who wants to run a stack like this (and more). I have 24Gb but it’s a weird size, not big enough for large models, but too big for small.
@TazzSmk 4 місяці тому ⁺³
or just wait for RTX Quadro A6000 (not the latest Ada), it's same architecture like RTX 3000 series, but gives you 48GB ECC vram on its own, and is fairly power efficient too (no overhead of two gpus waiting for their vram buffers)
@ToucheFarming 3 місяці тому ⁺¹
if you want a cheaper card that works well for this and has the VRAM of a RTX 3090 is the Tesla K80 24GB model, they go for $45 on eBay
@williamblair1123 2 місяці тому
How well? Compared to the 3090 using 12GB?
@mihajlopetkovic2003 4 місяці тому ⁺³⁶
Hahhahaha what a luck, just seen your comment on "Self-Hosted AI That's Actually Useful" this video is soon coming, headed over to this channel and saw uploaded 1 minute ago.
@TechnoTimTinkers 4 місяці тому ⁺⁵
Great timing!
@leaha2357 3 місяці тому ⁺¹
Lovely video, really looking forward to getting stuck into it, but did you run into any issues? Been following on from your web article and the default compose file, as I dont use traefik, but the web search just complains about limiter.toml missing, tried adding the default file from their site, no change, and when trying to do a web search it just gives me 'Expecting Value' the compose file has Searxng exposed on 8081, so I changed the Searxng from 8080 to 8081, but that gives error 403, the url references the server IP address for ease
That and aistack_stable-diffusion-download_1 keeps stopping with no obvious errors
Very new to this, so any helps is appreciated, thanks
@SelfSufficient08 2 місяці тому
I had to drop the whisper section from your compose file and replace it with the one from the repo compose file. without any changes, whisper no longer through a 500 internal error and translate worked fine.
@canoozie 4 місяці тому ⁺¹
Yeah I run 6x RTX A6000 (Ampere generation, same as the 30 series consumer cards) in 2 GPU nodes in my homelab. I don't train, but I do have a bunch of agents, and some automations that run a lot, so parallel compute of AI models is important enough to have spent a mid-sized car's worth on GPUs. EDIT: I'd also suggest gemma2:27b from ollama, it's a great model, better than llama3:8b in my testing (and in some, better than llama3:70b ... i can run both).
@TechnoTimTinkers 4 місяці тому ⁺¹
Nice! SO jealous of the A6000!
@senaris 4 місяці тому ⁺¹
Note: A decent or better GPU is required on the host machine; otherwise, the CPU will be constantly overworked.
@TechnoTimTinkers 4 місяці тому
For sure. Listed a few budget friendly ones in the description. I really hope Intel steps it up soon. Would be awesome to have decent AI on an Intel chip and just use system RAM.
@ZombieLurker 4 місяці тому ⁺¹
Stable diffusion 3 so far kinda sucks for things like body parts. StabilityAI is supposed to be releasing a updated model in the "coming weeks" they said, but most likely will be longer than that from what it sounds like. SDXL is a lot better for generating human body parts, or at least easier to get them too in my experience. At this point, I don't see any reason to use SD3 over SDXL.
Thanks for the tutorial. Still newish to docker, so your examples have helped me understand a lot better. Currently just run Ollama, SearXNG, SD WebUI Forge in a Python Venv on my main desktop with a 3090 TI, since my Proxmox server doesn't have a GPU yet. been waiting for 50 series card to come out before I get another GPU.
Found out about the SearXNG from your other channels video. Makes me wonder if paying for a private search engine, Kagi, is still worth it or not.
@TechnoTimTinkers 4 місяці тому
Thanks for the tips. I am new to SD so I will take your word for it. If you have something running with Proxmox in containers, that's good enough! Docker is really great though for dependencies and even updating the image. Just a quick docker pull and you get the latest version. searXNG is pretty cool too. I need to explore it some more.
@NormNorris 4 місяці тому ⁺¹
I just loaded this up and was getting a 403 Forbidden error when trying web search. I needed to modify Searnxg's settings.yml file to add "- json" under "- html" in the search > formats area.
@mariomenezes 4 місяці тому ⁺²
Maybe a silly tip, but I struggled to make things work and the behavior was inconsistent. However, enabling the processor type as host solved my problems.
@TechnoTimTinkers 4 місяці тому ⁺¹
Ah, yeah, that's a good call. I always do that so that VMs just inherit their capabilities from the host!
@GundamExia88 4 місяці тому
Great video! How do you update the stack? git pull and docker compose pull or?
@TechnoTimTinkers 4 місяці тому
That's right, just pull! Soon I hope to have some CI with Docker stacks so I don't have to SSH in anymore.
@PontyJohnty 4 місяці тому
I use Podman with whisper/piped/openwakeword in home assistant on an old Dell workstation i picked up on ebay for £180 (xeon and 1070ti)
Using the distil medium en model for whisper, voice control is around 2 seconds.
The biggest issue, in my opinion are the wakeword satelites you would need in a medium suzed home. I wonder if you have any experience with this Tim?
@JeffreyKool 3 місяці тому
Great video and very detailed article! I’ve been trying to set up the same stack but on an AMD RX 6750 XT, running ROCm 6.2, with Ubuntu 22.04 (and even tried 24.04).
I’ve successfully set up Docker and Portainer, configured my environment, and managed to get ROCm installed as per the AMD documentation. However, I’m running into issues with getting the AI stack (specifically Stable Diffusion WebUI) to work smoothly on the AMD hardware. I’ve tried using Dockerfiles from both the AUTOMATIC1111 and AbdBarho repositories, but adapting them for ROCm and AMD GPUs is where things get tricky.
It feels like I’m just a Docker image or software update away from getting this working properly. Any tips on how to resolve this or anyone who’s managed to get a similar setup working on AMD hardware?
Thanks for all the hard work you put into this content. It’s been super helpful so far!
@Techonsapevole 4 місяці тому
I did a similar setup with Debian and rocm containers for AMD APU
@JeffreyKool 3 місяці тому
That’s nice. Could you share your setup?
@Act1veSp1n 4 місяці тому
We need more info on Traefic setup. Like the setup from beginning.
@TechnoTimTinkers 4 місяці тому ⁺¹
I have a whole video on it here on the main channel ua-cam.com/video/n1vOfdz5Nm8/v-deo.html
@JacanaProductions Місяць тому
Can you list some business cases for this, great set up and I’m sure cool for tinkering around, but is it financially viable thing to say offer services privately?
@Yanis0071 3 місяці тому
Hey Tim can you build project base on Google coral and docker?
@Feynt 4 місяці тому
Funny, I just set up a Docker compose for ollama and SillyTavern last week. I want to add MEMGPT in there as well. Maybe you could expand this with MEMGPT?
@Beauty.and.FashionPhotographer 4 місяці тому
Could you suggest and recommend Ai setups (not gaming!) to run SD and comfyui or even for "building" LLMs... for 3 different budgets? ... low . medium, high budget,...as in 1-2k, 2k-4k, 4k-10k Dollars ? Would be a great video actually ...
@TechnoTimTinkers 4 місяці тому ⁺¹
If I could find someone to sponsor the builds 😂.
@wilty5 4 місяці тому
Hello Tim, I’m new to this, not computer savvy, where do you suggest I start to learn? Thanks
@Act1veSp1n 4 місяці тому ⁺¹
Why was a VM chosen instead of CT?
@TechnoTimTinkers 4 місяці тому ⁺¹
Great question! Primarily because I want complete isolation and I don't want to run Docker inside of a CT
@kanzzon 4 місяці тому
This is very interesting, Im going to play with it. Lets see how mac behaves with those promts
@REDTRONGUY 4 місяці тому
ever been to a festival at harmony park in Minnesota?
@mathieuleclerc4136 4 місяці тому
An overview of what a local ai can actually do would be great lol. Because I don't want to jump in it without knowing the final goals 😂
@TechnoTimTinkers 4 місяці тому
ua-cam.com/video/GrLpdfhTwLg/v-deo.html
@mathieuleclerc4136 4 місяці тому
@@TechnoTimTinkers thanks man!
@jingles765 2 місяці тому
What is the powerconsumtion of this thing?
@actellimQT 3 місяці тому
for the life of me I can't get the nvidia drivers to take. I've tried both methods a couple times (still very novice at all this so I'm probably missing something obvious). Great walkthrough though!
@justthestuff3324 4 місяці тому
I have a 4090 for gaming local LAN stream gaming and have considered using it as a AI workhorse but the energy it takes to run it on an always on server is keeping me from doing so.
@TechnoTimTinkers 4 місяці тому
It really only uses power when you are processing, otherwise it's idle. Gaming probably uses more if you consider how long you play games vs. an AI task that lasts 5 seconds.
@sassan428 4 місяці тому
@@TechnoTimTinkers What's the idle power? I have been running my 3090 with ollama and docker on my gaming machine. But the idle power really worries me. The annual costs would be a lot.
@TechnoTim 4 місяці тому
@@sassan428 27:25 7W!
@justthestuff3324 4 місяці тому
@@TechnoTimTinkers True. But when im not gaming its turned off completely. But maybe ill give it a shot and use my kill-a-watt and see what it idles at with the 7950x3d. Running it 24/7 seems kinda like a waste :)Thanks for the video! ✌
@shephusted2714 4 місяці тому
I think you need more gpu to run the large models or use fast nvme? there will be more content along these lines - great to see you jumping in and escalating the open source ai violence
@TechnoTimTinkers 4 місяці тому
You do, that's right! Right now you'd need (2) 3090s/4090s to do any kind of training or for larger models. That would give you 48GB, which could fit the 40 GB models.
@the_thunder_god 4 місяці тому
Cool and all, but still something it looks like I can't do with the hardware available to me. My best GPU is in my Gaming PC, a 6700XT...a 3 year old card. My server downstairs is from 11 years ago. Works great for the tasks I currently use it for.
@ozzykampha2776 4 місяці тому
Have you been able to Get RAG working? Mine cant find the Doc in the context
@DanielosCompaneros 4 місяці тому
What about VirGL? Would it still work with docker?
@kaspersergej 12 днів тому
With the current prices for graphics cards, it's probabbly cheaper to pay the subscriptions.
@DevinCuevas 4 місяці тому
Yaaaasss!! 👏👏👏
@swoopedify 4 місяці тому
I'm having trouble with whisper is anyone else getting a 502 Bad gateway error? Docker logs whisper shows CRIT supervisor is running as root. Thats the only error I'm getting
@psi4j 4 місяці тому ⁺¹
Podman > Docker. There’s zero reason to give the container root privileges.
@ElementX32 2 місяці тому
Were you able to achieve this with one GPU? I'm looking at duplicating your setup, but wanted to verify that you're using one GPU. Nm, once I turned the volume up, I got my answer. Had the volume down low as I'm watching this video at 4am trying not to wake up my wife. Lol
@pablo.augusto 4 місяці тому
Great!
@KevinHaeusler 4 місяці тому
Could this run with multi gpu? Like 4x P40s
@sku2007 4 місяці тому
is it possible to run this in windows and with self signed certificate for local use only?
@TechnoTimTinkers 4 місяці тому
Sure, that should work.
@quinxx12 4 місяці тому ⁺¹
Did he just call 256GB RAM regular??
@markirwin3624 4 місяці тому ⁺¹
For a server, yes.
@andrewbennett5733 4 місяці тому ⁺¹
Side quest unlocked!
@TechnoTimTinkers 4 місяці тому ⁺⁵
I have been debating renaming this channel to "Techno Tim Side Quests" 😂
@andrewbennett5733 4 місяці тому
@@TechnoTimTinkers Bonus points: "Techno Tim Side Quests" becomes your third channel and all of the content is generated from your local LLM hahaha
Edit: YOU ALREADY HAVE THREE CHANNELS. Just subbed to Techno Tim Talks. Please let me know if you have more, I want the full collection
@xana1011 4 місяці тому
Do it!@@TechnoTimTinkers
@RazoBeckett. 4 місяці тому
Nice new name @@TechnoTimTinkers
@obiforcemaster 2 місяці тому
If you don't have a beefy Nvidia GPU (3090+), you WILL be frustrated with trying to run any AI locally. If you have an AMD GPU, well, it's possible, but a massive pain in the ass and the results will always be slower than with an Nvidia GPU. Nvidia has the AI space cornered thanks to it's reliance on CUDA.
@confusedbeard69 4 місяці тому
Currently testing with Ollama on my Proxmox-box with a 10400F, it's a tad slow :P
@TechnoTimTinkers 4 місяці тому
I have my value pick in the description. A "cheap" GPU would speed it up tremendously.
@aaronpaulina 4 місяці тому
I bet this guy does a killer Christopher Walken
@NoobMaster2787 4 місяці тому
gtx 3090?
@jahanson 4 місяці тому
Took me forever to find out that I had to enable web search on a chat by chat basis lol
@TechnoTimTinkers 4 місяці тому
Yeah, I think you can set it as the default however they will slow down all of your non web searches
@SpaceGuy101 4 місяці тому
Every time toy called the RTX 3090 "GTX 3090" it just hurt a bit inside. But otherwise cool ideas, thank you.
@TechnoTimTinkers 4 місяці тому
Sorry! I have used GTX cards for 10 years and RTX for 2 🤐
@efeonobrakpeya9883 4 місяці тому
Bro if you're hacked 😂 it's not gonna be pretty. Awesome video otherwise
@astacc 4 місяці тому
ollama/ollama:rocm docker container works great on my RX6700 XT with any model that fits within VRAM that I tested so far
@philtoa334 4 місяці тому
@Damia-cz8og 4 місяці тому
@TechnoTimTinkers Permission denied (tailscale) help fixed
@Damia-cz8og 4 місяці тому
@TechnoTimTinkers please TureNAS Salce problem network help clusterIP loadbalancer fixed yeah fixed IP app k8S please help GUIDE
@TechnoTimTinkers 4 місяці тому
Hey, I don't use k8s in SCALE.
@Damia-cz8og 4 місяці тому
@@TechnoTimTinkers But this kubernetes for apps with total IP configuration what you said recently about the speed so I don't know how you did with the network administration group I need help

Наступне

Автоматичне відтворення

Self-Hosted AI Image Generation - InvokeAI