This guy just knows all ! I am watching all of his videos for almost two years and finally networking and hypervisor are getting easy. I just needed this detailed discussion about personalised AI. ❤
Thanks for covering this so in-depth Tim. I've been working on a similar project based on a Dell R730 with 2x Tesla P40 cards...much more power hungry than your setup, but will spit out answers from 70b and complex Stable Diffusion workloads as fast as the small models generate on a 3090. I was running into some issues with getting things integrated in the same stack, so great to see how you've put this all together....I have some rebuilding to do.
what cpus are you using with yours? Do you think I will have any issues with the dual 2660v4 or go with ones that have higher single cpu clock speed. I'll be using a single rtx 3060 for now since I already have it. Small models of course.
Thanks for another great video! For anyone else trying to get this working without an NVIDIA card and getting "Found no NVIDIA driver on your system" errors: you can add --cpu to the CLI_ARGS in the compose file (CLI_ARGS=--cpu, yes, it's slow) and, obviously, comment out all of the nvidia/deploy blocks. I also had to add json to the formats: in searxng/settings.yml to avoid a 403 error w/ web search
Wow, Tim. I'm waiting for the CPU and CPU cooler, the last pieces of my new AI machine, to arrive. After watching this video I feel like my eyes popped open and "whoa, I know AI stack kung fu!" Thanks for doing this.
Great video and awesome guide! I actually got it all working except Whisper, which I wasn't all that worried about, and Home Assistant, which I don't use. But having ollama, webui and stable diffusion running on my own PC with a GPU is a game changer! Thank you!
Would definitely be interested in a similar video using Tesla GPUs. Especially if you have multiple generations' worth of Tesla GPUs and can "benchmark" the best bang-for-buck option in the lineup without dipping into performance that is too slow to be acceptable and also without dipping into options that may be out of the price range for some homelabbers/where they'd be better off just buying a newer consumer-grade GPU.
Would love to see how the Tesla GPUs handle Text2Image with Stable Diffusion using Open WebUI! Running a LLM and ImageGen is taxing on a 6gb card.... in theory a 24gb card would be fantastic, but would love to see some data on it.
What a great video! Thanks for sharing your local, private, self-hosted AI stack! I appreciate the effort and expertise you brought to creating a tutorial on replicating this setup. I'm excited to try it out and learn from your experience.
Thank you for the video, a very nice guide. One potential thing to play around with is quantization of the models, one can find one that is less quantized but still fits in the memory. For example with gemma2 27b once gets `q4` that takes 16 GB by default but one could get `27b-instruct-q6_K ` that takes ~21 GB and perhaps gives slightly better results. Of course then one has less space to space to host the models for other services like stable diffusion or whisper. One needs to click on tags when picking the model size on ollama website to see the full list. Another nice potential addition to the stack could be `openedia-speech` to handle text-to-speech. It can be integrated with Open Web UI. Not a must have but complements the stack nicely IMHO.
You *could* run all the AI stuff in an LXC. You can pass the GPUs through to the LXC. The way that I figured out how to do it is mapping the devices in the LXC config and then install (the same!!) drivers on the host and in the LXC. You can actually share the GPUs this way if you wanted.
I hear Raid Owl challenged you to a mini rack challenge. I would love to see a portable build designed to be an extension of my home lab wherever I happen to be.
Surprisingly, I have Ollama running (with Dolphin-phi) on my 3 node, CPU only, Mini PC (16 core, 32GB RAM) K3s nodes :) I am so invested in the Kubernetes space so it made more sense for me, at the trade off of: My models work SLOW. I am ok with this though. I also love Open WebUI, it works fantastic in a K8s environment. Cool to see the full setup though, thanks for sharing!
That sounds awesome! I did debate making this VM a Kubernetes node and using node selectors to pin workloads to the node with the GPU, but I also didn't want to have to deal with storage since the models are so huge.
@@TechnoTimTinkers that was my plan as well (once I get some hardware for it) but doing docker stacks was def a great alternative, and totally agree with your point of having a central “AI” box is super useful.
I got an MLLSE RTX 3060 w/ 12GB of VRAM on Ali Express for the sole purpose of running AI models and it's been great. Very cheap card but solid CUDA compute capability and 12GB VRAM is more than enough for llama 3 (7B) and with quantized models I can really take advantage of it. Power consumption is ridiculous though. If you use AI a lot-- figure out a way to heat your home with compute lol. If money wasn't an issue I wouldn't even go for a 30/4090 with 24GB VRAM but one of the newer workstation cards. Wendell at Level1 did a good video going over the AI performance of the Ada generation RTX 4000 SFF and it was a little slower but used only a fraction of the power. 2 of those gets you 40GB of VRAM at 140W max which is nuts.
The 3060 12GB is definitely the value play! (Links in description) The MSI is the one I recommend to anyone who wants to run a stack like this (and more). I have 24Gb but it’s a weird size, not big enough for large models, but too big for small.
or just wait for RTX Quadro A6000 (not the latest Ada), it's same architecture like RTX 3000 series, but gives you 48GB ECC vram on its own, and is fairly power efficient too (no overhead of two gpus waiting for their vram buffers)
Hahhahaha what a luck, just seen your comment on "Self-Hosted AI That's Actually Useful" this video is soon coming, headed over to this channel and saw uploaded 1 minute ago.
Lovely video, really looking forward to getting stuck into it, but did you run into any issues? Been following on from your web article and the default compose file, as I dont use traefik, but the web search just complains about limiter.toml missing, tried adding the default file from their site, no change, and when trying to do a web search it just gives me 'Expecting Value' the compose file has Searxng exposed on 8081, so I changed the Searxng from 8080 to 8081, but that gives error 403, the url references the server IP address for ease That and aistack_stable-diffusion-download_1 keeps stopping with no obvious errors Very new to this, so any helps is appreciated, thanks
I had to drop the whisper section from your compose file and replace it with the one from the repo compose file. without any changes, whisper no longer through a 500 internal error and translate worked fine.
Yeah I run 6x RTX A6000 (Ampere generation, same as the 30 series consumer cards) in 2 GPU nodes in my homelab. I don't train, but I do have a bunch of agents, and some automations that run a lot, so parallel compute of AI models is important enough to have spent a mid-sized car's worth on GPUs. EDIT: I'd also suggest gemma2:27b from ollama, it's a great model, better than llama3:8b in my testing (and in some, better than llama3:70b ... i can run both).
For sure. Listed a few budget friendly ones in the description. I really hope Intel steps it up soon. Would be awesome to have decent AI on an Intel chip and just use system RAM.
Stable diffusion 3 so far kinda sucks for things like body parts. StabilityAI is supposed to be releasing a updated model in the "coming weeks" they said, but most likely will be longer than that from what it sounds like. SDXL is a lot better for generating human body parts, or at least easier to get them too in my experience. At this point, I don't see any reason to use SD3 over SDXL. Thanks for the tutorial. Still newish to docker, so your examples have helped me understand a lot better. Currently just run Ollama, SearXNG, SD WebUI Forge in a Python Venv on my main desktop with a 3090 TI, since my Proxmox server doesn't have a GPU yet. been waiting for 50 series card to come out before I get another GPU. Found out about the SearXNG from your other channels video. Makes me wonder if paying for a private search engine, Kagi, is still worth it or not.
Thanks for the tips. I am new to SD so I will take your word for it. If you have something running with Proxmox in containers, that's good enough! Docker is really great though for dependencies and even updating the image. Just a quick docker pull and you get the latest version. searXNG is pretty cool too. I need to explore it some more.
I just loaded this up and was getting a 403 Forbidden error when trying web search. I needed to modify Searnxg's settings.yml file to add "- json" under "- html" in the search > formats area.
Maybe a silly tip, but I struggled to make things work and the behavior was inconsistent. However, enabling the processor type as host solved my problems.
I use Podman with whisper/piped/openwakeword in home assistant on an old Dell workstation i picked up on ebay for £180 (xeon and 1070ti) Using the distil medium en model for whisper, voice control is around 2 seconds. The biggest issue, in my opinion are the wakeword satelites you would need in a medium suzed home. I wonder if you have any experience with this Tim?
Great video and very detailed article! I’ve been trying to set up the same stack but on an AMD RX 6750 XT, running ROCm 6.2, with Ubuntu 22.04 (and even tried 24.04). I’ve successfully set up Docker and Portainer, configured my environment, and managed to get ROCm installed as per the AMD documentation. However, I’m running into issues with getting the AI stack (specifically Stable Diffusion WebUI) to work smoothly on the AMD hardware. I’ve tried using Dockerfiles from both the AUTOMATIC1111 and AbdBarho repositories, but adapting them for ROCm and AMD GPUs is where things get tricky. It feels like I’m just a Docker image or software update away from getting this working properly. Any tips on how to resolve this or anyone who’s managed to get a similar setup working on AMD hardware? Thanks for all the hard work you put into this content. It’s been super helpful so far!
Can you list some business cases for this, great set up and I’m sure cool for tinkering around, but is it financially viable thing to say offer services privately?
Funny, I just set up a Docker compose for ollama and SillyTavern last week. I want to add MEMGPT in there as well. Maybe you could expand this with MEMGPT?
Could you suggest and recommend Ai setups (not gaming!) to run SD and comfyui or even for "building" LLMs... for 3 different budgets? ... low . medium, high budget,...as in 1-2k, 2k-4k, 4k-10k Dollars ? Would be a great video actually ...
for the life of me I can't get the nvidia drivers to take. I've tried both methods a couple times (still very novice at all this so I'm probably missing something obvious). Great walkthrough though!
I have a 4090 for gaming local LAN stream gaming and have considered using it as a AI workhorse but the energy it takes to run it on an always on server is keeping me from doing so.
It really only uses power when you are processing, otherwise it's idle. Gaming probably uses more if you consider how long you play games vs. an AI task that lasts 5 seconds.
@@TechnoTimTinkers What's the idle power? I have been running my 3090 with ollama and docker on my gaming machine. But the idle power really worries me. The annual costs would be a lot.
@@TechnoTimTinkers True. But when im not gaming its turned off completely. But maybe ill give it a shot and use my kill-a-watt and see what it idles at with the 7950x3d. Running it 24/7 seems kinda like a waste :)Thanks for the video! ✌
I think you need more gpu to run the large models or use fast nvme? there will be more content along these lines - great to see you jumping in and escalating the open source ai violence
You do, that's right! Right now you'd need (2) 3090s/4090s to do any kind of training or for larger models. That would give you 48GB, which could fit the 40 GB models.
Cool and all, but still something it looks like I can't do with the hardware available to me. My best GPU is in my Gaming PC, a 6700XT...a 3 year old card. My server downstairs is from 11 years ago. Works great for the tasks I currently use it for.
I'm having trouble with whisper is anyone else getting a 502 Bad gateway error? Docker logs whisper shows CRIT supervisor is running as root. Thats the only error I'm getting
Were you able to achieve this with one GPU? I'm looking at duplicating your setup, but wanted to verify that you're using one GPU. Nm, once I turned the volume up, I got my answer. Had the volume down low as I'm watching this video at 4am trying not to wake up my wife. Lol
@@TechnoTimTinkers Bonus points: "Techno Tim Side Quests" becomes your third channel and all of the content is generated from your local LLM hahaha Edit: YOU ALREADY HAVE THREE CHANNELS. Just subbed to Techno Tim Talks. Please let me know if you have more, I want the full collection
If you don't have a beefy Nvidia GPU (3090+), you WILL be frustrated with trying to run any AI locally. If you have an AMD GPU, well, it's possible, but a massive pain in the ass and the results will always be slower than with an Nvidia GPU. Nvidia has the AI space cornered thanks to it's reliance on CUDA.
@@TechnoTimTinkers But this kubernetes for apps with total IP configuration what you said recently about the speed so I don't know how you did with the network administration group I need help
Hi! I'm one of the contributors to the Open WebUI project, thanks for the mention!
Thanks for contributing to such an awesome project!
This guy just knows all ! I am watching all of his videos for almost two years and finally networking and hypervisor are getting easy. I just needed this detailed discussion about personalised AI. ❤
Thanks for covering this so in-depth Tim. I've been working on a similar project based on a Dell R730 with 2x Tesla P40 cards...much more power hungry than your setup, but will spit out answers from 70b and complex Stable Diffusion workloads as fast as the small models generate on a 3090. I was running into some issues with getting things integrated in the same stack, so great to see how you've put this all together....I have some rebuilding to do.
what cpus are you using with yours? Do you think I will have any issues with the dual 2660v4 or go with ones that have higher single cpu clock speed. I'll be using a single rtx 3060 for now since I already have it. Small models of course.
Well there's not a chance that I'll recreate this beast of a configuration but it was a fun watch! Thank you
Stoked that you put this together. Thanks Tim!
Thanks for another great video!
For anyone else trying to get this working without an NVIDIA card and getting "Found no NVIDIA driver on your system" errors: you can add --cpu to the CLI_ARGS in the compose file (CLI_ARGS=--cpu, yes, it's slow) and, obviously, comment out all of the nvidia/deploy blocks.
I also had to add json to the formats: in searxng/settings.yml to avoid a 403 error w/ web search
Thank you!
Wow, Tim. I'm waiting for the CPU and CPU cooler, the last pieces of my new AI machine, to arrive. After watching this video I feel like my eyes popped open and "whoa, I know AI stack kung fu!" Thanks for doing this.
I was your 1000th like. Great video
Thank you!
Great video and awesome guide! I actually got it all working except Whisper, which I wasn't all that worried about, and Home Assistant, which I don't use. But having ollama, webui and stable diffusion running on my own PC with a GPU is a game changer! Thank you!
nice work!!!
Oh baby! Time to put some of my Tesla GPUs to work 🙂
Heck yes!
Would definitely be interested in a similar video using Tesla GPUs. Especially if you have multiple generations' worth of Tesla GPUs and can "benchmark" the best bang-for-buck option in the lineup without dipping into performance that is too slow to be acceptable and also without dipping into options that may be out of the price range for some homelabbers/where they'd be better off just buying a newer consumer-grade GPU.
Looking forward to a vid on this!
Would love to see how the Tesla GPUs handle Text2Image with Stable Diffusion using Open WebUI! Running a LLM and ImageGen is taxing on a 6gb card.... in theory a 24gb card would be fantastic, but would love to see some data on it.
I'm always surprised when everything just works... Kudos & thanks for the support notes... Great job
What a great video! Thanks for sharing your local, private, self-hosted AI stack! I appreciate the effort and expertise you brought to creating a tutorial on replicating this setup. I'm excited to try it out and learn from your experience.
Thank you!
Thank you for the video, a very nice guide.
One potential thing to play around with is quantization of the models, one can find one that is less quantized but still fits in the memory. For example with gemma2 27b once gets `q4` that takes 16 GB by default but one could get `27b-instruct-q6_K ` that takes ~21 GB and perhaps gives slightly better results. Of course then one has less space to space to host the models for other services like stable diffusion or whisper. One needs to click on tags when picking the model size on ollama website to see the full list.
Another nice potential addition to the stack could be `openedia-speech` to handle text-to-speech. It can be integrated with Open Web UI. Not a must have but complements the stack nicely IMHO.
Thank you so much for all of the work you put in to making this amazing walkthrough and all of the documentation to go with it!!!
You *could* run all the AI stuff in an LXC. You can pass the GPUs through to the LXC. The way that I figured out how to do it is mapping the devices in the LXC config and then install (the same!!) drivers on the host and in the LXC. You can actually share the GPUs this way if you wanted.
I wouldn't advise LXC to anyone. I used for a long time and it when it works, it works well. Overall a pain compared to Docker/Podman
Awesome, such kind of long videos are very helpful for better understanding. I find it very engaging. Thanks brother ❤
Thank you! I appreciate you noticing! Glad it helped!
Dad: we have ai at home
Ai at home:
Thank you for posting this! I'm going to give it a try this week :D
Let me know how it goes! Full documentation too in the description!
Thank you so much for this!!
You're welcome!
I hear Raid Owl challenged you to a mini rack challenge. I would love to see a portable build designed to be an extension of my home lab wherever I happen to be.
I didn't think he was going to go that hard...
This is everything I want to do! Many thanks!!! 🙏
Thank you so much for this video!
At 31:37, this is how mine runs because I have a small GPU.
Awesome video. I am trying to run ollama in kubernetes but now i think it will be easier to run as docker swarn
I was going to go that route too but when you consider models are multiple gigs I didn't want to deal with Kubernetes storage for that.
Surprisingly, I have Ollama running (with Dolphin-phi) on my 3 node, CPU only, Mini PC (16 core, 32GB RAM) K3s nodes :)
I am so invested in the Kubernetes space so it made more sense for me, at the trade off of: My models work SLOW. I am ok with this though. I also love Open WebUI, it works fantastic in a K8s environment.
Cool to see the full setup though, thanks for sharing!
That sounds awesome! I did debate making this VM a Kubernetes node and using node selectors to pin workloads to the node with the GPU, but I also didn't want to have to deal with storage since the models are so huge.
@@TechnoTimTinkers that was my plan as well (once I get some hardware for it) but doing docker stacks was def a great alternative, and totally agree with your point of having a central “AI” box is super useful.
This is too cool! Thanks for doing this!!!
Thanks mate!. Well explained. Deploying right away.
I got an MLLSE RTX 3060 w/ 12GB of VRAM on Ali Express for the sole purpose of running AI models and it's been great. Very cheap card but solid CUDA compute capability and 12GB VRAM is more than enough for llama 3 (7B) and with quantized models I can really take advantage of it. Power consumption is ridiculous though. If you use AI a lot-- figure out a way to heat your home with compute lol.
If money wasn't an issue I wouldn't even go for a 30/4090 with 24GB VRAM but one of the newer workstation cards. Wendell at Level1 did a good video going over the AI performance of the Ada generation RTX 4000 SFF and it was a little slower but used only a fraction of the power. 2 of those gets you 40GB of VRAM at 140W max which is nuts.
The 3060 12GB is definitely the value play! (Links in description) The MSI is the one I recommend to anyone who wants to run a stack like this (and more). I have 24Gb but it’s a weird size, not big enough for large models, but too big for small.
or just wait for RTX Quadro A6000 (not the latest Ada), it's same architecture like RTX 3000 series, but gives you 48GB ECC vram on its own, and is fairly power efficient too (no overhead of two gpus waiting for their vram buffers)
if you want a cheaper card that works well for this and has the VRAM of a RTX 3090 is the Tesla K80 24GB model, they go for $45 on eBay
How well? Compared to the 3090 using 12GB?
Hahhahaha what a luck, just seen your comment on "Self-Hosted AI That's Actually Useful" this video is soon coming, headed over to this channel and saw uploaded 1 minute ago.
Great timing!
Lovely video, really looking forward to getting stuck into it, but did you run into any issues? Been following on from your web article and the default compose file, as I dont use traefik, but the web search just complains about limiter.toml missing, tried adding the default file from their site, no change, and when trying to do a web search it just gives me 'Expecting Value' the compose file has Searxng exposed on 8081, so I changed the Searxng from 8080 to 8081, but that gives error 403, the url references the server IP address for ease
That and aistack_stable-diffusion-download_1 keeps stopping with no obvious errors
Very new to this, so any helps is appreciated, thanks
I had to drop the whisper section from your compose file and replace it with the one from the repo compose file. without any changes, whisper no longer through a 500 internal error and translate worked fine.
Yeah I run 6x RTX A6000 (Ampere generation, same as the 30 series consumer cards) in 2 GPU nodes in my homelab. I don't train, but I do have a bunch of agents, and some automations that run a lot, so parallel compute of AI models is important enough to have spent a mid-sized car's worth on GPUs. EDIT: I'd also suggest gemma2:27b from ollama, it's a great model, better than llama3:8b in my testing (and in some, better than llama3:70b ... i can run both).
Nice! SO jealous of the A6000!
Note: A decent or better GPU is required on the host machine; otherwise, the CPU will be constantly overworked.
For sure. Listed a few budget friendly ones in the description. I really hope Intel steps it up soon. Would be awesome to have decent AI on an Intel chip and just use system RAM.
Stable diffusion 3 so far kinda sucks for things like body parts. StabilityAI is supposed to be releasing a updated model in the "coming weeks" they said, but most likely will be longer than that from what it sounds like. SDXL is a lot better for generating human body parts, or at least easier to get them too in my experience. At this point, I don't see any reason to use SD3 over SDXL.
Thanks for the tutorial. Still newish to docker, so your examples have helped me understand a lot better. Currently just run Ollama, SearXNG, SD WebUI Forge in a Python Venv on my main desktop with a 3090 TI, since my Proxmox server doesn't have a GPU yet. been waiting for 50 series card to come out before I get another GPU.
Found out about the SearXNG from your other channels video. Makes me wonder if paying for a private search engine, Kagi, is still worth it or not.
Thanks for the tips. I am new to SD so I will take your word for it. If you have something running with Proxmox in containers, that's good enough! Docker is really great though for dependencies and even updating the image. Just a quick docker pull and you get the latest version. searXNG is pretty cool too. I need to explore it some more.
I just loaded this up and was getting a 403 Forbidden error when trying web search. I needed to modify Searnxg's settings.yml file to add "- json" under "- html" in the search > formats area.
Maybe a silly tip, but I struggled to make things work and the behavior was inconsistent. However, enabling the processor type as host solved my problems.
Ah, yeah, that's a good call. I always do that so that VMs just inherit their capabilities from the host!
Great video! How do you update the stack? git pull and docker compose pull or?
That's right, just pull! Soon I hope to have some CI with Docker stacks so I don't have to SSH in anymore.
I use Podman with whisper/piped/openwakeword in home assistant on an old Dell workstation i picked up on ebay for £180 (xeon and 1070ti)
Using the distil medium en model for whisper, voice control is around 2 seconds.
The biggest issue, in my opinion are the wakeword satelites you would need in a medium suzed home. I wonder if you have any experience with this Tim?
Great video and very detailed article! I’ve been trying to set up the same stack but on an AMD RX 6750 XT, running ROCm 6.2, with Ubuntu 22.04 (and even tried 24.04).
I’ve successfully set up Docker and Portainer, configured my environment, and managed to get ROCm installed as per the AMD documentation. However, I’m running into issues with getting the AI stack (specifically Stable Diffusion WebUI) to work smoothly on the AMD hardware. I’ve tried using Dockerfiles from both the AUTOMATIC1111 and AbdBarho repositories, but adapting them for ROCm and AMD GPUs is where things get tricky.
It feels like I’m just a Docker image or software update away from getting this working properly. Any tips on how to resolve this or anyone who’s managed to get a similar setup working on AMD hardware?
Thanks for all the hard work you put into this content. It’s been super helpful so far!
I did a similar setup with Debian and rocm containers for AMD APU
That’s nice. Could you share your setup?
We need more info on Traefic setup. Like the setup from beginning.
I have a whole video on it here on the main channel ua-cam.com/video/n1vOfdz5Nm8/v-deo.html
Can you list some business cases for this, great set up and I’m sure cool for tinkering around, but is it financially viable thing to say offer services privately?
Hey Tim can you build project base on Google coral and docker?
Funny, I just set up a Docker compose for ollama and SillyTavern last week. I want to add MEMGPT in there as well. Maybe you could expand this with MEMGPT?
Could you suggest and recommend Ai setups (not gaming!) to run SD and comfyui or even for "building" LLMs... for 3 different budgets? ... low . medium, high budget,...as in 1-2k, 2k-4k, 4k-10k Dollars ? Would be a great video actually ...
If I could find someone to sponsor the builds 😂.
Hello Tim, I’m new to this, not computer savvy, where do you suggest I start to learn? Thanks
Why was a VM chosen instead of CT?
Great question! Primarily because I want complete isolation and I don't want to run Docker inside of a CT
This is very interesting, Im going to play with it. Lets see how mac behaves with those promts
ever been to a festival at harmony park in Minnesota?
An overview of what a local ai can actually do would be great lol. Because I don't want to jump in it without knowing the final goals 😂
ua-cam.com/video/GrLpdfhTwLg/v-deo.html
@@TechnoTimTinkers thanks man!
What is the powerconsumtion of this thing?
for the life of me I can't get the nvidia drivers to take. I've tried both methods a couple times (still very novice at all this so I'm probably missing something obvious). Great walkthrough though!
I have a 4090 for gaming local LAN stream gaming and have considered using it as a AI workhorse but the energy it takes to run it on an always on server is keeping me from doing so.
It really only uses power when you are processing, otherwise it's idle. Gaming probably uses more if you consider how long you play games vs. an AI task that lasts 5 seconds.
@@TechnoTimTinkers What's the idle power? I have been running my 3090 with ollama and docker on my gaming machine. But the idle power really worries me. The annual costs would be a lot.
@@sassan428 27:25 7W!
@@TechnoTimTinkers True. But when im not gaming its turned off completely. But maybe ill give it a shot and use my kill-a-watt and see what it idles at with the 7950x3d. Running it 24/7 seems kinda like a waste :)Thanks for the video! ✌
I think you need more gpu to run the large models or use fast nvme? there will be more content along these lines - great to see you jumping in and escalating the open source ai violence
You do, that's right! Right now you'd need (2) 3090s/4090s to do any kind of training or for larger models. That would give you 48GB, which could fit the 40 GB models.
Cool and all, but still something it looks like I can't do with the hardware available to me. My best GPU is in my Gaming PC, a 6700XT...a 3 year old card. My server downstairs is from 11 years ago. Works great for the tasks I currently use it for.
Have you been able to Get RAG working? Mine cant find the Doc in the context
What about VirGL? Would it still work with docker?
With the current prices for graphics cards, it's probabbly cheaper to pay the subscriptions.
Yaaaasss!! 👏👏👏
I'm having trouble with whisper is anyone else getting a 502 Bad gateway error? Docker logs whisper shows CRIT supervisor is running as root. Thats the only error I'm getting
Podman > Docker. There’s zero reason to give the container root privileges.
Were you able to achieve this with one GPU? I'm looking at duplicating your setup, but wanted to verify that you're using one GPU. Nm, once I turned the volume up, I got my answer. Had the volume down low as I'm watching this video at 4am trying not to wake up my wife. Lol
Great!
Could this run with multi gpu? Like 4x P40s
is it possible to run this in windows and with self signed certificate for local use only?
Sure, that should work.
Did he just call 256GB RAM regular??
For a server, yes.
Side quest unlocked!
I have been debating renaming this channel to "Techno Tim Side Quests" 😂
@@TechnoTimTinkers Bonus points: "Techno Tim Side Quests" becomes your third channel and all of the content is generated from your local LLM hahaha
Edit: YOU ALREADY HAVE THREE CHANNELS. Just subbed to Techno Tim Talks. Please let me know if you have more, I want the full collection
Do it!@@TechnoTimTinkers
Nice new name @@TechnoTimTinkers
If you don't have a beefy Nvidia GPU (3090+), you WILL be frustrated with trying to run any AI locally. If you have an AMD GPU, well, it's possible, but a massive pain in the ass and the results will always be slower than with an Nvidia GPU. Nvidia has the AI space cornered thanks to it's reliance on CUDA.
Currently testing with Ollama on my Proxmox-box with a 10400F, it's a tad slow :P
I have my value pick in the description. A "cheap" GPU would speed it up tremendously.
I bet this guy does a killer Christopher Walken
gtx 3090?
Took me forever to find out that I had to enable web search on a chat by chat basis lol
Yeah, I think you can set it as the default however they will slow down all of your non web searches
Every time toy called the RTX 3090 "GTX 3090" it just hurt a bit inside. But otherwise cool ideas, thank you.
Sorry! I have used GTX cards for 10 years and RTX for 2 🤐
Bro if you're hacked 😂 it's not gonna be pretty. Awesome video otherwise
ollama/ollama:rocm docker container works great on my RX6700 XT with any model that fits within VRAM that I tested so far
@TechnoTimTinkers Permission denied (tailscale) help fixed
@TechnoTimTinkers please TureNAS Salce problem network help clusterIP loadbalancer fixed yeah fixed IP app k8S please help GUIDE
Hey, I don't use k8s in SCALE.
@@TechnoTimTinkers But this kubernetes for apps with total IP configuration what you said recently about the speed so I don't know how you did with the network administration group I need help