Thank you for your video. I will share it with other people and other work organizations and put you on our list as preferred content providers for those who wanna do it yourselves again thank you for your video. It is so easy and you’re very detailed and the explanation Nolly in the application deployment as well as the hardware configuration.
After seeing this video I had to download and try this model by myself (also running Open WebUI in dockge while Ollama in a separate LXC container on Proxmox with a 20GB Nvidia RTX 4000 Ada passed through). I was flashed by the accuracy of the pictures being recognized! Even the numbers shown on my electricity meter's display were identified correct. Wow ... that is and will be fun using more over the weekend ;-) Keep up your good work with these videos!
@DigitalSpaceport Yeah. No idea what I did do differently or specifically 🤷 Looking at some logs might be a good idea though I have no clue if/how/where/when/why verbal they might be
Next time give it a try to ask a new question in a new chat. Ollama by default is using context size of 2k, you most probably exhausting it too quick with pictures. And the GPU VRAM is too low to accomodate higher context size without flash attention or using smaller quants, rather than default 4bit you have downloaded.
Hi. This is an awesome video showcasing Ollama on a 12GB GPU. I am currently using a 12GB 6750xt. I still find it very usable speed with models in the 18-24 GB range.
@@DigitalSpaceport I had to add a couple lines to ollama.service file because the 6750xt is not supported by rocm, but other than that it works great. I have not measured the token rate. I will get back to you when I do. But I can say with a 10600k and 32gb DDR4 3600 it generates responses at a very comfortable reading pace even when offloading decent percent to the cpu.
Sounds like maybe you'll be doing a compilation video here soon, but if not or if it's going to be a while, maybe you should add the guide videos to a playlist. You have so much great content out there. It's hard to figure out which ones to watch if you're starting from scratch
I hear this feedback and its tough as things that are critical change fairly fast. I like the idea of segmenting the playlists by skillset and content type. Then I can during intro point new folks to that playlist and update those videos. Thanks, soon. And yes there is a new software guide video up soon I am working on right now.
the terminology 'in this picture' might mean it is looking for photographs within the image. Using the phrase 'what is shown in this image' would be more open ended. It might classify 'picture' the same as 'painting'. example: 'what is in this painting?' and showing the image of the cat and slippers. IDK, just a guess
I like the way you are "testing" various Combos..... I'm an old guy progressively having hand issues after years of physical work/abuse..... I'm really interested in using the "AI" as a solution for disabilities, as well as Blue Iris/Home Assistant tie in. I'm "researching" the voice to text (conversational) as well as image recognition server/servers..... would be interesting to see speech to text asking/inputting the question(s)..... I have a 3060 12g and a 4000A to play with.... if you have time/desire, would be interested in seeing a dual GPU setup with the above GPUs (so I don't have to)..... also curious how they would perform in X8 (electrical) slots... and if multiple models can run simultaneously, voice
They will perform for inferance just as fine in an 8 as a 16 its a low BW workload. Training that wouldnt hold true however. Agree I need to do the voice video. Its pretty awesome and I use it often on cellphone.
i'm a disabled vet myself. i just started working on an agentic framework i quit on back in 2004, but now it's being refactored for vets and the disabled. problem is i'm on a fixed income and the software is failing from a cascading failure from heat on my laptop. wish i had the money for new hardware. i have all the modules working long enough to run the first couple test, but not long enough to put all the pieces together. all the pieces of the puzzle are available, but hardware will determine if you get a working product or not. #1 lesson? all the llama's are neutered and lobotomized and thus a waste of time. quants only make it worse, cascading failures and hallucinations. open-interpreter for tool use, agent-zero for memory, openai api/gpt4o for best results until a decent local LLM comes out.
30:07 if you have the ram you can always throw up a RAMDisk and swap models out of CPU RAM and into VRAM much quicker than off a drive. More advanced setup would use Memcached or Redis but for something quick and dirty RANDisk all day.
@@DigitalSpaceport yup. I use a similar approach for swapping datasets in and out of VRAM during fine-tuning and have even put my whole RAG in VRAM via lsync (It works but no way I would put it production professionally) and that defiantly helped speed things up quite a lot.
Did you give multiple images and try to retrieve the correct one with your query? That would be an intesting experiment. I wonder the how many images it can handle at most. Thanks for your series btw
Thank you for your video. Could you please tell me if you have tested this configuration on the Llama-3.1-8b or Llama-3.2-3b text models? It would be interesting to know the performance figures (tokens/sec) on your tests 🤔.
Thanks nearly my setup! Did you go with pci passthrough to an vm or to an lxc? The card is pretty good for daily tasks and some low power consumption. Also 3.2 vision is at the moment really good for what i use it, mine takes about 170W on full load though 😅
So in this demo I went with the VM and passthru as it "just works" with no cgroups funkiness but in a stable system I always go with LXC. Plus you can use it for other tasks but if it crashes out of VRAM with like a lot of tasks it doesnt recover gracefully. I need to figure that out but yeah 3.2 vision is wild stuff.
Interesting build. Funny you make this video not too long after I recycled a bunch of them. It would be nice if people found more uses for stuff older than 8TH gen. These older machines are still perfectly usable.
@DigitalSpaceport it would be interesting to see a functional ultra budget build. Curious how much cheaper that this setup you could get. The Dell T3600 with 625W PSU are really cheap now.
The power pin for a GPU tends to dictate I have found and a must to get enough vram cheaply. A strong contender that is even cheaper could be based off an HP workstation class but wow I do not like their bios at all. I have a note that says so taped to my second monitor in case I forget but it could bring costs down. I think 7th gen intel is desirable cutoff as that iGPU performs the vast amount of offload needed to have a decent AIO media center box also. Does a 3600 have a power 6 pin?
@@DigitalSpaceport T3600 has two 6 pin connectors if it is the 635W config. 425W config doesn't support powered GPUs though. There also can be some clearance issues depending on the GPU design. Looks like they bring the same price as the 3620 though so might not be worth pursuing.
The amount of requests I am getting for testing AMD GPUs does have me strongly considering buying one used to see. I had a friend that was going to lend me one but then they sold it. Possibly test this out soon.
Can you use multiple 3060s for this? I mean does it support memory pooling or is the model limited by VRAM capacity of a single GPU? Sorry if this is dumb question but this is not my field (and in 3D rendering with CUDA and OptiX can't pool memory on consumer grade cards)
I would guess by the fact if you ask multiple things the LLM processes them all at once, the vision is the same and doesn't read left to right nor right to left but processes the entire sentence all at once. 29:14
Okay but check this out. It says -80 at first, but that screen looks like that if read rtl. The - is a small case watt. Its 08watt on the screen. Im testing the big one today so will investigate further on it.
18:02 I thought it might be referring to the F-connector and is not registering the white Cat-6 cable at all. Maybe try again using a Cat- with a contrasting color...
That is an expensive ask and unfortunately this YT channel earns um lets say not even remotely close to enough to have a budget for R and D that would be in the 10K range for a quad setup. I have a hard time even getting people to subscribe, much less sign up for a membership or anything.
Yeah this 3060 is pretty sweet. I wish there was a 12 or 16GB vram slot powered card that was cheap but maybe in a few years. 20 t/s is totally passable and the base model this is strapped to is pretty decent.
@@DigitalSpaceportyes and affordable too, sad there is no 16G Version for just a little more. The price gap between 12-24 is just insane if just used for AI
could you please test this build with localGPT vision github, that repo had several vision model to test with seeing how each model perform on RAG with such build might really interesting because this kind of RAG were really different to image to text to vector, this system image to vector. different architecture
@DigitalSpaceport awesome, glad to know you are into the concept of "image to vector" instead of "image to text to vector" i believe in future having a model be able to handle both without losing speed in consumer hardware would be game changing, since both architecture have their pro cons. thanks for your videos mate.
I do plan to test the 90b-instruct-q8_0 which is 95GB (4 3090 gonna be close) and also the 11b-instruct-fp16 is only 21GB so might give that a roll also. I think the meta llama series of models caps at just fp16 or am I overlooking something?
these "vision" models are so bad and unreliable for anything. need to be way more specialized and fed much more samples to be of any value. spatial relationships are completly wrong. blob classification/recognition is weak. i dont see any use for this unless very very basic tasks. i dont even know if any of this can be put to production due to unreliability.
I am about to start testing out the big one here and hope for a lot of improvement. I just want to be able to read an LCD thats very clear which seems like it should be a small hurdle.
Thank you for your video. I will share it with other people and other work organizations and put you on our list as preferred content providers for those who wanna do it yourselves again thank you for your video. It is so easy and you’re very detailed and the explanation Nolly in the application deployment as well as the hardware configuration.
Thank you for your video. I will share it with other people and other work organizations and put you on our list as preferred content providers for those who wanna do it yourselves again thank you for your video. It is so easy and you’re very detailed and the explanation Nolly in the application deployment as well as the hardware configuration.
After seeing this video I had to download and try this model by myself (also running Open WebUI in dockge while Ollama in a separate LXC container on Proxmox with a 20GB Nvidia RTX 4000 Ada passed through). I was flashed by the accuracy of the pictures being recognized! Even the numbers shown on my electricity meter's display were identified correct. Wow ... that is and will be fun using more over the weekend ;-) Keep up your good work with these videos!
Wait so your model was able to see the numbers on a LCD? I need to figure out what is going on with mine I have 2 meter things I need to log.
@DigitalSpaceport Yeah. No idea what I did do differently or specifically 🤷 Looking at some logs might be a good idea though I have no clue if/how/where/when/why verbal they might be
Next time give it a try to ask a new question in a new chat. Ollama by default is using context size of 2k, you most probably exhausting it too quick with pictures. And the GPU VRAM is too low to accomodate higher context size without flash attention or using smaller quants, rather than default 4bit you have downloaded.
wo cool, thanks for the in deepth tests, helps a lot
Hi. This is an awesome video showcasing Ollama on a 12GB GPU. I am currently using a 12GB 6750xt. I still find it very usable speed with models in the 18-24 GB range.
Oh hey a datapoint for AMD! nice. can I ask what tokens/s you hut on the 6750xt? Any issues with ollama or does it "just work" ootb?
@@DigitalSpaceport I had to add a couple lines to ollama.service file because the 6750xt is not supported by rocm, but other than that it works great. I have not measured the token rate. I will get back to you when I do. But I can say with a 10600k and 32gb DDR4 3600 it generates responses at a very comfortable reading pace even when offloading decent percent to the cpu.
Is Amd Rx has a good compability now? I am planning rx7900 gre for games and ai
Or i will sacrifice for 3060 16gb?
@@spagget 7900GRE is ROCM supported. You will have no issues with Ollama. It will work out of the box. Just install Ollama and go.
@@docrx1857 thank you. Nvidia is pricey for me and i wanna try out AI stuff before i quit gaming life.
Great video! Love anything AI related.
Sounds like maybe you'll be doing a compilation video here soon, but if not or if it's going to be a while, maybe you should add the guide videos to a playlist. You have so much great content out there. It's hard to figure out which ones to watch if you're starting from scratch
I hear this feedback and its tough as things that are critical change fairly fast. I like the idea of segmenting the playlists by skillset and content type. Then I can during intro point new folks to that playlist and update those videos. Thanks, soon. And yes there is a new software guide video up soon I am working on right now.
the terminology 'in this picture' might mean it is looking for photographs within the image. Using the phrase 'what is shown in this image' would be more open ended. It might classify 'picture' the same as 'painting'. example: 'what is in this painting?' and showing the image of the cat and slippers. IDK, just a guess
I like the way you are "testing" various Combos..... I'm an old guy progressively having hand issues after years of physical work/abuse..... I'm really interested in using the "AI" as a solution for disabilities, as well as Blue Iris/Home Assistant tie in. I'm "researching" the voice to text (conversational) as well as image recognition server/servers..... would be interesting to see speech to text asking/inputting the question(s)..... I have a 3060 12g and a 4000A to play with.... if you have time/desire, would be interested in seeing a dual GPU setup with the above GPUs (so I don't have to)..... also curious how they would perform in X8 (electrical) slots... and if multiple models can run simultaneously, voice
They will perform for inferance just as fine in an 8 as a 16 its a low BW workload. Training that wouldnt hold true however. Agree I need to do the voice video. Its pretty awesome and I use it often on cellphone.
i'm a disabled vet myself. i just started working on an agentic framework i quit on back in 2004, but now it's being refactored for vets and the disabled. problem is i'm on a fixed income and the software is failing from a cascading failure from heat on my laptop. wish i had the money for new hardware. i have all the modules working long enough to run the first couple test, but not long enough to put all the pieces together. all the pieces of the puzzle are available, but hardware will determine if you get a working product or not. #1 lesson? all the llama's are neutered and lobotomized and thus a waste of time. quants only make it worse, cascading failures and hallucinations. open-interpreter for tool use, agent-zero for memory, openai api/gpt4o for best results until a decent local LLM comes out.
30:07 if you have the ram you can always throw up a RAMDisk and swap models out of CPU RAM and into VRAM much quicker than off a drive. More advanced setup would use Memcached or Redis but for something quick and dirty RANDisk all day.
dude, genius! i didnt think about this. I have a server personally that has 192ish and might use this method lol
Redis/valkey sounds like a great option for this!
@@DigitalSpaceport yup. I use a similar approach for swapping datasets in and out of VRAM during fine-tuning and have even put my whole RAG in VRAM via lsync (It works but no way I would put it production professionally) and that defiantly helped speed things up quite a lot.
Did you give multiple images and try to retrieve the correct one with your query? That would be an intesting experiment. I wonder the how many images it can handle at most. Thanks for your series btw
Thank you for your video. Could you please tell me if you have tested this configuration on the Llama-3.1-8b or Llama-3.2-3b text models? It would be interesting to know the performance figures (tokens/sec) on your tests 🤔.
Thanks nearly my setup! Did you go with pci passthrough to an vm or to an lxc?
The card is pretty good for daily tasks and some low power consumption.
Also 3.2 vision is at the moment really good for what i use it, mine takes about 170W on full load though 😅
So in this demo I went with the VM and passthru as it "just works" with no cgroups funkiness but in a stable system I always go with LXC. Plus you can use it for other tasks but if it crashes out of VRAM with like a lot of tasks it doesnt recover gracefully. I need to figure that out but yeah 3.2 vision is wild stuff.
Interesting build. Funny you make this video not too long after I recycled a bunch of them. It would be nice if people found more uses for stuff older than 8TH gen. These older machines are still perfectly usable.
Im testing out a maxwell card this weekend, a M2000. I bet its going to suprise me!
@DigitalSpaceport it would be interesting to see a functional ultra budget build. Curious how much cheaper that this setup you could get. The Dell T3600 with 625W PSU are really cheap now.
The power pin for a GPU tends to dictate I have found and a must to get enough vram cheaply. A strong contender that is even cheaper could be based off an HP workstation class but wow I do not like their bios at all. I have a note that says so taped to my second monitor in case I forget but it could bring costs down. I think 7th gen intel is desirable cutoff as that iGPU performs the vast amount of offload needed to have a decent AIO media center box also. Does a 3600 have a power 6 pin?
@@DigitalSpaceport T3600 has two 6 pin connectors if it is the 635W config. 425W config doesn't support powered GPUs though. There also can be some clearance issues depending on the GPU design. Looks like they bring the same price as the 3620 though so might not be worth pursuing.
Can you do a AMD test with a 7900 variant. I feel that’s more affordable and realistic when it comes down to $ to VRAM ratio.
The amount of requests I am getting for testing AMD GPUs does have me strongly considering buying one used to see. I had a friend that was going to lend me one but then they sold it. Possibly test this out soon.
I completely understand. I would love to see an AMD build so that we don’t have to offer our kidneys to the Nvidia gods.
@@alcohonis 7900xtx, accept no substitute. Ok maybe W7900. Or instinct.
Can you use multiple 3060s for this? I mean does it support memory pooling or is the model limited by VRAM capacity of a single GPU? Sorry if this is dumb question but this is not my field (and in 3D rendering with CUDA and OptiX can't pool memory on consumer grade cards)
I would guess by the fact if you ask multiple things the LLM processes them all at once, the vision is the same and doesn't read left to right nor right to left but processes the entire sentence all at once. 29:14
Okay but check this out. It says -80 at first, but that screen looks like that if read rtl. The - is a small case watt. Its 08watt on the screen. Im testing the big one today so will investigate further on it.
18:02 I thought it might be referring to the F-connector and is not registering the white Cat-6 cable at all.
Maybe try again using a Cat- with a contrasting color...
Good point! I am also now convinced it is reading RTL and not LTR on LCD screens which is weird.
I have an old dell i7-4760 that I could try pairing with a 3060 12gb. I have run llama3 on just a i5-13600K and it was usable but a little slow.
Was it the new llama3.2-vision 11b? What tokens/s did you get?
What was the inference speed for text gen. can you ask it to write 500 words story and see the llama stats?
can you look into running multiple mac mini m4 in a cluster? using exo for example?
That is an expensive ask and unfortunately this YT channel earns um lets say not even remotely close to enough to have a budget for R and D that would be in the 10K range for a quad setup. I have a hard time even getting people to subscribe, much less sign up for a membership or anything.
Interesting
LoL, Now your speaking my langage! Until 48GB Vram cards under 1000 become a thing anyway 😀
Yeah this 3060 is pretty sweet. I wish there was a 12 or 16GB vram slot powered card that was cheap but maybe in a few years. 20 t/s is totally passable and the base model this is strapped to is pretty decent.
@@DigitalSpaceportyes and affordable too, sad there is no 16G Version for just a little more. The price gap between 12-24 is just insane if just used for AI
could you please test this build with localGPT vision github, that repo had several vision model to test with seeing how each model perform on RAG with such build might really interesting because this kind of RAG were really different to image to text to vector, this system image to vector. different architecture
Im looking at this now and I like the idea fewer steps in RAG. Img2txt getting the boot would be awesome.
@DigitalSpaceport awesome, glad to know you are into the concept of "image to vector" instead of "image to text to vector" i believe in future having a model be able to handle both without losing speed in consumer hardware would be game changing, since both architecture have their pro cons. thanks for your videos mate.
Yeah I do like the concept and having been a long time user of unpaper/tesseract its indeed extra steps that avoiding would be ideal.
I would love to see the same test on fp16 or 32. Not sure if it gonna give more accurate responses.
I do plan to test the 90b-instruct-q8_0 which is 95GB (4 3090 gonna be close) and also the 11b-instruct-fp16 is only 21GB so might give that a roll also. I think the meta llama series of models caps at just fp16 or am I overlooking something?
I wonder how the Q8 version would do in these tests. *Should* be better.
I do plan on testing the q8 in the 90 so we should get a decent hi-lo gradiant. If that is signifigant I will revisit for sure
Ich verstehe nicht, wofür ihr diese Modelle benutzt. Kann mir das vielleicht einer erklären was der nutzen ist?
It's a shame Pixtral is not on ollama, also it's a bigger model.
I agree but I think there is a way to make it work with the new ollama HF running. Would need to manually kick that off but I think it could work.
I understand appeal of 3060, but why everyone ignore 4060ti 16gb?
Im not ignoring it myself, at msrp its a rather good card. I just cant afford to buy 1 of all the things so thats why its not reviewed here.
hi will you try llamafile on threadripper cpu not gpu they say its really fast
i5/ i7 10th gen
Ryzen 5 / 6 5th gen
better price/ watts
No body should give their Ai Data or Researches to big providers. Keep your data in Local.
Fully agree! The data the collect on us in addition to paid fir services is absolutly silly.
fully local yes be careful of API too some model still send data
these "vision" models are so bad and unreliable for anything. need to be way more specialized and fed much more samples to be of any value. spatial relationships are completly wrong. blob classification/recognition is weak. i dont see any use for this unless very very basic tasks. i dont even know if any of this can be put to production due to unreliability.
I am about to start testing out the big one here and hope for a lot of improvement. I just want to be able to read an LCD thats very clear which seems like it should be a small hurdle.
Yea 350$ + GPU lol. Stupid clickbait.
No. It is $350 with the 3060 12GB GPU and I dont clickbait like that. I do clickbait of course but not some outright lie as you are stating.
❤❤have a happy weekend brother and followers !!!
❤❤❤first comment .
Love ir videos man
Love from sweden
Thank you for your video. I will share it with other people and other work organizations and put you on our list as preferred content providers for those who wanna do it yourselves again thank you for your video. It is so easy and you’re very detailed and the explanation Nolly in the application deployment as well as the hardware configuration.
Oh thank you I appreciate that a lot.