Llama 3.2 Vision 11B LOCAL Cheap AI Server Dell 3620 and 3060 12GB GPU

Digital Spaceport

Додати в
- Мій плейлист
- Переглянути пізніше
Поділитися

Поділитися

Вставка

Розмір відео:

Показувати елементи керування програвачем

Автоматичне відтворення

Автоповтор

Опубліковано 25 лис 2024

КОМЕНТАРІ • 77

@FaithMediaChannel 14 днів тому ⁺¹
Thank you for your video. I will share it with other people and other work organizations and put you on our list as preferred content providers for those who wanna do it yourselves again thank you for your video. It is so easy and you’re very detailed and the explanation Nolly in the application deployment as well as the hardware configuration.
@CoolWolf69 17 днів тому ⁺²
After seeing this video I had to download and try this model by myself (also running Open WebUI in dockge while Ollama in a separate LXC container on Proxmox with a 20GB Nvidia RTX 4000 Ada passed through). I was flashed by the accuracy of the pictures being recognized! Even the numbers shown on my electricity meter's display were identified correct. Wow ... that is and will be fun using more over the weekend ;-) Keep up your good work with these videos!
@DigitalSpaceport 11 днів тому ⁺¹
Wait so your model was able to see the numbers on a LCD? I need to figure out what is going on with mine I have 2 meter things I need to log.
@CoolWolf69 11 днів тому
@DigitalSpaceport Yeah. No idea what I did do differently or specifically 🤷 Looking at some logs might be a good idea though I have no clue if/how/where/when/why verbal they might be
@alx8439 17 днів тому ⁺⁹
Next time give it a try to ask a new question in a new chat. Ollama by default is using context size of 2k, you most probably exhausting it too quick with pictures. And the GPU VRAM is too low to accomodate higher context size without flash attention or using smaller quants, rather than default 4bit you have downloaded.
@alexo7431 8 днів тому
wo cool, thanks for the in deepth tests, helps a lot
@docrx1857 17 днів тому ⁺⁹
Hi. This is an awesome video showcasing Ollama on a 12GB GPU. I am currently using a 12GB 6750xt. I still find it very usable speed with models in the 18-24 GB range.
@DigitalSpaceport 17 днів тому ⁺³
Oh hey a datapoint for AMD! nice. can I ask what tokens/s you hut on the 6750xt? Any issues with ollama or does it "just work" ootb?
@docrx1857 17 днів тому
@@DigitalSpaceport I had to add a couple lines to ollama.service file because the 6750xt is not supported by rocm, but other than that it works great. I have not measured the token rate. I will get back to you when I do. But I can say with a 10600k and 32gb DDR4 3600 it generates responses at a very comfortable reading pace even when offloading decent percent to the cpu.
@spagget 4 дні тому
Is Amd Rx has a good compability now? I am planning rx7900 gre for games and ai
Or i will sacrifice for 3060 16gb?
@docrx1857 4 дні тому
@@spagget 7900GRE is ROCM supported. You will have no issues with Ollama. It will work out of the box. Just install Ollama and go.
@spagget 4 дні тому
@@docrx1857 thank you. Nvidia is pricey for me and i wanna try out AI stuff before i quit gaming life.
@AndyBerman 17 днів тому ⁺¹
Great video! Love anything AI related.
@firefon326 17 днів тому ⁺¹
Sounds like maybe you'll be doing a compilation video here soon, but if not or if it's going to be a while, maybe you should add the guide videos to a playlist. You have so much great content out there. It's hard to figure out which ones to watch if you're starting from scratch
@DigitalSpaceport 17 днів тому
I hear this feedback and its tough as things that are critical change fairly fast. I like the idea of segmenting the playlists by skillset and content type. Then I can during intro point new folks to that playlist and update those videos. Thanks, soon. And yes there is a new software guide video up soon I am working on right now.
@lemmonsinmyeyes 11 днів тому ⁺¹
the terminology 'in this picture' might mean it is looking for photographs within the image. Using the phrase 'what is shown in this image' would be more open ended. It might classify 'picture' the same as 'painting'. example: 'what is in this painting?' and showing the image of the cat and slippers. IDK, just a guess
@MM-vl8ic 17 днів тому ⁺²
I like the way you are "testing" various Combos..... I'm an old guy progressively having hand issues after years of physical work/abuse..... I'm really interested in using the "AI" as a solution for disabilities, as well as Blue Iris/Home Assistant tie in. I'm "researching" the voice to text (conversational) as well as image recognition server/servers..... would be interesting to see speech to text asking/inputting the question(s)..... I have a 3060 12g and a 4000A to play with.... if you have time/desire, would be interested in seeing a dual GPU setup with the above GPUs (so I don't have to)..... also curious how they would perform in X8 (electrical) slots... and if multiple models can run simultaneously, voice
@DigitalSpaceport 17 днів тому
They will perform for inferance just as fine in an 8 as a 16 its a low BW workload. Training that wouldnt hold true however. Agree I need to do the voice video. Its pretty awesome and I use it often on cellphone.
@fatherfoxstrongpaw8968 17 днів тому ⁺¹
i'm a disabled vet myself. i just started working on an agentic framework i quit on back in 2004, but now it's being refactored for vets and the disabled. problem is i'm on a fixed income and the software is failing from a cascading failure from heat on my laptop. wish i had the money for new hardware. i have all the modules working long enough to run the first couple test, but not long enough to put all the pieces together. all the pieces of the puzzle are available, but hardware will determine if you get a working product or not. #1 lesson? all the llama's are neutered and lobotomized and thus a waste of time. quants only make it worse, cascading failures and hallucinations. open-interpreter for tool use, agent-zero for memory, openai api/gpt4o for best results until a decent local LLM comes out.
@ToddWBucy-lf8yz 16 днів тому ⁺³
30:07 if you have the ram you can always throw up a RAMDisk and swap models out of CPU RAM and into VRAM much quicker than off a drive. More advanced setup would use Memcached or Redis but for something quick and dirty RANDisk all day.
@MitchelDirks 16 днів тому ⁺¹
dude, genius! i didnt think about this. I have a server personally that has 192ish and might use this method lol
@DigitalSpaceport 14 днів тому ⁺¹
Redis/valkey sounds like a great option for this!
@ToddWBucy-lf8yz 14 днів тому
@@DigitalSpaceport yup. I use a similar approach for swapping datasets in and out of VRAM during fine-tuning and have even put my whole RAG in VRAM via lsync (It works but no way I would put it production professionally) and that defiantly helped speed things up quite a lot.
@ardaxify 8 днів тому
Did you give multiple images and try to retrieve the correct one with your query? That would be an intesting experiment. I wonder the how many images it can handle at most. Thanks for your series btw
@irvicon 11 днів тому
Thank you for your video. Could you please tell me if you have tested this configuration on the Llama-3.1-8b or Llama-3.2-3b text models? It would be interesting to know the performance figures (tokens/sec) on your tests 🤔.
@mariozulmin 16 днів тому ⁺¹
Thanks nearly my setup! Did you go with pci passthrough to an vm or to an lxc?
The card is pretty good for daily tasks and some low power consumption.
Also 3.2 vision is at the moment really good for what i use it, mine takes about 170W on full load though 😅
@DigitalSpaceport 14 днів тому
So in this demo I went with the VM and passthru as it "just works" with no cgroups funkiness but in a stable system I always go with LXC. Plus you can use it for other tasks but if it crashes out of VRAM with like a lot of tasks it doesnt recover gracefully. I need to figure that out but yeah 3.2 vision is wild stuff.
@computersales 17 днів тому
Interesting build. Funny you make this video not too long after I recycled a bunch of them. It would be nice if people found more uses for stuff older than 8TH gen. These older machines are still perfectly usable.
@DigitalSpaceport 17 днів тому ⁺¹
Im testing out a maxwell card this weekend, a M2000. I bet its going to suprise me!
@computersales 17 днів тому
@DigitalSpaceport it would be interesting to see a functional ultra budget build. Curious how much cheaper that this setup you could get. The Dell T3600 with 625W PSU are really cheap now.
@DigitalSpaceport 17 днів тому ⁺¹
The power pin for a GPU tends to dictate I have found and a must to get enough vram cheaply. A strong contender that is even cheaper could be based off an HP workstation class but wow I do not like their bios at all. I have a note that says so taped to my second monitor in case I forget but it could bring costs down. I think 7th gen intel is desirable cutoff as that iGPU performs the vast amount of offload needed to have a decent AIO media center box also. Does a 3600 have a power 6 pin?
@computersales 17 днів тому
@@DigitalSpaceport T3600 has two 6 pin connectors if it is the 635W config. 425W config doesn't support powered GPUs though. There also can be some clearance issues depending on the GPU design. Looks like they bring the same price as the 3620 though so might not be worth pursuing.
@alcohonis 17 днів тому ⁺¹
Can you do a AMD test with a 7900 variant. I feel that’s more affordable and realistic when it comes down to $ to VRAM ratio.
@DigitalSpaceport 17 днів тому ⁺²
The amount of requests I am getting for testing AMD GPUs does have me strongly considering buying one used to see. I had a friend that was going to lend me one but then they sold it. Possibly test this out soon.
@alcohonis 17 днів тому ⁺¹
I completely understand. I would love to see an AMD build so that we don’t have to offer our kidneys to the Nvidia gods.
@slavko321 3 дні тому
@@alcohonis 7900xtx, accept no substitute. Ok maybe W7900. Or instinct.
@Micromation 11 годин тому
Can you use multiple 3060s for this? I mean does it support memory pooling or is the model limited by VRAM capacity of a single GPU? Sorry if this is dumb question but this is not my field (and in 3D rendering with CUDA and OptiX can't pool memory on consumer grade cards)
@jamesgordon3434 16 днів тому
I would guess by the fact if you ask multiple things the LLM processes them all at once, the vision is the same and doesn't read left to right nor right to left but processes the entire sentence all at once. 29:14
@DigitalSpaceport 16 днів тому
Okay but check this out. It says -80 at first, but that screen looks like that if read rtl. The - is a small case watt. Its 08watt on the screen. Im testing the big one today so will investigate further on it.
@klr1523 17 днів тому
18:02 I thought it might be referring to the F-connector and is not registering the white Cat-6 cable at all.
Maybe try again using a Cat- with a contrasting color...
@DigitalSpaceport 17 днів тому ⁺¹
Good point! I am also now convinced it is reading RTL and not LTR on LCD screens which is weird.
@xlr555usa 16 днів тому
I have an old dell i7-4760 that I could try pairing with a 3060 12gb. I have run llama3 on just a i5-13600K and it was usable but a little slow.
@DigitalSpaceport 14 днів тому
Was it the new llama3.2-vision 11b? What tokens/s did you get?
@TokyoNeko8 7 днів тому
What was the inference speed for text gen. can you ask it to write 500 words story and see the llama stats?
@neponel 2 дні тому
can you look into running multiple mac mini m4 in a cluster? using exo for example?
@DigitalSpaceport 2 дні тому
That is an expensive ask and unfortunately this YT channel earns um lets say not even remotely close to enough to have a budget for R and D that would be in the 10K range for a quad setup. I have a hard time even getting people to subscribe, much less sign up for a membership or anything.
@Alejandro-md1ek 12 днів тому
Interesting
@JoeVSvolcano 17 днів тому ⁺¹
LoL, Now your speaking my langage! Until 48GB Vram cards under 1000 become a thing anyway 😀
@DigitalSpaceport 17 днів тому ⁺¹
Yeah this 3060 is pretty sweet. I wish there was a 12 or 16GB vram slot powered card that was cheap but maybe in a few years. 20 t/s is totally passable and the base model this is strapped to is pretty decent.
@mariozulmin 16 днів тому
@@DigitalSpaceportyes and affordable too, sad there is no 16G Version for just a little more. The price gap between 12-24 is just insane if just used for AI
@NLPprompter 15 днів тому
could you please test this build with localGPT vision github, that repo had several vision model to test with seeing how each model perform on RAG with such build might really interesting because this kind of RAG were really different to image to text to vector, this system image to vector. different architecture
@DigitalSpaceport 14 днів тому ⁺¹
Im looking at this now and I like the idea fewer steps in RAG. Img2txt getting the boot would be awesome.
@NLPprompter 13 днів тому
@DigitalSpaceport awesome, glad to know you are into the concept of "image to vector" instead of "image to text to vector" i believe in future having a model be able to handle both without losing speed in consumer hardware would be game changing, since both architecture have their pro cons. thanks for your videos mate.
@DigitalSpaceport 10 днів тому ⁺¹
Yeah I do like the concept and having been a long time user of unpaper/tesseract its indeed extra steps that avoiding would be ideal.
@thanadeehong921 16 днів тому
I would love to see the same test on fp16 or 32. Not sure if it gonna give more accurate responses.
@DigitalSpaceport 14 днів тому
I do plan to test the 90b-instruct-q8_0 which is 95GB (4 3090 gonna be close) and also the 11b-instruct-fp16 is only 21GB so might give that a roll also. I think the meta llama series of models caps at just fp16 or am I overlooking something?
@tungstentaco495 17 днів тому ⁺³
I wonder how the Q8 version would do in these tests. *Should* be better.
@DigitalSpaceport 17 днів тому
I do plan on testing the q8 in the 90 so we should get a decent hi-lo gradiant. If that is signifigant I will revisit for sure
@DIYKolka 14 днів тому
Ich verstehe nicht, wofür ihr diese Modelle benutzt. Kann mir das vielleicht einer erklären was der nutzen ist?
@milutinke 16 днів тому
It's a shame Pixtral is not on ollama, also it's a bigger model.
@DigitalSpaceport 14 днів тому ⁺¹
I agree but I think there is a way to make it work with the new ollama HF running. Would need to manually kick that off but I think it could work.
@i34g5jj5ssx 16 днів тому
I understand appeal of 3060, but why everyone ignore 4060ti 16gb?
@DigitalSpaceport 16 днів тому ⁺³
Im not ignoring it myself, at msrp its a rather good card. I just cant afford to buy 1 of all the things so thats why its not reviewed here.
@meisterblack9806 16 днів тому
hi will you try llamafile on threadripper cpu not gpu they say its really fast
@FSK1138 16 днів тому
i5/ i7 10th gen
Ryzen 5 / 6 5th gen
better price/ watts
@nhtdmr 17 днів тому ⁺¹
No body should give their Ai Data or Researches to big providers. Keep your data in Local.
@DigitalSpaceport 17 днів тому
Fully agree! The data the collect on us in addition to paid fir services is absolutly silly.
@NLPprompter 15 днів тому
fully local yes be careful of API too some model still send data
@genkidama7385 16 днів тому
these "vision" models are so bad and unreliable for anything. need to be way more specialized and fed much more samples to be of any value. spatial relationships are completly wrong. blob classification/recognition is weak. i dont see any use for this unless very very basic tasks. i dont even know if any of this can be put to production due to unreliability.
@DigitalSpaceport 16 днів тому
I am about to start testing out the big one here and hope for a lot of improvement. I just want to be able to read an LCD thats very clear which seems like it should be a small hurdle.
@zxljmvvmmf3024 17 днів тому
Yea 350$ + GPU lol. Stupid clickbait.
@DigitalSpaceport 17 днів тому ⁺³
No. It is $350 with the 3060 12GB GPU and I dont clickbait like that. I do clickbait of course but not some outright lie as you are stating.
@andreas1989 17 днів тому ⁺¹
❤❤have a happy weekend brother and followers !!!
@andreas1989 17 днів тому ⁺¹
❤❤❤first comment .
Love ir videos man
Love from sweden
@FaithMediaChannel 14 днів тому
Thank you for your video. I will share it with other people and other work organizations and put you on our list as preferred content providers for those who wanna do it yourselves again thank you for your video. It is so easy and you’re very detailed and the explanation Nolly in the application deployment as well as the hardware configuration.
@DigitalSpaceport 14 днів тому
Oh thank you I appreciate that a lot.

Наступне

Автоматичне відтворення

Qwen Just Casually Started the Local AI Revolution