SUPER Cheap Ai PC - Low Wattage, Budget Friendly, Local Ai Server with Vision

Digital Spaceport

Додати в
- Мій плейлист
- Переглянути пізніше
Поділитися

Поділитися

Вставка

Розмір відео:

Показувати елементи керування програвачем

Автоматичне відтворення

Автоповтор

Опубліковано 11 гру 2024

КОМЕНТАРІ • 103

@dorinxtg 8 днів тому ⁺²³
Looking at the price: 2XP2000 will cost you around $200 (+shipping) while a new RTX 3060 12GB will cost you $284 from Amazon (+shipping), so for around $84, why should someone buy the 2 P2000 cards? I'm pretty sure that RTX3060 will smack out the dual P2000
@christender3614 8 днів тому ⁺²
I guess bc it’s 4 GB more of VRAM so you should be able to use slightly larger models. That being said, I think I’d go with the 4060 as well.
@DigitalSpaceport 8 днів тому ⁺¹⁰
I think I mentioned that in the video but yes the 3060 12GB is an all around better the card vs 2 P2000. The script was out the door when I pivoted to testing other cards so it was likely muddled as a point. Always when I write a script before testing.... but the M2000 will be the stand-in for the current cheapest rig I could figure out. Its for sure worth it to go with the 350 rig and 3060 12GB if someone can.
@tomoprime217 7 днів тому ⁺¹
@@DigitalSpaceport damn did you see intel's new Battlemage gpu? It drops in stores in a couple of weeks. The Arc B580 has 12GB of vram at $250! It improves efficiency on that front, using tricks like transforming the vector engines from two slices into a single structure, supporting native SIMD16 instructions, and beefing up the capabilities of the Xe core’s ray tracing and XMX AI instructions. I don't what the previous A770 16GB graphics card standing is but it may get a price drop soon as a result. It's already $259 at microcenter.
@DigitalSpaceport 4 дні тому
Yes I wish it was a 16GB but I will prolly snag one to test. I hope they have fixed their idle wattage issues also, my a750 is a power eater!
@Jason-ju7df 4 дні тому
@@DigitalSpaceport REALLY WANT to see you test 2x Arc B580 for 24GB of vram
@Choooncey 8 днів тому ⁺¹⁰
now that intel battlemage is out i bet they will be more price competitive with dedicated AI cores
@DigitalSpaceport 7 днів тому ⁺³
Just watched the GN breakdown and looks like an interesting option and a good price point
@rhadiem День тому
10gb and 12gb VRAM is not worth it. Get a $100 M40 24gb gpu if you want cheap AI. They're slow but work fine. VRAM is king imho. Lots of stuff is made for 24gb VRAM.
@clomok 7 днів тому ⁺³
I have been playing with Ollama on an AMD Ryzen 5900hx with 32gb of DDR4-3200 RAM and I ran the same models (with my RAM already over 65% taken using other stuff)
And got 8-9 tokens/s with minicpm-v:8b and have been happy with the 17-19 token/s I can get with llama3.2:3b
@JoshuaBoyd 8 днів тому ⁺⁵
I was excited that you went from k2200 to m2000 to p2000. If you would have stopped at k2200 I would have been really disappointed.
@DigitalSpaceport 8 днів тому ⁺¹
The K2200 is disappointing but I was surprised at the M2000. I meandered a bit in some explanations as this all popped up and evolved outside of my bullet points but yeah if I get curious I will track it down if I can. This one I feel like I chased performance pretty well, but I am still wanting to know the why behind the K2200 to M2200 differences. I need to learn more.
@JoshuaBoyd 8 днів тому ⁺¹
@@DigitalSpaceport The M2000 does has better FP32 performance and about 25% faster memory performance. There is also the CUDA Compute Level 5.0 versus 5.2 difference. I haven't seen anything explaining what instruction level differences there are between the two. It would be cool to really locate all causes for the performance difference though.
@DigitalSpaceport 8 днів тому ⁺²
There is a tool for benching models if you have their shape Ive yet to go down but looks good for comparisons like this. It may be a real rabbithole but Im interested in the raw perf numbers. Maybe of interest. github.com/stas00/ml-engineering/tree/master/compute/accelerator/benchmarks
@sebastianpodesta 8 днів тому ⁺³
Great video! That was fun
@DigitalSpaceport 8 днів тому
It was a hard pivot mid video mentally for me to buy into, but rolling the dice worked. It came out decent. Thanks!
@jensodotnet 2 дні тому ⁺¹
I tested the minicpm-v:8b on a gtx 1070 ~37 t/s, and on a rtx 3090 ~92 t/s. Using this prompt: "Help me study vocabulary: write a sentence for me to fill in the blank, and I'll try to pick the correct option." ~5,5gb vram. Default values. Tested with an image and prompt "explain the meme" using an image and got ~34t/s (gtx1070) and 97t/s (rtx 3090) the image was resized to 1344x1344
@ebswv8 3 дні тому ⁺¹
Thanks for the videos. I am looking to build a home AI server for ~$1000 or less. Would love to see a video on what you could build for around that price range.
@DigitalSpaceport 3 дні тому ⁺²
Good news, i'm working in that video already and I think its a price that gets a very capable setup. Out in days.
@ebswv8 3 дні тому
@@DigitalSpaceport Looking forward to it. I am a Software developer by trade and have been working to learn more about the hardware side of things. Thanks again for the Videos. You have gained a subscriber.
@jk-mm5to 8 днів тому ⁺²
I have two titan xp's languishing. They may have a new purpose now.
@tomoprime217 8 днів тому ⁺⁵
What about AMD gpus? Haven't they made progress for AI and cuda alternatives?
@DigitalSpaceport 8 днів тому ⁺⁴
Yes I have read they are doing better on the sw front, but still have some stability issues. I do plan to snag in some AMD cards for testing when I can, just dont have money to buy one of everything really. It will happen.
@Nettlebed7 2 дні тому ⁺¹
20-80 watts? This means live 24/7 classification of persons on your Ring is not only technically feasible but also financially acceptable.
@ChrisCebelenski 5 днів тому ⁺¹
My 16GB 4060 TI clocks in around 31 tps on this model (single card used). I've seen these for around $400 USD, so price/performance ratio is on par, but overall system price is higher. And you get 16GB of VRAM, which is going to be the limiting factor with the cheaper cards even if the performance is OK for you.
@DigitalSpaceport 5 днів тому
Hey can you see if your 4060ti's can fit the new llama 3.3 in and at what context? It is a great model, excited for you to try it.
@ChrisCebelenski 5 днів тому
@@DigitalSpaceport Just started playing with it - default settings I'm getting about 6 tps. I'll try and up the context, but for some reason I'm getting flaky malfunctions with multiple models lately when playing with the settings. I hope that settles down with some updates. Also my models never unload, which is minor-level annoying. (Yes, I think I have the flags set correctly...)
@pxrposewithnopurpose5801 4 дні тому
this guy is built different
@elliotthanford1328 6 днів тому ⁺¹
I would be interested to see how a tesla p4 or 2 does, especially as they are around $100 especially when compared to a 3060
@C-141B_FE 8 днів тому ⁺²
There is the 3DFX card?
@mitzusantiago 6 днів тому ⁺²
Hi! I really enjoyed your video. I'm trying to do some experimental work (research) with local AI models (I'm a teacher). What is your opinion about using Xeon processors (like the ones that are sold in AliExpress) plus a graphic card like the ones that you presented? Is the Xeon processor necessary or Can I choose any other processor? (like a Ryzen plus an nvidia card). Greetings from Mexico.
@UCs6ktlulE5BEeb3vBBOu6DQ 8 днів тому ⁺²
My dual P40 + A2000 use 550w at idle lol Keeps me warm
@DigitalSpaceport 8 днів тому
Its "Free" heat if a workloads running 😉
@TazzSmk 8 днів тому ⁺²
19:53 - so would you recommend 3060's over 1080Ti's, or what kind of price would make 11GB Pascals an interesting value?
@UCs6ktlulE5BEeb3vBBOu6DQ 8 днів тому ⁺¹
stay away from Pascal, most models use FP16 and Pascal power is 90% in the 32 instead of the 16.
@DigitalSpaceport 8 днів тому ⁺⁴
I do like the 3060s 12GB vram. That extra 1 GB really does matter. Id sell the 1080ti while you can and move on up.
@UCs6ktlulE5BEeb3vBBOu6DQ 8 днів тому ⁺⁴
@@DigitalSpaceport Those that can (have another gpu for desktop) have modest benefit by setting the nvidia gpu in TCC mode instead of WDDM mode. So you get to use 95+ % of the vram for compute instead of 80+ % because of OS reserved memory. It can be the difference between 16k context and 32k or Q4 or Q5 quant.
@DigitalSpaceport 8 днів тому
Hey now thats news to me 😀 Im looking into this asap, thx for sharing!
@UCs6ktlulE5BEeb3vBBOu6DQ 8 днів тому ⁺²
@@DigitalSpaceport once that you set that gpu to TCC mode, it cant display image until you set it to WDDM again (reboot does reset to WDDM unless you make the change persistant)
@patricklogan6089 8 днів тому ⁺¹
Great stuff. Tnx
@andAgainFPV 7 днів тому ⁺¹
stoked i found your channel! I'm considering using Exo to distribute a llm across my families fleet of gaming pcs, however not sure on the overall power draw. Thoughts?
@FSK1138 8 днів тому ⁺²
i am having a good time with ryzen mini pc 5th and 6th gen are CHEAP you can add a m.2 to pci adapter for egpu and you can max max out the ram of the igpu in the bios
@DigitalSpaceport 8 днів тому ⁺¹
Does the m.2 to pcie adapter need an external power supply? I might buy one here for my unraid nas. It could use a proper cuda card.
@jelliott3604 8 днів тому
Have a ryzen 7 (5800H) apu with 64GB of RAM (48 dedicated to the GPU) and it works surprisingly well.
Recently bought a HP Victus E16 motherboard (only) with the same APU plus a 3060 on the board (really it's 1/2 a 3060 - has 6GB of VRAM) that I have just gotten powered up and am hoping will be interesting - or at least cost effective for a £140 outlay (as i already have the RAM, SSD etc)
@lovebutnosoilder 5 днів тому ⁺¹
Could I use a 4x x4 bifurcated pcie slot adapter and squeeze 5 gpus in the pc?
@SunnyCurwenator 6 днів тому ⁺¹
Sorry, a really basic question from me; puns unintended. What are you using to collect reliable stats on power consumption (watts). We have Threadrippers and we're considering a couple of 4090s, but one question relates to having good metrics on power usage at idle and peak usage. Then we can begin to track and compare power costs. What have you found that works? Thanks in advance. Sunny
@DigitalSpaceport 6 днів тому
In the videos im peeking at a killawatt. If your gathering metrics you can use nvidia tooling to drop that out to influix. I forget the name of it but its fairly searchable. That would be useful to check around guthub for.
@Dundell2 День тому
Most people I believe if you're just trying to track GPU wattage use, you can create a script of job to track nvidia-smi, and set the power levels of your RTX 3090's down to an acceptable wattage with some performance loss, until you hit an efficient rate. Something like nvidia-smi -pm 1 and nvidia-smi -p 250
I set my RTX 3060's to 100w max for all 4 cards. It's a decrease from their usual spikes of around 145w during inference, with around 10% speed loss, but 45w spike inference savings, and they never got past 70C
@ARfourbanger2000 3 дні тому ⁺¹
Does the Del 7050 have power connectors to support a 3060? Also, what would the difference be in power consumption? Just curious, thanks!
@DigitalSpaceport 3 дні тому ⁺¹
No unfortunatly the 7050 doesnt. The wattages are nearly identical at idle however but the peak during use is higher on a 3060 but the work is done faster. Ive seen the 3060 in the 3620 peak at 130 watts but the 7050 only hit near 100watts.
@Act1veSp1n 8 днів тому ⁺¹
I'm running Ollama UI on Proxmox with 1070 - its not bad. The 1070s are in low $100 USD rate. But you will probably do much better with 3060 12Gb/4060ti Super 16GB
@Act1veSp1n 8 днів тому
If anybody wondering - 1070 runs at 36 tokens per second. The Wattage pulled while idle = 36W (Intel 13500)
@DigitalSpaceport 8 днів тому
Oh yeah I did test a 1070ti out in an older video which unfortunately had bad audio. A card a lot of ppl have sitting around also which can still perform really decently for a power pin capable setup. ua-cam.com/video/Aocrvfo5N_s/v-deo.htmlsi=YhmtIDi5C0JGyRL9&t=569
@beprivatecdblind7831 7 днів тому ⁺¹
How hard is it to get Invoke AI to use dual GOU, could you use an RTX 4060 8GB and a RTX 3060 12GB to get 20GB of VRAM, or would it be better to use 2 4060's?
@HunterDishner 8 днів тому
It'd be cool to look at a K5200 8GB card. I'm seeing those used at like $70
@DigitalSpaceport 8 днів тому
I feel like Kepler, especially after this video, is a bridge too far on the performance side at this point. Its the bottom of the supported list also for llama.cpp/ollama so I cant think it hangs in for a lot longer on the software support side.
@Dundell2 День тому
P102-100 10GB mining cards I think you can still get sub~$45? 2 of these together can probably push IQ3 QwQ 32B with a decent amount of context in llama.cpp, and might be around $90~140 total GPUs.
+ basic.. Really anything since I believe they run pcie3.0@4 lanes each. They hit a decent inference level being pascal around GTX 1080 inference speeds.
@myna2mac 3 години тому
really a basic question - can I mix and match Intel CPU and Nvidia GPUs or AMD CPUs with new Intel GPU.
@michaelgleason4791 2 дні тому
If I only need a language model when I'm using my gaming/main PC, is there a point in having a dedicated LLM server? Is VRAM the end all be all? I have a 10GB 3080.
@lvutodeath 8 днів тому
What about AMD GPUs and APUs?
Can I make a video request.
@andrewcameron4172 8 днів тому ⁺¹
Try the Tesla P4 Gpus
@DigitalSpaceport 8 днів тому
Okay I have one of those here. Gotta toss a fan on it but good call.
@andrewcameron4172 8 днів тому
@DigitalSpaceport I 3d printed the fan housing for mine
@DigitalSpaceport 8 днів тому
I had some printed but failed to find a good and cheap fan option for them. Did you happen to get fans that are not coil whine prone?
@andrewcameron4172 7 днів тому
@@DigitalSpaceport My fan is very loud and noisy but it does not bother me as it's in a room that is not occupied.
@pauljones9150 7 днів тому
Love this video script
@mejesster 7 днів тому
What cards are most efficient in terms of tokens per watt in your experience?
@DigitalSpaceport 7 днів тому
I think base idle has to be considered also, so intels cards are out on that alone. The 3xxx series and 4xxxx series all have great idles that scale with oddly the amount of vram its looking like in my analysis. I strongly recommend a 24GB card if a person can afford it as the experience is unmatched, and spexifically a 3090 unless you want image generation at max speeds. Inference is close to the same as the 4090. That said the 3060 12G is very fast and I recommend avoiding all 8GB cards unless you already have them. The 16GB 4060 is likely to be a strong contender as well.
@joelv4495 6 днів тому
IMO Apple Silicon Macs are the best for power efficiency. Not the best for capital cost or outright speed though.
@DIYKolka 10 годин тому
Guys, for what i can usw the ai If i Run IT local, i dont See any uswcase ?
@patrickweggler 8 днів тому
Can you mix and match different cards?
@DigitalSpaceport 8 днів тому
You can to gain vram for model storage, but your performance is that of the slowest single card always. So if you mixed a K2200 and a P2000, the tps would be that of the K2200.
@patrickweggler 8 днів тому
@DigitalSpaceport thx. I have a 1080ti and two 1030, so it would be better to ignore the two small one and just use the 1080?
@notaras1985 6 днів тому
What's a proxmox server?
@ChrisCebelenski 5 днів тому
Hypervisor server - for running virtual machines. As opposed to a desktop machine that is a physical machine. Proxmox is good for sharing a machine among many tasks.
@jbaenaxd 8 днів тому ⁺²
Could you share that cat meme? 😅
@DigitalSpaceport 8 днів тому
Breaking software or fiber drops?
@jbaenaxd 8 днів тому
@@DigitalSpaceport Breaking software please 😂
@DigitalSpaceport 6 днів тому
ua-cam.com/users/postUgkx_9IiU9QQk6J0EHlQ9yOmz4FO0da1Zv1-?si=eu8llgtJqqiIMyTg 🙀
@jbaenaxd 5 днів тому
@@DigitalSpaceport thanks for that. I really appreciate it ❤️
@CatalystReaction 8 днів тому
now try tpus on a 4x4 carrier card
@CARTUNE. 7 днів тому ⁺¹
Thank you for this. I've been looking for ideas for a viable $200 - $400 ultra budget rig to get my feet wet. This is right in that range. lol
@MM-vl8ic 8 днів тому
Thanks again for the "Poof" of concept.....
@rbwheels 8 днів тому
im running mine with RTX 4060 8GB
@DigitalSpaceport 8 днів тому ⁺¹
I need to get one a 16GB one of those in the mix for testing!
@adamlois5574 4 дні тому ⁺¹
This video is pretty pointless because 8gb vram is nothing at all when it comes to running AI. Like, sure if you build your pc from outdated and nearly unusable parts then sure you can make it cheap.
What I'd like to see is a video showing how to cheaply make a pc using 2x m10's or 2x m40 tesla gpus.
@DigitalSpaceport 4 дні тому
Small models are pretty good now, however P40s would be a safer longevity bet as they are cuda12.
@Dundell2 День тому ⁺¹
There are some uses for a 8GB Pascal GPU if you got one. smaller models that can still hit 20+ t/s for 7~8B models with small fine tunes, roleplay, visual support models, SDXL generators.
@JesusLd935 7 днів тому ⁺¹
So cheap and low power but fucking slow :( I have 4060 and it is fucking fast
@StefRush 8 днів тому ⁺²
4 x Intel(R) Core(TM) i5-2400 CPU @ 3.10GHz (1 Socket)
RAM usage 71.93% (11.22 GiB of 15.60 GiB) DDR3
proxmox-ve: 8.3.0 (running kernel: 6.8.12-4-pve)
NVIDIA geForce GTX 960 PCIe GEN 1@16x 4Gi
write python code to access this LLM
response_token/s:24.43
create the snake game to run in python
response_token/s:21.38
This is way faster than P2000 with just one GTX 960 card

Наступне

Автоматичне відтворення

The Perfect Home Server 2024 - 56TB, ECC, IPMI, Quiet & (kind of) Compact