Ai Server Hardware Tips, Tricks and Takeaways - Watch before you SHOP BF/CM

Поділитися
Вставка
  • Опубліковано 29 лис 2024

КОМЕНТАРІ • 53

  • @DigitalSpaceport
    @DigitalSpaceport  17 годин тому +1

    Writeup - digitalspaceport.com/homelab-ai-server-rig-tips-tricks-gotchas-and-takeaways/

  • @MeidanYona
    @MeidanYona 14 годин тому +1

    This is very helpful! I buy most of my hardeare from facebook marketplace and i often have to wait long spans between getting components so knowing what to watch out is very important.
    Thanks a lot for this!

  • @LucasAlves-bs7pf
    @LucasAlves-bs7pf 12 годин тому

    Great video! The most eye-opening takeaway: having two GPUs doesn’t mean double the speed.

    • @DigitalSpaceport
      @DigitalSpaceport  9 годин тому

      Hands down #1 question in videos. Not with llama.cpp yet but hopefully soon. Bigger models and running models on seperate gpus at the same time are the current reasons and running bigger models like nemotron is a big quality step. Or use vLLM which isnt as end user friendly as ollama/owui

  • @UnkyjoesPlayhouse
    @UnkyjoesPlayhouse 7 годин тому +1

    dude, what is up with your camera, feels like I am drunk or on a boat :) another great video :)

  • @danielstrzelczyk4177
    @danielstrzelczyk4177 15 годин тому +1

    You inspired me to experiment with own AI server based on 3090/4090. I did little different choices like: ASRock WRX80D8-2T + Threadripper Pro 3945wx. As you mentioned CPU clock speed matters and I got a brand new motherboard + CPU for around 900 USD. I also want to try OCulink ports (ASRock has 2 of them) instead of risers There are 2 advantages: OCulink offers flexible cabling and works on separate power supply so you are no longer dependent on a single expensive PSU. So far I see 2 problems: Intel X710 10gbe ports cause some errors under ubuntu 24.04 and Noctua NH-U14S is too big to close a Lian Lin 011 XL so I have to turn to an open air case. Can't wait to see your future projects.

    • @DigitalSpaceport
      @DigitalSpaceport  8 годин тому

      On the intel, if thats fiber x710, do you have approved optics?

    • @MetaTaco317
      @MetaTaco317 Годину тому

      @@danielstrzelczyk4177 I've been wondering if OCuLink would find it's way into these types of builds. Wasn't aware ASRock mobo had 2 ports like that. Have to check that out.

  • @christender3614
    @christender3614 18 годин тому +2

    Been waiting for that one and happy to write the first comment!

  • @coffeewmike
    @coffeewmike 5 годин тому

    I am doing a build that is about 60% aligned with yours. Total investment to date is $7200. My suggestion if you have a commercial use goal is to invest in the server grade parts.

  • @hassanullah1997
    @hassanullah1997 3 години тому +1

    Any advise on potential local server for a small startup looking to support 50-100 concurrent users doing basic inference/embeddings with small-medium sized models - 13B for eg.
    Would a single RTX 3090 suffice for this?

    • @DigitalSpaceport
      @DigitalSpaceport  2 години тому

      This is my guess, so don't hold me to it. I would start with figuring out exactly which model or models you want to run concurrently. You would want to set the timeout to those to be pretty long to avoid something like people coming back and all warming it up at the same time, so greater than 1 hour. I think you would be better off with 3 3060 12GBs if it would support the models that you intend to use. If you are looking for any flexibility, then start with a good base system and add 3090s as needed is the safest advice. If there is a big impact from undersizing, just go 3090s. Make sure to get a CPU that has a good fast single-thread speed. Adjust your batch size around as needed but the frequency of your users' interactions need to be observed in NVTOP of other more LLM specific specialized performance monitoring tools.

  • @thanadeehong921
    @thanadeehong921 11 годин тому +1

    I setup motherboard and epyc cpu just like you.
    May i ask if you can do it all over again, will you change any setup?

    • @DigitalSpaceport
      @DigitalSpaceport  9 годин тому

      Im wanting to get a 7f72 but they are expensive and I would need a pair. If i was scratch building I would likely have used an air cooler for cpu also. Maybe the h12ssl-i would be the board id go with since the mz32-ar0 has gone up in price a good bit.

  • @TheYoutubes-f1s
    @TheYoutubes-f1s 5 годин тому

    Nice video! What do you think of the Asrock Romed82t motherboard?

  • @dorinxtg
    @dorinxtg 14 годин тому +3

    I didn't understand why you didn't mention any of the Radeon 7xxx cards, nor ROCm

    • @ringpolitiet
      @ringpolitiet 10 годин тому +2

      You want CUDA for this.

    • @christender3614
      @christender3614 9 годин тому

      It’s preferable . AFAIK, Olllama isn’t yet optimized to work with ROCm. Would’ve been interesting though like “how far do you get with AMD”. AMD is so much more affordable per GB. Especially when you look at used stuff. Maybe that’s something for a future video, @DigitalSpaceport ?

    • @christender3614
      @christender3614 9 годин тому +1

      My comment vanished. Could you make a video on AMD GPU? Some people say they aren’t that bad for AI.

    • @DigitalSpaceport
      @DigitalSpaceport  8 годин тому +2

      I see two comments here and do plan to test AMD and Intel soon.

    • @slowskis
      @slowskis 6 годин тому

      ​@@DigitalSpaceport I have a bunch of A770 16tgb cards along with Asrock H510 BTC Pro+ motherboards sitting around. Was thinking of trying to make a 12 card cluster connect by 10gb network cards and 10900k for the cpu with the 3 linked to each other. Any problems you can think of that I am missing? 4 gpu per motherboard with 2 10 gb cards. The biggest problem I can think of would be the single 32gb ram stick that the cpu is using.

  • @hotsauce246
    @hotsauce246 16 годин тому

    Hello there. Regarding RAM speed, were you partially offloading the models in GGUF format? I am currently loading the EXL2 model completely into VRAM.

    • @DigitalSpaceport
      @DigitalSpaceport  8 годин тому

      No the model was fully loaded to vram this video tested multiple facets of cpu impact fairly decently. ua-cam.com/video/qfqHAAjdTzk/v-deo.html

  • @Boyracer73
    @Boyracer73 11 годин тому

    This is relevant to my interests 🤔

  • @claybford
    @claybford 10 годин тому +1

    Any tips/experience using NVLink with dual 3090s?

    • @DigitalSpaceport
      @DigitalSpaceport  8 годин тому +1

      Its not needed unless you are training but i need to test on my a5000's that have nvlink to not just be a parrot on that. I did try it out but messed up something iirc and got frustrated. Will give it another shot soonish

    • @claybford
      @claybford 7 годин тому

      @DigitalSpaceport cool thanks! I'm putting together my new 2x3090 desktop/workstation and I grabbed the bridge so I'll be trying it out soon as well

  • @Keeeeeeeeeeev
    @Keeeeeeeeeeev 9 годин тому

    more than ddr4/ddr5 & MTs probably the interesting takeawy would be single vs dual vs quad channels vs 8 channels performances

    • @Keeeeeeeeeeev
      @Keeeeeeeeeeev 9 годин тому +1

      ...maybe even more cache speeds and quantity....
      what are your thoughts?

    • @DigitalSpaceport
      @DigitalSpaceport  8 годин тому +1

      Forvsurevyou want tobwatch this video! Its the most in depth test on cpu impacts around and ive got a pretty crazy 7995wx in it 8 channels filled. ua-cam.com/video/qfqHAAjdTzk/v-deo.html

    • @Keeeeeeeeeeev
      @Keeeeeeeeeeev 7 годин тому

      @@DigitalSpaceport I missed that. tnx. whatching rn

    • @Keeeeeeeeeeev
      @Keeeeeeeeeeev 7 годин тому

      same thoughts...faster cache and higher amounts would be my bet both on cpu and gpu.
      If I'm not getting something wrong the fastest gpus running llm ( both older and newers models) seems to be those with higher cache, higher memory bandwidht and bigger Memory Bus sizes.
      of course TFlops do count but to lower extent

  • @mrrift1
    @mrrift1 7 годин тому

    What are your thoughts on getting 4 to 8 4060 ti with 16g vram?

    • @DigitalSpaceport
      @DigitalSpaceport  7 годин тому

      64GB VRAM is a very solid amount that will run vision and nemotron easy at q4 and not a bad card at all for inference.

  • @christender3614
    @christender3614 9 годин тому

    The most difficult decision is how much money to spend for a first buy. I’m kinda reluctant to get a 3090 config not knowing if I’ll be totally into local AI.

    • @DigitalSpaceport
      @DigitalSpaceport  8 годин тому +1

      3060 12GB is a good starter then. If you want img video gen heavy, 24GB is desirable. Local AI is best left running 24/7 in a setup however to really get the benefits with integrations abound in so many homeserve apps now.

    • @VastCNC
      @VastCNC 58 хвилин тому

      Maybe rent a vm with your target config for a little before you start building?

  • @canoozie
    @canoozie 9 годин тому

    My RTX A6000s idle at 23W so yeah, always on is expensive depending on your GPU config. I have 3x in each system, 2 systems in my lab.

    • @DigitalSpaceport
      @DigitalSpaceport  8 годин тому +1

      Mmmmmm 48GB vram each. So nice!!!

    • @canoozie
      @canoozie 8 годин тому

      @@DigitalSpaceport Yes, they're nice. I was looking for a trio of A100s over a year ago and couldn't find them, so instead, I bought 6 A6000s because at least I could find them.

    • @DigitalSpaceport
      @DigitalSpaceport  8 годин тому +1

      If you think about it... I avg 10-12w per 3090 24gb the 23w per a6000 48gb seems to scale. Maybe idle is tied to vram amount also?

    • @canoozie
      @canoozie 8 годин тому

      @@DigitalSpaceport That could be, but usually power scales with # of modules, not size. But then again, maybe you're right, because I looked at an 8x A100-SXM rig a while back, it idled each GPU between 48-50W and had 80GB per GPU.

    • @DigitalSpaceport
      @DigitalSpaceport  8 годин тому +1

      @canoozie my 3060 12GB idles 5-6 hum. Interesting. Also now im browsing ebay for A100s. SMX over pcie right? Im prolly not this crazy.

  • @Keeeeeeeeeeev
    @Keeeeeeeeeeev 8 годин тому

    can you mix together amd and nvidia gpus on inference?

    • @DigitalSpaceport
      @DigitalSpaceport  8 годин тому +1

      Great question. Will test when I get an amd card.

  • @FabianOlesen
    @FabianOlesen 2 години тому

    i want to suggest a slight lower tier, 2080 Tis that have been modified with 22G memory, running 2x system

  • @marianofernandez3600
    @marianofernandez3600 11 годин тому

    what about cpu cache?

    • @DigitalSpaceport
      @DigitalSpaceport  8 годин тому

      Doesnt seem to impact inference speed interestingly but would need engineering flamegraph to profile it really. Not a top factor for sure.