Accelerating AI: Adding a 2nd Nvidia P40 GPU to Dell R730 for Dual LLM Virtualization!

Поділитися
Вставка
  • Опубліковано 26 вер 2024
  • Embark on the next phase of our AI journey as we supercharge our Dell R730 server with dual GPU capabilities! In this video, I'll guide you through the process of seamlessly adding a second Nvidia P40 GPU to our setup, laying the groundwork for dual LLM virtualization.
    With each step carefully explained, from hardware installation to configuration adjustments, we're primed for a future of enhanced AI performance and expanded capabilities. Stay tuned as we unlock new horizons in LLM virtualization, harnessing the power of multiple GPUs for unprecedented AI innovation.
    Don't miss this moment in our AI adventure - join me as we accelerate into the future of AI with dual GPU LLM virtualization! #NvidiaP40 #DualGPUSetup #LLMVirtualization #AIInnovation #DellR730

КОМЕНТАРІ • 33

  • @wlgt3257
    @wlgt3257 5 місяців тому

    Nice! Doing my install right now! Thanks for again taking the time to make this video.

    • @MukulTripathi
      @MukulTripathi  5 місяців тому +1

      Awesome! I'll drop the Ubuntu 24.04 server headless version here soon.

  • @Nonagon-bc2wl
    @Nonagon-bc2wl 4 місяці тому +2

    Man did you installed a GPU without shutting down the power?! 😅

    • @MukulTripathi
      @MukulTripathi  4 місяці тому +1

      Haha, I can totally see why it appears like the servers were on because some lights were glowing in there. On these Dell servers if the power cable is plugged in even though if the server is turned off there is a interface that still stays up and lets you connect to the server remotely it is called IDRAC. So the servers were off but just by the virtue of the fact that the power cables were plugged in those lights were glowing.
      In an ideal scenario I can unplug the power cable to have no lights turned on. Realistically I have never run into an issue with the ID rack being on while I'm swapping GPUs. it makes no difference but I would suggest unplugging the power cable to just so that we do not touch any electronic component.

    • @Nonagon-bc2wl
      @Nonagon-bc2wl 3 місяці тому

      Now I have a dell r730 with a tesla P40 and a tesla P100, I'm waiting for the power cables and two 1100w power supply to have it functioning. I used your video as reference, thanks. I will do some 7b/13b LLM fine tuning with this machine. Hoping that 40gb of VRam will be enough

    • @MukulTripathi
      @MukulTripathi  3 місяці тому

      I'm glad it helped. I made more video on the OpenVoice V2 on this server. I think p100 supports fp16 so you should be good there.
      You should watch the new open oice video too. I'm currently making "Jarvis" and it's almost ready.

    • @callmebigpapa
      @callmebigpapa 2 місяці тому

      @@Nonagon-bc2wl Arent those inference card not tuning cards?

    • @Nonagon-bc2wl
      @Nonagon-bc2wl 2 місяці тому

      @@callmebigpapa P100 is designed to perform well in tuning, P40 for inference. But are quite old cards, I have an RTX 3080 in my work PC that is more powerful than the two combined together, but I can't do tuning due to limited VRAM. I'm just experimenting and I managed to get 40 GB of VRAM for 350 us dollars. At the date of today the only difference is that tuning is very slow, but I can wait it's just for learning purpose.

  • @gamer4et
    @gamer4et 4 місяці тому +1

    Looks awesome!
    Is dell's riser power supply enough for tesla p40?

    • @MukulTripathi
      @MukulTripathi  4 місяці тому

      It is actually. It's 225 watts and then some (75 watts) from riser too if I'm not wrong. P40 is perfectly built for that slot!

    • @gamer4et
      @gamer4et 4 місяці тому

      @@MukulTripathi Thank you for info!

  • @liamgibbins
    @liamgibbins 2 місяці тому +1

    what wattage does that server pull?

    • @callmebigpapa
      @callmebigpapa 2 місяці тому +1

      My r730/256gb/2ssd with 1 P40 runs at 140 at idle with Ubuntu 22.04. Running Oobabooga and A1111 and few others if you were wondering.

  • @tsclly2377
    @tsclly2377 3 місяці тому

    Where are the SLI cables? I think you forgot that essential component... and don't forget the code required (I don't think NVLink code is going to be the same).

    • @MukulTripathi
      @MukulTripathi  3 місяці тому

      Haha, true. They are expensive and hard to find! So I didn't use them. I am however investing in a multiple 3090 GPU rig.

  • @mnm972
    @mnm972 4 місяці тому

    Hi,
    Can you please write the link where you bought the cable for the P40?
    Also did you make any game bench on the cards?
    Thanks!

    • @MukulTripathi
      @MukulTripathi  Місяць тому

      This is the one I bought, but I can't recommend it. I bought a bunch and a few didn't work.
      GinTai 8(pin) to 8(pin) Power Cable Replacement for DELL R730 and Nvidia K80/M40/M60/P40/P100 PCIE GPU a.co/d/3NdU0rf

  • @jacksonpham2974
    @jacksonpham2974 2 місяці тому

    Hi,
    Are you running Esix 8,U3? and possible to shared GPU to VMs Windows 10?

    • @MukulTripathi
      @MukulTripathi  2 місяці тому

      Works on both esxi7 and 8. On Windows I don't know if you can go a GPU passthrough, but you can install ollama directly on Windows.

    • @jacksonpham2974
      @jacksonpham2974 2 місяці тому

      @@MukulTripathi I already put the P40 inside R730 and just running the ESIX 7, but, the fans seem go crazy (a bit loud sound from fan) even I still dont do anything for loading. Do you think it is normal running with loud sound with the P40?

    • @jacksonpham2974
      @jacksonpham2974 2 місяці тому

      @@MukulTripathi Do you have any problem with the big sound from fans, before and after the P40 inside server?

    • @MukulTripathi
      @MukulTripathi  2 місяці тому +1

      So they spin up fast initially and then they slow down. Make sure that your cover is closed otherwise the system detects that the airflow isn't optimized and it hits fans in full speed.

  • @dleer_defi
    @dleer_defi 2 місяці тому

    Awesome! I just picked up an r730xd for a local llm server and I am trying to decide on GPUs. I am thinking dual RTX A6000 is the best for me.
    How do you like the performance of your P40s?

  • @6GaliX
    @6GaliX 4 місяці тому

    What LLM Models are you planing to run?

    • @MukulTripathi
      @MukulTripathi  4 місяці тому +1

      48 GB VRAM gives me the flexibility to run llama3 70b model. That's what I'm most excited about. Watch my other videos to see what all I run!

    • @lknhd
      @lknhd 26 днів тому

      ​@@MukulTripathiif i understand correctly, llama3 70b needs ~130 gigs of vram to run

    • @MukulTripathi
      @MukulTripathi  26 днів тому

      The Q4 quantized version of 70b llama3.1 runs on 40-44 GB of VRAM. Please see the quantization video that I have done.

    • @lknhd
      @lknhd 26 днів тому

      @@MukulTripathi how do you quantize a model in ollama though? sorry i just started playing with llm this week

    • @MukulTripathi
      @MukulTripathi  26 днів тому

      @@lknhd even though there are ways to quantized the models, you don't have to do it by yourself. If you go to ollama library online there are pre quantized models there. The default model on ollama website is q4 quantized.

  • @iHateMyEyeBrow
    @iHateMyEyeBrow 4 місяці тому

    are you using NVLink for the LLM use-case here with the GPUs? Also which Ubuntu version are you running? Interested to know, looks like a cool project.

    • @MukulTripathi
      @MukulTripathi  4 місяці тому

      No I am not using nvlink. It definitely is an interesting and relatively cheap project for the amount of vram that we get out of it!