Can you run an AI LLM on an old server?

Поділитися
Вставка
  • Опубліковано 26 вер 2024
  • Old servers are cheap, especially the memory meaning you can get a lot of RAM at a low cost. Does this make it viable to run AI language models at a low cost?
    Ollama:
    github.com/jmo...
    Issue relating to AVX:
    github.com/jmo...
    PDF for the server I have:
    www.bargainhar...

КОМЕНТАРІ • 19

  • @NetrunnerAT
    @NetrunnerAT 3 місяці тому +2

    Old PC with 128-512gb ddr3 RAM + a lot pci-e slots + some cheap Nvidia P can lift heavy workload without issue. Also in a tight 1000€ budget.

    • @chris_php
      @chris_php  3 місяці тому

      Yeah can do some good work with a Setup like that decent for also training your own data for your own models.

  • @geekdomo
    @geekdomo 7 місяців тому +2

    please consider a microphone. Its really difficult to understand you with the mic/noise gate cutting out so often. Thanks for the video.

    • @chris_php
      @chris_php  7 місяців тому +1

      Thanks, I've been making improvements to my mic in my more recent videos.

  • @porter__8205
    @porter__8205 9 місяців тому +2

    Will do once I get more RAM xD

  • @solotechoregon
    @solotechoregon 5 місяців тому +2

    i have plenty of old servers..but the requirements have other hidden dependencies... such as avx2 or better. While avx2 hit intels cpus in 2012 with the haswell tech... the cpu in your server could predate avx2.

    • @chris_php
      @chris_php  5 місяців тому

      Yes the CPU predates any avx which contributes to it's very slow speed which is why I had to disable avx entirely to get to even run.

  • @tsclly2377
    @tsclly2377 3 місяці тому +1

    HP ML350p with 256GB, 1200W dual power supplies, 2 dedicated 16 lane, one 16-8 lane for the NVLinked.. and the video cards are up to you as to speed and power consumption,, But the A5000s should do for the linked or A6000(s) in the 16 dedicated, lane 32 bus.. three vid cards will leave you absent for PCIe accelerator cards and high bandwidth connections.. like to your mass storage.. Oh, expect your internet provider to complain if your LLM has access to the Web... you may have to upgrade to a business plan. These programs are hard on SSDs, choose accordingly or you consumer NVME will be written to death in months, so have a big backup platter (HD) as least and backup often. Petabyte write rated drives.

  • @ewasteredux
    @ewasteredux 9 місяців тому +1

    Hi Chris! Great video. Do you know what the minimum specs are for setting up an older system with no GPU to run ollama locally with reasonable speed? I know there are many variables here including the size of the LLM, however, if we choose a small to medium model and assume that RAM is not an issue, how many cores and what speed would allow for a speed that would not require you to take a nap waiting for the generation to finish? I have also recently purchased an old nVidia GRID K1 (16GB VRAM) for extremely cheap and could not get this to run with ollama. Currently my workstation specs are Dell T5600 with two E5-2690 CPU's and 128GB DDR3 RAM. I could not use the GRID K1 with this unit as these older Dell workstations will not even POST with one in it. I am not planning on leaving the system on 24x7 so I am not considering energy costs as being substantial to run this for short periods of time. FYI, the RAM was gifted to me and the t5600 was under $100 US so I really could not afford not trying...

    • @chris_php
      @chris_php  9 місяців тому +2

      Hello, That's a good offer you got for that system and it's speed might already be good since those CPUs have the AVX instruction set which will greatly increase the speed for ollama. generally more cores the better since it's a large dataset. The speed of the ddr3 ram will be important and might be a bottleneck but this system might be fine running a 7B or 13B at a good speed since the CPU has the AVX instruction set, so no need to disable it in the generate_linux,go.

  • @IpfxTwin
    @IpfxTwin 7 місяців тому +1

    I have my eye on a Proliant Gen 8 server with 393 gigs of ram (dual socket / 12 threads each). I know more ram would handle more parameters, but would more ram speed up simpler models?

    • @chris_php
      @chris_php  7 місяців тому +1

      RAM speed is important since the whole LLM needs to be read from so the smaller the model the quicker you will get a response. If the LLM is 40gb in size and you can ready 40gb a second in bandwidth that means you'd have 1 token every second.

  • @perrymitchell7118
    @perrymitchell7118 6 місяців тому +1

    What would you recommend to run a chatbot trained on website data locallY? Thanks for the video.

    • @chris_php
      @chris_php  6 місяців тому

      If it's a small model like a 3B you don't need much ram like 8gb so it can even run on lower end graphics like a 2060 and get quick responses.

  • @SB-qm5wg
    @SB-qm5wg 5 місяців тому +1

    So yeah, but hella slowly

  • @dectoasd3644
    @dectoasd3644 8 місяців тому

    This generation of server is E waste, and anything with dual CPU of this gen worse still. Minimum would be E5 V3,V4 with quad channel, not the cheap AliExpress remade boards.
    20b Q5 on E5 2660 V3 is usable but cash would be better spent on P40