Which nVidia GPU is BEST for Local Generative AI and LLMs in 2024?

Поділитися
Вставка
  • Опубліковано 25 сер 2024
  • Struggling to choose the right Nvidia GPU for your local AI and LLM projects? We put the latest RTX 40 SUPER Series to the test against their predecessors! Discover which card reigns supreme in terms of performance per dollar for running everything from Stable Diffusion to custom language models. Whether you're a hobbyist or a serious developer, this video unveils the best value for your AI journey in 2024.
    Tell us what you think in the comments!
    This video contains affiliate links, meaning if you click and make a purchase, I may earn a commission at no extra cost to you. Thank you for supporting my channel!
    My 4090 machine:
    amzn.to/3QMvE4s - MSI 4090 Suprim Liquid X 24G (best linux compatibility)
    amzn.to/3V5R0My - Corsair 1500i PSU
    amzn.to/4dIwybZ - 12VHPWR Cables that DONT MELT!
    Tech I use to produce my videos:
    amzn.to/4bN5eaR - Samsung T7 2TB SSD USB-C
    amzn.to/4dJFHky - Sandisk 32Gb USB-C flash drive
    amzn.to/44LHZeG - Blue XLR Microphone
    amzn.to/3ULTT3N - Focusrite Scarlett Solo Usb C to XLR interface

КОМЕНТАРІ • 206

  • @jmirodg7094
    @jmirodg7094 6 місяців тому +82

    The main limiting factor is the amount of VRAM you care about speed only once you can run your model!

    • @aifluxchannel
      @aifluxchannel  6 місяців тому +4

      @jmirodg asking the real questions ;)

    • @tingsimon8483
      @tingsimon8483 3 місяці тому +4

      why nvidia rtx vram not as high as apple m3 max vram, what is the different? Does it means apple is more cost efficient?

    • @almaka17
      @almaka17 2 місяці тому +1

      this is kinda stupid because even if there's not enough VRAM you can still run a model it'll just be painfully slow but having a ton of regular RAM can help because it'll be used as a kind of slower VRAM

    • @Larimuss
      @Larimuss 21 день тому +1

      Yup and nvidia are determined to screw customers on overpriced cards with tiny amounts of vram so you buy their too high-end super overpriced 90 cards.

  • @joshbarron7406
    @joshbarron7406 6 місяців тому +38

    I’ve had my 4080 for over a year now. What I’m able to do on it has grown tremendously with the improvements in quantization.

    • @glenyoung1809
      @glenyoung1809 6 місяців тому +17

      Quantized models are key to more access to the consumer level market and to use in mobile, edge devices.
      VRAM is the key but keep an eye on the rapidly developing Mamba architecture which is a linear model architecture for inference, unlike the current Transformers architecture which is quadratic and therefore a lot more resource demanding for inference work.

    • @aifluxchannel
      @aifluxchannel  6 місяців тому +11

      Mamba is incredibly exciting - can't wait until I finish my video explaining the full future implications of the framework :)

    • @kenadams894
      @kenadams894 4 місяці тому

      @@aifluxchannelwhat great and accurate models can i run on my 4080 super?

    • @metacob
      @metacob 4 місяці тому

      @@kenadams894I get around 60 tokens/s for Llama3-8B on my 4080

    • @1hecuteangel
      @1hecuteangel 3 місяці тому +1

      What cpu do you use with it?

  • @datpspguy
    @datpspguy 4 місяці тому +18

    I have a server in my apartment with an EPYC 32Core processor and in one of the slots I have a 3090 + ollama and it does its job pretty good for quantized models. Anything massive, I would just leverage a cloud based gpu but the progress being made so far for local LLM's is really amazing. I noticed that gpt-4 seems to be getting a bit more "lazy or concise" forcing me to use the api + autogen agents for more thoughtful answers.

  • @glenyoung1809
    @glenyoung1809 6 місяців тому +24

    The advantage to a personal system for AI work is privacy and security of information and data.
    Doing anything in a cloud based service automatically means through the Internet and on a remote server.
    Don’t expect privacy no matter what the service provider says.
    Does anyone really expect OpenAI or MS to pass up the opportunity to log and record all transactions with GPT-4?
    User queries will provide petabytes of new data for training and tracking purposes, why do you think it’s so cheap to use them considering how expensive data center resources are to provide to millions of people for inference.

    • @aifluxchannel
      @aifluxchannel  6 місяців тому +7

      Security is paramount - I also find its sometimes faster for certain tasks to use local assistants. Chat GPT is quickly losing it's advantage as an obvious choice over local LLMs.

    • @glenyoung1809
      @glenyoung1809 6 місяців тому +9

      @@aifluxchannel Mistral AI let the cat out of the bag with their smaller 8x7B MoE models which are very competitive with GPT 3.5 and closing in GPT-4 levels performance, all at the cost of slightly lower accuracy but an immense gain in portability.
      The problem with with the LLM subscription setup OpenAI are using is there will be a scaling limit where inference costs as a whole will exceed any kind of profit margin. The rumours are that for a single inference query to GPT-4 it runs on a full 8-GPU NVidia cluster with the attendant power, cooling, network and compute requirements.
      They’re already struggling with ‘laziness’ from their model because they needed to trim resources to accommodate massive demand.
      Now with multi-modalities showing up in these super scale LLMs it’s only going to get worse, and raising prices will only dissuade wider adoption because of competition with other AI services.
      In the case of GPT models big and proprietary isn’t better!
      I think localized, open source will be the solution to widespread adoption of LLM applications.

    • @TheGalacticIndian
      @TheGalacticIndian 6 місяців тому +3

      It's like saying that the advantage of having a personal smartphone is the privacy and security of the data in it, since using phone booths exposes us to eavesdropping. Good joke😉
      Consider this. LLM models have a remarkable ability to compress data, unimaginable even. They can include countless images, books or videos in a few gigabytes and generate new things based on this training data. They also have an excellent memory, so they are able to reproduce a given work of art almost word for word or pixel for pixel.
      Compressing your personal data, attaching it to any file and sending it the moment you happen to have an Internet connection for a walk in the park for them.
      And who knows if our electrical grid in the background doesn't also support some hidden connection to the Internet via powerline networking. Then your data can be transmitted even without Ethernet or Wifi connection..👀
      AI is the ultimate spying tool until mind reading is invented.

    • @glenyoung1809
      @glenyoung1809 6 місяців тому +3

      @@TheGalacticIndian Your argument is one of the reasons why I don’t feed any personal or sensitive data into any AI models even if they are locally run. I’ve heard cases where some AI hobbyists have trained local models with all their personal data in hopes of creating a personalized ‘digital twin’ which they can use as a true avatar.
      This includes the latest fad of feeding these models your voice, your personal appearance etc.
      What a great and convenient way to make identity theft even easier, like that poor sod who was fooled into transferring $26 million of company funds by a deep fake of his boss to a scammer in a Zoom call.

    • @Wobbothe3rd
      @Wobbothe3rd 6 місяців тому

      Nvidia has invested a lot into confidential computing even for the cloud.

  • @Gigachadder_
    @Gigachadder_ 4 місяці тому +13

    i m buying a 4070
    btw i just started getting into LLM`s and ai`s ,can u give some tips and channels for begginers it would be appreciated

  • @YoItsOO
    @YoItsOO 3 місяці тому +6

    Do you think a 4070 ti super 16gb - 12gb vram is worth getting? I currently have a 3060ti 8gb vram and render times are alright, but with lots of prompts/realism it takes forever. Thx in advance

  • @glenyoung1809
    @glenyoung1809 6 місяців тому +12

    I think the 4070 Ti Super is the sweet spot, 16GB AD103 (same die as the 4080) with adequate bus-width and it doesn’t completely bankrupt your budget.

    • @fontenbleau
      @fontenbleau 6 місяців тому +2

      With old card combined (yes, 2nd PSU needed) it can be as 4090 by power and VRAM in total, very cheaply.
      Also it's smaller in size and less demanding in power than 4080.

    • @glenyoung1809
      @glenyoung1809 6 місяців тому +4

      @@fontenbleau I was thinking the same thing or getting a 2nd 4070 Ti Super which for the both of them still costs less than the current market price of a 4090.
      The only issue is that current LLMs aren't "sharded" to properly operate efficiently between multiple GPUs over a PCI-E bus.
      I wish that NVidia had not removed NVLink from their cards allowing you to do SLI configurations.

    • @fontenbleau
      @fontenbleau 6 місяців тому +1

      @@glenyoung1809 I want to test that myself, I see how kinda many tools supported multiple GPUs on software level. The only problem with new cards is that it's driver not yet available in many Linux distros, you need 550 driver, on Ubuntu type distro I kinda installed it, on other distros i'll try the one provided from Nvidia. Nouveau driver not supported yet new models.

    • @user-bd8jb7ln5g
      @user-bd8jb7ln5g 6 місяців тому +9

      @@glenyoung1809 Its "funny" how Nvidia removed NVLink just as it was becoming useful

    • @truehighs7845
      @truehighs7845 6 місяців тому

      @@fontenbleau I got 2xA4000 essentially for the Vram and for the low consumption, for I know what it means to have GPUs running full blast it's an arm and a leg in electricity, may as well pay for cloud. I can tun Solar and Mixtral, I am happy.

  • @southofgrace
    @southofgrace Місяць тому +2

    Good vid and great info for someone just looking to learn enough to set up an AI tool for business use. I researched gpus quite a bit and landed on a 4070 thanks to you. I want to build a model on my own data that can spit out info from under 1000 pdfs. I dont want to learn ai engineering to do it so I'm just piecing things together. I found that a lot of devs focus on image generation or the ai model is hooked into online. Happy that more and more offline options are out there now.

  • @GerryPrompt
    @GerryPrompt 6 місяців тому +13

    Think I'll just stick to my current 3090 and maybe pickup a second 😊

    • @aifluxchannel
      @aifluxchannel  6 місяців тому +2

      Great choice! I also strongly dislike the new 12HVPWR connectors that are now standard on all nVidia GPUs

  • @Gabriel50797
    @Gabriel50797 7 днів тому +1

    The 4070 and 4070 Super don't have 16GB vram they have 12...... (10:00 mark)

  • @werthersoriginal
    @werthersoriginal 2 місяці тому +10

    If I don't see 24GB of VRAM I don't give a shaaaaaaaait. For now, that 3090 24GB is floating on ebay for $650+ and if they announce a new GPU this year you can expect that thing to $450+

    • @subhojitdhar3354
      @subhojitdhar3354 27 днів тому +3

      Wow RTX 3090 for 650 dollars, kindly enlighten me with the name of the seller who is selling it at such a dirt cheap price

    • @werthersoriginal
      @werthersoriginal 27 днів тому

      @@subhojitdhar3354
      Sold Jul 28, 2024 $610.00
      Sold Jul 27, 2024 $689.99
      Sold Jul 27, 2024 $630.00
      Sold Jul 26, 2024 $655.00
      Sold Jul 25, 2024 $699.99
      Sold Jul 24, 2024 $695.00
      Sold Jul 24, 2024 $530.00
      Sold Jul 23, 2024 $610.00
      Sold Jul 21, 2024 $675.00
      Wow, should I keep going or have you been "enlightened"?

    • @werthersoriginal
      @werthersoriginal 23 дні тому

      @@subhojitdhar3354 Literally just bought one off of eBay for $630 yesterday. So you're right, it wasn't $650

  • @dholzric1
    @dholzric1 6 місяців тому +8

    I just picked up some P40's and they seem to work well for inference. I think i'm getting around 2/3 the performance of my 3090's (i have only tested a few models.) For the price they seem hard to beat for a budget GPU if you can get the cooling worked out.

    • @TheGalacticIndian
      @TheGalacticIndian 6 місяців тому +2

      What consumer motherboard you use for P40? Or a dedicated server?

    • @dholzric1
      @dholzric1 6 місяців тому +3

      Im using a dell r720 i got from facebook marketplace for $100. I picked up ram cheap and a bunch of new old stock 200 gb sas ssd's for about $10 each. Only issue is the xeon doesnt support avx2 so the main version of lm studio wont work. The avx version works well in windows 10, but not in windows 11 (not sure why)

    • @aifluxchannel
      @aifluxchannel  6 місяців тому +2

      @SoulEngineer on Twitter also has a 3x nVidia p40 setup. They're slow, but its a great way to stack VRAM!

    • @testales
      @testales 5 місяців тому +1

      I find 2/3 hard to believe given that this card doesn't even have half of the CUDA cores that a 3090 has and its also of an older generation. How many tokens per second do you get when loading say 38GB mixtral?

  • @robxmccarthy
    @robxmccarthy 6 місяців тому +9

    For local its all about maximizing VRAM. More ram = smarter models. Speed is secondary.

    • @aifluxchannel
      @aifluxchannel  6 місяців тому +1

      Agree, but usability is also a huge factor. Showing someone a model that takes 15s to get them an answer and model thats 60% as capable but can get it to them in 1s yields curious results.

    • @testales
      @testales 5 місяців тому

      If you want to chat with the model, it's not. If speed was totally secondary, you could just run it on CPU and be happy with like 1T/s.

  • @JuanSanchez-rb4qu
    @JuanSanchez-rb4qu Місяць тому +2

    I don't see anything about memory which is what would really make a difference if these things had 24GB of VRAM but the 4070super only has the same 12GB from the 3080 and the 4080super only gives you 16GB.

  • @user-uh8po2sx6y
    @user-uh8po2sx6y 4 місяці тому +5

    Is there a big difference in performance and speed in AI tasks like stable diffusion etc between RTX 4080 super and RTX 4090?Which one should i buy as I seldom play games or should i wait for 5090 at the end of the year?I am not a video editor or hold any jobs related to designing or editing,just a casual home user.

    • @aifluxchannel
      @aifluxchannel  4 місяці тому

      Agreed!

    • @3dChris
      @3dChris Місяць тому +1

      @@aifluxchannelwhat are you agreeing with. He asked a question.

    • @akhathos1618
      @akhathos1618 Місяць тому

      @@aifluxchannel

  • @lobyapatty
    @lobyapatty 2 місяці тому +3

    Any experience with AMD cards? I hear it's getting easier and more support.
    Buying used. Can get a 3090 or a 7900XT both for £500. Which one do you recommend?

    • @mohamedayman7396
      @mohamedayman7396 2 місяці тому +1

      Get the 7900xt unless you plan on developing AI (llm, generative models) or machine learning models, basically if you don't work with Cuda then go with AMD, they're getting better, probably a new big upgrade to focus on AI, plus you get a faster card in gaming and I think more Vram but if not you still get the new AMD architecture and a newer card.

  • @andre-le-bone-aparte
    @andre-le-bone-aparte 6 місяців тому +3

    Just found your channel. Excellent Content! - Another sub for you sir!

    • @aifluxchannel
      @aifluxchannel  6 місяців тому +2

      Awesome, thank you! We'll do our best to keep it coming!

  • @Juan-ws9sy
    @Juan-ws9sy 5 місяців тому +2

    Ok great intro, exactly the video I've been looking for, thanks!

    • @aifluxchannel
      @aifluxchannel  5 місяців тому

      Glad I could help!

    • @digitalizeddeath
      @digitalizeddeath 5 місяців тому

      @@aifluxchannelwhat game are you playing in the video?

  • @TheMetadude
    @TheMetadude 3 місяці тому +2

    I expected benchmarks or tests to show which Nvidia GPUs to use. Not a reading out info from Nvidia's web site ! Disappointed

  • @handpowers
    @handpowers 2 місяці тому +1

    I have a gtx 1650 laptop i5 12th gen and it can run local chatgpt like ai and text to voice and text to picture. But it struggles with text to videos ai. Just to give you idea guys.

  • @testales
    @testales 5 місяців тому +3

    I just build a dual RTX 4090 system, will see how far this gets me. I had checked my options on runpod befor and I refuse to pay like 4x the actual value for one of the "professional" GPUs like the RTX 6000 Ada. One of those cards makes my whole new system look cheap. Fun fact: two RTX 4090 are faster than two RTX 6000 ada. There are also actual dedicated AI chips on the horizon and if it takes too long I can still upgraded to RTX 5090. As for other GPUs, I don't see any point to consider all this nonsense like "Super" that NVidia has introduced. If you are on a tight budget, get a used 3090 as this has also 24GB RAM just like the 4090. If you want to push it and run the largest open LLMs, you can build a 4x3090 system, that gives you the most bang for buck but its quite a complex build obviously.

    • @aifluxchannel
      @aifluxchannel  5 місяців тому

      Very exciting - I'm also holding out for the RTX 5090. The biggest benefit (grift) of the workstation GPUs is that they can share / pool memory peer to peer.

    • @testales
      @testales 5 місяців тому +1

      @@aifluxchannel That seems to be a pure software issue for most use cases. Most LLM software allows the distribution of the data over multiple GPUs but only one at a time will actually do stuff. At least on runpod the expensive RTX 6000 ada showed the same behavior, only one was active at a time. Though I don't know what happens if you put an NVlink on them or if they already did that at runpod.

  • @Plagueheart
    @Plagueheart 5 місяців тому +2

    I am just waiting for when AI improves the blueprint for hardware where it builds computer parts for the most enthusiast

    • @aifluxchannel
      @aifluxchannel  5 місяців тому +2

      nVidia has actually used AI for GPU conductor tracing for almost twenty years now. Granted, it's gotten much MUCH better in the past two to three years. But we're very close to seeing this. They've also had AI augmented systems to visualize 3d x-rays of their GPU dies since 2013.

  • @gallamegamerZ
    @gallamegamerZ 4 місяці тому +3

    In terms of VRAM for the money AMD absolutely crushes Nvidia in consumer GPUs. So it would be cool if you did a video comparing rocM (AMD) gpus with nvidia on similar price points

    • @wattzombiegames
      @wattzombiegames 4 місяці тому

      little or no generative ai programmed to work with AMD, AMD never gave priority to generative ai and focused on games. much is optimized to use NVIDIA and the company itself gives it priority. although if they are unfortunate to leave only the 4090 with so much VRAM. the launch of the 4070ti super is aimed at generative ai that's why they raised to 16gb. a future standard.

  • @Kalmaos
    @Kalmaos 4 місяці тому +3

    Do you think a 3090 with 24gb a ryzen 5 5600x and 64gb ram its gonna be able to run Lama 3 70b?

    • @KyuubiYoru
      @KyuubiYoru 4 місяці тому +2

      yes, if you use gguf quantization and split the model between vram and ram. But it will be slow.

    • @Kalmaos
      @Kalmaos 4 місяці тому

      @@KyuubiYoru oh tanks. As of now i have a 3050, ryzen 5 5600x and 64gb of ram and Im able to run llama3 8B and other models without a problem. But i underestand that 70B is a whole new level

  • @thefastmeow
    @thefastmeow 5 місяців тому +3

    what gpu would you recommend for a laptop though? because it seems like the gpus they put in laptops are slightly different

    • @aifluxchannel
      @aifluxchannel  5 місяців тому +1

      The rtx 4070 in laptops is actually quite good

    • @kenadams894
      @kenadams894 4 місяці тому

      @@aifluxchannel what great and accurate models can i run on my 4080 super? I want to create my own jarvis.

  • @SeanietheSpaceman
    @SeanietheSpaceman 2 місяці тому +2

    Sorry to understand this, a 70b parameter model could work on a 3070 8gb ram? or are yous aying a 16gb min? Sorry if i am misundersanding you. im interestingin running codestral but currently have a 3070ti 8gb. My next upgrade wiull either be, a 3090 or 4070ti super..

    • @aifluxchannel
      @aifluxchannel  2 місяці тому +3

      This would require something called a "quantization" which is a clever way you can compress the weights these models use that takes up less memory while making a compromise in accuracy.

  • @samalmas4588
    @samalmas4588 29 днів тому +1

    you made a mistake you ment 4070ti super thier is no 16gb 4070 super models and it costs around 850-900 which is significantly more than an 3090 rocking 24gb and with nvlink its a beeeast!

  • @simpernchong
    @simpernchong 4 місяці тому +1

    Thanks for the info

  • @user-bd8jb7ln5g
    @user-bd8jb7ln5g 6 місяців тому +3

    What's the approximate relative training speed and inference speed of 3090 vs 4090. I know that inference speed mainly depends on memory bandwidth, what does training speed depend on?

    • @aifluxchannel
      @aifluxchannel  6 місяців тому +4

      Training is largely PCIe bandwidth and how quickly GPUs themselves can exchange finished computation.

    • @PeakCivilization
      @PeakCivilization 6 місяців тому +5

      Just replaced a 3090 with a 4090, at iso-power, you get about +85% on faceswap training.
      My 3090 was capped at 85% for efficiency (sweet spot). The 4090's sweet spot is around 65%.
      Also, the 4090 being a single sided board, the VRAM is much, much cooler. Training at around 60°C GPU, 70°C VRAM for the FE at 65%.

    • @fontenbleau
      @fontenbleau 6 місяців тому +4

      Recently I've found on Huggingface a little model which one guy trained using only his 3090 and he wrote it took him a full 24 hours of no break work. I'll try to find the name of model, I don't remember on which OS SSD browser I've discovered that. But running such for that time equal to good oven room heating. Temperatures on 3090 are not great.
      I never was into mining but the search algorithm itself offered me the equipment to buy and I can say that people who doing that stuff are literally reinvented the oven, again, by that DIY boxes with ventilation. It reminded me of folklore fairytales of "magic oven" which makes any wishes and in some sense current tech is looking like magic - prompts is like a spell to make something from nothing.

    • @Wobbothe3rd
      @Wobbothe3rd 6 місяців тому +3

      16bit TFLOPS, unless the model supports FP8 (and then Ada/Blackwell automatically wins)

  • @dishcleaner2
    @dishcleaner2 3 місяці тому +1

    Sitting with a 4090... I want more vram for less than $10k

  • @redampsmining
    @redampsmining 3 місяці тому +3

    Always good results with 3060, 3080s here,

    • @aifluxchannel
      @aifluxchannel  3 місяці тому +4

      What are you running with these GPUs?? Glad to hear more from people using smaller GPUs!

  • @lain2600
    @lain2600 Місяць тому +1

    Why didnt you compare A100? or something better like NVIDIA GH200?

    • @aifluxchannel
      @aifluxchannel  Місяць тому +3

      Most consumers can't even think of affording GPUs like this.

  • @u13e12
    @u13e12 3 місяці тому +1

    what are your thoughts about getting a GPU server and throwing 3 telsa P40's?

    • @aifluxchannel
      @aifluxchannel  3 місяці тому

      Making a video about this actually! This is actually @soul engineer's setup from Twitter.

  • @EzaneeGires
    @EzaneeGires Місяць тому

    Hello, what are your thoughts on connecting gpu to a mini pc or handheld via oculink aka sff 8611/8612? Assuming gen 4 pcie

  • @Larimuss
    @Larimuss 21 день тому

    So i should upgrade my 4070ti to get an extra 4gb vram. Ive co,e to expect this from nvidia.

  • @user-su1zh7fx3x
    @user-su1zh7fx3x 5 місяців тому +3

    What game is that in the beginning

    • @FZFALCON
      @FZFALCON 4 місяці тому

      Super MarioMan

  • @the_cluster
    @the_cluster 6 місяців тому +6

    I think, the best hardware for local inference LLM is Mac computers with M1/2/3 processors. Why? Because they have unified memory architecture and a huge amount of that memory. My Mac Studio with M2 Ultra and 128Gb RAM can run a big LLM, which requires 60+ Gb GPU memory. Yes, the inference speed is slower than on 4090/4080, but still acceptable, and the models generate tokens faster than I can read the text from screen. At the same time, the level of comfort is incomparably higher than with a regular PC and discrete GPUs - because the Mac practically does not heat up and its fans are not audible at all times, unlike high-end Nvidia GPUs, which, when working for a long time, try to melt everything around them and themselves too.

    • @aifluxchannel
      @aifluxchannel  6 місяців тому

      Apple Silicon is incredible, but until we have an M3 Ultra GPUs are still top of the pile IMO

    • @TazzSmk
      @TazzSmk 5 місяців тому +1

      depends on what models and such, my old GTX1080Ti (11GB vram) still beats Apple M2 Mac Studio in things like UVR or SD

  • @tchristell
    @tchristell 6 місяців тому +2

    I'm currently running two 3090s without NVLink. What gains would I see adding it. I'm doing 1/2 finetuning and 1/2 inference.

    • @aifluxchannel
      @aifluxchannel  6 місяців тому +3

      It SUBSTANTIALLY increases inference speed (i.e.. 11 t/s to like 20t/s) on 70b models for me. The software has to support it, but it's trivially easy to enable P2P transfers if you have access to the source (e.g. there's small patch I wrote to enable nvlink in llama.cpp in my comment history). You can then watch the nvidia-smi stats and see the transfers occuring over nvlink instead of PCI-E.
      Source: Have 2x 3090's with nvlink and have enabled llama.cpp to support it.
      Also, if it's the 4-slot (3090) bridge it should only be like $70. If it's the 3-slot (quadro) bridge, then that one will run over $200. Both do the same thing, it just depends on the motherboard slot spacing you have. If you are shopping from scratch, buy a mobo with 4-slot spacing!

    • @tchristell
      @tchristell 6 місяців тому

      Thanks for the reply. Unfortunately I already have the hardware (2 EVGA 3090s and ASRock X670E Taichi mobo so I guess I'm stuck with the quadro bridge.

    • @123isperfect6
      @123isperfect6 6 місяців тому +1

      FYI, If you don’t have rear blower style cards, the 3slot bridge results in a hotter gpu on the top position. Make sure you set up really good airflow in your case to compensate for that.

    • @keylanoslokj1806
      @keylanoslokj1806 6 місяців тому

      ​@@aifluxchannelhow do you activate nvlink

    • @Beauty.and.FashionPhotographer
      @Beauty.and.FashionPhotographer Місяць тому

      @@aifluxchannel would such multiple slots , as your mentioned Mobo 4-slot (where would i get one?), also work with Mac M2 ? ......there is only a few people who seem to get that working in general , Gen Ai or LLMs on macs ....for example, so Mac M2 , running windows (via boot camp) and using external eGPU with one,.... or more RTX Nvidia Cards. ....i also see that 3090 with 24gb ram is most cost-effective versus 4090 with very little more speed really, but massively more expensive price tags. this to run auto11 or comfyui or similar. would be amazing if you were to dedicate a video to the mac owners, who are willing to run windows via bootcamp, and show us Thunderbolt 4 hardware egpu docks and RTX nvidia graphic cards,.... that make most sense for the buck.

  • @saintsscholars8231
    @saintsscholars8231 6 місяців тому +3

    Would be good to know where Apple Silicone machines are when comparing whats achievable with NVIDIA
    Buying a graphics card might be great if you already have a quality PC and that’s the only upfront cost.
    I’d be interested in a breakdown in respect to buying a Mac Studio versus building a high spec PC.

    • @aifluxchannel
      @aifluxchannel  6 місяців тому +4

      We can definitely make that happen! Thanks for your feedback

    • @saintsscholars8231
      @saintsscholars8231 6 місяців тому

      @@aifluxchannel thanks for your excellent content btw. Do you have a Discord?

  • @dleer_defi
    @dleer_defi Місяць тому

    What about the A4000 cards for a poweredge server?

  • @TechPill_
    @TechPill_ 5 місяців тому +1

    I can't pc as portability is major concern for me should i get rtx 4060 in laptop or rtx 4070 i need laptop asap as old machine is down and both have 8 gb v ram so it's confusing decision

    • @aifluxchannel
      @aifluxchannel  5 місяців тому

      4070 is likely a better choice

    • @TechPill_
      @TechPill_ 5 місяців тому

      @@aifluxchannel but is it worth 181 dollars for that upgrade ?

  • @nzt29
    @nzt29 6 місяців тому +1

    Do you develop on linux via something like wsl? Or do you have a dedicated linux boot? Is there any reasons why I might not want to continue using wsl?

    • @aifluxchannel
      @aifluxchannel  6 місяців тому +4

      WSL is just a headache - why not just use linux?

    • @nzt29
      @nzt29 6 місяців тому

      @@aifluxchannel fair point. When I first left my mac for windows I should have read up more.

    • @oseaghin
      @oseaghin 6 місяців тому

      @@aifluxchannel There are plenty of good reasons. I mean, there must've been a good reason for Microsoft to dump resources into the WSL development and support, right? First and foremost, the only alternative to using WSL is having another box. Many don't want to and/or don't have the means to do that.
      This holy war goes back decades, but Linux -- its strengths notwithstanding -- sucks balls for everyday use. It's just a fact of life.

    • @MrAusdrifter
      @MrAusdrifter 4 місяці тому

      If in doubt dual boot for a while and see how you go

  • @rohitrai8814
    @rohitrai8814 2 місяці тому

    Ok. So which one is the best tho

  • @creed4788
    @creed4788 3 місяці тому +2

    The most important thing is the amount of vram, not the speed...

  • @Whysicist
    @Whysicist 4 місяці тому

    M2 Ultra 192 GB of unified memory… M3 Ultra soon and M4 Ultra by yearend?

  • @SomeStuffShin
    @SomeStuffShin 3 місяці тому +1

    did you try to use clore ai to rent a server?

    • @aifluxchannel
      @aifluxchannel  3 місяці тому

      Clore AI is literally fake

    • @SomeStuffShin
      @SomeStuffShin 3 місяці тому

      @@aifluxchannel can you please tell why?

  • @relexelumna5360
    @relexelumna5360 4 місяці тому +1

    Honest good review. I regret painfully why i bought RTX 4070 regular instead of RX 7800xt. But then RX wins in gaming but in Stable diffusion am not sure whether 7800xt wins coz of 16 Vram compare to rtx 12 gb vram. Rx use DDR6 while rtx use DDR6X. What do you think will i still regret specially for SDXL + Auto1111 in 2024?

    • @aifluxchannel
      @aifluxchannel  4 місяці тому

      Glad this helped!

    • @relexelumna5360
      @relexelumna5360 4 місяці тому +2

      @@aifluxchannel Are you replying me with UA-cam Bot automatic reply software bro 😁 if yes then it's not you

    • @BakaTimeproject
      @BakaTimeproject 4 місяці тому +1

      the 4070 with 12gb is a good option, but the best is 4070tisuper with 16gb, more VRAM improves a lot, your purchase is good AMD cards are only for games, no matter that they are cheaper and with more VRAM, there is no generative Ai programming in AMD, so what good is so much VRAM if there are only 1 or 2 projects for AMD.

    • @relexelumna5360
      @relexelumna5360 4 місяці тому

      @@BakaTimeproject I think that clarify my doubts. Thanks a lot. That time i had $600 limited budget i cant go for 4070ti. i was skeptical abt 7800xt coz of 16gb vram. but then RTX excel in both gaming and content creation, Stable diffusion, Ai, 3D and using few electricity. While Amd focus on gaming n less on other task. Am more focus on content creation, ai and ofcourse Gaming. Right now am happy with RTX 12gb vram. But don't how many yrs will it last.

    • @BakaTimeproject
      @BakaTimeproject 4 місяці тому

      @@relexelumna5360 From my point of view, it will depend on how comfortable you are with the card, considering the advancements in generative AI: there is still a lot of life for cards even with 12GB. The models are being optimized, LCM is a clear example of this. In a few months, it started to be used in image generation and real-time image work. It wouldn't have been possible to work in real-time without this adaptation of LCM. LCM allows image generation, reducing times by 60 percent. It's a matter of advancing a bit more in other generative AI projects. I hope that someday AMD sees the wasted potential of its cards and deigns to prepare them for generative AI. AMD itself needs to show ways to use its cards for AI, or it will stagnate and be nothing more than an INTEL floundering.

  • @Gamer4Eire
    @Gamer4Eire 5 місяців тому +1

    TensorRT is a game changer

    • @aifluxchannel
      @aifluxchannel  5 місяців тому

      100% - performance gains with tensorRT are awesome

  • @pepaw
    @pepaw 6 місяців тому +1

    great general vid

    • @aifluxchannel
      @aifluxchannel  6 місяців тому

      Thanks! Let us know if we missed anything.

  • @digitalizeddeath
    @digitalizeddeath 5 місяців тому

    What game is he playing?

  • @Beauty.and.FashionPhotographer
    @Beauty.and.FashionPhotographer Місяць тому

    4090 is from sept 2022.... a 2 year old card....is that really the fastest one existing for ai today?

  • @christinemurray1444
    @christinemurray1444 5 місяців тому +1

    At those prices it's hard to tell which single GPU I'd buy to use at home. Maybe it wouldn't make sense to get anything above the 4070 over renting. Prices are overdue some drop and I believe it's going to happen as the market for people using generative models locally is getting saturated and older cards are good enough for an increased number of use cases.

    • @aifluxchannel
      @aifluxchannel  5 місяців тому +1

      This is a great point - the real question is whether or not you're just experimenting or actually finetuning etc. Finetuning even on big GPUs takes hours or days - at that point buying is a clear choice.

  • @Spractral
    @Spractral 5 місяців тому +6

    Honestly you didn't really answer the question at all.

    • @aifluxchannel
      @aifluxchannel  5 місяців тому +1

      Thanks for the feedback.

    • @Spractral
      @Spractral 5 місяців тому

      @@aifluxchannel yessir

  • @Canna_Science_and_Technology
    @Canna_Science_and_Technology 6 місяців тому +1

    I have a 2x RTX 6000 ADA. I can run a 70B models easy, I did load Falcon 180B, it works but it only does a word a second.. slow…

    • @aifluxchannel
      @aifluxchannel  6 місяців тому

      RTX 6000 Ada are great GPUs. In hindsight, I probably would've preferred that setup over 4x RTX 4090s. Granted, I managed to sell my RTX A5000s for the equivalent price of 4090s a few weeks ago.

    • @Canna_Science_and_Technology
      @Canna_Science_and_Technology 6 місяців тому

      @@aifluxchannel the 4090s are too big for my rack mounted server. I have room for 2 more that I’ll add next year. Unless everything changes by then, which is not out of the question.

    • @truehighs7845
      @truehighs7845 6 місяців тому

      @@aifluxchannel I have 2A4000 I can run Mixtral, deepseek and Solar, but did you manage to use both linked, because it seems I can only use one at the time? What framework do you use for inference/fine tuning?

    • @keylanoslokj1806
      @keylanoslokj1806 6 місяців тому

      ​@@aifluxchannelaren't 6000 adas, like 10.000$ each? It's a ridiculous cost that only big companies could handle

    • @testales
      @testales 5 місяців тому

      @@keylanoslokj1806 Absolutely, that's whappens if there is no competetion, the manufacturer can charge whatever they want. I had a few sleepless nights too considering to go for this delux setup too. But than groq came and demonstrated what can be done with dedicated AI chips. So I'll see how far dual 4090 will get me until a viable options appears that's not priced like a new car.

  • @jamesgodfrey1322
    @jamesgodfrey1322 6 місяців тому +2

    I intrested AMD AI round up

    • @aifluxchannel
      @aifluxchannel  6 місяців тому +1

      We can look into that - it'll likely be wrapped into a TinyBox update

    • @jamesgodfrey1322
      @jamesgodfrey1322 6 місяців тому

      Thank you. Now my AI journey is just starting. I am dislexic. I started to use free online AI to do a lot of day to day writing, and I have a lot of home work to do and a lot to learn. I think about setting up a stand-alone AI to help me write. On my home PC Now I am a PC gamer with a good system: a 570 motherboard, 32GB of memory (3600), a CPU (5800X3D), and an GPU (7900XTX). Storage using SSD's So I started going down the UA-cam rabbit hole, and I liked and found your content useful. I look forward to your post.@@aifluxchannel

  • @marknakasone85
    @marknakasone85 21 день тому

    DGX H200 🎉

  • @MaJetiGizzle
    @MaJetiGizzle 5 місяців тому +1

    It was pounds (£), not euros (€).

  • @danieldelewis2448
    @danieldelewis2448 2 місяці тому

    Would I be accurate in stating that AI will be able to take any compatible hardware and give it a massive boost in rendering capabilities, as well as computing ( not only new hardware being developed specifically for it, but older hardware that is capable of running it )?

  • @ResIpsa-pk4ih
    @ResIpsa-pk4ih 6 місяців тому +4

    Great content but for the love of god please pronounce inference (and other terms) correctly. Small mistakes like that end up making you sound like you don’t know what you’re talking about.

    • @aifluxchannel
      @aifluxchannel  6 місяців тому +2

      I made this video later into the night after a long day of work - thanks for the feedback.

  • @ydgameplaysyt
    @ydgameplaysyt 3 місяці тому

    3090 or 4080 Super for llama 3 8b?

    • @ydgameplaysyt
      @ydgameplaysyt 3 місяці тому +1

      4x3090 or 4x4080 super i mean

    • @aifluxchannel
      @aifluxchannel  3 місяці тому +1

      3090 every single day over a 4080 super

    • @PreparelikeJoseph
      @PreparelikeJoseph 3 місяці тому

      @@aifluxchannelim looking at a pc with something called an asus dual rtx 4070. I want to run the best LLM possible without spending 2k

    • @JozefKolacek
      @JozefKolacek 3 місяці тому

      Come on, I run llama 3 8b on cheap GTX 1660 super and it flies.

    • @samalmas4588
      @samalmas4588 29 днів тому

      3090 especially if you had nvlink youll get a beast dual gpu AI llm system

  • @jaredcluff5105
    @jaredcluff5105 6 місяців тому +3

    I managed to get an M1 ultra Mac Studio with 128gb ram and can inference most anything. Codellama 70b at 6-10tps. Plenty fast for a large model. Much faster for smaller models. Can load multiple models in ram simultaneously. In my opinion nothing compares.
    $2600 from microcenter back in Sept of 2023.

    • @aifluxchannel
      @aifluxchannel  6 місяців тому +1

      I think this will continue to be a strong trend in the space, especially with larger models being finetuned with MLX!

    • @Wobbothe3rd
      @Wobbothe3rd 6 місяців тому

      Too few TFLOPS, tensor cores are much, much faster.

    • @InnocentiusLacrimosa
      @InnocentiusLacrimosa 6 місяців тому +1

      How are Macs supporting ML without CUDA? Do they not have the same problem as AMD has breaking into that sector?

    • @charlesselrachski34
      @charlesselrachski34 4 місяці тому

      malware impossible to fix hole in M1 "nothing compares"

    • @BakaTimeproject
      @BakaTimeproject 4 місяці тому

      @@InnocentiusLacrimosa use only the M1 Ultra CPU, remember that MAC uses a single processor that carries inside its "video card". it is not that you need an NVIDIA directly for generative Ai, what you need are programmers willing to program generative Ai for MAC and AMD. that is the current problem. in short I use a generative Ai programmed for MAC.

  • @777sumitx
    @777sumitx 5 місяців тому +1

    What would you recommend between rtx 4060 ti 16 gb and RTX 4070 super 12gb

    • @bigsmilefriend
      @bigsmilefriend 5 місяців тому

      If you want to play also, to 4070 is best choice. But if you want cheaper and only for AI to 4060ti 16gb is better

    • @777sumitx
      @777sumitx 5 місяців тому

      @@bigsmilefriend secretly added 4070 ti super in cart let's see 🙈❤️

    • @harryrg1250
      @harryrg1250 4 місяці тому

      4060 ti 16

    • @BakaTimeproject
      @BakaTimeproject 4 місяці тому

      4070 super minimum and optimum 4070ti 16gb improve a lot, the 4060 or 4060ti lower the CUDAS and this influences the transforms used by the generative AI.The 4070ti super is aimed at generative Ai for the general public, that's why they raised to 16gb.

  • @ps3301
    @ps3301 5 місяців тому +1

    People need 2 gpu in a pc. One for playing game and one for ai workload concurrently.

  • @gileneusz
    @gileneusz 6 місяців тому +1

    16 gigs are too small to run any SOTA LLM model, you need at least 48GB, so you need to chain 3x4070 to make it work...

    • @aifluxchannel
      @aifluxchannel  6 місяців тому +5

      By the time you've spent $$ on a motherboard capable of hosting more than 2x GPUs at full PCIE-x 4.0 bandwidth - I'd question whether 3x 4070s are a wise investment.

    • @gileneusz
      @gileneusz 6 місяців тому

      @@aifluxchannel true

    • @keylanoslokj1806
      @keylanoslokj1806 6 місяців тому

      ​@@aifluxchannelare you talking about special motherboards for servers?

  • @pierruno
    @pierruno 6 місяців тому +1

    I bought a 4090 from MSI.

    • @aifluxchannel
      @aifluxchannel  6 місяців тому

      The only 4090 worth buying! I have four of them, perfect linux support and more importantly linux fan support. I highly recommend NOT buying any of the Gigabyte 4090s. Nothing but trouble.

    • @pierruno
      @pierruno 6 місяців тому

      @@aifluxchannel I bought a Gaming Trio X. Idk if its good

    • @keylanoslokj1806
      @keylanoslokj1806 6 місяців тому

      Which one

  • @russ2636
    @russ2636 5 місяців тому +1

    'Promosm' 👍

  • @TazzSmk
    @TazzSmk 5 місяців тому +2

    4070 Super has only 12GB vram, so definitely NOT an interesting option

    • @aifluxchannel
      @aifluxchannel  5 місяців тому +1

      Agree to disagree - different people have different requirements