Debug with Lewis
Debug with Lewis
  • 5
  • 41 928
Can DeepSeek R1 Run Locally on a NVIDIA RTX 5090!?
I try DeepSeek R1 locally on the NVIDIA RTX 5090. We try different model sizes of different parameters with no quantization as well as Q4. I am pretty impressed with the results!
LINKS
---
MY 12K+ DISCORD 💬
discord.gg/GkrFX4zT2C
CONNECT WITH ME ON SOCIAL
📸 Instagram:
lewismenelaws
🎚TikTok:
tiktok.com/@lewismenelaws
🐣 Twitter:
LewisMenelaws
My gear 💻
liinks.co/lewismenelaws
-----
Переглядів: 19 458

Відео

12 Non-Developer Tools That Boost Your Productivity
Переглядів 9 тис.3 місяці тому
Here are 12 tools that ARE NOT developer related that devs can use to improve their productivity. In this video, I go over what I use to help me organize my videos and projects! LINKS MY 12K DISCORD 💬 discord.gg/GkrFX4zT2C CONNECT WITH ME ON SOCIAL 📸 Instagram: lewismenelaws 🎚TikTok: tiktok.com/@lewismenelaws 🐣 Twitter: LewisMenelaws My gear 💻 liinks.co/lewismenelaws
Is OpenAI's Realtime API REALLY Worth the Hype?
Переглядів 7 тис.3 місяці тому
OpenAI has released their realtime API for developers. Is this ChatGPT in real time experience worth it? Let's find out. LINKS MY 12K DISCORD 💬 discord.gg/GkrFX4zT2C CONNECT WITH ME ON SOCIAL 📸 Instagram: lewismenelaws 🎚TikTok: tiktok.com/@lewismenelaws 🐣 Twitter: LewisMenelaws My gear 💻 liinks.co/lewismenelaws
This CHANGED the Way I Use Databases (Atlas)
Переглядів 2,8 тис.8 місяців тому
Atlas is a tool written in Go that helps you manage your databases by writing your schema as code. This is similar to existing features like Alembic, Django and other database migration tools. The package is still a bit early and YMMV but this has been a huge help in my productivity. So in this video, I discuss Atlas and how it works for those who may or may not be familiar with databases. Link...
Why I Stopped Using LangChain
Переглядів 4,4 тис.9 місяців тому
Here's some reasons why you shouldn't use LangChain is your next AI project. Personally, I have given this framework at least 3 or 4 times now and it's been an absolute struggle everytime. Some of the main reasons include tough documentation, hard to understand syntax and strange abstractions. In this video, I will show you some examples of LangChain doing this and provide some alternatives if ...

КОМЕНТАРІ

  • @jasonshen
    @jasonshen 19 годин тому

    the 24gb is the bottleneck you don't need all those cuda cores

  • @nigmaxus
    @nigmaxus 2 дні тому

    Deep seek is meh at best and over hyped

  • @Anktual
    @Anktual 3 дні тому

    I tried many models with ollama. They're just dumb tbh.

  • @pakalupapito3202
    @pakalupapito3202 3 дні тому

    now i can use 64 gb of my RAM

  • @josejosee132
    @josejosee132 4 дні тому

    i can run DeepSeek R1 70B parameters x8bit on my macpro max 128GB memory..

  • @yvesinformel221
    @yvesinformel221 5 днів тому

    I tried the 1.5b ans 7b on my laptop and it is not worted, to small to be usefull

  • @x0vg5hs1
    @x0vg5hs1 6 днів тому

    is it possible to take lets say ampere gpu that uses gddr6 and give it the gddr5 vram? let's say 16x4gb modules on 3090? assuming you have firmware devkit and source or is it arch limited?

  • @damienthorn1340
    @damienthorn1340 6 днів тому

    Currently an XTX outperforms a 4090 using Deepseek by about 13%. Pointless getting a 5090 for something like this.

    • @osman2k
      @osman2k 5 днів тому

      radeon cards? do I need any additional software installed to make them work? thanks..

    • @damienthorn1340
      @damienthorn1340 5 днів тому

      @@osman2k Nope. No extra software needed, other than the latest drivers of course. It just works.

    • @osman2k
      @osman2k 5 днів тому

      @@damienthorn1340 that's great, thank you.

  • @ZeddMalum
    @ZeddMalum 7 днів тому

    Is the 7B version the one in the website and app? Edit: nevermind it's not, can someone tell me the name of the website/app version so I can see it's api requirements.

    • @EnriqueVivancoH
      @EnriqueVivancoH 6 днів тому

      No, it is not the original DSR1 7b. He used the distilled version of Qwen in R1. He's not using Ollama, he's using another app.

  • @alessandrovitali7104
    @alessandrovitali7104 7 днів тому

    you shuould try the 32B Q8 version with offolading in ram

  • @fontenbleau
    @fontenbleau 7 днів тому

    If your card have 502GB of VRAM and that's only for Q5 gguf which is not great quality. For best Q8 better to have 720+ Gb VRAM (or RAM, volume is same GPU gives nothing except speed). Thats for full size model of 671 billions.

  • @olebogengthothela1191
    @olebogengthothela1191 7 днів тому

    Wtf? DeepSeek R1 7B got the strawberry question correct with the correct reasoning. On my PHONE. Something's wrong with your model.

    • @olebogengthothela1191
      @olebogengthothela1191 7 днів тому

      YES. LOCALLY. No internet. No remote desktop. Compiled Ollama using GO then downloaded the model in termux.

  • @calimark7448
    @calimark7448 7 днів тому

    I can run 70B on my old M1 mac studio with 64gig of ram...

  • @proxyjan
    @proxyjan 7 днів тому

    I guess to get some new graphics card, I should become a UA-camr

  • @zasuvkazasuvkovic498
    @zasuvkazasuvkovic498 7 днів тому

    *1:03** Is any data sent back to the DSR1creators? Are users a real users or just a beta testers for DSR1?* 🤔

  • @margovincent22x
    @margovincent22x 8 днів тому

    Can we use 2x5090 and combine the vram? With out Nvlink?

    • @fontenbleau
      @fontenbleau 7 днів тому

      No, it's not combine without link. I've tried with 4070 & 2070, at best you can only use one card on one task and other on other in ComfyUI, it can free some VRAM on main card.

  • @Singlton
    @Singlton 8 днів тому

    What is the name of the GUI tool you are using?

  • @MaduraPriyan
    @MaduraPriyan 8 днів тому

    I ran 7b on gtx 1070ti. Speed is pretty good

  • @paulroberts7429
    @paulroberts7429 8 днів тому

    Thanks to DeepSeek we now know Nvidia is using AI to squeeze these chips, these cards are a rehash with gddr7.

  • @Matrriosh
    @Matrriosh 8 днів тому

    i just installed full 32B model and i have Sapphire RX 7900 XT 20GB Nitro+ and it runs

  • @souljeah
    @souljeah 8 днів тому

    what are your PC specs?

  • @BrentLeVasseur
    @BrentLeVasseur 8 днів тому

    Honestly I’m fed up with these 5090 videos. The only fricken people in the world that can actually get their hands on these cards are UA-cam reviewers! I think I might start my own channel, just so I can get a GPU. 😂

    • @scorpizy
      @scorpizy 6 днів тому

      Well, then you might want to skip videos that have 5090 on the title. I think 🤔 even basic AI would be able to come up with that solution 😅

    • @BrentLeVasseur
      @BrentLeVasseur 6 днів тому

      @ Yup! Exactly. I am doing that now.

  • @Metarig
    @Metarig 8 днів тому

    You can run 32B on a 3090 without any issues, and it runs smoothly.

    • @lunch7553
      @lunch7553 4 дні тому

      Same only use around 20gb

  • @Vimblini
    @Vimblini 8 днів тому

    Rtx 5090 and deepseek in the same title is bound to be viral

  • @Phil-D83
    @Phil-D83 8 днів тому

    Intel b580 24gb with zluda

  • @tringuyen7519
    @tringuyen7519 9 днів тому

    So you bought a 5090 from the scalpers just to run DeepSeek distilled models locally, not gaming? Seriously?

    • @demolicous
      @demolicous 9 днів тому

      Why is that an issue?

    • @03chrisv
      @03chrisv 9 днів тому

      5090s are better suited at AI workloads than they are at gaming. Like what game even needs anything close to 32GB of vram? 😂 Most games use between 8GB to 12GB, with only a very select few that even use 16GB, which usually involves full path tracing. The 5090 is literally using a binned GB202 die used in AI workstations.

    • @Returntonature145
      @Returntonature145 7 днів тому

      ​@@03chrisvanything beyond RTX 3090 is not needed for gaming i think. But for AI you need multi GPU and at least 32GB vram

    • @fontenbleau
      @fontenbleau 7 днів тому

      This card is better at ComfyUI mostly, video generation in Hunyan all the way. There will be no VRAM in these in any future by temperature reason (more dense=more temp), server LRDIMM the dense RAM can already heat to 90 C.

  • @alby13
    @alby13 9 днів тому

    I want you to run more AI models locally on your PC.

  • @AnirbanKar4294
    @AnirbanKar4294 9 днів тому

    its not about the quantization. its about the actual model in parameters size 671b even if you run it at q4 its still much better than all these distill versions because the base model for that was deepseek v3 which is a very good model. And I know its not for home lab as least for now. but there are ways it can run at 1.58 bit with unsloth''s method. what require 131GB vram instead of 741GB vram

  • @val_bld
    @val_bld 9 днів тому

    I just turn deepseek on my macbook pro of mid 2017 with the badest intel CPU

  • @margovincent22x
    @margovincent22x 9 днів тому

    Please run 70b. Also can we use two GPU for faster and more accuracy?

    • @tobiasstoll4715
      @tobiasstoll4715 7 днів тому

      fp16 should give u nice accurancy... u would need 5 5090 to tun 70b locally just inferenz and around 10 5090 if u want to train it... with fp 32 you would need to double the amount of 5090s.... just my 2 cents

    • @margovincent22x
      @margovincent22x 4 дні тому

      So it's possible since there is no nvlink for 5090

  • @krinodagamer6313
    @krinodagamer6313 9 днів тому

    I got it running on a TITAN X Pascal of course it will I even run it in my application

  • @MrAbdoabd
    @MrAbdoabd 9 днів тому

    Which model the web of chat deepseek use for it self?

  • @dave24-73
    @dave24-73 9 днів тому

    Yes, it can even run locally without a gpu. Clearly performance is affected, but it can run.

  • @anshulsingh8326
    @anshulsingh8326 9 днів тому

    16p, q8 does they have difference in output?

  • @agush22
    @agush22 9 днів тому

    why is 14b using so much of your VRAM? I can run it on a 16gb card with a couple gigs of slack

  • @larrowvolru7204
    @larrowvolru7204 9 днів тому

    hum...maybe two Radeon RX 9070 XT can run better more

  • @CO8848_2
    @CO8848_2 10 днів тому

    i can run 7B on my 3070 pretty well, so why pay more

  • @alkeryn1700
    @alkeryn1700 10 днів тому

    no, you can't. the distills are not "versions" of the model.

    • @amihartz
      @amihartz 9 днів тому

      They are hybrids of R1 and other models (either Llama or Qwen depending on the one you download), their weights containing information from both models they were created from. I don't think it is unreasonable to say something like DeepSeek R1 Qwen Distill is a "version of R1," and equally I would not think it is very unreasonable to say it is a "version of Qwen," both statements are true since it's a hybrid of the two. It is being oddly nitpicky to try and fight against this.

    • @alkeryn1700
      @alkeryn1700 9 днів тому

      @@amihartz sure but it cannot be compared to the real R1, they are not the same model.

    • @jeffwads
      @jeffwads 8 днів тому

      You are correct, but 99.9% just can't grasp that the distilled models are qwen or llama. Heck, it even states the arch in this video a such and people still think it's R1. Notice the other one in this thread yapping about it being a hybrid, etc. Sigh.

    • @amihartz
      @amihartz 8 днів тому

      @ They are objectively not Qwen or Llama, this is easy to prove just by doing the "diff" command between the models, you will see they are different. The models are R1 Qwen Distill and R1 Llama Distill, not Qwen or Llama, nor are they R1. You are spreading provably false misinformation.

    • @alkeryn1700
      @alkeryn1700 8 днів тому

      @ they are qwen and llama based, yes the weights have been changed but it does not matter. if you do a distance analysis they are very very close.

  • @shock-blitz77
    @shock-blitz77 10 днів тому

    no deepseek can not run on a rtx 5090 but on a raspberry pi

  • @AndrewTSq
    @AndrewTSq 10 днів тому

    Do you really use the correct Deepseek R1?? I use the one from Ollama, and it had no problems answering the questions on the 7B model, also, the 32b model is only 20GB

    • @amihartz
      @amihartz 9 днів тому

      He might've downloaded the unquantized version.

    • @AndrewTSq
      @AndrewTSq 9 днів тому

      @@amihartz aahh, yes did not think about that :)

  • @themohammadsayem
    @themohammadsayem 10 днів тому

    i have a very old laptop and it running 7B model makes it go bonkers. i am looking to shift to mac mini m4. for running 14B model will 16gb be enough? or should i go for 24/32?

    • @prof2k
      @prof2k 10 днів тому

      The more the better honestly. But 16 does me really well. Just can't go any higher than the base sizes.

    • @agush22
      @agush22 9 днів тому

      macs have unified memory so the vram is also your system ram, 14b is around 11gb you would only have 5gb left for macos and whatever else you are working on

    • @themohammadsayem
      @themohammadsayem 9 днів тому

      @@prof2k which parameter are you running right now?

    • @themohammadsayem
      @themohammadsayem 9 днів тому

      @@agush22 so 24/ 32 gb would be better for running 14B?

    • @prof2k
      @prof2k 8 днів тому

      @themohammadsayem Honestly it varies and quality doesn't necessarily go up linearly in size. The best local experience has come from Gemma 2b . I've tried bigger models but for conversation Gemma has been better. It's no where near Claude though and tjis make me think about what I'm missing because I can't run greater than 9gb models.

  • @kakashi7119
    @kakashi7119 10 днів тому

    Make a video on how to make any deepseek model quantized

  • @taqinv2
    @taqinv2 10 днів тому

    i run that 33b model in RTX 4070 Super, it's really have amazing performance

  • @akam9919
    @akam9919 10 днів тому

    The fact the AI is reporting having a whole internal debate about how many R's are in strawberry. It's six btw

  • @craftmyne
    @craftmyne 10 днів тому

    i got 32b running on my M2 granted it's slow as balls but if i close almost everything it'll run, 14b is almost usable and anything lower runs like the wind, looking at your memory usage is bizarre, maybe i don't have context windows setup but my 7700xt can also run 14b but not 32b, and my mac has 24gb of ram letting it pull 32b nvm i have quantised versions of da models

    • @randomseer
      @randomseer 10 днів тому

      Even a phone can run 1.5-7 b

    • @AndrewTSq
      @AndrewTSq 10 днів тому

      Same here, but my Ollama Deepseek did not have any problems with the questions either, so wierd that his did not even could answer the strawberry question correct :)

  • @Dom-zy1qy
    @Dom-zy1qy 11 днів тому

    No joke, In this first release batch of 50 series cards, I think NVIDIA unironically shipped out more review samples to UA-camrs than they did to retailers. Maybe not too surprising, i guess. If stock is low, may as well build hype instead of selling a few hundred additional GPUs.

    • @gabrielesilinic
      @gabrielesilinic 5 днів тому

      The 50 series as far as I could understand is shit except for very specific AI tasks to not make it feel like shit. I will not buy it though I have the money. Conclusion: it's probably cheap to make.

    • @Avantime
      @Avantime 5 днів тому

      @@gabrielesilinic It's more about big tech AI demand. They're (or were) in an arms race to secure top end Blackwell GPUs, so obviously Nvidia could mark them up at insane profit, like the manufacturer playing the scalper. Blackwell consumer GPUs are there to crush anything AMD could come up with at CES 2025, and by the bewildered look of AMD's board partners in CES with AMD not launching their cards, I'm pretty sure Nvidia felt mission accomplished with the 5090 and 5080. Of course having it up for review and potentially curb stomping AMD's next Radeon cards is a great thing, but a paper launch is intentional because big tech pays far more for Blackwell AI, and AMD is unable to take advantage of Nvidia's paper launches with their own, relatively uncompetitive products. Previous Radeon cards weren't exactly value leaders, especially when Nvidia offers much better ray tracing and a more polished DLSS/frame gen experience, making Nvidia cards worth a little bit more. And used Turing/Ampere cards are relatively decent value. In addition far fewer retailers stock Radeon cards especially outside the US, and sometimes they mark them up due to their lack of volume sales, so much of the value proposition gets wiped out by higher retailer margins. That leaves AMD only with volume and availability vs. Nvidia's paper launches. But AMD doesn't have much leeway with the fabs, and CPUs are far more profitable for AMD. So Radeon gets second billing in production as AMD would rather not spend too much on volume, because Jensen might suddenly without warning turn on the tap, and flood the market with Blackwell. Because what Jensen wants more than selling GPUs to big tech at scalper prices, is to completely crush AMD at every turn.

    • @graphguy
      @graphguy 2 дні тому

      $2,000 for one card? that is absurd

    • @gabrielesilinic
      @gabrielesilinic 2 дні тому

      @@Avantime what I was saying is that in the 50 GeForce series in terms of raw performance there is a lot of smoke but not so much meat. The issue is that they technically suck for performance, half the frames they make is generated and this doesn't make them good GPUs.

    • @Avantime
      @Avantime День тому

      @ It's still a ~30% rasterization uplift, and there's no competition from AMD. They may suck for you but AMD isn't doing any better, and that's what matters for Jensen Huang. Nvidia doesn't really care about the consumer GPU market right now, and they can afford to because neither AMD or Intel are providing the competition the market needs.

  • @gaswegn
    @gaswegn 11 днів тому

    the issue with trusting ai is we taught it how to process data, and trained it to give outputs, but we don't know the processes between the two it's considered the black box on some channels it's interesting that deepseek does the thought process thing before giving you the real output. it's aimed for transparency and to give you insight on the black box, but now there's the question of how was the output of the thought process generated? still the unknown black box issue but a clever idea

    • @BineroBE
      @BineroBE 10 днів тому

      Eh the thought box is not what it actually "thinks". It just answers the prompt a first time, and then summarises it into the "real" answer. There is no thinking going on, we know _how_ it works. We just don't really understand why it works so well.

    • @randomseer
      @randomseer 10 днів тому

      It's still a black box , it's generating it's reasoning the same way got generates final output

    • @gaswegn
      @gaswegn 9 днів тому

      @randomseer well before you weren't sure what biases were in play for your final output, now you get a false window into how it came up with what it said, but even that solution has a new black box, we still don't fully understand it's interpretation and biases because the output of it's deepthink process it's still inexplainable

  • @gaswegn
    @gaswegn 11 днів тому

    its so annoying that Nvidia crashed because deepseek r1, a highly hallucinating copy of a copy (trained by gpts outputs), benchmarked alongside gpt why sell Nvidia? you can run it on M2s? cool that means you can run it better with 5090s Nvidia is down 600 billion for what?

    • @Dom-zy1qy
      @Dom-zy1qy 11 днів тому

      It'll probably trickle back up. It's investor panic by people who aren't really informed about technology and the implications of certain things. I do think NVIDIA is pretty risky though. I think if it takes AI too long to become profitable, people will pull out.

    • @randomseer
      @randomseer 10 днів тому

      It's not about nvidea consumer GPUs , it's about the fact that it was trained with a lot fewer nvidea GPUs than people expected. The primary reason Nvidea is valued is for the GPUs they use for training

    • @geezher
      @geezher 9 днів тому

      @@randomseer This. If companies are telling investors they need, say, 2 million GPU's to run ChatGPT but an alternative comes along that shows you only need 1/10th of those, well, then the demand for said GPUs might be a lot less than the 2 million.... That and the fact that smaller models that are just as accurate can run on competitors' products (say AMD gpu's or Apple M series stuff) means that demand for Nvidia might actually be even lower. Lower than forcasted demand means with alternatives might mean that the moat Nvidia has is non-existant. Those could be the reasons Nvidia dropped.

  • @Waszzup
    @Waszzup 11 днів тому

    How has nobody found this??

  • @miguelito5602
    @miguelito5602 11 днів тому

    Well technically that's not the real model 🤓☝

    • @saiuan4562
      @saiuan4562 11 днів тому

      Okay, dip****

    • @DebugWithLewis
      @DebugWithLewis 11 днів тому

      🤬🤬🤬🤬

    • @amihartz
      @amihartz 9 днів тому

      I don't know why people say this, all the models are "real" models, they're just different. It would make more sense to say that it is not the "original" model, because the Distill models were produced by taking things like Llama or Qwen and readjusting their weights based on synthetic data generated from R1, so the weights are a hybrid of the two models (either a hybrid of Qwen+R1 or Llama+R1 depending on which you download), but they are still "real" models, just not the original R1. I don't know what it would even mean to have a "fake" model.

    • @poisonza
      @poisonza 9 днів тому

      ??? So when you train on the output of o1 model suddenly the model becomes o1?? Naw its just qwen2 finetuned via grpo

    • @amihartz
      @amihartz 9 днів тому

      @ You literally are changing the weights of the model, it is no longer the same model. To claim that a modified qwen2 is literally identical to qwen2 is easily falsified just by running the "diff" command on the two model files. They are different models. If you adjusted qwen2's weights based on the output of o1, it would neither be qwen2 nor o1, but would be a new model that is hybrid between them and would take on characteristics of both, as this literally causes the model to acquire information and properties from o1.