DeepSeek's GPU optimization tricks | Lex Fridman Podcast

Поділитися
Вставка
  • Опубліковано 10 лют 2025
  • Lex Fridman Podcast full episode: • DeepSeek, China, OpenA...
    Thank you for listening ❤ Check out our sponsors: lexfridman.com...
    See below for guest bio, links, and to give feedback, submit questions, contact Lex, etc.
    GUEST BIO:
    Dylan Patel is the founder of SemiAnalysis, a research & analysis company specializing in semiconductors, GPUs, CPUs, and AI hardware. Nathan Lambert is a research scientist at the Allen Institute for AI (Ai2) and the author of a blog on AI called Interconnects.
    CONTACT LEX:
    Feedback - give feedback to Lex: lexfridman.com...
    AMA - submit questions, videos or call-in: lexfridman.com...
    Hiring - join our team: lexfridman.com...
    Other - other ways to get in touch: lexfridman.com...
    EPISODE LINKS:
    Dylan's X: x.com/dylan522p
    SemiAnalysis: semianalysis.com/
    Nathan's X: x.com/natolambert
    Nathan's Blog: www.interconne...
    Nathan's Podcast: www.interconne...
    Nathan's Website: www.natolamber...
    Nathan's UA-cam: / @natolambert
    Nathan's Book: rlhfbook.com/
    SPONSORS:
    To support this podcast, check out our sponsors & get discounts:
    Invideo AI: AI video generator.
    Go to lexfridman.com...
    GitHub: Developer platform and AI code editor.
    Go to lexfridman.com...
    Shopify: Sell stuff online.
    Go to lexfridman.com...
    NetSuite: Business management software.
    Go to lexfridman.com...
    AG1: All-in-one daily nutrition drinks.
    Go to lexfridman.com...
    PODCAST LINKS:
    Podcast Website: lexfridman.com...
    Apple Podcasts: apple.co/2lwqZIr
    Spotify: spoti.fi/2nEwCF8
    RSS: lexfridman.com...
    Podcast Playlist: • Lex Fridman Podcast
    Clips Channel: / lexclips
    SOCIAL LINKS:
    X: x.com/lexfridman
    Instagram: / lexfridman
    TikTok: / lexfridman
    LinkedIn: / lexfridman
    Facebook: / lexfridman
    Patreon: / lexfridman
    Telegram: t.me/lexfridman
    Reddit: / lexfridman

КОМЕНТАРІ • 260

  • @LexClips
    @LexClips  7 днів тому +9

    Lex Fridman Podcast full episode: ua-cam.com/video/_1f-o0nqpEI/v-deo.html
    Thank you for listening ❤ Check out our sponsors: lexfridman.com/sponsors/cv8472-sa
    See below for guest bio, links, and to give feedback, submit questions, contact Lex, etc.
    *GUEST BIO:*
    Dylan Patel is the founder of SemiAnalysis, a research & analysis company specializing in semiconductors, GPUs, CPUs, and AI hardware. Nathan Lambert is a research scientist at the Allen Institute for AI (Ai2) and the author of a blog on AI called Interconnects.
    *CONTACT LEX:*
    *Feedback* - give feedback to Lex: lexfridman.com/survey
    *AMA* - submit questions, videos or call-in: lexfridman.com/ama
    *Hiring* - join our team: lexfridman.com/hiring
    *Other* - other ways to get in touch: lexfridman.com/contact
    *EPISODE LINKS:*
    Dylan's X: x.com/dylan522p
    SemiAnalysis: semianalysis.com/
    Nathan's X: x.com/natolambert
    Nathan's Blog: www.interconnects.ai/
    Nathan's Podcast: www.interconnects.ai/podcast
    Nathan's Website: www.natolambert.com/
    Nathan's UA-cam: youtube.com/@natolambert
    Nathan's Book: rlhfbook.com/
    *SPONSORS:*
    To support this podcast, check out our sponsors & get discounts:
    *Invideo AI:* AI video generator.
    Go to lexfridman.com/s/invideoai-cv8472-sa
    *GitHub:* Developer platform and AI code editor.
    Go to lexfridman.com/s/github-cv8472-sa
    *Shopify:* Sell stuff online.
    Go to lexfridman.com/s/shopify-cv8472-sa
    *NetSuite:* Business management software.
    Go to lexfridman.com/s/netsuite-cv8472-sa
    *AG1:* All-in-one daily nutrition drinks.
    Go to lexfridman.com/s/ag1-cv8472-sa
    *PODCAST LINKS:*
    - Podcast Website: lexfridman.com/podcast
    - Apple Podcasts: apple.co/2lwqZIr
    - Spotify: spoti.fi/2nEwCF8
    - RSS: lexfridman.com/feed/podcast/
    - Podcast Playlist: ua-cam.com/play/PLrAXtmErZgOdP_8GztsuKi9nrraNbKKp4.html
    - Clips Channel: ua-cam.com/users/lexclips
    *SOCIAL LINKS:*
    - X: x.com/lexfridman
    - Instagram: instagram.com/lexfridman
    - TikTok: tiktok.com/@lexfridman
    - LinkedIn: linkedin.com/in/lexfridman
    - Facebook: facebook.com/lexfridman
    - Patreon: patreon.com/lexfridman
    - Telegram: t.me/lexfridman
    - Reddit: reddit.com/r/lexfridman

  • @negladiator
    @negladiator 4 дні тому +65

    DeepSeek’s success is like an underdog F1 team winning races against giants, not because they had more money or better cars, but because they engineered their way to victory with extreme optimizations. What is considered extreme? Like bypassing the default engine computer settings (NVidias CUDA library, NCCL GPU comms library) and writing their own software to precisely control fuel injection, turbo boost, and power delivery for each track. Heck, web developers always are looking under the hood for optimizations in popular libaries all the time. But that is cheap compared to optimizing for a training YOLO run.

  • @bigBlueXlot
    @bigBlueXlot 6 днів тому +208

    I shouldn’t have eaten so much paste in grade school

  • @johnnyday23
    @johnnyday23 6 днів тому +341

    nodding and pretending I understand what they're saying...

  • @Fordance100
    @Fordance100 4 дні тому +13

    For R1, the main thing was GRPO. R1-zero has very little supervision, just works. It's really a breakthrough. It's reproduced by several groups now.

  • @Phil-D83
    @Phil-D83 5 днів тому +19

    They have some very smart people at deepseek. If you read and understand some of the technical journals, articles, etc - well mostly understand it - you are blown away

    • @bobbysuazjFhvcfgh
      @bobbysuazjFhvcfgh 9 годин тому

      Seems like there is only one people in deep seek that is heavier then all other AI startups 😂

  • @alextrebek5237
    @alextrebek5237 4 дні тому +17

    1. They wrote the equivalent of machine code (assemblt) for the GPU (PTX)
    2. Used a bunch of other models (Mixture Of Experts/MoE)
    3. For MoE, it used a generalized alhorithm to determine which models ("experts") would be relied on and directed GPU compute time towards there along with load balancing; manually choosing the algorithm is bad ("The Bitter Lesson," Rich Sutton 2019), so use those that are scalable so as to avoid local maxima

  • @mandarine1007
    @mandarine1007 5 днів тому +16

    Excellent to have an impartial conversation about this! Open Source is an excellent path forward. Can’t wait for Digits 😮

    • @al3nmicl
      @al3nmicl 4 дні тому

      If the 5090 launch was any indication, digits will be unobtainable for the average customer.

  • @TemplemountKing
    @TemplemountKing 6 днів тому +74

    To make an analogy, you needs to serve 10 different dishes to in your restaurant, but your kitchen have only two stoves, so you need to decide which ones to cook based on what customers ordered, the old way was, you make ten stoves each with its own specialty dishes, but Chinese can’t afford 10 stoves, so they choose to use two and shuffling the dish.

    • @TemplemountKing
      @TemplemountKing 6 днів тому +3

      I think if I understand correctly the guy is saying this kind of trick or over engineering is not so great historically in the deep learning field, brute forces have been the best option.

    • @jxmai7687
      @jxmai7687 6 днів тому +3

      You need a very skill Chinese chef.😂

    • @TemplemountKing
      @TemplemountKing 6 днів тому +4

      To be more close to the so called mix of experts idea, you kind of get 10 Chinese who have to learn one dish each, and rotate to use that two stoves, the challenge is to make sure each new chef got a chance to practice his dish.

    • @corgirun7892
      @corgirun7892 6 днів тому

      It's not about being unable to afford it, it's about not being able to get it.

    • @SawyerFM
      @SawyerFM 5 днів тому +5

      It’s cooking two dishes in one wok at the same time by continuously flipping the wok.
      While rice is in wok, noodles are in the air, then flip…

  • @jwjohnson7909
    @jwjohnson7909 6 днів тому +71

    I fed the transcript to ChatGPT to explain this conversation to me in plain language. 😂

    • @jacobs8531
      @jacobs8531 5 днів тому +11

      Should have fed it to Deepseek :)

    • @antwango
      @antwango 5 днів тому +1

      @@jacobs8531 Beat me to it!!! XD
      Im actually impressed with DeepSeek no lie i already had ChatGpt app downloaded for about a yr and have been using it on and off.... but the PR and recent news about DeepSeek has me playing around with DeepSeek and just trying to catch it out.... i even asked it a vague casual question I was thinking of and it got it right! not using perfect grammar etc about a quote i vaguely remember from someone sometime? Genius!

    • @jrok96
      @jrok96 5 днів тому

      Same answer . It trains off sensai gpt

    • @nightshadegatito
      @nightshadegatito 5 днів тому

      I will never use chatgpt or openai, traitors. Deepseek for life or until it don’t work no more.

    • @高高阳-x8k
      @高高阳-x8k 5 днів тому +1

      怎么把视频文字提取出来啊,

  • @alan83251
    @alan83251 6 днів тому +60

    The low-level load balancing work is going to be extremely important for the Chinese devs if/when they move to their home-grown AI chips once they get their own non-ASML EUV lithography figured out. It would be really cool if some super-general version was possible that would allow one to mix GPUs like Nvidia and AMD together and the load balancing software would just tie it all together. Then there'd be no more vendor lock-in. Not sure if possible, but would be cool!

    • @hallockstuart7899
      @hallockstuart7899 6 днів тому +1

      I think this exists already it's called OpenCL. Check it out.

    • @seseseye
      @seseseye 6 днів тому +11

      The work has already begun and is expected to be resolved within the coming year. The core issue China aims to address is opposing the United States' exclusive monopoly.

    • @AP-ei4jt
      @AP-ei4jt 6 днів тому +11

      Nvidia's goal is to make CUDA as bloated as possible in order to jack up demand for their chips. It's not that hard to optimize the process with low level coding if you care to look.

    • @TemplemountKing
      @TemplemountKing 6 днів тому +4

      Engineering work involves many many layers and detailed work, at each element it might be simple, but when dealing large language model, you also have to dive into the microcode level is really challenging, because usually people knows high level libraries and thinking about training strategies don’t know the low level. It also takes long time to get things to mature, given only half year to figure the whole thing you need a good engineering team.

    • @muhammadsyahrani8858
      @muhammadsyahrani8858 6 днів тому +1

      China already do that with their own supercomputer 10 years ago

  • @vrealzhou
    @vrealzhou 6 днів тому +39

    I've read some Chinese media said DeepSeek R1 now can write the low level code itself to adapt GPU from other brands such as AMD or Huawei. Actually it's already running on Huawei's GPUs now. It claims that scarified 5% performance but reduced 70% of reasoning cost.

    • @boonchweeng6567
      @boonchweeng6567 6 днів тому +4

      Precisely, R1 is supposed to be good in coding and there is no reason why it can’t help in coming with PTX for GPU

    • @Gilberthasit
      @Gilberthasit 6 днів тому +2

      Non of this matters it’s old technology

    • @soheiladam7510
      @soheiladam7510 5 днів тому

      ​@@Gilberthasit you'll know why it matters you fool.

    • @Glomly
      @Glomly 4 дні тому +3

      @@Gilberthasit what exactly?

    • @MeditationMindless
      @MeditationMindless 4 дні тому

      So are we bullish on amd?

  • @wasimjaved7104
    @wasimjaved7104 5 днів тому +17

    Is it very difficult to accept that DeepSeek developer have really developed something innovative rather than just branding them as ‘lucky’ ?

    • @iliak3937
      @iliak3937 5 днів тому +5

      also they “luck” based on “stolen” openai models as we know 😂 ps. and of course best nvidia chips they bought illegaly…

    • @wasimjaved7104
      @wasimjaved7104 5 днів тому

      @ @iliak3937 seems like you already made up your mind based on some stereotype. If you listen to the whole podcast not just this 10mins clip, you will find both Nathan and Dylan are clarifying the dubious claim Sam Altman made that DeepSeek stolen their OpenAl model, without providing any proof. DeepSeek’s code base is truly open and public you can download it on your computer and check with GPT api. Other than Mr Sam’s childish claim there hasn’t been any substantial evidence so far. Financial Times mentioned DeepSeek wrote in their paper that they used distillation(which is a data filtration process) to optimise input data. You can hear Nathan explaining in the main podcast it is a standard practice in model training. So no reason to mock DeepSeek.

    • @elvispontes4165
      @elvispontes4165 3 дні тому +8

      Some are still coping... But of course the majority with some brains knows that it was a great feat with all the efforts to handicap their chip industry.

    • @wasimjaved7104
      @wasimjaved7104 3 дні тому

      @@iliak3937 seems like you have already made up your mind based on some stereotype. If you listen to the whole podcast not just this 10mins clip, you will find both Nathan and Dylan are clarifying the dubious claim Sam Altman made that DeepSeek stolen their OpenAl model, without providing any proof. DeepSeek’s code base is truly open and public you can download it on your computer and check with GPT api. Other than Mr Sam’s childish claim there hasn’t been any substantial evidence so far. Financial Times mentioned DeepSeek wrote in their paper that they used distillation(which is a data filtration process) to optimise input data. You can hear Nathan explaining in the main podcast it is a standard practice in model training. So no reason to mock DeepSeek.

    • @slimshadybball
      @slimshadybball 19 годин тому

      they weren't branding them as lucky were they now?

  • @minimal2224
    @minimal2224 5 днів тому +9

    Guy on the left has a big freakin BRAIN

  • @Gastao2000
    @Gastao2000 14 годин тому

    Thaks for sharing guys! But I believe there are two things here:
    1) AI processes today (both training and inference) are extremely brutal force processes. The inspiration is the brain, but the subparts of the model have very low specialization compared to our brain, and we are going through a similar problem that “evolution” faced till we got to where we are today
    2) Developers (specially younger ones) got spoiled with cloud/virtual computing and don’t pay as much attention to the quality of the code and how to get the best out of the hardware. What you just explained was standard procedure when people used to code in punch cards (and I never used them). Complexity abstration in programming has allowed way more people to code, but it doesn’t mean that they code well, because a lot don’t have the base understanding of how a code runs in the computer.
    These two things combined brought us to the other side of the pendulum… Therefore I do bellieve that there will still be a lot of improvements related to architecture (not only the network architecture, but the overall system architecture)

  • @codyfan1097
    @codyfan1097 6 днів тому +9

    Great guest choice

  • @saurabhahlawat425
    @saurabhahlawat425 3 дні тому

    Best lex guests in ai so far. Need to watch the whole episode.

  • @KamiKomplex504
    @KamiKomplex504 3 дні тому +1

    Well said overall. This "bitter lesson" rings familiar. Reminds me of the no early optimizations rule and other things that keep coming up. I do really feel this is true, people aren't truly innovating, they are hunting for the quick and easy wins. People have stopped looking to report 2x gains and settle for 10% because it is not nothing and everyone has settled into the assumption we are on the right track. My hot take is analog processing and compute in memory is the future and 10 years from now people will wonder why we stayed on transistors and massive gpus without stopping to investigate memristors more.

  • @alexanderjager2697
    @alexanderjager2697 4 дні тому

    Simplicity wins because (most of the time) complicated things make the training slow and the scaling difficult. The only exception to this (and deepseek showed it) is when "complicated things" (aka custom ptx code and GPU kernels) safe you memory/compute/communication during training, enabeling you to scale up Data/Inference/...

  • @johnashton4086
    @johnashton4086 5 днів тому +5

    The equivalent of compiler lectures from the 1970s. Examining sand grains....

  • @analogdesigner-Jay
    @analogdesigner-Jay День тому

    Excellent conversation, smart people and Lex is so cool!

  • @OgOssman
    @OgOssman 4 дні тому +10

    Tldr= lex listens to someone speak, and than completely ignores it and will bring up something He thinks is important, and the guest have to just head nod and pretend he is smart....

    • @paaabl0.
      @paaabl0. 2 дні тому

      Classic MAGA, it's all just about pretending

  • @jimstone6570
    @jimstone6570 6 днів тому +29

    Yeah but can they change a flat tire

    • @DJNOZ805
      @DJNOZ805 6 днів тому +18

      Probably not, they will just design a tire that never goes flat to solve that problem tho

    • @hmind9836
      @hmind9836 6 днів тому +1

      Can you? - GPT 6.0 in Azimov mode

    • @PurpleHeat
      @PurpleHeat 5 днів тому

      Can you?

    • @jimstone6570
      @jimstone6570 5 днів тому

      It's a joke as obviously they have high intelligence in compute but yes I can

    • @ndmitri1
      @ndmitri1 5 днів тому

      But can you create a tire that doesn’t go flat?

  • @drew3331
    @drew3331 4 дні тому +1

    As the owner of the microwave gang subreddit i am happy to help

  • @cosmicstruggle2042
    @cosmicstruggle2042 6 днів тому +9

    I have no idea what they're talking about I agree though.

  • @devoptimist
    @devoptimist 6 днів тому +10

    Yes, tech can be very tricky stuff. Totally get it. Hey I just wonder if these guys ever tried turning it off and turning it on again? Just a thought

    • @josephposenecker9741
      @josephposenecker9741 6 днів тому +1

      I find sometimes you can lightly tap it or give it a little shake and it works better.

    • @devoptimist
      @devoptimist 6 днів тому

      @@josephposenecker9741 Yes, I think the tiny AI experts living inside the motherboards get stuck trying to crawl through the wires sometimes, and this can free them up

  • @sbqb21
    @sbqb21 2 дні тому +1

    This channel has inspired me to learn coding thank you.❤

  • @Moped_Mike
    @Moped_Mike 4 дні тому +1

    Please put Guest name in title!!!!

  • @tommyleite-x3o
    @tommyleite-x3o 5 днів тому +12

    I think this would be a great lesson for the US. Instead of just having all the best brains using the best technology available, set aside a few of those brains and start a project where they are only allow to use less than ideal technology. It could be a college project funded by the government, where students work on it with less than ideal technologies.

  • @fangzhengchen7620
    @fangzhengchen7620 5 днів тому +1

    Just found this channel. Damn, just what I need all the time.

  • @Michaelno
    @Michaelno 7 годин тому

    So it does things faster. But does it do things better?

  • @nagabandarupalli8880
    @nagabandarupalli8880 6 днів тому +8

    Unless the innovation comes from US, its just luck. Such a novelty. Isnt it how every research ever works that you try 1000 things, 1 will work?

    • @toma9596
      @toma9596 4 дні тому +1

      Thats crazy to say

    • @oceanmangg
      @oceanmangg День тому

      American tech giants seething and so are the people who bought nvidia stocks

    • @nagabandarupalli8880
      @nagabandarupalli8880 День тому

      @toma9596 its not. I have huge respect for Lex Fridman. I enjoy his podcasts. But the way he was trying to diminsh their achievements while the other guys are trying to admire with deepseek achieved, it was quite visible.

  • @madhusudanrao1865
    @madhusudanrao1865 6 днів тому +1

    I came across and left, to learn it on Deepseek

  • @andrewp5171
    @andrewp5171 6 днів тому +4

    Right?

  • @sub-vibes
    @sub-vibes 6 днів тому +1

    _"With peace and love, of course..."_

  • @benjaminbertrand1259
    @benjaminbertrand1259 5 днів тому +1

    WOT?
    Sips tea, sits back and folds arms.

  • @ashutoshpadhi2782
    @ashutoshpadhi2782 4 дні тому +1

    I felt so proud that I could understand all of this.

  • @Karol-g9d
    @Karol-g9d 5 днів тому

    spike are usually hint people send .

  • @m1nhhoang
    @m1nhhoang 6 днів тому +6

    Hmmm...does that mean that the Nvidia chip was not optimized by Nvidia itself?

    • @betterserenity
      @betterserenity 6 днів тому +6

      Basically yes

    • @l3eatalphal3eatalpha
      @l3eatalphal3eatalpha 5 днів тому +5

      Not so much the chip itself but the layers of access/SDK which are by definition generalised. And also - just like gfx drivers in the past - optimisation is a continuous and incremental process.

    • @pieterrossouw8596
      @pieterrossouw8596 3 дні тому +2

      Nvidia optimised for a more generalised use case where there's many parameters they can't assume so must pick reasonable trade-offs. What Deepseek could exploit is that it's not a generalised use case and they can predict great parameters by knowing exactly the model architecture, exactly the training cluster dimensions and where the limits are.
      A Toyota Corolla is an amazing generalised solution to get the average driver where they want to go on average roads and unpredictable environments.
      The Red Bull RB19 F1 car was an amazing solution for getting Max Verstappen to win a championship within tight design constraints, known tracks, predictable conditions etc.
      Both are supremely hard to solve, but CUDA is like the Corolla and Deepseek needed a RB19 for what they wanted to do.

  • @IceColdProfessional
    @IceColdProfessional 5 днів тому

    I listened to the whole thing. Understood maybe 15% of it.

  • @DavidRothLovesTech
    @DavidRothLovesTech 3 дні тому

    Fascinating discussion! I truly appreciated the depth of knowledge, wealth of experience, and genuine enthusiasm the guests brought to the conversation.

  • @cazzone
    @cazzone 2 дні тому

    The interviewer at the end sounded like a cowboy in a western movie

  • @ArnoScholtz
    @ArnoScholtz 4 дні тому +2

    Now I know what my girlfriend feels like when I talk tech.

    • @bidyo1365
      @bidyo1365 День тому

      Nice videos especially the old-looking camera and the settings of the Drift Trike video is nostalgic...
      in 2014 I was still a kid, playing and very excited...
      but yeah it made me stop and reanalyze a little bit of things in my life... like, create one game then earn a lot of money, after that just enjoy the world, fkkk! ! 😆🤩

  • @Rustylad12
    @Rustylad12 День тому

    The Richard Sutton's bitter lesson memo should be read by every AI researcher before they start their journey.

  • @hectron-gon-code
    @hectron-gon-code 6 днів тому +2

    We as humans trying to emulate functionality of the brain. I wonder if a robot will look at itself and try to recreate its trace and via.

  • @Karol-g9d
    @Karol-g9d 5 днів тому

    nvidia or cpu does this ? If in fullscreen mode if compiler is not skipped . Many if . Mosyt do not check all the if

  • @get_downed_boi6270
    @get_downed_boi6270 4 дні тому

    As someone trying to train a model for financial analysis, this interview was super useful….
    Ai and training models really is the wild west

  • @DopeAsPho
    @DopeAsPho 3 дні тому

    Try creating a console with a loophole for both ends. Loophole console.[]

  • @johnjay6370
    @johnjay6370 5 днів тому

    Interesting, I remember reading articles about how developers would optimize and write very sloppy code for video games during the 90s to get every ounce of performance out of the system. When i hear "sloppy code", I think of innovation and creativity but some cases, it can be someone being "LAZY". If you can get a 10%+ performance gain with "Sloppy Code" there might be a case to use "Sloppy Code" this is a race we have entered...

  • @WeylandLabs
    @WeylandLabs 6 днів тому +7

    I cant take the word "RIGHT" from this man anymore ! - Take the transcript and upload to your LLM, remove the word "right" and replay text to speech. Its so much better... 💯

    • @chrismikeryan
      @chrismikeryan 6 днів тому +1

      I didn't notice till you said this. Damn it.

  • @Penaming
    @Penaming 6 днів тому +3

    Deepseek hired gold and silver olypiads in math, physics, etc to build their team. Good luck.

    • @rozburg
      @rozburg 6 днів тому +1

      There comes a point where the model becomes better than every math genius in the world.

  • @tonyh7158
    @tonyh7158 5 днів тому +4

    So, does Deepseek really has lots and lots of h100 like Elon Musk claim?? if they really have so many h100, why will they do the extra hard work to program in assembly language ??

    • @dianasong4594
      @dianasong4594 5 днів тому +1

      They don't. DeepSeek focuses on "A" of AI.

    • @soheiladam7510
      @soheiladam7510 5 днів тому +3

      Elon doesn't know what he's talking about.

    • @Glomly
      @Glomly 4 дні тому

      Why would you want to walk if you have a car?

  • @bojangles2492
    @bojangles2492 4 дні тому +1

    Ah yes, the multilayer perceptron, the feed forward network and the attention mechanism. All part of my daily lexicon 🤯
    Microwave goes MMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMM BEEP.

  • @Nick3DvB
    @Nick3DvB 6 днів тому +3

    i know some of these words

  • @jhayes42
    @jhayes42 6 днів тому +1

    Right? Train a person to talk right? Am I right?

  • @soheiladam7510
    @soheiladam7510 5 днів тому +1

    did that guy say luck🤦 what an embarrassment, there is nothing lucky about what DeepSeek's team have achieved.
    It's skill and hard work.

  • @gareththomas3234
    @gareththomas3234 5 днів тому

    GPU assembly code

  • @karanchandra2491
    @karanchandra2491 2 дні тому +1

    I think one day Deepseek will implement and improve to a extent that it can decide for itself

  • @LilBigDude28
    @LilBigDude28 6 днів тому +5

    11:50 I imagine those spikes in loss could be due to the model trying to "test out" new ideas. When humans learn language, there are times when the language starts to be used "creatively". Almost in the same sense as how new slang is born. Which in a broader sense could be a byproduct of learning to generalize and improvise/novelize.

    • @hmind9836
      @hmind9836 6 днів тому +3

      Hmmm... I think you might be anthropomorphizing the math/algorithm behind these transformer models, but that might actually be a good thing. In the future, it’s possible that most of the "low-level" work could be handled by A.I. models-which might be concerning in some cases, as we could lose control of the software we're running. We might end up focusing primarily on high-level thinking, relying on our intuition to explore new ideas (something current models aren’t really good at). Imagine saying, "Hey GPT-8.0, I was wondering if the loss spikes during transformer model training might be due to..." and then the model replying, "Well, the idea of testing out new ideas might hold true for humans, but the gradient descent mechanism doesn’t really incentivize the discovery of new high-level concepts and ideas. It’s more likely that those loss spikes are related to, well, blablabla..."

    • @circuitbreaker8314
      @circuitbreaker8314 5 днів тому

      @@hmind9836That is assuming that 'it knows ' about a context that it has never seen before.

  • @killerbee2218
    @killerbee2218 5 днів тому

    Can someone explain it in star wars

  • @sportsonwheelss
    @sportsonwheelss 4 дні тому

    That title tell you the level of coping this country has.

  • @reversetransistor4129
    @reversetransistor4129 4 дні тому

    Good that I didn't do what I wanted last year, even at the low level, I'm flaked

  • @FamilyYoutubeTV-x6d
    @FamilyYoutubeTV-x6d 6 днів тому +3

    None of this is specially technical or hard to understand. It's just programming at a lower layer than Pytorch and a much higher layer than machine code. If you do not understand these things, then why are you watching, go study the fundamentals first.

  • @ProjectTurtleTech
    @ProjectTurtleTech 6 днів тому +1

    So they didn't have to worry about backward/cross compatibility, and optimized for their hardware.

  • @billalbaugh
    @billalbaugh 4 дні тому +1

    If I hear the word “right” again, I’m gonna scream.

  • @ImranSahir1
    @ImranSahir1 5 днів тому

    I think I understood it like 98.9% of this; is there anyone who can help me understand the rest of 1.1%, please? Thanxxxx.

  • @mychannel-bu6jx
    @mychannel-bu6jx 3 дні тому

    I understood about .03% of this.

  • @kaleiohu
    @kaleiohu 5 днів тому +1

    I could be wrong but it seems DeepSeek didn't really need the best programmers, just better programmers than Nvidia.

  • @Karol-g9d
    @Karol-g9d 5 днів тому

    voice mode only ai is where things are great

  • @13thbiosphere
    @13thbiosphere 6 днів тому +1

    You understand about 30% of what

  • @josephposenecker9741
    @josephposenecker9741 6 днів тому +1

    Did you know the earth was actually always round, not just when Christopher Columbus sailed to America? Blew my mind.

  • @80-80.
    @80-80. 4 дні тому +1

    I have a dream 😴

  • @COLLAPSE.of.US.ECONOMY
    @COLLAPSE.of.US.ECONOMY 5 днів тому

    I don't know how many thousands of Nivida advanced chips that Chat GPT used, or how many billions of dollars spent on training the algorithms. I asked Chat GDP to solve a very basic mathematical question,
    A heavy smoker can make 1 new cigarette from 3 cigarette butts. He has 11 cigarette butts. So, how many new cigarettes can he make?
    I was very disappointed at the wrong answer provided by Chat GTP 😢😮😮

  • @vicentevaldez1696
    @vicentevaldez1696 4 дні тому +6

    Americans extremely good in blah blah 😂😂😂😂
    Chinese optimized Deepseek down to hardware level...need extreme technical abilities to do it...its like creating a web page using assembly language😮

  • @boonchweeng6567
    @boonchweeng6567 6 днів тому +1

    Rather than being concerned about the number of Nvidia GPU would be required, the breakthrough is in the development of robots and self driven cars because R1 can make dedicated models so cheap and efficient

  • @shoe_Bin
    @shoe_Bin 4 дні тому

    There comes a point in everyone life where you you brain ability to comprehend lags your interest in what js being said. This is one of those times…

  • @Iriejwnkxjdneoldnf
    @Iriejwnkxjdneoldnf 6 днів тому +2

    Simple math

  • @crashingtiger
    @crashingtiger 2 дні тому

    Only honest AI matters, all others are GIGO.

  • @michaelnip9464
    @michaelnip9464 5 днів тому

    Another podcast explained that using PTX, the lower-level machine language, DeepSeek can use non-NVIDIA chips to train their AI model. This means they can use Huawei chips to train their LLM immediately. Huawei chips may be inferior in performance but abundant in supply. This podcast also discussed data center scale and power infrastructure for training LLMs. I think the playing field may be equal when all the factors are taken into consideration.

  • @JohnKuhles1966
    @JohnKuhles1966 5 днів тому

    3 Ultra Nerds in 1 room production :P

  • @pauldannelachica2388
    @pauldannelachica2388 5 днів тому

    ❤❤❤❤❤❤

  • @HumanPP
    @HumanPP 2 дні тому +1

    Mistral AI + DeepSeek =🧠❤💪

  • @EngineeringAdjacent
    @EngineeringAdjacent 6 днів тому +1

    Can anyone tell me if I'm understanding the real significance of deep seek is taking a big step to synthetic data?

    • @TemplemountKing
      @TemplemountKing 6 днів тому +1

      The big model cost a lot to run, but you can use it to output correct questions and answers and use it to train a smaller model, and that smaller model will become really good, that’s very significant because you can run a very small model on your laptop now, and it allows a lot more players to get in

    • @kazedcat
      @kazedcat 6 днів тому +2

      DeepSeek is not one big innovation but multiple of them working together to make a bigger splash.

  • @chetan_naik
    @chetan_naik 6 днів тому +4

    Natural selection pressure of lack of short trees forced giraffe to evolve elongated necks,
    Similarly USes trade policy selection pressure of restricting high end GPUs is pushing Chinese to evolve better software to compensate for lack of high end GPUs.
    Americans are fighting the losing battle of trying to stop evolution.

    • @Glomly
      @Glomly 4 дні тому

      It's easier to make someone do the work and steal it. China stole from US, US will steal from China

  • @munashegudza7628
    @munashegudza7628 5 днів тому

    What did i just understand 😂😂😂

  • @vaishnoo1168
    @vaishnoo1168 5 днів тому +2

    I'm gonna share this to appear pretentious

  • @koliux1
    @koliux1 3 дні тому

    Trust me bros it works.... experts ... no f.... about ... yolo run ...

  • @TheArfdog
    @TheArfdog 4 дні тому

    Wait wait, it still uses GPUs?!? No way. I thought they figured out how to run on 256kb of RAM.

  • @zappulla4092
    @zappulla4092 5 днів тому +2

    Stop saying right.

  • @pizzyfpv
    @pizzyfpv 20 годин тому

    Yeah too bad deep seek really sucks if you actually take the time to use it I was trying to have it to do some simple things I can't even do

  • @tammoprien8021
    @tammoprien8021 5 днів тому

    Right

  • @Andrew-rc3vh
    @Andrew-rc3vh 5 днів тому +1

    Don't these experts you interview ever use diagrams? Every engineer I have come across does.

    • @goldnarms435
      @goldnarms435 5 днів тому

      Diagrams available in the research papers.

  • @ifayezali
    @ifayezali 3 дні тому

    Look at his head looks like quantum computer

  • @MollieInya
    @MollieInya 6 днів тому +1

    Unpopular data viewpoint- there is no bad data. Bad sources maybe. Bad data is like bad press. It still has value.

    • @leonlysak4927
      @leonlysak4927 6 днів тому +8

      You haven't written a single piece of software in your life. Don't have such strong opinions on things you know nothing about

    • @evr0.904
      @evr0.904 6 днів тому

      ​@@leonlysak4927Don't you know ChatGPT has made everyone an expert on everything.

    • @dtrcs9518
      @dtrcs9518 5 днів тому +1

      There's absolutely bad data wtf

  • @goodneighborsnetwork
    @goodneighborsnetwork 5 днів тому +2

    The true sign of understanding is being able to explain complex concepts so that a 5th grader could understand them. Grade = F

  • @damonkatos4271
    @damonkatos4271 3 дні тому

    If you understand all of this, raise your hand.🙋‍♂️

  • @ThreepwoodFan
    @ThreepwoodFan 2 дні тому

    I wanna visit r/microwavegang now lmao

  • @TheBhushanJPawar
    @TheBhushanJPawar 2 дні тому

    They are from different planet😅

  • @johnkost2514
    @johnkost2514 6 днів тому +1

    So basically multi-model-sharding across the GPU fabric ..

  • @ah244895
    @ah244895 2 дні тому

    Can't the American models now take the lessons learned by the Chinese and integrate them into their models with all the hardware at their disposal and leap forward?