DeepSeek R1 Coldstart: How to TRAIN a 1.5B Model to REASON

Поділитися
Вставка
  • Опубліковано 30 січ 2025

КОМЕНТАРІ • 90

  • @HaraldEngels
    @HaraldEngels 3 дні тому +61

    I am using DeepSeek releases since over 9 months. The results have been great all the time but are getting better and better. I am running locally on my Linux PC all Qwen based DeepSeek R1 models and they are all great. The 1.5B model works fantastic when you are using it in the q16 variant. It is really killer. Inference is not very fast since I am running all models (from 1.5B up to 32B) on my CPU Ryzen5 8600G WITHOUT a dedicated GPU adapter. The CPU uses up to 40GB of my 64GB RAM for the 32B model. With good prompting the results are fantastic and save me hours of work every day. The dynamic memory allocation of the 8600G is great and allows me to run powerful LLMs with a small budget. My PC has cost me $900.

    • @Aurelnpounengong
      @Aurelnpounengong 3 дні тому +4

      wait you're able to run a 32B model on just your CPU? i have a RTX 4060 TI with 16 gB of VRAM and I'm scared to download a 32B model 😅

    • @rhadiem
      @rhadiem 3 дні тому +2

      @@Aurelnpounengong The Ryzen5 8600G has a GPU on the processor and can use system memory for VRAM, but much more slowly. (40gb out of the 64gb system memory) He provided the details to research the parts you don't understand.

    • @gracegoce5295
      @gracegoce5295 3 дні тому +2

      really ? all this cost you 900 ? 64 gb ram ?

    • @Aurelnpounengong
      @Aurelnpounengong 3 дні тому

      @@rhadiem ahhh I see, i did not know it used system emmory as VRAM. I also have 64GB DDR4 memory do you think I'll be able to run a 32B model with my Graphics card with some Memory offset to system memory?

    • @trevoC132
      @trevoC132 3 дні тому +1

      @@Aurelnpounengong It will run, just slow. I can run a 32b on my 4090, but anything larger and it has to swap in and out of memory which is painful.

  • @agenticmark
    @agenticmark 3 дні тому +14

    ive also had luck gettign the model to reflect using - reversing the calculation (math) - writing the documentation while it codes - writing a tutorial while it codes.
    this is one of the best videos I have seen in some time Chris!

    • @chrishayuk
      @chrishayuk  3 дні тому +1

      Awesome, so glad you’ve seen similar results

  • @aaronabuusama
    @aaronabuusama 3 дні тому +19

    It would be awesome if you did a tutorial on fine tuning a reasoning model with tool calling abilities

    • @chrishayuk
      @chrishayuk  3 дні тому +20

      That is a really good shout, I will do that

    • @zacharielaik8652
      @zacharielaik8652 3 дні тому +3

      Yes that would be awesome !

    • @punchster289
      @punchster289 3 дні тому +1

      yes! i want to train a model for z3 use when doing logical reasoning. very powerful solver

  • @greghampikian9286
    @greghampikian9286 3 дні тому +4

    Thanks for answering all the basic questions I had. Great teaching style, even for the non-programmer.

    • @chrishayuk
      @chrishayuk  3 дні тому

      Glad it was useful, I had a lot of fun making this video

  • @PhilWeinmeister
    @PhilWeinmeister 2 дні тому +1

    I may no longer be at IBM, but I was curious to hear your thoughts on DeepSeek. Very insightful video, thanks!

  •  2 дні тому +1

    Excellent ! Bravo! I am spending hours analyzing how DS1-R 32B works with my 4090. I am getting amazing results everyday...

  • @OpenAITutor
    @OpenAITutor День тому

    Hey Chris, Great video. Really enjoy the way you teach. Keep up the good work. Can't wait for your next video on RLHF.

  • @d.d.z.
    @d.d.z. 3 дні тому +3

    Keep doing helpful videos Chris 😊

    • @chrishayuk
      @chrishayuk  3 дні тому +1

      Always, glad it was useful, I was particularly happy with this one

  • @PunitPandey
    @PunitPandey 3 дні тому +2

    Great video. Looking forward to RL video.

  • @wwkk4964
    @wwkk4964 3 дні тому +1

    Brilliant work! Yes i do remember you mentioning that o1 was mcts and and r1 was not. I agreed with you that r1 was surely not, will be exciting to see if o1 or o3 used similar techniques or used mcts!

    • @chrishayuk
      @chrishayuk  3 дні тому +2

      I’m 100 percent convinced that o1 is using search (specifically mcts) at inference time, and I’m 100% convinced that R1 will do the same in a future release when they figure it. But the results they’ve gotten without it, is pretty incredible

    • @wwkk4964
      @wwkk4964 3 дні тому +1

      @chrishayuk It just blows my mind every time I think about it still! That one can converge through search or learning at these endpoints so long as one is bootstrapped with some notion of correctness! Your demo was incredible work. thanks again.

    • @chrishayuk
      @chrishayuk  3 дні тому +1

      thank you, yeah, i came up with the concept of getting the compiler to do the calc, and the ai to do the explanation, a while back, i think i did a video on this in june 2024. so it seemed a natural fit when i saw the long chain of thought coldstart piece from deep seek. felt like a good merge. i was blown away also on how good the results were

  • @phillipneal8194
    @phillipneal8194 День тому

    Thank you for a great presentation, especially for your explanation and examples of
    the "cold start' part. The 'Incentivizing' paper and the technical report are heavy going, especially
    the reinforcement learning algorithm. When will you have a video out explaining the RL algorithm ?

  • @geocorpsys
    @geocorpsys 2 дні тому

    Thank you Chris. I am hoping I will be able to replicate this on my old windows laptop. I want to be able to train a base model from scratch like you did here.

  • @blue-y3r
    @blue-y3r 3 дні тому +1

    In your newly trained qwen model. What is the verifier step doing? Since there is no math compiler in qwen

    • @chrishayuk
      @chrishayuk  3 дні тому +4

      I’m not verifying yet, I’ll do that in the RL stage in the next video. I’m just generating long and accurate chain of thoughts for coldstarting training

  • @kishoretvk
    @kishoretvk 3 дні тому +1

    hello Chris Hay !
    this is crazy, you made this amazing tutorial. thats mind blowing. while openAI is cloased, open source community is actually builidng it openly for community. although comanies like deepseek are validation and inspring. community is doing its own discovery. you are very inspring as well.
    thanks again for a wonderful video

    • @chrishayuk
      @chrishayuk  3 дні тому +1

      Thank you, I appreciate it, I was pretty pleased with this one, glad it’s useful

    • @kishoretvk
      @kishoretvk 3 дні тому +1

      @@chrishayuk we might not need MOE now , as we need only cold start data for different tasks
      1. fuction calling
      2. coding
      3. summarization
      4. role play
      5. nlq and others
      we can do this on colab as its 1.5b, its going to crazy

    • @chrishayuk
      @chrishayuk  3 дні тому +1

      It’s cool right

  • @mrd6869
    @mrd6869 3 дні тому +1

    One cool addition.i use TwinMind AI on screen assistant to explain what your doing exactly,as i watch the video.(Reads transcript im guessing)
    Anyway it makes understanding the topic far easier.

    • @chrishayuk
      @chrishayuk  3 дні тому

      oooh, that sounds pretty sweet

  • @sumitmamoria
    @sumitmamoria 2 дні тому

    Good work. One tiny suggestion - Maybe try using word-wraps for long lines , for better readability when watching a video.

  • @SDGwynn
    @SDGwynn 2 дні тому

    Very much appreciate your videos. Thank you. I noticed your training data jsonl format is different than your validation and test jsonl format. Could you please explain?

  • @usget
    @usget 3 дні тому +4

    Can a reasoning model figure out that it doesn't know something, and ask for inputs? Or could it be trained to ask?

  • @rodnet2703
    @rodnet2703 2 дні тому

    Thanks for the info! I followed your instructions and it’s training the model but it’s pretty slow on my M1 Mac. Is there a similar software for Linux that I can coldstart train the model on a VPS?

  • @AndyHuangCA
    @AndyHuangCA 3 дні тому +1

    Given that the intention is not so much to train new knowledge, but synthesize chain of thought capabilities on existing models, how good would it work if we were to use R1 to generate a bunch of non-math questions/thinking/answers input output as the cold start seed?

    • @chrishayuk
      @chrishayuk  3 дні тому +1

      That’s pretty much what happens with the RL stage.. but I also think you can use verifiers to do this well also

    • @AndyHuangCA
      @AndyHuangCA 3 дні тому

      @@chrishayuk Thanks! I was playing around with Granite 3.1 MoE 3B, found it to be insanely fast even on CPU only. I'd be really curious to see how much "intelligence" we can extract from smaller MoE models like that by synthesizing chain of thought. I'll have to find some time to play around and see what could be extracted. I'm thinking a semi-capable thinking model, with MCP (thanks to your MCP-CLI project), that requires no GPU will be a very powerful local assistant!

  • @seanplynch
    @seanplynch 3 дні тому +1

    Fantastic, well done

  • @johntdavies
    @johntdavies День тому

    I'm sure you're aware ot the Qwen-maths models but using these reasoning techniques it would be interesting to see if a small (Qwen2.5-1.5b) model could be trained to reason geometry or integration in the same was a mathematician would simply apply the rules they know to see what fits.
    I think the only limitation with this is the size of the context. I put the DeepSeek-R1-7b (Q4) on my phone and it was good but limited. I increased the context to 8192 and wow, it solved things o1 struggled with and failed.

  • @blue-y3r
    @blue-y3r 3 дні тому +1

    Are you saying there is a math compiler in deepseek R1 ? Its open source, so that can be checked

    • @chrishayuk
      @chrishayuk  3 дні тому +1

      They said in the paper they use a math verifier

  • @EliSpizzichino
    @EliSpizzichino 2 дні тому +1

    Can you actually fine tune DeepSeek R1? I see you used Qwen-2.5

  • @snehmehta
    @snehmehta 3 дні тому +1

    Hi Chris, it's pretty cool thanks for sharing.
    can we try to generate the cold start data from deepseek-r1-zero just like the paper and train lora, what do you think of that?

    • @chrishayuk
      @chrishayuk  3 дні тому +2

      Yes, I plan to do a pure version with RL, so will do that when I have that ready (which should be very soon)

    • @snehmehta
      @snehmehta 3 дні тому +1

      @@chrishayuk that would be great! I would like to contribute in researching, writing script or generating data if possible

  • @ApolloGemini11
    @ApolloGemini11 3 дні тому

    Awesome video 👏🏼👏🏼👏🏼

  • @ianhaylock7409
    @ianhaylock7409 3 дні тому +1

    14:52 isn't the answer it gives here incorrect?

  • @blue-y3r
    @blue-y3r 3 дні тому +1

    So what you are saying is that R1 will not perform good on non-logical and non maths like queries, where they cant use a verifier? Like what if I want to use R1 in a healthcare domain?

    • @chrishayuk
      @chrishayuk  3 дні тому +2

      Nope, because verifiers work for that also, which I’m gonna show in an upcoming video

  • @andrewcameron4172
    @andrewcameron4172 3 дні тому +1

    How about a video on creating a jsonl to finetune a model to write computer code

    • @chrishayuk
      @chrishayuk  3 дні тому +1

      Yeah I plan do a new one on that using verifiers

  • @user-qe2ps9vm9o
    @user-qe2ps9vm9o 3 дні тому +2

    Is NVDA going to die?

    • @chrishayuk
      @chrishayuk  3 дні тому +2

      I think a new grand theft auto game is coming out, they’ll be fine

  • @danson3038
    @danson3038 День тому

    a video on local agentic ide please.

  • @danson3038
    @danson3038 День тому

    excellent!

  • @santoshtelwane1776
    @santoshtelwane1776 3 дні тому

    WOW Superb

  • @andrewcameron4172
    @andrewcameron4172 День тому

    Have a look at the Open R1 repo from huggingface as they work with the community to replicate deepseek r1 datasets etc

  • @dalsenov
    @dalsenov 3 дні тому +2

    This resembles "first principle" , -Don't teach me how to reason, I will find it myself!

  • @did28
    @did28 3 дні тому +2

    real open ai

  • @anubisai
    @anubisai 3 дні тому +1

    N.Ireland / N.American accents is wild.

    • @chrishayuk
      @chrishayuk  3 дні тому +3

      Agreed, love those accents. Mine is Scottish though

    • @wwkk4964
      @wwkk4964 3 дні тому

      ​@@chrishayukhaha :)

    • @mrd6869
      @mrd6869 3 дні тому +1

      ​@@chrishayuk.u look like a musician that got into AI😂.Like I can see you on a synthesizer in a music video.

    • @chrishayuk
      @chrishayuk  3 дні тому +1

      hahaha, i'm terrible at music.. but i think there is a lot of synergies. i like using lots of tools and techniques and meshing them together

  • @dmalex321
    @dmalex321 3 дні тому +2

    Wait a minute.. you used a how many billion parameter LLM to solve what a card-sized Casio calculator could solve in the 80s?

    • @agenticmark
      @agenticmark 3 дні тому +2

      one is hardware
      one is ml
      ml can do things hardware cant. generalize.

    • @aiknownc
      @aiknownc 2 дні тому +1

      Obviously this is a toy example. The purpose is to explain how to generate accurate synthetic Chain of Thought data to use during the training process, which is quite valuable. Even better, he walks through it end to end within the context of DeepSeek's COLDSTART methodology.

  • @HiteshKrishanKumar
    @HiteshKrishanKumar 3 дні тому +1

    *_Who do you think will win the AI race: China or the US? Please reply._*

    • @chrishayuk
      @chrishayuk  3 дні тому +2

      I don’t believe there will be a winner… I believe the game is an infinite game, and players will join and drop off. There are no winners….

    • @HiteshKrishanKumar
      @HiteshKrishanKumar 3 дні тому

      @ Don't you think so it will be like space race?

    • @llIllIlI
      @llIllIlI 3 дні тому +2

      ​@@HiteshKrishanKumar To what finish line? AI is already here and people use it every day.

    • @EliSpizzichino
      @EliSpizzichino 2 дні тому

      unfortunately, I think, it's a military race, and we'll never know for sure until it's too late.
      For the general public, open-source model will win, this video shows it already pretty much

    • @aiknownc
      @aiknownc 2 дні тому

      Unlike the space and nuclear arms race where spies were the only way to get the latest technological advances, DS has OPEN SOURCED everything they did to produce this model. Imagine how much faster the space/nuclear arms race would have been in that case! Open Source has been the biggest if not nearly the biggest accelerator for AI advancement in my opinion, especially within the last ~2 years.

  • @LokeKS
    @LokeKS 3 дні тому +1

    how to do this in windows? i guess peft from huggingface. cool.

    • @agenticmark
      @agenticmark 3 дні тому +2

      bitsnbytes releases (bnb) many small models for ollama on windows/linux and yeah peft adapters.
      i am pretty impressed with mac ml, but I cant imagine not being on linux with direct access to my 4090!

    • @chrishayuk
      @chrishayuk  3 дні тому +1

      I’ll do a regular PyTorch video for the next one

    • @LokeKS
      @LokeKS 2 дні тому

      ​@@chrishayuknice

  • @BigAura
    @BigAura 9 годин тому

    I see there are now R1 reasoning datasets on Hugging Face e.g. ServiceNow-AI/R1-Distill-SFT