Coding Llama 3 from scratch in PyTorch - Part 1

Поділитися
Вставка
  • Опубліковано 5 тра 2024
  • In this video series, you will learn how to train and fine-tune Llama 3 model from scratch.
    The goal is to code LLaMA 3 from scratch in PyTorch to create models with sizes 3B, 6B, 35B and 45BM params. In this first video, you'll learn about upcycling, downcycling and infini-attention.
    📚Papers:
    - Sparse Upcycling Training Mixture-of-Experts from Dense Checkpoints
    : arxiv.org/abs/2212.05055
    - Pre-training Small Base LMs with Fewer Tokens: arxiv.org/abs/2404.08634
    Leave No Context Behind Efficient Infinite Context Transformers with Infini-attention: arxiv.org/abs/2404.07143
    💻 To follow along you can use this colab notebook:
    - github.com/Blaizzy/Coding-LLM...
    🎥 Coding Llama 2 from scratch video series
    Part 1: ua-cam.com/users/liveXHmag4damTg
    Part 2: ua-cam.com/users/liveLSWDpFmbE90
    Part 3: • Coding Llama 2 from sc...
  • Наука та технологія

КОМЕНТАРІ • 12

  • @AC-go1tp
    @AC-go1tp 16 днів тому +3

    This is very thoughtful and great initiative! researchers with enough gray matter but limited means can be still in the game . Thank you PC🙏!

    • @princecanuma
      @princecanuma  15 днів тому

      Most welcome!
      It’s my pleasure:)
      I lived through this so others don’t have to.

  • @ngamcode2485
    @ngamcode2485 5 днів тому

    this is very impressive and great content. thank you

  • @kishoretvk
    @kishoretvk 15 днів тому

    Super impressive. Great value
    One question
    How do I further train the model on my custom content
    Instead of LORA ?
    Can we further full training it and add new memory

    • @princecanuma
      @princecanuma  9 днів тому

      Most welcome!
      You can do that, but that can be very expensive.

  • @vivekpadman5248
    @vivekpadman5248 3 дні тому

    Bro how did you train llama 3 without paper?

    • @princecanuma
      @princecanuma  12 годин тому

      Could you elaborate?

    • @vivekpadman5248
      @vivekpadman5248 Годину тому

      @@princecanuma As far as I know there hasn't been an official llama 3 paper released and no data Info as well. But I could be wrong... 😅

    • @princecanuma
      @princecanuma  Годину тому +1

      @@vivekpadman5248 true, they only released a blog detailing the data, model arch and performance.
      Here is how I did it: Llama-3 has the same exact architecture of Llama-2 which we already covered in this channel.
      ua-cam.com/play/PLDn_JsyofyfQp4td_ub6LfIg5vxyu6YJK.html&si=0Gyt9mdaA-ydiWOA
      Finally, if you understand how these models work you don't need the paper, the code implementation is more than enough.

    • @vivekpadman5248
      @vivekpadman5248 Годину тому +1

      @@princecanuma oh understood, thanks I'll check it out and also your video 💙

    • @princecanuma
      @princecanuma  36 хвилин тому

      Most welcome :)