LoRA & QLoRA Fine-tuning Explained In-Depth

Поділитися
Вставка
  • Опубліковано 23 гру 2024

КОМЕНТАРІ • 78

  • @DanielTompkinsGuitar
    @DanielTompkinsGuitar 11 місяців тому +31

    Thanks! This is among the clearest and most concise explanations of LoRA and QLoRA. Really great job.

  • @titusfx
    @titusfx 10 місяців тому +7

    🎯 Key Takeaways for quick navigation:
    00:00 🤖 *Introduction to Low Rank Adaptation (LoRA) and QLoRA*
    - LoRA is a parameter-efficient fine-tuning method for large language models.
    - Explains the need for efficient fine-tuning in the training process of large language models.
    02:29 🛡️ *Challenges of Full Parameter Fine-Tuning*
    - Full parameter fine-tuning updates all model weights, requiring massive memory.
    - Limits fine-tuning to very large GPUs or GPU clusters due to memory constraints.
    04:19 💼 *How LoRA Solves the Memory Problem*
    - LoRA tracks changes to model weights instead of directly updating all parameters.
    - It uses rank-one matrices to efficiently calculate weight changes.
    06:11 🎯 *Choosing the Right Rank for LoRA*
    - Rank determines the precision of the final output table in LoRA fine-tuning.
    - For most tasks, rank can be set lower without sacrificing performance.
    08:12 🔍 *Introduction to Quantized LoRA (QLoRA)*
    - QLoRA is a quantized version of LoRA that reduces model size without losing precision.
    - It exploits the normal distribution of parameters to achieve compression and recovery.
    10:46 📈 *Hyperparameters in LoRA and QLoRA*
    - Discusses hyperparameters like rank, alpha, and dropout in LoRA and QLoRA.
    - The importance of training all layers and the relationship between alpha and rank.
    13:30 🧩 *Fine-Tuning with LoRA and QLoRA in Practice*
    - Emphasizes the need to experiment with hyperparameters based on your specific data.
    - Highlights the ease of using LoRA with integrations like Replicate and Gradient.

  • @Vinayakan-s4y
    @Vinayakan-s4y Рік тому +5

    I have been using thiese techniques for a while now without having a good understanding of each of the prameters. Thanks for giving a good overview of both the techniques and the papers

  • @mandrakexTV
    @mandrakexTV 3 місяці тому +2

    This is the best detailed video and nicest explanation on youtube right now. I do think your channel will grow because you are doing an EXCELENT job. Thank you man.

  • @andrepemmelaar8728
    @andrepemmelaar8728 4 місяці тому +2

    Very useful! Marvelous clear explanation with the right amount of detail about a subject that’s worth understanding

  • @gayathrisaranath666
    @gayathrisaranath666 Місяць тому

    Thanks for this clear explanation about the topic!
    Your way of relating back to research papers is very interesting and helpful!

  • @SanjaySingh-gj2kq
    @SanjaySingh-gj2kq Рік тому +2

    Good explanation of LoRA and QLoRA

  • @drstrangeluv1680
    @drstrangeluv1680 9 місяців тому

    I loved the explanation! Please make more such videos!

  • @thelitbit
    @thelitbit 6 місяців тому

    great video! referring to the paper and explaining each thing in detail really helps understand the concept to the fullest. Kudos!

  • @steve_wk
    @steve_wk Рік тому +4

    I've watched a couple other of your videos - you're a very good teacher - thanks for doing this.

  • @naevan1
    @naevan1 7 місяців тому

    I love this video man. watched it at least 3 times and came back to it before a job interview also. Please do more tutorials /explanations !

  • @user-wr4yl7tx3w
    @user-wr4yl7tx3w 9 місяців тому +1

    This is really well presented

  • @VerdonTrigance
    @VerdonTrigance 11 місяців тому +1

    It was incredible and very helpful video. Thank you man!

  • @SantoshGupta-jn1wn
    @SantoshGupta-jn1wn 11 місяців тому

    great video, i think the best explanation i've seen on this, i'm also really confused about why they picked the rank and alpha that they did.

  • @stutters3772
    @stutters3772 7 місяців тому

    This video deserves more likes

  • @varun_skywalker
    @varun_skywalker 11 місяців тому +1

    This is really helpful, Thank you!!

  • @anujlahoty8022
    @anujlahoty8022 8 місяців тому

    Loved the contnt! Simply explained no BS.

  • @brianbarnes746
    @brianbarnes746 5 місяців тому

    Great explanation, best that I've seen

  • @YLprime
    @YLprime 9 місяців тому +19

    Dude u look like the lich king with those blue eyes

    • @practicemail3227
      @practicemail3227 8 місяців тому

      True. 😅 He should be in acting career ig.

    • @EntryPointAI
      @EntryPointAI  7 місяців тому +5

      You mean Lich King looks like me I think 🤪

  • @AbdoGhazala-y5p
    @AbdoGhazala-y5p 3 місяці тому

    can you share the presentation document

  • @CatarinaReis-g3y
    @CatarinaReis-g3y 5 місяців тому

    Thisa saved me. Thank you. Keep doing this :)

  • @omarsherif88
    @omarsherif88 17 днів тому

    Very useful, thank you!

  • @user-wp8yx
    @user-wp8yx 5 місяців тому +1

    I'm pulling for another vid on alpha. Oobabooga suggests twice your rank. The Chinese alpaca lora people use a rank 8 with alpha 32 and I guess it worked. I've tried high alphas that make the model kinda crazy. Need guidence.

    • @EntryPointAI
      @EntryPointAI  5 місяців тому +1

      When in doubt, set alpha = rank for the effective scale factor to be 1. There are better ways to have a larger impact on training than bluntly multiplying the change in weights, like improving your dataset or dialing in the learning rate.

    • @user-wp8yx
      @user-wp8yx 5 місяців тому

      @@EntryPointAI this does make sense they way you put it. Thanks so much for your reply!

  • @nafassaadat8326
    @nafassaadat8326 7 місяців тому

    can we use QLoRA in a simple ML model like CNN for image classification ?

  • @louisrose7823
    @louisrose7823 8 місяців тому

    Great video!

  • @Sonic2kDBS
    @Sonic2kDBS 7 місяців тому

    Some nice details here. Keep on.

  • @NathanielMaymon
    @NathanielMaymon 3 місяці тому

    What's the name of the paper you referenced in the video?

    • @EntryPointAI
      @EntryPointAI  3 місяці тому

      Here's LoRA: arxiv.org/abs/2106.09685
      and QLoRA: arxiv.org/abs/2305.14314

  • @nachiketkathoke8281
    @nachiketkathoke8281 6 місяців тому

    really grate explanation

  • @kunalnikam9112
    @kunalnikam9112 8 місяців тому

    In LoRA, Wupdated = Wo + BA, where B and A are decomposed matrices with low ranks, so i wanted to ask you that what does the parameters of B and A represent like are they both the parameters of pre trained model, or both are the parameters of target dataset, or else one (B) represents pre-trained model parameters and the other (A) represents target dataset parameters, please answer as soon as possible

    • @EntryPointAI
      @EntryPointAI  7 місяців тому +1

      Wo would be the original model parameters. A and B multiplied together represent the changes to the original parameters learned from your fine-tuning. So together they represent the difference between your final fine-tuned model parameters and the original model parameters. Individually A and B don't represent anything, they are just intermediate stores of data that save memory.

    • @kunalnikam9112
      @kunalnikam9112 7 місяців тому

      @@EntryPointAI got it!! Thank you

  • @ArunkumarMTamil
    @ArunkumarMTamil 7 місяців тому

    how is Lora fine-tuning track changes from creating two decomposition matrix?

    • @EntryPointAI
      @EntryPointAI  7 місяців тому +1

      The matrices are multiplied together and the result is the changes to the LLM's weights. It should be explained clearly in the video, it may help to rewatch.

    • @ArunkumarMTamil
      @ArunkumarMTamil 7 місяців тому

      @EntryPointAI
      My understanding:
      Orignal weight = 10 * 10
      to form a two decomposed matrices A and B
      let's take the rank as 1 so, The A is 10 * 1
      and B is 1 * 10
      total trainable parameters is A + B = 20
      In Lora even without any dataset training if we simply add the A and B matrices with original matric we can improve the accuracy slighty
      And if we use custom dataset in Lora the custom dataset matrices will captured by A and B matrices
      Am I right @EntryPointAI?

    • @EntryPointAI
      @EntryPointAI  7 місяців тому +1

      @@ArunkumarMTamil Trainable parameters math looks right. But these decomposed matrices will be initialized as all zeroes so adding them without any custom training dataset will have no effect.

  • @Gayatritravelandfitnessvlogs
    @Gayatritravelandfitnessvlogs 3 місяці тому

    Thanks a ton!

  • @aashwinsharma8194
    @aashwinsharma8194 5 місяців тому

    Great explanation...

  • @egonkirchof
    @egonkirchof 6 місяців тому

    Why do we call training a model pre-training it ?

    • @EntryPointAI
      @EntryPointAI  6 місяців тому

      Not sure if that's a rhetorical question, but I'll give it a go. You can call it just "training," but that might imply that it's ready to do something useful when you're done. If you call it "pre-training" it implies that you'll train it more afterward, which is generally true. So it may be useful in being a little more specific.

  • @markironmonger223
    @markironmonger223 Рік тому +1

    This was wonderfully educational and very easy to follow. That either it makes you a great educator or me an idiot :P Regardless, thank you.

    • @EntryPointAI
      @EntryPointAI  Рік тому +2

      let's both say it's the former and call it good! 🤣

  • @TheBojda
    @TheBojda 8 місяців тому

    Nice video, congrats! LoRA is about fine-tuning, but is it possible to use it to compress the original matrices to speed up inference? I mean decompose the original model's original weight matrices to products of low-rank matrices to reduce the number of weights.

    • @rishiktiwari
      @rishiktiwari 8 місяців тому +1

      I think you mean distillation with quantisation?

    • @EntryPointAI
      @EntryPointAI  8 місяців тому +1

      Seems worth looking into, but I couldn't give you a definitive answer on what the pros/cons would be. Intuitively I would expect it could reduce the memory footprint but that it wouldn't be any faster.

    • @TheBojda
      @TheBojda 8 місяців тому +1

      @@rishiktiwari Ty. I learned something new. :) If I understand well, this is a form of distillation.

    • @rishiktiwari
      @rishiktiwari 8 місяців тому

      ​@@TheBojdaCheers mate! Yes, in distillation there is student-teacher configuration and the student tries to be like teacher with less parameters (aka. weights). This can also be combined with quantisation to reduce memory footprint.

  • @UfcFan-d6s
    @UfcFan-d6s 4 місяці тому

    Amazing for struggling students. Love from Korea😂

  • @archchana7756
    @archchana7756 4 місяці тому

    very well explained, thanks :)

  • @RafaelPierre-vo2rq
    @RafaelPierre-vo2rq 9 місяців тому

    Awesome explanation! Which camera you use?

    • @EntryPointAI
      @EntryPointAI  9 місяців тому

      Thanks, it’s a Canon 6d Mk II

  • @tgzhu3258
    @tgzhu3258 3 місяці тому

    so good!!

  • @SergieArizandieta
    @SergieArizandieta 8 місяців тому

    wow I'm noobie in this field n I been testing fine-tunen my own chatbot with differents techniques, n I found a lot of stuff, but It's not commonly find a some explanation to understand the main reason of the use of it, ty a lot < 3

  • @princekhunt1
    @princekhunt1 2 місяці тому

    Nice

  • @chrisanderson1513
    @chrisanderson1513 6 місяців тому

    Saving me somr embarrassment in future work meetings. :) thanks for sharing.

  • @Ian-fo9vh
    @Ian-fo9vh Рік тому +2

    Bright eyes

  • @Larimuss
    @Larimuss 5 місяців тому

    QLORA let's me train on a 4070ti with only 12gb vram. Though I can't go over 7b model

  • @DrJaneLuciferian
    @DrJaneLuciferian 11 місяців тому

    I wish people would actually share links to papers they reference...

    • @EntryPointAI
      @EntryPointAI  11 місяців тому +2

      LoRA: arxiv.org/abs/2106.09685
      QLoRA: arxiv.org/abs/2305.14314
      Click "Download PDF" in top right to view the actual papers.

    • @DrJaneLuciferian
      @DrJaneLuciferian 11 місяців тому

      @@EntryPointAI Thank you, that's kind. I did already go look it up. Sorry I was frustrated. It's very common for people to forget to putlikes to papers in show note :^)

  • @vediodiary1754
    @vediodiary1754 9 місяців тому

    Oh my god your eyes 😍😍😍😍everybody deserves hot teacher😂❤

  • @nabereon
    @nabereon 10 місяців тому

    Are you trying to hypnotize us with those eyes 😜

  • @rohitvishwakarma2871
    @rohitvishwakarma2871 5 місяців тому

    Gojo ?

  • @619vijay
    @619vijay 5 місяців тому

    Eyes!

  • @TR-707
    @TR-707 11 місяців тому

    Ahh very interesting thank you!
    *goes to fine tune pictures of anime girls*

  • @ecotts
    @ecotts 8 місяців тому

    LoRa (Long Range) is a physical proprietary radio communication technique that uses a spread spectrum modulation technique derived from chirp spread spectrum. It's a low powered wireless platform that has become the de facto wireless platform of Internet of Things (IoT). Get your own acronym! 😂

    • @EntryPointAI
      @EntryPointAI  8 місяців тому

      Fair - didn’t create it, just explaining it 😂

  • @kritarthlohomi3305
    @kritarthlohomi3305 3 місяці тому

    bradley cooper in limitless tf

  • @coco-ge4xg
    @coco-ge4xg 7 місяців тому

    omg I always distracted by his blue eyes😆and ignoring what his talking

  • @Ben_dover5736
    @Ben_dover5736 6 місяців тому

    your have beautiful eyes

  • @partymarty1856
    @partymarty1856 4 місяці тому

    blud why your eyes like that