Boost Fine-Tuning Performance of LLM: Optimal Architecture w/ PEFT LoRA Adapter-Tuning on Your GPU

Поділитися
Вставка
  • Опубліковано 29 вер 2024

КОМЕНТАРІ • 37

  • @numannebuni
    @numannebuni Рік тому +2

    The colab in the video description is different than the one in the video. Could you please share the colab in the video?

  • @ziad12211
    @ziad12211 Рік тому +1

    Is it possible to give advice to someone who wants to train a model, but his level is beginner in programming? I spent a week trying to understand what is going on, but every time I delve deeper into scientific papers

    • @code4AI
      @code4AI  Рік тому

      I wrote about it in the community tab. great question.

  • @janpashashaik2598
    @janpashashaik2598 Рік тому

    Hi,
    Thanks for the video. This video is very helpful. I was trying to execute the notebook given in the description. I am facing session crash issue in the free tier of google colab while loading the shards. Can you please help me?

  • @lnakarin
    @lnakarin Рік тому +2

    Nice tutorial, pls share notebook.

  • @riser9644
    @riser9644 Рік тому +1

    can i please get the link for the original colab that was usedd in the video

  • @menggao3636
    @menggao3636 5 місяців тому

    Big help. Thank U for sharing

  • @TylerLali
    @TylerLali Рік тому +2

    Can you walk through the differences between the big LLM’s for example the dataset they were trained on? If I want to fine tune a model I’d like to understand more about the base model to ensure matching and I’m not sure the differences based on model card.

    • @code4AI
      @code4AI  Рік тому +1

      Currently I counted more than 140 LLM variations and sizes. The best way I currently use is simply to read the accompanying scientific publication to understand anything about the specific model. There you have between 20 to 147 pages per model.

  • @Smytjf11
    @Smytjf11 Рік тому +2

    Very good callout about not just blindly trusting any model or adapters you find on Huggingface. Much better to do the training yourself, so you have control over the data it was trained on.
    Edit: well, for the stuff you can realistically do yourself. I know *I'm* not planning on training GPT-4 from scratch 😂

  • @dpaoster
    @dpaoster Рік тому +1

    Great video, please share the notebook in the video

  • @debashisghosh3133
    @debashisghosh3133 Рік тому +1

    Beautiful..Beautiful..with lots of learning..u learnt the art

  • @serin32
    @serin32 Рік тому +1

    Thanks for this! I watched Sam's video and was starting to figure out how to use it for flan-ul2 but got very confused with the modules mapping. This video really helped! Only have 23 minutes left of Google Colab Pro A100 access but at least I got it running before I ran out of time! Next I can play with smaller models till I get my gpu time next month. Currently Flan-UL2 is using 35.6 GB of gpu space so just fits!

  • @yashjain6372
    @yashjain6372 Рік тому

    Very informative!!!! does fine tunning with qlora/lora does support this kind of dataset? If not, what changes should i make in my output dataset?
    Review(col1)
    Nice cell phone, big screen, plenty of storage. Stylus pen works well.
    Analysis(col2)
    [{“segment”: “Nice cell phone”,“Aspect”: “Cell phone”,“Aspect Category”: “Overall satisfaction”,“sentiment”: “positive”},{“segment”: “big screen”,“Aspect”: “Screen”,“Aspect Category”: “Design”,“sentiment”: “positive”},{“segment”: “plenty of storage”,“Aspect”: “Storage”,“Aspect Category”: “Features”,“sentiment”: “positive”},{“segment”: “Stylus pen works well”,“Aspect”: “Stylus pen”,“Aspect Category”: “Features”,“sentiment”: “positive”}]

  • @jakeh1829
    @jakeh1829 Рік тому

    Sir, what is hyperparameter "alpha" for LoraConfig? How do we comprehend "The hyperparameter used for scaling the LoRA reparametrization."? Thank you sir.

  • @nickynicky782
    @nickynicky782 Рік тому

    like like like like like like like like like like like like like like like like like

  • @alinajafistudent6906
    @alinajafistudent6906 Рік тому

    You are a legend man. Thanks a lot!

  • @miguel6800
    @miguel6800 Рік тому

    Very good video. Thanks for sharing it with us.

    • @code4AI
      @code4AI  Рік тому

      Glad you enjoyed it

  • @enchanted_swiftie
    @enchanted_swiftie Рік тому

    Hello mate, I lovvveedd your tutorial series ❤
    __
    I have a question, actually I am trying to fine-tune "GPT-J" on my private data . So, I have multiple documents, they all in the raw text. So, as the example goes, we will convert them into the huggingface dataset and then train the mode.
    My doubt is:
    I mean, during the training, how should I structure my prompt?
    Should I just give the raw text as-is?
    or
    I should do some prompt engineering like: Context:{} Question:{} Answer:{} to the model?
    Will you please shed some light on this?
    Thank you very much!

    • @code4AI
      @code4AI  Рік тому +2

      When you fine-tune (!) a LLM like GPT-J on your data, you need to work with a DataCollator, like in my code. Further details see here: huggingface.co/docs/transformers/main_classes/data_collator#data-collator
      I would recommend you use the following DC_for_LM: : huggingface.co/docs/transformers/main_classes/data_collator#transformers.DataCollatorForLanguageModeling
      What you mention in the second half of your comment as a prompt is ICL: in-context-learning.
      1- With ICL you do not change the weights of your transformer model (LLM), with
      2- fine-tuning you change all weights of your transformer model and
      3- with adapter-tuning you insert additional trainable tensors in your transformer.

    • @enchanted_swiftie
      @enchanted_swiftie Рік тому

      ​@@code4AI Thank you for your response, I have followed your video for the difference between "pre-training, fine-tuning and ICL". So, when used ICL for the question answering, often the prompt becomes so big that exceeds the total token limit for the model. And also I get CUDA out-of-memory errors for the bigger prompts.
      For that reason I was thinking to fine tune the model for the QA task for my private documents.
      Now, in your case you are showcasing the Quotes dataset. So that is more of a "completion" task. While I am looking for the Question Answering task (generative question answering) so I think the way we prepare our dataset will be different (because the task is QA) than the quote completion one (from the video).
      So as I was asking, In what format I should prepare my dataset for question answering? Is it okay just to pass the text documents as they are? or some kind of different formatting is required?
      Will you please shade some light on this? I really appreciate your work and the way you explain the process ♥

  • @aayushsmarten
    @aayushsmarten Рік тому

    Hello, this training took around 36 minutes. If we had more steps or more data, it would take more. I have seen some "accelerator" term. Does it increase the training speed? Should we use that? Any guidance on that? Please share, thanks.

    • @code4AI
      @code4AI  Рік тому +1

      In one of my next videos I'll use Huggingface Accelerate in my code ... Good point!

    • @aayushsmarten
      @aayushsmarten Рік тому

      @@code4AI So, it does speed up the process, doesn't it?

    • @code4AI
      @code4AI  Рік тому +1

      as always, it depends. If you optimize it for your distributed hardware infrastructure, and you use the optimal data handling, and you choose the right compiler, and your code is really optimized for the task at hand, then you could be in for a surprise.

    • @aayushsmarten
      @aayushsmarten Рік тому

      @@code4AI If possible would you please cover the "Generative Question Answering" on the private documents? So that covers fine tuning + accelerate! Thanks 💖

  • @computerex
    @computerex Рік тому

    Llama was trained on more tokens than the open source alternatives like gpt-j, that's why it's an appealing model to fine tune.

    • @code4AI
      @code4AI  Рік тому

      The amount of tokens trained is not indicative to the performance of a model. Just to the amount of input.

    • @computerex
      @computerex Рік тому +1

      @@code4AI Older models like gpt j or bloom were undertrained. One of findings of the paper is that model performance keeps improving past 1 T token count. In short, llama was trained for more epochs. And yes that does correlate with better performance.

    • @code4AI
      @code4AI  Рік тому

      So it is the number of epochs for you?

    • @computerex
      @computerex Рік тому

      @@code4AI not sure I understand. :) but basically token count for training and number of epochs a model is trained on are both measures essentially of how much pre training has been done.

    • @code4AI
      @code4AI  Рік тому

      you are absolutely right. It is an indication of how much pre-training has been done, but tells you nothing, if you compare different LLM architectures, about the quality of pre-training. Just more (like epochs or input tokens) does not mean better - if you compare different smart transformer, GPT and RL architectures.

  • @LearningWorldChatGPT
    @LearningWorldChatGPT Рік тому

    Fantastic tutorial!
    Thank you very much for this wonderful explanation .