Llama - EXPLAINED!

Поділитися
Вставка
  • Опубліковано 11 чер 2024
  • ABOUT ME
    ⭕ Subscribe: ua-cam.com/users/CodeEmporiu...
    📚 Medium Blog: / dataemporium
    💻 Github: github.com/ajhalthor
    👔 LinkedIn: / ajay-halthor-477974bb
    RESOURCES
    [1 🔴] Llama 1 dissertation: arxiv.org/pdf/2302.13971.pdf
    [2 🔴] Llama 2 dissertation: scontent-lax3-1.xx.fbcdn.net/...
    [3 🔴] Llama code: github.com/facebookresearch/l...
    [4 🔴] Where I got the decoder only transformer architecture: ai.stackexchange.com/question...
    [5 🔴] Huggingface models on Llama that you can use for inference: huggingface.co/models?search=...
    [6 🔴] Llama vs Alpaca: sapling.ai/llm/alpaca-vs-llama2
    [7 🔴] Llama2 file using Qlora: colab.research.google.com/dri...
    [8 🔴] @1littlecoder 's video describing Qlora and how you can fine tune llama: • 🐐Llama 2 Fine-Tune wit...
    [9 🔴] Autotrain repo for 1 line training: github.com/huggingface/autotr...
    [10 🔴] Video describing on how to use Autotrain (@abhishekkrthakur ) : • The EASIEST way to fin...
    PLAYLISTS FROM MY CHANNEL
    ⭕ Transformers from scratch playlist: • Self Attention in Tran...
    ⭕ ChatGPT Playlist of all other videos: • ChatGPT
    ⭕ Transformer Neural Networks: • Natural Language Proce...
    ⭕ Convolutional Neural Networks: • Convolution Neural Net...
    ⭕ The Math You Should Know : • The Math You Should Know
    ⭕ Probability Theory for Machine Learning: • Probability Theory for...
    ⭕ Coding Machine Learning: • Code Machine Learning
    MATH COURSES (7 day free trial)
    📕 Mathematics for Machine Learning: imp.i384100.net/MathML
    📕 Calculus: imp.i384100.net/Calculus
    📕 Statistics for Data Science: imp.i384100.net/AdvancedStati...
    📕 Bayesian Statistics: imp.i384100.net/BayesianStati...
    📕 Linear Algebra: imp.i384100.net/LinearAlgebra
    📕 Probability: imp.i384100.net/Probability
    OTHER RELATED COURSES (7 day free trial)
    📕 ⭐ Deep Learning Specialization: imp.i384100.net/Deep-Learning
    📕 Python for Everybody: imp.i384100.net/python
    📕 MLOps Course: imp.i384100.net/MLOps
    📕 Natural Language Processing (NLP): imp.i384100.net/NLP
    📕 Machine Learning in Production: imp.i384100.net/MLProduction
    📕 Data Science Specialization: imp.i384100.net/DataScience
    📕 Tensorflow: imp.i384100.net/Tensorflow
    #chatgpt #deeplearning #machinelearning #bert #gpt

КОМЕНТАРІ • 63

  • @CodeEmporium
    @CodeEmporium  10 місяців тому +39

    Would you like to see more videos on Llama? Let me know. Have a wonderful day :)

    • @paisanareeprasertkul1950
      @paisanareeprasertkul1950 10 місяців тому +5

      Yes, definitely. One of the best explanations of the topic!

    • @manusrivastava2047
      @manusrivastava2047 10 місяців тому +1

      great video, love the well structured and informative nature they have. Would love to see how to use word embeddings from Llama2 or other language model for transfer learning. Thanks and keep up the good work!

    • @ozne_2358
      @ozne_2358 10 місяців тому +4

      Yes, please. More details on the code, how the parameters are initialized from the parameter file and used in the various stages.

    • @scitechtalktv9742
      @scitechtalktv9742 8 місяців тому +1

      I am struggling to have llama 2 working with Dutch language reliably, so you can pose questions in Dutch and have llama 2 give the answer in Dutch. (This will be due to the fact that llama 2 is trained on data that contains very little Dutch language).
      I have had some succes using special prompts to do that, but sometimes it switches back to English unexpectedly.
      What technique(s) can I use to solve this?
      My use case is: I have Dutch texts that I want to be able to pose questions to in Dutch by means of Retrieval Augmented Generation (RAG) (using a llama 2 LLM) and get answers in correct Dutch?

    • @user-yi8vs7lb7d
      @user-yi8vs7lb7d 8 місяців тому

      I'm waiting the video!

  • @pipinstallyp
    @pipinstallyp 10 місяців тому +3

    Hey, thanks a lot for your videos. Your video - transformer attention is all you need helped me build an intuition back before transformers were really cool. It's lovely to see your video on llama, as I actively get to finetune llama on day to day basis :) Much love.

    • @CodeEmporium
      @CodeEmporium  10 місяців тому

      Super happy to hear! Thanks so much for watching :)

  • @aurkom
    @aurkom 9 місяців тому +7

    Would love a deep dive into stuff like LoRA and quantization (bitsandbytes library) as well. Perhaps, doing it from scratch in pytorch!

    • @CodeEmporium
      @CodeEmporium  9 місяців тому +3

      Perfect. I have coded out the transformer from scratch using PyTorch. Maybe I’ll think of a similar series for llama :)

  • @naevan1
    @naevan1 6 місяців тому

    amazing work man. one of my favourite deep learning creators!

  • @jeswer9
    @jeswer9 6 місяців тому +1

    Yes please more deep dive into the code! Super valuable video because of that part.

  • @share4713
    @share4713 9 місяців тому

    The more i watch videos , the more i understand a subject, this is propably because i Can now see the subject in different angles or perspectives, now i have a better intuition of transformer architectures and i Can code it from scratch, thank you.

  • @abhijitnayak1639
    @abhijitnayak1639 9 місяців тому +2

    Thank you for such an insightful video. Would definitely love a deep-dive video on the architecture and code of LLama 2. Could you please also do an implementation of BERT or RoBERTa fine-tuning (the training process optimized via deepspeed) .
    Thanks again!!

  • @prasadraavi390
    @prasadraavi390 6 місяців тому

    Beautifully Explained. Thank you. Yes, I want to know more about its architecture too.

  • @steel-r_ua
    @steel-r_ua 3 місяці тому

    Thanks for the great video and a GREAT way of presenting data and showing the code!

  • @prasadraavi390
    @prasadraavi390 6 місяців тому

    Beautifully explained. Thank you.

  • @dollarscholar2956
    @dollarscholar2956 10 місяців тому +1

    Clear, informative, well presented. Great video!

    • @CodeEmporium
      @CodeEmporium  10 місяців тому

      Thanks so much for commenting:)

  • @spydeyftw
    @spydeyftw 9 місяців тому

    Good explanation with proper understanding !

  • @dinoscheidt
    @dinoscheidt 10 місяців тому

    Commenting for the algorithm. Very well explained. You have a talent !

    • @CodeEmporium
      @CodeEmporium  10 місяців тому

      Much appreciated ! Thank you!

  • @andresg297
    @andresg297 6 місяців тому

    Excellent explanation. Thank you

  • @YashVerma-ii8lx
    @YashVerma-ii8lx 5 місяців тому

    Thank you so much for explaining brother!
    Would be really great if you could give a code walkthrough video as well!

  • @dan1ar
    @dan1ar 9 місяців тому +2

    Great video! Looking forward to deep dive into llama code

    • @CodeEmporium
      @CodeEmporium  9 місяців тому +1

      Sure thing. I have slated it on my TODOs :) Thank you for watching

  • @jiaxingyu8300
    @jiaxingyu8300 9 місяців тому

    Nice explanation!

  • @naevan1
    @naevan1 6 місяців тому

    would you be intersted in making a guide of finetuning llamma2 or you thin kit is oversaturated?

  • @gopalakrishna9651
    @gopalakrishna9651 4 місяці тому

    yes. please. deep dive arch. and code walkthrough if possible.
    Thanks a lot for the video. May gods blessing be with you.

  • @alexandertakele7528
    @alexandertakele7528 4 місяці тому

    Thank you so much

  • @popamaji
    @popamaji 9 місяців тому +1

    I have not implemented the code for decoder only so I have 3questions:
    1. so it uses the triangular mask? I have heard from 2 sources which it does, but I dont get it, as we only feed inputs and not the outputs(unlike original transformer),how triangular mask on input data makes sense?
    2. does why its called `decoder only`? the architecture seems much closer to encoder part of original transformer model, than its decoder part!! specially when the mask also not different than encoder of original.
    3. is it autoregressive or still can be autoencoder to output the outputs in one pass?

  • @dikshyakasaju7541
    @dikshyakasaju7541 10 місяців тому +3

    Very informative!! Would be sick if you could dive deeper.

    • @CodeEmporium
      @CodeEmporium  10 місяців тому

      Yes! Thanks for watching! Will think about if as a future video / series

  • @adarshsaurabh7871
    @adarshsaurabh7871 9 місяців тому

    Can you please help me. I have multiple doubts.
    As all of these models are LLM and these generated next words based on the previous words, can I find tune them on any type of data, for example I like to make a model which can make poems, shayeri for me so can I train these for this task.
    Also as llama doesn't have an encoder. Isn't it a disadvantage.
    Also can you please make a video on encoder and decoder and their specific details. Please 🤓🤓

  • @popamaji
    @popamaji 9 місяців тому

    please make a video about how the generative feature and how the reinforcement learning is used in language models?

  • @younessamih3188
    @younessamih3188 10 місяців тому

    Very helpful ! that will be great ...

    • @CodeEmporium
      @CodeEmporium  10 місяців тому +1

      Thanks so much! I’ll think of a deep dive as a future video / series

  • @ajaytaneja111
    @ajaytaneja111 10 місяців тому

    Hi Ajay, I have been reading Llama 2 research paper. They talk a lot of safety during pre-training as you might have seen. Do you think they score over GPT in this aspect?

    • @CodeEmporium
      @CodeEmporium  10 місяців тому

      Yea. That 77 page dissertation in llama 2 definitely makes the claim that it is safer. They have sections and infographics dedicated to showing this as well. That said, I would need to check how much of this safety is incorporated in the pre training as well. I didn’t think there would be much in this phase. But I haven’t read the entire dissertation, so I may be wrong.

  • @ajaytaneja111
    @ajaytaneja111 9 місяців тому

    Hi Ajay, would love to hear your insights on PEFT - the theoretical aspects of course. I have seen a lot of videos on PEFT and some reading too. The theoretical aspects are not well explained.

    • @CodeEmporium
      @CodeEmporium  9 місяців тому +1

      Ajay! Yea for sure. I am interested to learn more about this too. I’ll read more and make some content on this soon :)

  • @popamaji
    @popamaji 9 місяців тому

    is this decoder with simplified form?!?!!?!? or its encoder with decoder mask?

  • @ruksharalam173
    @ruksharalam173 9 місяців тому

    It'd be great if you could please dig deeper into llama code and architecture.

  • @StrangeMemes52
    @StrangeMemes52 10 місяців тому +1

    wow , amazing video 😁 , so how language modle after training fine-tuned , i mean how works this fine_tune ?

    • @CodeEmporium
      @CodeEmporium  10 місяців тому

      Fine tuning is done depending on the specific task you want. In llama chat and ChatGPTs case, we want the fine tuning on question answering. So we feed the model a bunch of questions + answer pairs and the model parameters are “fine tuned”. Hope this helps.

  • @tunkskabulungana46
    @tunkskabulungana46 2 місяці тому

    You said llama is an 8 language model, which prg.langs are they?😮

  • @NicholasRenotte
    @NicholasRenotte 9 місяців тому

    1.8k and closing in my boi!!!!

    • @CodeEmporium
      @CodeEmporium  9 місяців тому

      Ma guy. I will join the ranks of the 6 digit sub counts

  • @ajaytaneja111
    @ajaytaneja111 10 місяців тому

    Hi Ajay, I suppose they do grouped query Attention and not multi head attention

    • @CodeEmporium
      @CodeEmporium  10 місяців тому

      I’ll need to check the fine grained details out. Thanks for the heads up. If so, I’ll address this in that future video

    • @ajaytaneja111
      @ajaytaneja111 10 місяців тому

      Thanks for the response, Ajay. As always, great video.

  • @DaTruAndi
    @DaTruAndi 10 місяців тому +1

    I think you didn’t describe RLHF fully. What you described was more SFT, you seemingly skipped mentioning the reward model explicitly. Maybe implicitly you meant it, but it could help to clarify this part of reinforcement learning

    • @rogermenezes
      @rogermenezes 10 місяців тому

      He has a very good series called "chatGPT explained" where he goes into detailed explanation of RHLF: ua-cam.com/video/_MPJ3CyDokU/v-deo.html

    • @CodeEmporium
      @CodeEmporium  10 місяців тому +2

      Yea that’s true. I mentioned this as “humans determining what is a better answer” when I probably should have said “humans determine the better answer to train the rewards model (s) and this in turn is used with the original fine tuned model to further fine tune it. And this happens via some proximal policy optimization” ~ or maybe something along these lines. Thanks for pointing it out. I’ll clarify this on some follow up videos in the near future too

    • @abzs5811
      @abzs5811 3 місяці тому

      @@CodeEmporiumlost me fam

  • @jackhale8497
    @jackhale8497 8 місяців тому

    😢 "Promo sm"

  • @azai.online
    @azai.online 8 місяців тому

    I do like Llama 2 and found it easy to use. I am using it in my own multi application platform and its great.