Attention is all you need (Transformer) - Model explanation (including math), Inference and Training

Поділитися
Вставка
  • Опубліковано 29 січ 2025

КОМЕНТАРІ • 744

  • @umarjamilai
    @umarjamilai  Рік тому +117

    Slides' PDF: github.com/hkproj/transformer-from-scratch-notes

    • @bhaskartripathi
      @bhaskartripathi Рік тому +2

      I am not able to download the pdf file. My friends also tried. Will it be possible to put it on a downloadable link please? your content is too good and needs to be read again and again.

    • @mahek6110
      @mahek6110 Рік тому

      its getting downloaded@@bhaskartripathi

  • @gabrielnsionu8583
    @gabrielnsionu8583 Рік тому +228

    This is arguably the best explaination of the multi-head attention in the internet hands down. Very thorough and most important to folks like me using attention mechanism as my underpinning mechanism in developing my novel neural architecture to be applied to my deep reinforcement learning architecture. Sir, pls never stop making this type of videos.

    • @umarjamilai
      @umarjamilai  Рік тому +12

      You're welcome! 🤓

    • @csikel22
      @csikel22 Рік тому +3

      I couldn't agree more. Best video on transformers I have seen so far. I doesn't get clearer than this. It would be very interesting to give some insight why this whole thing works and what are other variations and alternative architectures.

    • @rkbshiva
      @rkbshiva Рік тому +2

      ​@@umarjamilaibro you're a legend!!!!

    • @pablofe123
      @pablofe123 9 місяців тому +1

      There are still a couple of things that are not explained well in the video. Q, K and V matrixs are the same matrix? and where do the parameters matrix Wq, Wk and Wv comes from?
      Besides that, excellent video.

    • @peregudovoleg
      @peregudovoleg 9 місяців тому +3

      @@pablofe123 21:25 "QKV are the same matreces". As for W matrices, he only says that they are "parameter matrices", and parameters is something we train during training process.

  • @DembaDiop-om3gv
    @DembaDiop-om3gv Рік тому +59

    The best explanation of "Attention is all you need" from my point of view, guys "This explanation is all you need". Thank you very much

  • @ltbd78
    @ltbd78 10 місяців тому +2

    You are incredible. Please continue making these type of tutorials.

  • @utkarshashinde9167
    @utkarshashinde9167 10 місяців тому +6

    I cannot tell you how grateful I am for this explanation provided by you .............. nowhere I find this detailed and easy-to-understand description, a go-to video for every interview preparing students

  • @payamabdi543
    @payamabdi543 3 дні тому +1

    This is by far the best video out there explaining transformers deeply with allllllllll the important details, step by step from the paper. Believe me, you won't find anything better then this.

  • @ChadieRahimian
    @ChadieRahimian 3 місяці тому +8

    Best explanation of the paper on UA-cam. I love your style which is tailored for people who know the basics on an academic level. It’s like sitting in a really good graduate level course at university. You are such a good teacher!

  • @joaomontenegro
    @joaomontenegro Рік тому +2

    This video is amazing! Thanks to you, I finally understood what a Transformer is! Thank you!

    • @umarjamilai
      @umarjamilai  Рік тому +3

      You're welcome! To improve your understanding, I recommend watching my other video on how to code a Transformer model from scratch. Coding a model is the best way to interiorize its inner workings. Thank you again for your support and have a nice day!

  • @hamzaomari7052
    @hamzaomari7052 10 місяців тому +5

    This is the best explanation, it took me 4 hours, to take notes and revise stuff, and going with you word by word, with intuitions, and now I feel that I truly understand the transformer architecture and the mathematical intuition behind every detail.
    A thing that you cannot find in any other video.
    Thank you so much sir, this is very instructif and helpful.

  • @sushantpenshanwar8038
    @sushantpenshanwar8038 Рік тому +43

    You did the best job of describing the complicated details in a fluid manner. Sat, watched and took notes in one sitting. Hands down best one so far.

  • @JulianHarris
    @JulianHarris 8 місяців тому +17

    I'm so glad I found this again. Do NOT rely on UA-cam watch history it doesn't look at all your history. This is definitely the best explanation of transformers and attention and believe me I've watched quite a few! Kudos again Umar.

    • @umarjamilai
      @umarjamilai  8 місяців тому +4

      You should subscribe to the channel to never lose it 😇 thanks for the kind words.

  • @praveensoni1119
    @praveensoni1119 Місяць тому +3

    "This explaination is all I need" to crack my DS interviews. I really liked the smooth buildup of Transformer concepts in this video. Thanks a lot man!!

  • @prancingpony2785
    @prancingpony2785 2 місяці тому +2

    i want to appreciate the fact that you started from basics and explain each and every step in detail , this is so great and so much needed for beginners.

  • @sinaabdi8033
    @sinaabdi8033 6 місяців тому +8

    I have read and watched a lot to understand the Transformer architecture. However, this is the best one of them so far. Nobody went to this level of minute details as you went. Thank you. Please keep it up.

  • @sourish_ml
    @sourish_ml 5 днів тому +1

    I am honestly very much thankful that you made this video. It cleared my whole concept and understanding of the paper "Attention is all you need".
    Thank You Once again !! 🙌

  • @rajatadimeti2398
    @rajatadimeti2398 5 місяців тому +3

    This is one of the best, compact, precise explanation of transformer architecture that I could find on UA-cam. Thanks for all the effort you have put.

  • @NJCLM
    @NJCLM Рік тому +9

    This video is surely among the top 3 among the 50 videos that I watched to understand this subject.
    We are very grateful to you, keep the energy, UA-cam numbers will follow !

    • @marsupilami125
      @marsupilami125 11 місяців тому +2

      Can you tell me the other 2?🙏

  • @_seeker423
    @_seeker423 11 місяців тому +7

    The clearest explanation of a very important breakthrough paper that I have seen on UA-cam. Thank you!

    • @_seeker423
      @_seeker423 11 місяців тому +1

      One thing that I felt was missing is the logical explanation of what is the role of value vector (V).

  • @ajithshenoy5566
    @ajithshenoy5566 Рік тому +12

    Bless you Umar One of the finest tutorials out there. Please don't ever stop. We're willing to support you in every way possible.

  • @mohamedelnaggar8414
    @mohamedelnaggar8414 10 днів тому +1

    An amazing concise explanation without disragarding important details. Great Job!

  • @saravanannatarajan6515
    @saravanannatarajan6515 6 місяців тому +3

    What a gem of a video! I would request people to read the paper and then come back here so that you will understand the value we get from the instructor. Awesome work, keep it up!

  • @mculabs
    @mculabs Рік тому +5

    Probably the best explanation of the paper and the encoder and decoder sub layers. Kudos!!

  • @rachadlakis1
    @rachadlakis1 7 місяців тому +2

    Wow, this is an incredibly detailed explanation of the Transformer Model! Thank you for sharing all the insights and resources. Understanding the layers and processes involved is crucial for anyone working with this model. Keep up the great work!

  • @shivshankarsajeev8665
    @shivshankarsajeev8665 14 днів тому +1

    One of the best explanation of Transformers I have seen.

  • @federicoblaseotto
    @federicoblaseotto 25 днів тому

    Many many thanks! This video was really useful to have a better understanding of transformers architecture.

  • @euro_trucker-r2v
    @euro_trucker-r2v 2 місяці тому +1

    This is the clearest explanation video I've ever seen for Attention and Transformer Architecture. Thank you very much! Please continue making such awesome videos.

  • @barretvermilion6359
    @barretvermilion6359 5 місяців тому +1

    Your video has clarified and tied together the missing pieces from reading papers and watching other videos, and is the best explanation I've seen. My background is in psychology and psychometrics, so learning tranformer architectures for my dissertation has been a slog - but you've saved me a lot of time wasted on confusing explanations. Thank you so much!

  • @snehotoshbanerjee1938
    @snehotoshbanerjee1938 8 місяців тому +5

    Umar, you are a great teacher. I have not seen such a great explanation of transformer. Your transformer from scratch coding is also awesome. So, basically you understand which part needs more explanation. Thanks for your effort.

  • @keithchua1723
    @keithchua1723 10 місяців тому +2

    Spent days trying to understand this and I wished I had come across this video first because now I understand everything fully. Immediately subscribed, keep it up!!

  • @hackie321
    @hackie321 8 місяців тому +14

    The best Transformer explanation on internet till now and I have seen almost all of it. Kudos! You are a true teacher. I dare to compare you with Andrew NG. Please become a professor and not a corporate slave.

    • @laodrofotic7713
      @laodrofotic7713 7 місяців тому

      I think Dr. Umar Jamil is way better than Andrew NG, and I did his courses and think he is great too, but this person is way better.

    • @uditapatel778
      @uditapatel778 4 місяці тому

      Way better than Andrew NG for sure at least for my learning style. Prof Andrew is great too though.

  • @lucasmolter1040
    @lucasmolter1040 3 місяці тому +1

    One cannot say it for sure because there is an infinite amount of explanations on UA-cam... but I can say that this is the best I have seen. Congrats for the great quality and congrats for all the effort that you clearly put into the material.

  • @BowenXie-b7b
    @BowenXie-b7b Рік тому +3

    The best video explaining the Transformer so clearly I have ever seen. Thanks very much for your efforts. I really appreciate your methods of explaining every steps with a concrete examples and explicitly give the shapes of every matrices that involve. The shapes of matrices in each step are the most confusing part for me to understand Transformer models, and you make it so clear for me. Thanks a lot Umar.

    • @umarjamilai
      @umarjamilai  Рік тому +1

      不客气!你们可以在领英交流

  • @nabanitadash7085
    @nabanitadash7085 5 місяців тому +1

    I have been religiously watching your videos and it has helped me understand difficult papers so smoothly. Kudos 👏 you are doing a great job. It feels like you are the next Andrej Karpathy.

  • @Patrick-wn6uj
    @Patrick-wn6uj 10 місяців тому +2

    This is the most important channel I have come across on youtube. keep creating these long form videos you are saving our lives in a huge away

  • @vrvlbl
    @vrvlbl Рік тому +3

    Amazing explanation. I struggled too long to understand the architecture until I landed on your video. Way to go!!

  • @muyassarabdullah1504
    @muyassarabdullah1504 2 місяці тому +1

    This is the best explanation I have found so far on internet. Thanks Umar

  • @bsuhaib
    @bsuhaib Рік тому +2

    This is called decoding a transformer. What I really liked was explaining each chunk. That was really helpful for this topic and surely taught me the approach to decode any problem.
    Jazaakallah ul Khair

  • @keviny2
    @keviny2 8 місяців тому +2

    Thanks Umar for the amazing video. This is the most comprehensive yet understandable walkthrough of the transformer architecture that I came across. Super helpful. I feel like I have a good foundation for tackling more complex LLMs because of it.

  • @KelianSchulz
    @KelianSchulz 27 днів тому +1

    Thank you for the great explanation! keep doing that kind of videos. Only reading this paper is hard but having a visual explanation of every step including an great explanation is very helpful.

  • @franciumruel615
    @franciumruel615 2 дні тому +1

    This is the most amazing video I have seen on transformers. Thank you!

  • @xray788
    @xray788 4 місяці тому +2

    These kinds of videos just makes MIT videos look like rookies. Thank you Umar, may God bless you.

  • @ua1bbf
    @ua1bbf 2 місяці тому +1

    I understood all of this pretty well. I have no experience in any of the math here, but the way you explained all the relational logic made it very easy to follow.

  • @silasnginyo7744
    @silasnginyo7744 Рік тому +1

    So far the best laid out presentation of Transformers I have ever walked through

  • @abhilashbalachandran7160
    @abhilashbalachandran7160 Рік тому +7

    super useful. I really loved how you explain this with linear algebra. Very insightful. actually easier to understand than a lot of lectures at universities

  • @mail2say
    @mail2say 5 місяців тому +1

    Very clear, precise explanation! Went through many articles and videos, but was never clear of concept. Well thought-out presentation. Now eager to go through your other videos. 👍

  • @Udayanverma
    @Udayanverma Рік тому +2

    I would understand much deeper with your explanation. Rest of the world is scarying with diagrams and tables without explaining practical implementation. thank you dear!

  • @jamesmina7258
    @jamesmina7258 6 місяців тому +5

    the best laid out presentation of Transformers, thank you Umar Jamil🥰

  • @parthvadera1
    @parthvadera1 6 місяців тому

    I love the way you’ve explained it using matrices. Had some doubts after watching Andrej’s video, this clears it. Thank you so much!

  • @AIVidya
    @AIVidya Рік тому +2

    One of the best transforrmers videos encountered so far.

  • @andreicristea997
    @andreicristea997 Рік тому +2

    Finally the fancy "black box" called transformer became more understandable for me. Really interested in the other content you are making. Thanks for the explanation.

  • @gabrielpetersson3416
    @gabrielpetersson3416 2 місяці тому +1

    Thanks again umar!

  • @kerrykilian9127
    @kerrykilian9127 8 місяців тому +6

    best explanation of the paper on the whole internet

  • @Jafar801
    @Jafar801 4 місяці тому +3

    You deserve a larger following and more recognition in the ML community.

  • @laodrofotic7713
    @laodrofotic7713 7 місяців тому +2

    I must say it started off a bit bad when you started writing with the red stick, I almost tuned out. Turns out I have to agree this is the best explanation of self attention I have seen on youtube, congratulations, this is really good and properly explained, specially the QKV

  • @NazerkeSafina
    @NazerkeSafina 10 місяців тому +1

    This is brilliant. Thank you Umar for your hard work. Please keep new videos coming. You are helping immensely. May you live long and happy and healthy

  • @ameyadesai6382
    @ameyadesai6382 Рік тому +4

    The best explanation on this paper, can't wait to see the other videos on this topic.

  • @megatroneata9911
    @megatroneata9911 Рік тому +1

    After watching this video and the stable diffusion video, I can say forsure that you are an amazing teacher. Extremely digestible content and easy to follow along.

  • @blacksword06
    @blacksword06 3 місяці тому +1

    the best explanation I have ever seen about transformer architecture. Thanks a lot.

  • @sedthh
    @sedthh Рік тому +10

    Thank you, this was really helpful! One minor correction: the LayerNorm does not normalize to a 0-1 range rather it standardizes to 0 mean with unit variance.

    • @umarjamilai
      @umarjamilai  Рік тому +1

      You're right! Thanks for pointing out.

  • @lethnisoff
    @lethnisoff 9 місяців тому +2

    Finally, after a lot of articles and videos i found a video a could understand. Thank you, sir. I am not strong in math but i think i understood a lot with this explanation

  • @Nereus22
    @Nereus22 Рік тому +3

    This is really a great video, exactly what I was searching for! Everything that you mentionned was explained in details (others are skipping a lot).

  • @ishaanjoshi6959
    @ishaanjoshi6959 Рік тому +1

    The best explanation of attention based mechanism I found online , thank you so much Umar for making this video.

  • @haoming3430
    @haoming3430 10 місяців тому +2

    Your video is very helpful and easy to follow. I have to say this is the best tutorial about transformer I've seen.

  • @vitoroliveiradesouza4214
    @vitoroliveiradesouza4214 8 місяців тому +2

    I'm really glad to have found your video! Congratulations on the clean and yet detailed explanation

  • @yuk-hoiyiu7023
    @yuk-hoiyiu7023 Рік тому +1

    The only video that explains the difference between training and inference in the Transformer model!

  • @KunalTiwariBCI
    @KunalTiwariBCI 8 місяців тому +2

    Bro, legit the best explanation I have ever seen so far.

  • @gauravmalik3911
    @gauravmalik3911 11 місяців тому +1

    Detailed explanation, did great work on explaining difficult topic by dividing in chunks, I don't think any part is missed in explanation. Best Explanation

  • @huseyngorbani6544
    @huseyngorbani6544 Рік тому +44

    This video is hands down the best explanation I've come across so far! The level of detail provided is fantastic, but if there's one aspect I'd love to delve deeper into, it's the normalization part. It would be incredibly helpful if you could expand on that topic a bit more.
    Furthermore, I'm quite curious about the process of weight learning. With so many weights involved, such as those for Q, K, V, and the fully connected layer, as well as the weights in the decoder part, understanding how they are learned would be immensely valuable. If you have any recommended resources or links that explain this aspect, I would greatly appreciate it. Thanks again for the amazing content!

    • @umarjamilai
      @umarjamilai  Рік тому +29

      Hi @huseyngorbani6544
      The process of weights learning is determined exclusively by the back-propagation algorithm. Since it's a fundamental algorithm in machine learning, I will make a video on how it works and how to write an autograd system from scratch, so that anyone, even with little maths background, can understand it. As you know making videos, especially when it's not your source of income, is very difficult. I try to make high quality content and for free, not only for my own personal pleasure in teaching, but especially for helping others struggling to enter this magical world called AI. Have faith and I'll make try to satisfy everyone's requests. Have a wonderful day with your family, friends, pets (and VS code)!

    • @huseyngorbani6544
      @huseyngorbani6544 Рік тому +1

      @@umarjamilai Oh understood. Thank you.

    • @smartwakeAI
      @smartwakeAI Рік тому +4

      @@umarjamilai Thanks for being such a genuine human being. Being extraordinary smart and remaining humble at the same time is a difficult challenge that most highly intelligent people seem to fail. I am fairly new to AI and I loved your video! Thanks for making those videos! They are super helpful!

    • @Ankara_pharao
      @Ankara_pharao Рік тому

      ​@@umarjamilaiyou make it to help us and it works, thank you.

  • @ShivaprakashYaragal
    @ShivaprakashYaragal 2 місяці тому +1

    One of the best explaination. Now i can relate to JD Prince's book. Now my bigger work is to bring this explaination to satellite image data input and classification of the same. That would be a challenging but it lovely to see this coming through.

  • @AbhinavSharma-dc3kv
    @AbhinavSharma-dc3kv 9 місяців тому +2

    the best explanation for attention architecture. kudos to you sir!

  • @sanskargupta7085
    @sanskargupta7085 8 місяців тому +1

    I feel lucky enough to have come across this channel, amazing stuff!

  • @abc-by1kb
    @abc-by1kb Рік тому +15

    Such a great video! Explained all the key concepts so clearly and precisely while giving very nice intuition!

  • @zeeshanmehdi3994
    @zeeshanmehdi3994 11 місяців тому +1

    can't thank you enough, this is the best explanation of transformers i could find after trying for days to understand it. Thank you ❤

  • @shuchenwu170
    @shuchenwu170 10 місяців тому +1

    This tutorial translates complex and terse structures into intuitions. A masterpiece of tutorials!

  • @IsaacKLusuku
    @IsaacKLusuku 6 місяців тому +3

    we love you Umar...never stop delivering

  • @tgyawali
    @tgyawali Рік тому +1

    Thank you, so much for putting together such a detailed video. This helps technical people who do not have a lot of experience in research but have some background in machine learning to understand this very important and historic paper in AI.

  • @MichaelJentsch
    @MichaelJentsch Рік тому +1

    Hi,
    I wanted to express my thanks for your fantastic video. Your clarity and expertise made a complex topic incredibly accessible. Your video has been a meaningful change for me.

    • @umarjamilai
      @umarjamilai  Рік тому

      Thank you for your kind words, Michael! Have a nice day

  • @saravanannatarajan6515
    @saravanannatarajan6515 11 місяців тому +1

    One of the best videos I have seen on this topic. Thanks a lot for making it easy for us. Great effort, hats off!

  • @nithinma8697
    @nithinma8697 4 місяці тому +1

    This is the best Explanation I have ever come across about Transfomers. Thank you For sharing. Expecting more such Quality Contents😊😊😊😊😊

  • @profyao
    @profyao 9 місяців тому +1

    Absolutely the best explanation for multi-head attention so far!

  • @atulsain6170
    @atulsain6170 Рік тому +1

    Wow.. Thank you so much.❤
    I was banging my head in different papers, books, and videos for the last two days.
    Its the best explanation I could find.

    • @umarjamilai
      @umarjamilai  Рік тому +1

      Thanks! You should watch my other video on how to code the Transformer from scratch, that will also give you practical experience.

  • @atrijpaul4009
    @atrijpaul4009 Рік тому

    Best explanation of Attention throughout UA-cam!!!!! Thank you sir for making this video and helping us..

  • @debjyotimukherjee8275
    @debjyotimukherjee8275 10 місяців тому +2

    Excellent video gave a complete description with a great explanation. Looking forward to more such amazing content!

  • @cristinaballesteros93
    @cristinaballesteros93 11 місяців тому

    I have watched a lot of videos about transformers, and this is by far the best one. I finally understand how they work. Thank you so much!

  • @ariffahla482
    @ariffahla482 Рік тому +1

    I seldom comment in a youtube video.. but this is just too good to pass. Thank you Umar for your relatively easy and comprehensible video on such a complex subject. It helps me a lot! You are awesome!

  • @victorariasvanegas7407
    @victorariasvanegas7407 6 місяців тому +1

    The best explanation in all internet, such a wonderful work!

  • @ddstar
    @ddstar Рік тому +1

    Excellent. You answered a lot of questions I had about where the weights come from and how they were updated

  • @70152136
    @70152136 Рік тому +1

    your presentation skill are simply amazing!!! best video on transformers I've seen so far

  • @aloksee
    @aloksee 12 днів тому +1

    Thank you, Umar - I learned self-attention and multi-head attention from this video👍🙏

  • @brunogatti383
    @brunogatti383 9 місяців тому +4

    Best video for attention mechanism hands down

  • @fransvanbuul3098
    @fransvanbuul3098 Рік тому +2

    Thank you so much, Sir. This was fantastic. I tried to work through the paper on my own, but not being an expert in the field, it was too dense for me to get through. I tried to find other resources to explain it to me, but they all seemed to stop short of really understanding it. Reading the paper again after your clear explanation, I finally think I understand most of it.

    • @umarjamilai
      @umarjamilai  Рік тому

      You're welcome! Always makes me happy to know I've helped somebody.

  • @shashankreddyboyapally4069
    @shashankreddyboyapally4069 9 місяців тому +2

    The queries keys and values are not divided by just seperating them but they are made into the size of the head by multiplying with a weight matrix which is learnable parameter, it is also applicable to the self attention algorithm

  • @saima6759
    @saima6759 10 місяців тому +1

    transformer model never got so clear to me! thank you Umar!

  • @dalilabdouraman3557
    @dalilabdouraman3557 Рік тому +2

    Definetely the best explanation of the mutli head attention with the transformer ...just awesome

  • @marsupilami125
    @marsupilami125 11 місяців тому +1

    ¡Gracias!

  • @SagarVibhute
    @SagarVibhute Рік тому +3

    Kudos on the commendable work, and simplified explanation! I appreciate that you are also trying to explain the intuition behind each step and not just math. I'll view and re-view this a few times to understand more with successive passes. Thank you!

  • @jocemarnicolodijunior2851
    @jocemarnicolodijunior2851 4 місяці тому +1

    Honestly, the best video about the article I have seen!

  • @peregudovoleg
    @peregudovoleg 9 місяців тому +1

    38:56 normalizing in this case through layer norm, doesn't squish our values between [0, 1], but rather transforms them to have mean=0 and std=1. I know, a bit confusing, some papers mix normalization/standartization. For [0.1] the formula is different: x = (x - x_min) / (x_max - x_min). Just asked GPT, it said - "it is a DS thing to call standardization a normalization." Great video non the less. I try to rewatch it every now and then just because it is so good and helps to visualize everything.

  • @maxwell77176
    @maxwell77176 7 місяців тому

    Thanks!

  • @tariqkhan1518
    @tariqkhan1518 9 місяців тому +2

    TBH The best Explanation of Attention in whole Internet.