Visual Guide to Transformer Neural Networks - (Episode 2) Multi-Head & Self-Attention

Поділитися
Вставка

КОМЕНТАРІ • 606

  • @HeduAI
    @HeduAI  3 роки тому +32

    *CORRECTIONS*
    A big shoutout to the following awesome viewers for these 2 corrections:
    1. @Henry Wang and @Holger Urbanek - At (10:28), "dk" is actually the hidden dimension of the Key matrix and not the sequence length. In the original paper (Attention is all you need), it is taken to be 512.
    2. @JU PING NG
    - The result of concatenation at (14:58) is supposed to be 7 x 9 instead of 21 x 3 (that is to so that the concatenation of z matrices happens horizontally and not vertically). With this we can apply a nn.Linear(9, 5) to get the final 7 x 5 shape.
    Here are the timestamps associated with the concepts covered in this video:
    0:00 - Recaps of Part 0 and 1
    0:56 - Difference between Simple and Self-Attention
    3:11 - Multi-Head Attention Layer - Query, Key and Value matrices
    11:44 - Intuition for Multi-Head Attention Layer with Examples

    • @amortalbeing
      @amortalbeing 2 роки тому +2

      Where's the first video?

    • @HeduAI
      @HeduAI  Рік тому +4

      ​@@amortalbeing Episode 0 can be found here - ua-cam.com/video/48gBPL7aHJY/v-deo.html

    • @amortalbeing
      @amortalbeing Рік тому

      @@HeduAI thanks a lot really appreciate it:)

    • @omkiranmalepati1645
      @omkiranmalepati1645 Рік тому

      Awesome...So dk value is 3?

    • @jasonwheeler2986
      @jasonwheeler2986 Рік тому +1

      @@omkiranmalepati1645 d_k = embedding dimensions // number of heads

  • @thegigasurgeon
    @thegigasurgeon Рік тому +149

    Need to say this out loud, I saw Yannic Kilcher's video, read tonnes of materials on internet, went through atleast 7 playlists, and this is the first time I really understood the inner mechanism of Q, K and V vectors in transformers. You did a great job here

    • @HeduAI
      @HeduAI  Рік тому +8

      This made my day :,)

    • @afsalmuhammed4239
      @afsalmuhammed4239 9 місяців тому +1

      True

    • @exciton007
      @exciton007 8 місяців тому +1

      Very intuitive explanation!

    • @EducationPersonal
      @EducationPersonal 6 місяців тому +1

      Totally agree with this comment

    • @VitorMach
      @VitorMach 5 місяців тому +1

      Yes, no other video actually explains what the actual input for these are

  • @nitroknocker14
    @nitroknocker14 2 роки тому +199

    All 3 parts have been the best presentation I've ever seen of Transformers. Your step-by-step visualizations have filled in so many gaps left by other videos and blog posts. Thank you very much for creating this series.

    • @HeduAI
      @HeduAI  2 роки тому +9

      This comment made my day :,) Thanks!

    • @bryanbaek75
      @bryanbaek75 2 роки тому

      Me, too!

    • @lessw2020
      @lessw2020 2 роки тому +1

      Definitely agree. These videos really crystallize a lot of knowledge, thanks for making this series!

    • @Charmente2014
      @Charmente2014 2 роки тому

      ش

    • @devstuff2576
      @devstuff2576 Рік тому

      ​@@HeduAI absolutely awesome . You are the best.

  • @ML-ok9nf
    @ML-ok9nf 6 місяців тому +6

    Absolutely underrated, hands down one of the best explanations I've found on the internet

  • @nurjafri
    @nurjafri 3 роки тому +71

    Damn. This is exactly what a developer coming from other backgrounds need.
    Simple analogies for a rapid understanding.
    Thanks a ton.
    Keep uploadinggggggggggg plss

    • @Xeneon341
      @Xeneon341 3 роки тому +1

      Agreed, very well done. You do a very good job of explaining difficult concepts to a non-industry developer (fyi I'm an accountant) without assuming a lot of prior knowledge. I look forward to your next video on masked decoders!!!

    • @HeduAI
      @HeduAI  3 роки тому +4

      @@Xeneon341 Oh nice! Glad you enjoyed these videos! :)

  • @HuyLe-nn5ft
    @HuyLe-nn5ft 8 місяців тому +5

    The important detail that set you apart from the other videos and websites is that not only did you provide the model's architecture with numerous formulas but you also demonstrated them in vectors and matrixes, successfully walked us through each complicated and trivial concept. You really did a good job!

  • @rohanvaidya3238
    @rohanvaidya3238 3 роки тому +10

    Best explanation ever on Transformers !!!

  • @adscript4713
    @adscript4713 19 днів тому

    As someone NOT in the field reading the Attention paper, after having watched DOZENS of videos on the topic this is the FIRST explanation that laid it out in an intuitive manner without leaving anything out. I don't know your background, but you are definitely a great teacher. Thank you.

    • @HeduAI
      @HeduAI  18 днів тому

      So glad to hear this :)

  • @chaitanyachhibba255
    @chaitanyachhibba255 3 роки тому +10

    Were you the one who wrote transformers in the fist place, because no one explained it like you did. This is undoubtfully the best info I have seen. I hope you please keep posting more videos. Thanks a lot.

    • @HeduAI
      @HeduAI  3 роки тому +1

      This comment made my day! :) Thank you.

  • @andybrice2711
    @andybrice2711 10 днів тому

    This really is an excellent explanation. I had some sense that self-attention layers acted like a table of relationships between tokens, but only now do I have more sense of how the Query, Key, and Value mechanism actually works.

  • @EducationPersonal
    @EducationPersonal 6 місяців тому +1

    This is one of the best Transformer videos on UA-cam. I hope UA-cam always recommends this Value (V), aka video, as a first Key (K), aka Video Title, when someone uses the Query (Q) as "Transformer"!! 😄

    • @HeduAI
      @HeduAI  6 місяців тому

      😄

  • @malekkamoua5968
    @malekkamoua5968 2 роки тому +11

    I've been stuck for so long trying to get the Transformer Neural Networks and this is by far the best explanation ! The examples are so fun making it easier to comprehend. Thank you so much for you effort !

    • @HeduAI
      @HeduAI  8 місяців тому

      Cheers!

  • @forresthu6204
    @forresthu6204 2 роки тому +3

    Self-attention is a villain that has struck me for a long time. Your presentation has helped me to better understand this genius idea.

  • @ja100o
    @ja100o Рік тому +1

    I'm currently reading a book about transformers and was scratching my head over the reason for the multi-headed attention architecture.
    Thank you so much for the clearest explanation yet that finally gave me this satisfying 💡-moment

  • @wireghost897
    @wireghost897 9 місяців тому

    Finally a video on transformers that actually makes sense. Not a single lecture video from any of the reputed universities managed to cover the topic with such brilliant clarity.

  • @rohtashbeniwal9202
    @rohtashbeniwal9202 11 місяців тому +4

    this channel needs more love (the way she explains is out of the box). I can say this because I have 4 years of experience in data science, she did a lot of hard work to get so much clarity in concepts (love from India)

    • @HeduAI
      @HeduAI  11 місяців тому +1

      Thank you Rohtash! You made my day! :) धन्यवाद

  • @hubertkanyamahanga2782
    @hubertkanyamahanga2782 8 місяців тому

    I am just speechless, this is unbelievable! Bravo!

  • @danielarul2382
    @danielarul2382 Рік тому

    One of the best explanations on Attention in my opinion.

  • @MGMG-li6lt
    @MGMG-li6lt 3 роки тому +19

    Finally! You delivered me from long nights of searching for good explanations about transformers! It was awesome! I can't wait to see the part 3 and beyond!

    • @HeduAI
      @HeduAI  3 роки тому +1

      Thanks for this great feedback!

    • @HeduAI
      @HeduAI  3 роки тому +2

      “Part 3 - Decoder’s Masked Attention” is out. Thanks for the wait. Enjoy! Cheers! :D
      ua-cam.com/video/gJ9kaJsE78k/v-deo.html

  • @devchoudhary8892
    @devchoudhary8892 Рік тому +1

    best, best best explanation on transformer, you are adding so much value to the world.

  • @raunakdey3004
    @raunakdey3004 11 місяців тому

    Really love coming back to your videos and get a recap on multi layered attention and the transformers! Sometimes I need to make my own specialized attention layers for the dataset in question and sometimes i dunno it just helps to just listen to you talk about transformers and attention ! Really intuitive and helps me to break out of some weird loop of algorithm design I might have gotten myself stuck at. So thank you so so much :D

  • @sebastiangarciaacosta5468
    @sebastiangarciaacosta5468 3 роки тому +15

    The best explanation I've ever seen of such a powerful architecture. I'm glad of having found this Joy after searching for positional encoding details while implementing a Transformer from scratch today. Valar Morghulis!

    • @HeduAI
      @HeduAI  3 роки тому +2

      Valar Dohaeris my friend ;)

  • @Scaryder92
    @Scaryder92 Рік тому

    Amazing video, showing how the attention matrix is created and what values it assumes is really awesome. Thanks!

  • @oliverhu1025
    @oliverhu1025 11 місяців тому

    Probably the best explanation of transformers I’ve found online. Read the paper, watched Yannic’s video, some paper reading videos and a few others, the intuition is still missing. This connects the dots, keep up the great work!

  • @SuilujChannel
    @SuilujChannel Рік тому

    thanks for these great videos! The visualizations and extra explanations on details are perfect!

  • @rayxi5334
    @rayxi5334 Рік тому +1

    Better than the best Berkeley professor! Amazing!

  • @sujithkumar5415
    @sujithkumar5415 11 місяців тому

    This is quite literally the best attention mechanism video out there guys

  • @oludhe7
    @oludhe7 24 дні тому

    Literally the best series on transformers. Even clearer than statquest and luis serrano who also make things very clear

  • @fernandonoronha5035
    @fernandonoronha5035 2 роки тому

    I don't have words to describe how much these videos saved me, thank you!

  • @alankarmisra
    @alankarmisra 6 місяців тому

    3 days, 16 different videos, and your video "just made sense". You just earned a subscriber and a life-long well-wisher.

  • @aaryannakhat1842
    @aaryannakhat1842 2 роки тому

    Spectacular explanation! This channel is sooo underrated!

  • @giridharnr6742
    @giridharnr6742 Рік тому

    Its one of the best explainations of Transformers. Just mind blowing.

  • @binhle9475
    @binhle9475 Рік тому +1

    Your attention to details and information structuring are just exceptional. The Avatar and GoT references on top were hilarious and make things perfect. You literally made a story out of complex deep learning concept(s). This is just brillant.
    You have such a beautiful mind (if you get the reference :D). Please consider making more videos like this, such a gift is truly precious. May the force be always with you. 🤘

  • @sowmendas812
    @sowmendas812 Рік тому

    This is literally the best explanation for self-attention I have seen anywhere! Really loved the videos!

  • @skramturbo8499
    @skramturbo8499 Рік тому

    I really like the fact that you ask questions within the video. In fact those are the same questions one has and first reading about transformers. Keep up the awesome work!

  • @alirezamogharabi8733
    @alirezamogharabi8733 2 роки тому

    Great explanation and visualization, thanks a lot. Please keep making such helpful videos.

  • @onthelightway
    @onthelightway 2 роки тому

    Incredibly well explained! Thanks a lot

  • @clintcario6749
    @clintcario6749 Рік тому

    These videos are really incredible. Thank you!

  • @Srednicki123
    @Srednicki123 Рік тому

    I just repeat what everybody else said: these videos are the best! thank you for the effort

  • @geetanshkalra8340
    @geetanshkalra8340 Рік тому

    This is by far the best video to understand Attention Networks. Awesome work !!

  • @persianform
    @persianform Рік тому

    The best explanation of attention models on the earth!

  • @srikanthkarapanahalli
    @srikanthkarapanahalli Рік тому

    Awesome analogy and explanation !

  • @cihankatar7310
    @cihankatar7310 Рік тому

    This is the best explanation of transformers architecture with a lot of basic analogy ! Thanks a lot!

  • @MikeAirforce111
    @MikeAirforce111 Місяць тому

    My goodness, you have talent as a teacher!! :-) This builds a very good intuition about what is going on. Very impressed. Subscribed!

  • @Mihre-ol3bk
    @Mihre-ol3bk 8 місяців тому +1

    This is how the self-attention should be explained.

  • @Ariel-px7hz
    @Ariel-px7hz Рік тому

    Such a fantastic and detailed yet digestible explanation. As others have said in the comments, other explanations leave so many gaps. Thank you for this gem!

  • @Abhi-qf7np
    @Abhi-qf7np 2 роки тому +1

    You are the best😄😄, This is THE Best explanation I have ever seen on UA-cam for Transformer Model, Thank you so much for this video.

  • @abdot604
    @abdot604 Рік тому

    brilliant explanation , your chanel deserve way more ATTENTION.

  • @kazeemkz
    @kazeemkz 4 місяці тому

    Spot on analysis. Many thanks for the clear explanation.

  • @krishnakumarprathipati7186
    @krishnakumarprathipati7186 3 роки тому

    The MOST MOST MOST MOST ..........................useful and THE BEST video ever on Multi head attention........Thanks a lot for your work

    • @HeduAI
      @HeduAI  3 роки тому

      So glad you liked it! :)

  • @user-ne2nr2yi1h
    @user-ne2nr2yi1h 4 місяці тому

    The best video I've ever seen for explaining transformer.

  • @mariosconstantinou8271
    @mariosconstantinou8271 Рік тому

    These videos are amazing, thank you so much! Best explanation so far!!

  • @MCMelonslice
    @MCMelonslice 10 місяців тому

    This is the best resource for an intuitive understanding of transformers. I will without a doubt point everyone towards your video series. Thank you so much!

  • @minruihu
    @minruihu Рік тому

    it is impressive, you explain so complicated topics in a vivid and easy way!!!

  • @ghostvillage1
    @ghostvillage1 Рік тому

    Hands down the best series I've found on the web about transformers. Thank you

  • @bendarodes61
    @bendarodes61 Рік тому

    I've watched many video series about transformers, this is by far the best.

  • @markpadley890
    @markpadley890 3 роки тому

    Outstanding explanation and well delivered, both verbally and with the graphics. I look forward to the next in this series

    • @HeduAI
      @HeduAI  3 роки тому

      “Part 3 - Decoder’s Masked Attention” is out. Thanks for the wait. Enjoy! Cheers! :D
      ua-cam.com/video/gJ9kaJsE78k/v-deo.html

  • @chenlim2165
    @chenlim2165 11 місяців тому

    Bravo! After watching dozens of other explainer videos, I can finally grasp the reason for multi-headed attention. Excellent video. Please make more!

  • @rasyidanakbarf2482
    @rasyidanakbarf2482 9 місяців тому

    i love this vid so much, now i understand whole multi head self attention thing very clearly thanks!

  • @robertco7
    @robertco7 11 місяців тому

    This is very clear and well-thought out, thanks!

  • @RafidAslam
    @RafidAslam Місяць тому

    Thank you so much! This is by far the clearest explanation that I've ever seen on this topic

  • @Alex-ul3xw
    @Alex-ul3xw 2 роки тому

    Amazing explanation, thank you so much!

  • @wolfie6175
    @wolfie6175 Рік тому

    This is an absolute gem of a video.

  • @jinyunghong
    @jinyunghong 2 роки тому

    Great explanation! Thank you so much!

  • @pedroviniciuspereirajunho7244
    @pedroviniciuspereirajunho7244 10 місяців тому

    To visualize the matrices helped me to understand better transformers.
    Again, thank you very much!

  • @AdityaRajPVSS
    @AdityaRajPVSS 2 роки тому

    Awesome and hats off to your conceptual knowledge level understanding

  • @shivam6565
    @shivam6565 10 місяців тому

    Finally I understood the concept of query, key and value. Thank you.

  • @adithyakaravadi8170
    @adithyakaravadi8170 Рік тому +1

    You are so good, thank you for breaking down a seemingly scary topic for all of us.The original paper requires lot of background to understand clearly, and not all have it. I personally felt lost. Such videos help a lot!

  • @adityaghosh8601
    @adityaghosh8601 2 роки тому

    Blown away by your explanation . You are a great teacher.

  • @maryamkhademi
    @maryamkhademi 2 роки тому

    Thank you for putting so much effort in the visualization and awesome narration of these series. These are by far the best videos to explain transformers. You should do more of these videos. You certainly have a gift!

    • @HeduAI
      @HeduAI  Рік тому

      Thank you for watching! Yep! Back on it :) Would love to hear which topic/model/algorithm are you most wanting to see on this channel. Will try to cover it in the upcoming videos.

  • @cw9249
    @cw9249 Рік тому

    you are amazing. ive watched other videos and read materials but nothing compares to your videos

  • @yassine20909
    @yassine20909 Рік тому

    This is a great work, thank you.
    keep uploading. 👏

  • @shubheshswain5480
    @shubheshswain5480 3 роки тому +1

    I went through many videos from Coursera, youtube, and some online blogs but none explained so clear about the Query, key, and values. You made my day.

    • @HeduAI
      @HeduAI  3 роки тому

      Glad to hear this Shubhesh :)

  • @Andrew6James
    @Andrew6James 3 роки тому

    Wow. Amazing explanation! You have a gift for explaining quite complex material succinctly.

    • @HeduAI
      @HeduAI  Рік тому

      Thanks Andrew! Cheers! :D

  • @hewas321
    @hewas321 Рік тому

    No way. This video is insane!! The most accurate and excellent explanation of self-attention mechanism. Subscribed to your channel!

  • @gowthamkrishna6283
    @gowthamkrishna6283 2 роки тому

    wow!! The best transformers series ever. Thanks a ton for making these

  • @artukikemty
    @artukikemty 11 місяців тому

    Thanks for posting, by far this is the most didactic Transformer presentation I've ever seen. AMAZING!

  • @rahulkumarchaudhary2474
    @rahulkumarchaudhary2474 6 місяців тому

    Hats off to you for this incredible tutorial! 🎩🚀

  • @jojo01925
    @jojo01925 2 роки тому

    Thank you for the video. Best explanation i've seen.

  • @franzanders7762
    @franzanders7762 2 роки тому

    I can't believe how good this is.

  • @adrianamichelavilagarcia5404
    @adrianamichelavilagarcia5404 11 місяців тому

    Thank you! this is so well explained

  • @hesona9759
    @hesona9759 Рік тому

    The best video I've ever watched, thank you so much

  • @jirasakburanathawornsom1911
    @jirasakburanathawornsom1911 2 роки тому

    Hand down the best transformer explanation. Thank you very much!

  • @bhavyaghai1924
    @bhavyaghai1924 11 місяців тому

    Educational + Entertaining. Nice examples and figures. Loved it!

  • @JDechnics
    @JDechnics Рік тому

    Holy shit was this a good explanation! Other blogs literally copy what the paper states (which is kinda confusing), but you explained it in such a intuitive and fun way! Thats what I called talent!!

  • @frankietank8019
    @frankietank8019 8 місяців тому +1

    Hands down the best video on transformers I have seen! Thank you for taking your time to make this video.

  • @jackskellingtron
    @jackskellingtron 2 роки тому

    This is the most intuitive explanation of transformers that I've seen. Thank you hedu! I'm in awe. Liked & subbed.

    • @HeduAI
      @HeduAI  Рік тому

      So glad to know this! :)

  • @mrmuffyman
    @mrmuffyman Рік тому +1

    You are awesome!! I watched Yannic Kilcher's video first and was still confused by the paper, probably because there's so much detail skipped over in the paper and Kilcher's video. However, your video goes much slower and in depth so the explanations were simple to understand, and the whole picture makes sense now. Thank you!

  • @McBobX
    @McBobX 2 роки тому

    That is what I'm looking for, for 3 days now! Thanks a lot!

  • @takudzwamakusha5941
    @takudzwamakusha5941 2 роки тому

    Thank you, amazing explanation

  • @davidlazaro3143
    @davidlazaro3143 9 місяців тому

    This video is GOLD, it should be everywere! Thank you so much for doing such an amazing job 😍😍

  • @VADemon
    @VADemon Рік тому

    Excellent examples and explanation. Don't shy away from using more examples of things that you love, this love shows and will translate to better work overall. Cheers!

  • @adarshkone9384
    @adarshkone9384 10 місяців тому

    have been trying to understand this topic for a long time , glad I found this video now

  • @melihekinci7758
    @melihekinci7758 Рік тому

    This is the best explanation I've ever seen!

  • @aritamrayul4307
    @aritamrayul4307 2 місяці тому

    Ohh why I get to know this channel now . This channel is criminally underrated!!

  • @haowenjohnwei7547
    @haowenjohnwei7547 8 місяців тому

    The best video I ever had! Thank you very much!

  • @madhu1987ful
    @madhu1987ful Рік тому

    Wow. Just wow !! This video needs to be in the top most position when searched for content on transformers and their explanation

    • @HeduAI
      @HeduAI  Рік тому +1

      So glad to see this feedback! :)

  • @jasonpeloquin9950
    @jasonpeloquin9950 10 місяців тому

    Hands down the best explanation of the use of Query, Key and Value matrices. Great video with an easy example to understand.

  • @jonathanlarkin1112
    @jonathanlarkin1112 3 роки тому +6

    Excellent series. Looking forward to Part 3!

    • @HeduAI
      @HeduAI  3 роки тому +1

      “Part 3 - Decoder’s Masked Attention” is out. Thanks for the wait. Enjoy! Cheers! :D
      ua-cam.com/video/gJ9kaJsE78k/v-deo.html

  • @kennethm.4998
    @kennethm.4998 2 роки тому

    You have a gift for explanations... Best I've seen anywhere online. Superb.