Self Attention in Transformer Neural Networks (with Code!)

Поділитися
Вставка
  • Опубліковано 25 лис 2024

КОМЕНТАРІ • 160

  • @CodeEmporium
    @CodeEmporium  Рік тому +73

    If you think I deserve it, please consider liking the video and subscribing for more content like this :)

    • @tomoki-v6o
      @tomoki-v6o Рік тому

      do have any idea how transformers generates new data ?

    • @15jorada
      @15jorada Рік тому

      You are amazing man! Of course you deserve it! You are building transformers from the ground up! That's insane!

    • @vipinsou3170
      @vipinsou3170 Рік тому

      ​@@tomoki-v6ousing decoder 😮😮😊

  • @marktahu2932
    @marktahu2932 Рік тому +12

    I have learnt so much between yourself, ChatGPT, and Alexander & Ava Amini iat MIT 6.S191. Thank you all.

  • @jeffrey5602
    @jeffrey5602 Рік тому +14

    What's important is that for every token generation step we always feed the whole sequence of previously generated tokens into the decoder, not just the last one. So you start with the token and generate now new token, then feed + into the decoder, so basically just appending the generated token to the sequence of decoder inputs. That might have not been clear in the video. Otherwise great work. Love your channel!

  • @pranayrungta
    @pranayrungta Рік тому +1

    Your videos are way better than Stanford lecture cs224n

    • @CodeEmporium
      @CodeEmporium  Рік тому

      Words I am not worthy of. Thank you :)

  • @MaksymLeshchenko-d7k
    @MaksymLeshchenko-d7k Рік тому +16

    I usually don't write comments, but this channel really deserves one! Thank you so much for such a great tutorial. I watched your first video about Transformers and the Attention mechanism, which was really informative, but this one is even more detailed and useful.

    • @CodeEmporium
      @CodeEmporium  Рік тому +3

      Thanks so much for the compliments! This is the first in a series of videos called “Transformers from scratch “. Hope you’ll check the rest of the playlist out

  • @nikkilin4396
    @nikkilin4396 9 місяців тому +6

    It's one of the best videos I have watched. The concepts are explained very much, specially with codes.

  • @ganesha4281
    @ganesha4281 Місяць тому

    ನಮಸ್ಕಾರ ಅಜಯ್, ನೀವು ಕನ್ನಡಿಗ ಎಂದು ತಿಳಿದು ತುಂಬ ಸಂತೋಷವಾಯಿತು!

  • @tonywang7933
    @tonywang7933 Рік тому +1

    Thank you so much, I searched so many places, this is the first place finally have a nice person willing to spend time really dig in step by step. I'm going to value this channel as good as Fireship now.

    • @CodeEmporium
      @CodeEmporium  Рік тому

      Thanks for the compliments and glad you are sticking around!

  • @softwine91
    @softwine91 Рік тому +28

    What can I say, dude!
    God bless you
    This is the only content on the whole youtube that really explain the self-attention mechanism in a brilliant way.
    Thank you very much.
    I'd like to know if the key, query, and value matrixes are updated via backpropagation during the training phase.

    • @CodeEmporium
      @CodeEmporium  Рік тому +2

      Thanks for the kind words. These matrices I mentioned in the code represent the actual data. So no. However, the 3 weight matrices that map a word vector to Q,K,V are indeed updated via backprop. Hope that lil nuance makes sense

    • @picassoofai4061
      @picassoofai4061 Рік тому

      I definitely agree.

  • @varungowtham3002
    @varungowtham3002 Рік тому

    ನಮಸ್ಕಾರ ಅಜಯ್, ನೀವು ಕನ್ನಡಿಗ ಎಂದು ತಿಳಿದು ತುಂಬ ಸಂತೋಷವಾಯಿತು! ನಿಮ್ಮ ವಿಡಿಯೋಗಳು ತುಂಬ ಚನ್ನಾಗಿ ಮೂಡಿಬರುತ್ತಿವೆ.

    • @CodeEmporium
      @CodeEmporium  Рік тому

      Glad you liked this and thanks for watching! :)

  • @rainmaker5199
    @rainmaker5199 Рік тому +4

    This is great! I've been trying to learn attention but it's hard to get past the abstraction in a lot of the papers that mention it, much clearer this way!

  • @EngineeredFemale
    @EngineeredFemale Рік тому +4

    I was legit searching for self attention concept vids and thinking that it sucked that you didn't cover it yet. And voila here we are. Thankyou so much for uploading!!

    • @CodeEmporium
      @CodeEmporium  Рік тому +1

      Glad I could deliver. Will be uploading more such content shortly :)

  • @rajpulapakura001
    @rajpulapakura001 Рік тому +1

    This is exactly what I needed! Can't believe self-attention is that simple!

  • @simonebonato5881
    @simonebonato5881 Рік тому +1

    One video to understand them all! Dude thanks I've tried to watch like 10 other videos on transformers and attention, yours was really super clear and much more intuitive!

    • @CodeEmporium
      @CodeEmporium  Рік тому

      Thanks so much for this compliment! Means a lot :)

  • @becayebalde3820
    @becayebalde3820 Рік тому +1

    This is pure gold man!
    Transformers are complex but this video really gives me hope.

    • @pratyushrao7979
      @pratyushrao7979 10 місяців тому

      What are the prerequisites for this video? Do we need to know about encoder decoder architecture before hand? The video feels like I jumped right in the middle of something without any context. I'm confused

    • @cv462-l4x
      @cv462-l4x 7 місяців тому

      ​@pratyushrao7979 there are Playlists for different topics

  • @dataflex4440
    @dataflex4440 Рік тому

    This Has been a most wonderful series on this channel so far

    • @CodeEmporium
      @CodeEmporium  Рік тому

      Thanks a ton! Super glad you enjoyed the series :D

  • @muskanmahajan04
    @muskanmahajan04 Рік тому

    The best explaination on the internet, thank you!

    • @CodeEmporium
      @CodeEmporium  Рік тому

      Thanks so much for the comment. Glad you liked it :)

  • @shivamkaushik6637
    @shivamkaushik6637 Рік тому

    With all my heart, you deserve a lot of respect
    Thanks for the content. Damn I missed my metro station because of you.

    • @CodeEmporium
      @CodeEmporium  Рік тому

      Hahahaha your words are too kind! Please check the rest of the Transformers from scratch” playlist for more (it’s fine to miss the metro for education lol)

  • @srijeetful
    @srijeetful 9 місяців тому +1

    Extremely well explained. Kudos !!!!

  • @noahcasarotto-dinning1575
    @noahcasarotto-dinning1575 11 місяців тому

    Best video explaining this that ive seen by far

  • @nexyboye5111
    @nexyboye5111 2 місяці тому

    thanks, this is the only video I found useful on attention.

  • @PraveenHN-zj3ny
    @PraveenHN-zj3ny 7 місяців тому +2

    very happy to see kannada here
    Great 😍Love from kannadigas

  • @pocco8388
    @pocco8388 Рік тому

    Best contents ever I've seen. Thanks for this video.

  • @shailajashukla5841
    @shailajashukla5841 9 місяців тому

    Excellent , how well you explained. NO other video on youtube explained like this , Really done good job.

  • @shaktisd
    @shaktisd 11 місяців тому

    Excellent video . If you can please make a hello world on self attention like first showing pca representation before self attention and after self attention to show how context impacts the overall embedding

  • @prashantlawhatre7007
    @prashantlawhatre7007 Рік тому +2

    waiting for your future videos. This was amazing. especially the masked attention part.

    • @CodeEmporium
      @CodeEmporium  Рік тому +2

      Thanks so much! Will be making more over the coming weeks

  • @ishwaragoudapatil9654
    @ishwaragoudapatil9654 3 місяці тому

    Nice explanation. Thanks a lot Kannadiga :)

  • @Wesker-he9cx
    @Wesker-he9cx 4 місяці тому

    Bro You're The Best, Mad Respect For You, I'm Subscribing

  • @ayoghes2277
    @ayoghes2277 Рік тому

    Thanks a lot for making the video!! This deserves more views.

    • @CodeEmporium
      @CodeEmporium  Рік тому

      Thanks for watching. Hope you enjoy the rest of the playlist as I code the entire transformer out !

  • @JBoy340a
    @JBoy340a Рік тому

    Great walkthrough of the theory and then relating it to the code.

    • @CodeEmporium
      @CodeEmporium  Рік тому

      Thanks so much! Will be making more of these over the coming weeks

  • @AI-xe4fg
    @AI-xe4fg Рік тому

    Good video Bro.
    Studying Transformer this week but still a little confused before I met your video.
    Thanks

    • @CodeEmporium
      @CodeEmporium  Рік тому

      Thanks for the kind words. I really appreciate it :)

  • @giovannibianco5996
    @giovannibianco5996 Місяць тому

    best video found about the topic, great

  • @deepalisharma1327
    @deepalisharma1327 Рік тому

    Thank you for making this concept so easy to understand. Can’t thank you enough 😊

  • @MahirDaiyan7
    @MahirDaiyan7 Рік тому

    Great! This is exactly what I was looking for in all of the other videos of yours

    • @CodeEmporium
      @CodeEmporium  Рік тому

      Thanks for the comment! There is more to come :)

  • @PaulKinlan
    @PaulKinlan Рік тому

    This is brilliant, I've been looking for a bit more hands on demonstration of how the process is structured.

  • @junior14536
    @junior14536 Рік тому

    My god, that was amazing, you have a gift my friend;
    Love from Brazil :D

    • @CodeEmporium
      @CodeEmporium  Рік тому

      Thanks a ton :) Hope you enjoy the channel

  • @Slayer-dan
    @Slayer-dan Рік тому +2

    Huge respect ❤️

  • @chrisogonas
    @chrisogonas Рік тому

    Awesome! Well illustrated. Thanks

  • @lawrencemacquarienousagi789

    Wonderful works you've done! I really love your video and have studied twice. Thank you so much!

    • @CodeEmporium
      @CodeEmporium  Рік тому

      Thanks so much for watching! More to come :)

  • @ChrisCowherd
    @ChrisCowherd Рік тому

    Fantastic explanation! Wow! You have a new subscriber. :) Keep up the great work

  • @sriramayeshwanth9789
    @sriramayeshwanth9789 Рік тому

    you made me cry brother

  • @chessfreak8813
    @chessfreak8813 Рік тому

    Thanks! U r very deserved and underdog!

  • @bradyshaffer3302
    @bradyshaffer3302 Рік тому

    Thank you for this very clear and helpful demonstration!

    • @CodeEmporium
      @CodeEmporium  Рік тому

      You are so welcome! And be on the lookout for more :)

  • @faiazahsan6774
    @faiazahsan6774 Рік тому

    Thank you for explaining in such an easy way. It would be great if you could upload some codes on GCN algorithm.

  • @rajv4509
    @rajv4509 Рік тому

    Absolutely brilliant! Thumba chennagidhay :)

    • @CodeEmporium
      @CodeEmporium  Рік тому

      Thanks a ton! Super glad you like this. I hope you like the rest of this series :)

  • @sockmonkeyadam5414
    @sockmonkeyadam5414 Рік тому

    u have saved me. thank u.

  • @nandiniloomba
    @nandiniloomba Рік тому

    Thank you for teaching this.❤

    • @CodeEmporium
      @CodeEmporium  Рік тому

      My pleasure! Hope you enjoy the series

  • @picassoofai4061
    @picassoofai4061 Рік тому

    Mashallah, man you are a rocket.

  • @ParthivShah
    @ParthivShah 5 місяців тому +1

    Really Appreciate Your Efforts. Love from Gujarat India.

  • @pulkitmehta1795
    @pulkitmehta1795 Рік тому

    Simply wow..

  • @maximilianschlegel3216
    @maximilianschlegel3216 Рік тому

    This is an incredible video, thank you!

    • @CodeEmporium
      @CodeEmporium  Рік тому

      Thanks so much for watching and commenting!

  • @jamesjang8389
    @jamesjang8389 Рік тому

    Amazing video! Thank you😊😊

  • @FelLoss0
    @FelLoss0 Рік тому +1

    Dear Ajay. Thank you so much for your videos!
    I have a quick question here. Why did you transpose the values in the softmax function? Also... why did you specify axis=-1? I'm a newbie at this and I'd like to have strong and clear foundations.
    have a lovely weekend :D

  • @jazonsamillano
    @jazonsamillano Рік тому

    Great video. Thank you very much.

  • @arunganesan1559
    @arunganesan1559 Рік тому +1

    Thanks!

    • @CodeEmporium
      @CodeEmporium  Рік тому

      Thanks for the donation! And you are very welcome!

  • @mamo987
    @mamo987 Рік тому

    Amazing work! Very glad I subscribed

  • @bhavyageethika4560
    @bhavyageethika4560 Рік тому +1

    why is it d_k in both Q and K in the np.random.randn ?

  • @yonahcitron226
    @yonahcitron226 Рік тому

    this is amazing!

  • @imagiro1
    @imagiro1 Рік тому

    Got it, thank you very much, but one question: What I still don't understand: We are talking about neural networks, and they are trained. So all the math you show here, how do we (know|make sure) that it actually happens inside the network? You don't train specific regions of the NN to specific tasks (like calculating a dot product), right?

  • @ritviktyagi9221
    @ritviktyagi9221 Рік тому +1

    How did we get the values of q, k and v vectors after initializing them as randoms. Great video btw. Waiting for more such videos.

    • @CodeEmporium
      @CodeEmporium  Рік тому +2

      The weight matrices that map the original word vectors to these 3 vectors are trainable parameters. So they would be updated by back propagation during training

    • @ritviktyagi9221
      @ritviktyagi9221 Рік тому

      @@CodeEmporium Thanks for clarification

  • @li-pingho1441
    @li-pingho1441 Рік тому

    you save my life!!!!!

  • @yijingcui7736
    @yijingcui7736 11 місяців тому

    this is very helpful

  • @dataflex4440
    @dataflex4440 Рік тому

    Brilliant Mate

  • @naziadana7885
    @naziadana7885 Рік тому

    Thank you very much for this great video! Can you please upload a video on Self Attention code using Graph Convolutional Network (GCN)?!

    • @CodeEmporium
      @CodeEmporium  Рік тому

      I’ll look into this at some point. Thanks for the tips.

  • @7_bairapraveen928
    @7_bairapraveen928 Рік тому

    why we need to stabilise the variance of attention vector with query and key vectors.

  • @klam77
    @klam77 Рік тому +1

    "query" , "key" , and "value" terms come from the world of databases! So how do individual words in "My name is Ajay" each map to their own query and key and value semantically? that remains a bit foggy. i know you've shown random numbers in the example, but is there any semantic meaning to it? is this the "embeddings" of the LLM?

  • @ProdbyKreeper
    @ProdbyKreeper 3 місяці тому

    appreciate!

  • @SIADSrikanthB
    @SIADSrikanthB 7 місяців тому

    I really like how you use Kannada language examples in your explanations.

  • @paull923
    @paull923 Рік тому

    Thx for your efforts!

  • @creativeuser9086
    @creativeuser9086 Рік тому +4

    how do we actually choose the dimensions of Q, K and V? Also, are they parameters that are fixed for each word in the English language, and do we get them from training the model? That part is a little confusing since you just mentioned that Q, V and K are initialized at random, so I assume they have to change in the training of the model.

  • @rujutaawate5412
    @rujutaawate5412 Рік тому

    Thanks, @CodeEmporium / Ajay for the great explanation!
    One quick question- can you please explain how the true values of Q, K, and V are actually computed? I understand that we start with random initialization but do these get updated through something like backpropagation? If you already have a video of this then would be great if you can state the name/redirect!
    Thanks once again for helping me speed up my AI journey! :)

    • @CodeEmporium
      @CodeEmporium  Рік тому

      That's correct back prop will update these weights. For exact details, you can continue watching this playlist "Transformers From Scratch" where we will build a working transformer. This video was the first in that series. Hope you enjoy it :)

  • @virtualphilosophyjourney8897
    @virtualphilosophyjourney8897 11 місяців тому

    which phase does the model take the pretrianed info to decide the output?

  • @Slayer-dan
    @Slayer-dan Рік тому

    Ustad 🙏

  • @SnehaSharma-nl9do
    @SnehaSharma-nl9do 9 місяців тому +2

    Kannada Represent!! 🖐

  • @gabrielnilo6101
    @gabrielnilo6101 Рік тому

    I stop the video sometimes and roll it back some seconds to hear you explaining something again and I am like: "No way that this works, this is insane", some explanations on AI techniques are not enough and yours are truly simple and easy to understand, thank you.
    Do you collab with anyone when making these videos, or is it done all by yourself?

    • @CodeEmporium
      @CodeEmporium  Рік тому +3

      Haha yea. Things aren’t actually super complicated. :) I make these videos on my own. Scripting, coding, research, editing. Fun stuff

  • @prasadhadkar1775
    @prasadhadkar1775 3 місяці тому

    I have a question, since we generated q k and v randomly, how does the output that you are getting in your jupyter notebook have correct matrix values? like how is the value corresponding to my and name in matrix high, without any training?

  • @josephpark2093
    @josephpark2093 Рік тому

    I watched the video around 3 times but I still don't understand.
    Why are these awesome videos so unknown?

  • @commonguy7
    @commonguy7 Місяць тому

    wow

  • @TechTrendSpectrum
    @TechTrendSpectrum 13 днів тому

    Sir, I have my assigment to write report on, " Large Language Model are few shot clinical information extractor"
    and I have to make such LLM,,, and I reach here to your video.
    Sir can you please guide me.?
    Always be thankful!

  • @ayush_stha
    @ayush_stha Рік тому

    In the demonstration, you generated the q, k & v vectors randomly, but in reality, what will the actual source of those values be?

    • @CodeEmporium
      @CodeEmporium  Рік тому

      Each of the q,k,v vectors will be a function of each word (or byte pair encoding) in the sentences. I say a “function” of the sentences since to the word vectors, we add position encoding and then convert into q,k,v vectors via feed forward layers. Some of the later videos in this “Transformers from scratch”playlist show some code on exactly how it’s created. So you can check those out for more intel :)

  • @wishIKnewHowToLove
    @wishIKnewHowToLove Рік тому

    thx

  • @ajaytaneja111
    @ajaytaneja111 Рік тому

    Ajay, I don't think the point of capturing the context in terms of words 'after' has a significance in language modelling. In language modelling you are predicting only the next word. Yes, for a task like machine translation, yes. Thus I don't think Bi-directional RNNs have anything better to offer for language modelling than the regular (one-way) RNNs. . Let me know what you think

  • @govindkatyura7485
    @govindkatyura7485 Рік тому

    I have a few doubts
    1. Do we use multiple ffnn after the attention layer? So suppose we have 100 input words for the encoder then 100 ffnn will get trained ? One for each of the word, i checked the source code but they were using only one, so I'm confused how one FFNN can handle multiple embedding specially with batch size.
    2. In decoder do we pass multiple input also, just like encoder layer specially in training part?

  • @anwarulislam6823
    @anwarulislam6823 Рік тому +1

    How could someone hack my brain wave and convoluted this by evaluate inner voice?
    May I know this procedure?
    #Thanks

    • @EngineeredFemale
      @EngineeredFemale Рік тому +1

      Haha ikr. I felt the same. Was looking for a good Self attention video.

  • @McMurchie
    @McMurchie Рік тому

    Hi I noticed this has been added to the transformer playlist, but there are 2 unavailable tracks - do i need them in order to get the full end to end grasp?

    • @CodeEmporium
      @CodeEmporium  Рік тому

      You can follow the order of “transformers from scratch” playlist. This should be the first video in the series. Hope this helps and thanks for watching ! (It’s still being created so you can follow along :) )

  • @jonfe
    @jonfe Рік тому

    i still dont understand the difference between Q K V, can someone explain?

  • @thechoosen4240
    @thechoosen4240 Рік тому +2

    Good job bro, JESUS IS COMING BACK VERY SOON; WATCH AND PREPARE

  • @philhamilton3946
    @philhamilton3946 Рік тому

    What is the name of the text book you are using?

    • @klam77
      @klam77 Рік тому

      if u watch the vid carefully, the url shows the books are "online" free access bibles of the field.

  • @sometimesdchordstrikes...7876
    @sometimesdchordstrikes...7876 8 місяців тому

    @1:41 here you have said that you want the context of the words that will be coming in the future but in masking part of the video you have said that it will be cheating know the context of the words that will be coming in the future

  • @YT-yt-yt-3
    @YT-yt-yt-3 Рік тому

    I felt the q, k, v parameter is not explained very well.. similar search analogy would be better to get a intuition of these parameter then explaining as what I can offer, what I actual offer

  • @bkuls
    @bkuls Рік тому +1

    Guru aarama? Nanu kooda Kannada ne!

    • @CodeEmporium
      @CodeEmporium  Рік тому

      Doin super well ma guy. Thanks for watching and commenting! :)

  • @thepresistence5935
    @thepresistence5935 Рік тому

    Bro it's 100% better than your ppt vides

    • @CodeEmporium
      @CodeEmporium  Рік тому +1

      Thanks so much! Just exploring different styles :)

  • @kotcraftchannelukraine6118
    @kotcraftchannelukraine6118 Рік тому

    You forgot to show the most important thing, how to train self-attention with backpropagation? You forgot about backward pass

    • @CodeEmporium
      @CodeEmporium  Рік тому

      This is the first video in a series of videos called “Transformers from scratch”. Later videos show how the entire architecture is training. Hope you enjoy the videos

    • @kotcraftchannelukraine6118
      @kotcraftchannelukraine6118 Рік тому

      @@CodeEmporium thank you, i subscribe

  • @ChethanaSomeone
    @ChethanaSomeone Рік тому +2

    Seriously, are u from karnataka ? your accent is so different dude.

  • @azursmile
    @azursmile 8 місяців тому

    Lots of time on the mask, but none on training the attention matrix 🤔

  • @venkatsahith6795
    @venkatsahith6795 Рік тому

    Bro why can't you encounter an example while explaining