Attention for Neural Networks, Clearly Explained!!!

Поділитися
Вставка
  • Опубліковано 2 січ 2025

КОМЕНТАРІ • 442

  • @statquest
    @statquest  Рік тому +10

    To learn more about Lightning: lightning.ai/
    Support StatQuest by buying my book The StatQuest Illustrated Guide to Machine Learning or a Study Guide or Merch!!! statquest.org/statquest-store/

  • @koofumkim4571
    @koofumkim4571 Рік тому +76

    “Statquest is all you need” - I really needed this video for my NLP course but glad it’s out now. I got an A+ for the course, your precious videos helped a lot!

  • @Cld136
    @Cld136 Рік тому +8

    Thanks for the wholesome contents! Looking for Statquest video on the Transformer.

    • @statquest
      @statquest  Рік тому +1

      Wow!!! Thank you so much for supporting StatQuest!!! I'm hoping the StatQuest on Transformers will be out by the end of the month.

    • @Cld136
      @Cld136 Рік тому +1

  • @atharva1509
    @atharva1509 Рік тому +144

    Somehow Josh always figures out what video are we going to need!

    • @yashgb
      @yashgb Рік тому +1

      Exactly, I was gonna say the same 😃

    • @statquest
      @statquest  Рік тому +16

      BAM! :)

    • @yesmanic
      @yesmanic Рік тому +2

      Same here 😂

  • @MelUgaddan
    @MelUgaddan Рік тому +12

    The level of explainability from this video is top-notch. I always watch your video first to grasp the concept then do the implementation on my own. Thank you so much for this work !

  • @sameepshah3835
    @sameepshah3835 6 місяців тому +2

    The amount of effort for some of these animations, especially in these videos on Attention and Transformers in insane. Thank you!

    • @statquest
      @statquest  6 місяців тому

      Glad you like them!

  • @XDog1908
    @XDog1908 Рік тому +6

    This channel is pure gold. I'm a machine learning and deep learning student.

  • @Travel-Invest-Repeat
    @Travel-Invest-Repeat Рік тому +8

    Great work, Josh! Listening to my deep learning lectures and reading papers become way easier after watching your videoes, because you explain the big picture and the context so well!! Eagerly waiting for the transformers video!

  • @sinamon6296
    @sinamon6296 Рік тому +3

    Hi mr josh, just wanna say that there is literally no one that makes it so easy for me to understand such complicated concepts. Thank you ! once I get a job I will make sure to give you guru dakshina! (meaning, an offering from students to their teachers)

    • @statquest
      @statquest  Рік тому

      Thank you very much! I'm glad my videos are helpful! :)

  • @clockent
    @clockent Рік тому +23

    This is awesome mate, can't wait for the next installment! Your tutorials are indispensable!

  • @rutvikjere6392
    @rutvikjere6392 Рік тому +9

    I was literally trying to understand attention a couple of days ago and Mr.BAM posts a video about it. Thanks 😊

  • @dylancam812
    @dylancam812 Рік тому +22

    Dang this came out just 2 days after my neural networks final. I’m still so happy to see this video in feed. You do such great work Josh! Please keep it up for all the computer scientists and statisticians that love your videos and eagerly await each new post

    • @statquest
      @statquest  Рік тому +2

      Thank you very much! :)

    • @Neiltxu
      @Neiltxu Рік тому +1

      @@statquest it came out 3 days before my Deep Learning and NNs final. BAM!!!

    • @statquest
      @statquest  Рік тому

      @@Neiltxu Awesome! I hope it helped!

    • @Neiltxu
      @Neiltxu Рік тому

      @@statquest for sure! Your videos always help! btw, do you ship to spain? I like the hoodies of your shop

    • @statquest
      @statquest  Рік тому

      @@Neiltxu I believe the hoodies ship to Spain. Thank you for supporting StatQuest! :)

  • @aquater1120
    @aquater1120 Рік тому +4

    I was just reading the original attention paper and then BAM! You uploaded the video. Thank you for creating the best content on AI on UA-cam!

  • @brunocotrim2415
    @brunocotrim2415 7 місяців тому +1

    Hello Statquest, I would like to say Thank You for the amazing job, this content helped me understand a lot how Attention works, specially because visual things help me understand better, and the way you join the visual explanation with the verbal one while keeping it interesting is on another level, Amazing work

  • @aayush1204
    @aayush1204 Рік тому +3

    1 million subscribers INCOMING!!!
    Also huge thanks to Josh for providing such insightful videos. These videos really make everything easy to understand, I was trying to understand Attention and BAM!! found this gem.

    • @statquest
      @statquest  Рік тому +1

      Thank you very much!!! BAM! :)

  • @OsamaAlatraqchi
    @OsamaAlatraqchi 6 місяців тому +1

    This is the best explanation ever, not only in this video, but the entire course...... Thanks a lot...

    • @statquest
      @statquest  6 місяців тому

      Glad you are enjoying the whole course.

  • @benmelis4117
    @benmelis4117 8 місяців тому +1

    I just wanna let you know that this series is absolutely amazing. So far, as you can see, I've made it to the 89th video, guess that's something. Now it's getting serious tho. Again, love what you're doing here man!!! Thanks!!

    • @statquest
      @statquest  8 місяців тому +1

      Thank you so much!

    • @benmelis4117
      @benmelis4117 8 місяців тому +1

      @@statquest Personally, since I'm a medical student, I really can't explain how valuable it is to me that you used so many medical examples in the video's. The moment you said in one of the first video's that you are a geneticist I was sold to this series, it's one of my favorite subjects at uni, crazy interesting!

    • @statquest
      @statquest  8 місяців тому +1

      @@benmelis4117 BAM! :)

  • @lunamita
    @lunamita 10 місяців тому +1

    Can’t thank enough for this guy helped me get my master degree in AI back in 2022, now I’m working as a data scientist and still kept going back to your videos.

  • @ArpitAnand-yd7tr
    @ArpitAnand-yd7tr Рік тому +2

    The best explanation of Attention that I have come across so far ...
    Thanks a bunch❤

  • @usser-505
    @usser-505 Рік тому +2

    The end is a classic cliffhanger for the series. You talk about how we don't need the LSTMs and I wait for an entire summer for transformers. Good job! :)

    • @statquest
      @statquest  Рік тому

      Ha! The good news is that you don't have to wait! You can binge! Here's the link to the transformers video: ua-cam.com/video/zxQyTK8quyY/v-deo.html

    • @usser-505
      @usser-505 Рік тому +1

      @@statquestYeah! I already watched when you released it. I commented on how this deep learning playlist is becoming a series! :)

    • @statquest
      @statquest  Рік тому

      @@usser-505 bam!

  • @ncjanardhan
    @ncjanardhan 8 місяців тому +2

    The BEST explanation of Attention models!! Kudos & Thanks 😊

    • @statquest
      @statquest  8 місяців тому

      Thank you very much!

  • @saschahomeier3973
    @saschahomeier3973 Рік тому +2

    You have a talent for explaining these things in a straightforward way. Love your videos. You have no video about Transformers yet, right?

    • @statquest
      @statquest  Рік тому +1

      The transformers video is currently available to channel members and patreon supporters.

  • @jacobverrey4075
    @jacobverrey4075 Рік тому +1

    Josh - I've read the original papers and countless online explanations, and this stuff never makes sense to me. You are the one and only reason as to why I understand machine learning. I wouldn't be able to make any progress on my PhD if it wasn't for your videos.

    • @statquest
      @statquest  Рік тому

      Thanks! I'm glad my videos are helpful! :)

  • @weiyingwang2533
    @weiyingwang2533 Рік тому +1

    You are amazing! The best explanation I've ever found on UA-cam.

  • @rafaeljuniorize
    @rafaeljuniorize 9 місяців тому +1

    this was the most beautiful explanation that i ever had in my entire life, thank you!

  • @mehmeterenbulut6076
    @mehmeterenbulut6076 Рік тому +2

    I was stunned when you start the video with a catch jingle man, cheers :D

  • @Aaron-n2d6i
    @Aaron-n2d6i 3 дні тому +2

    I was looking for how attention works in bio neuron networks when I bumped into this video and after two songs, I finally realize it is about a computer, but anyway, it is fascinating🎉

    • @Aaron-n2d6i
      @Aaron-n2d6i 3 дні тому +1

      It’s really hilarious of me to neglected the attention meaning in computer but if this comments make you smile, I think it’s worthwhile

    • @statquest
      @statquest  2 дні тому

      bam!

  • @MartinGonzalez-wn4nr
    @MartinGonzalez-wn4nr Рік тому +4

    Hi Josh, I just bought your books, Its amazing the way that you explain complex things, read the papers after wach your videos is easier.
    NOTE: waiting for the video of transformes

    • @statquest
      @statquest  Рік тому +2

      Glad you like them! I hope the video on Transformers is out soon.

  • @KevinKansas1
    @KevinKansas1 Рік тому +6

    The way you explain complex subjects in a easy-to-understand format is amazing! Do you have an idea when will you release a video about transformers? Thank you Josh!

    • @statquest
      @statquest  Рік тому +6

      I'm shooting for the end of the month.

    • @JeremyHalfon
      @JeremyHalfon Рік тому

      Hi Josh@@statquest , any update on the following? Would definitely need it for my final tomorrow :))

    • @statquest
      @statquest  Рік тому +1

      @@JeremyHalfon I'm finishing my first draft today. Hope to edit it this weekend and record next week.

  • @nikolamarkovic9906
    @nikolamarkovic9906 Рік тому +5

    for this video attention is all you need

  • @won20529jun
    @won20529jun Рік тому +1

    I was literally just thinking an Id love an explanation of attention by SQ..!!! Thanks for all your work

  • @Ghost-ip3bx
    @Ghost-ip3bx 6 місяців тому

    Hi StatQuest, I've been a long time fan, your videos have helped me TREMENDOUSLY. For this video I felt however if we could get a larger picture of how attention works first ( how different words can have different weights ( attending to them differently )) and then going through a run with actual values, it'd be great! :) I also felt that the arrows and diagrams got a bit confusing in this one. Again, this is only constructive criticism and maybe it works for others and just not for me ( this video I mean ). Nonetheless, thank you so much for all the time and effort you put into making your videos. You're helping millions of people out there clear their degrees and achieve life goals

    • @statquest
      @statquest  6 місяців тому +1

      Thanks for the feedback! I'm always trying to improve how I make videos. Anyway, I work through the concepts more in my videos on transformers: ua-cam.com/video/zxQyTK8quyY/v-deo.html and if the diagrams are hard to follow, I also show how it works using matrix math: ua-cam.com/video/KphmOJnLAdI/v-deo.html

  • @ArpitAnand-yd7tr
    @ArpitAnand-yd7tr Рік тому +1

    Really looking forward to your explanation of Transformers!!!

  • @abdullahbinkhaledshovo4969
    @abdullahbinkhaledshovo4969 Рік тому +1

    I have been waiting for this for a long time

    • @statquest
      @statquest  Рік тому

      Transformers comes out on monday...

  • @rathinarajajeyaraj1502
    @rathinarajajeyaraj1502 Рік тому +1

    Much awaited one .... Awesome as always ..

  • @rrrprogram8667
    @rrrprogram8667 Рік тому +1

    Excellent josh.... So finally MEGA Bammm is approaching.....
    Hope u r doing good...

    • @statquest
      @statquest  Рік тому

      Yes! Thank you! I hope you are doing well too! :)

  • @abrahammahanaim3859
    @abrahammahanaim3859 Рік тому +1

    Hey Josh your explanation is easy to understand. Thanks

  • @rajatjain7894
    @rajatjain7894 Рік тому +1

    Was eagerly waiting for this video

  • @imkgb27
    @imkgb27 Рік тому +2

    Many thanks for your great video!
    I have a question. You said that we calculate the similarity score between 'go' and EOS (11:30). But I think the vector (0.01,-0.10) is the context vector for "let's go" instead of "go" since the input includes the output for 'Let's' as well as the embedding vector for 'go'. It seems that the similarity score between 'go' and EOS is actually the similarity score between "let's go" and EOS. Please make it clear!

    • @statquest
      @statquest  Рік тому +1

      You can talk about it either way. Yes, it is the context vector for "Let's go", but it's also the encoding, given that we have already encoded "Let's", of the word "go".

  • @AntiPolarity
    @AntiPolarity Рік тому +2

    can't wait for the video about Transformers!

  • @sourabhverma9034
    @sourabhverma9034 5 місяців тому

    This is called Luong attention. In its previous version, a simple neural net was used to get similarity scores instead of dot product which was trained along with rest of RNN, this older version was called bahdanau attention.
    Thank you for the amazing video, I had to watch it twice to make sense of it but it is amazingly done. If I can make a request/suggestion, showing mathematical equations sometimes helps making sense of things. So if you can include them in future videos, that would be great.

    • @statquest
      @statquest  5 місяців тому

      I'll keep that in mind.

  • @aniket_mishr
    @aniket_mishr 3 місяці тому +1

    Thanks for the amazing explanation. TRIPLE BAM!!!

  • @machinelearninggoddess
    @machinelearninggoddess Місяць тому +1

    3:14 That and the vanishing gradient problem is a key factor. NNs update themselves with gradient descent, basically derivatives, and the deeper the LSTM, the more we are applying the derivative of a derivative of a derivative so on so forth of a gradient value, and since the original loss value gradient is reduced astronomically every time a derivative, beyond a dozen or so LSTM cells the gradient might become 0 and this results in the earlier LSTMs literally not learning. So not only do LSTMs not remember stuff from previous words long away, they can't learn stuff on how to deal with previous words long away either, a double whammy :(

    • @statquest
      @statquest  Місяць тому +1

      bam! :)

    • @machinelearninggoddess
      @machinelearninggoddess Місяць тому +2

      @@statquest It's a double bam but it is directed at our faces and our NN, not at the problem we are trying to solve, which is really bad :(

  • @chessplayer0106
    @chessplayer0106 Рік тому +4

    Ah excellent this is exactly what I was looking for!

    • @statquest
      @statquest  Рік тому +1

      Thank you!

    • @birdropping
      @birdropping Рік тому +1

      @@statquest Can't wait for the next episode on Transformers!

  • @okay730
    @okay730 Рік тому +2

    I'm excited for the video about transformers. Thank you Josh, your videos are extremely helpful

  • @naomilago
    @naomilago Рік тому +1

    The music sang before the video are contagious ❤

  • @abdullahhashmi654
    @abdullahhashmi654 Рік тому +1

    Been wanting this video for so long, gonna watch it soon!

  • @souravdey1227
    @souravdey1227 Рік тому +1

    Had been waiting for this for months.

  • @hasansayeed3309
    @hasansayeed3309 Рік тому +1

    Amazing video Josh! Waiting for the transformer video. Hopefully it'll come out soon. Thanks for everything!

    • @statquest
      @statquest  Рік тому

      Thanks! I'm working on it! :)

  • @rikki146
    @rikki146 Рік тому +1

    When I see new vid from Josh, I know today is a good day! BAM!

  • @envynoir
    @envynoir Рік тому +1

    Godsent! Just what I needed! Thanks Josh.

  • @familywu3869
    @familywu3869 Рік тому +2

    Thank you for the excellent teaching, Josh. Looking forward to the Transformer tutorial. :)

  • @x7A9cF2k
    @x7A9cF2k 5 місяців тому +1

    Josh! Again to geg some attention with a cup of coffee, Double BAM!!

  • @yizhou6877
    @yizhou6877 Рік тому +2

    I am always amazed by your tutorials! Thanks. And when we can expect the transformer tutorial to be uploaded?

  • @thanhtrungnguyen8387
    @thanhtrungnguyen8387 Рік тому +1

    can't wait for the next StatQuest

    • @statquest
      @statquest  Рік тому

      :)

    • @thanhtrungnguyen8387
      @thanhtrungnguyen8387 Рік тому

      @@statquest I'm currently trying to fine-tune Roberta so I'm really excited about the following video, hope the following videos will also talk about BERT and fine-tune BERT

    • @statquest
      @statquest  Рік тому +1

      @@thanhtrungnguyen8387 I'll keep that in mind.

  • @Murattheoz
    @Murattheoz Рік тому +19

    I feel like I am watching a cartoon as a kid. :)

    • @statquest
      @statquest  Рік тому +1

      bam!

    • @Namenlos-r8f
      @Namenlos-r8f 4 місяці тому

      bu mecrada ilk defa türk görüyorum, bilg müh ögrencisi misin?

  • @tupaiadhikari
    @tupaiadhikari Рік тому

    At 13:38 are we Concatenating the output of the attention values and the output of the decoder LSTM for the translated word (EOS in this case) and then using a weights of dimensions (4*4) to convert into a dimension 4 pre Softmax output?

    • @statquest
      @statquest  Рік тому

      yep

    • @statquest
      @statquest  Рік тому +1

      If you want to see a more detailed view of what is going on at that stage, check out my video on Transformers: ua-cam.com/video/zxQyTK8quyY/v-deo.html In that video, I go over every single mathematical operation, rather than gloss over them like I do here.

    • @tupaiadhikari
      @tupaiadhikari Рік тому +1

      @@statquest Thank You Professor Josh for the clarifications !

  • @madjohnshaft
    @madjohnshaft Рік тому +1

    I am currently taking the AI cert program from MIT - I thank you for your channel

  • @capyk5455
    @capyk5455 Рік тому +1

    You're amazing Josh, thank you so much for all this content

  • @d_b_
    @d_b_ Рік тому +1

    Thanks for this. The way you step through the logic is always very helpful

  • @manuelcortes1835
    @manuelcortes1835 Рік тому

    I have a question that could benefit from clarification: In the final FC layer for word predictions, it is claimed that the Attention Values and 'encodings' are used as input (13:38). By 'encodings', do we mean the short term memories from the top LSTM layer in the decoder?

    • @statquest
      @statquest  Рік тому +2

      Yes. We use both the attention values and the LSTM outputs (short-term memories or hidden states) as inputs to the fully connected layer.

  • @tupaiadhikari
    @tupaiadhikari Рік тому +1

    Thanks Professor Josh for such a great tutorial ! It was very informative !

  • @sabaaslam781
    @sabaaslam781 Рік тому +1

    Hi Josh! No doubt, you teach in the best way. I have a request, I have been enrolled in PhD and going to start my work on Graphs, Can you please make a video about Graph Neural Networks and its variants, Thanks.

  • @megalazer6378
    @megalazer6378 6 місяців тому

    Could you please help me to understand what math did you do with attention values [-0.3, 0.3, 0.9, 0.4] to get [-0.7, 4.7, -2, -2] on 13:42

    • @statquest
      @statquest  6 місяців тому +1

      That fully connected layer is just a normal neural network with weights connecting every input to every output and bias terms in front of the outputs. For more details (to see the actual math), check out my video on Transformers: ua-cam.com/video/zxQyTK8quyY/v-deo.html

    • @megalazer6378
      @megalazer6378 6 місяців тому +1

      @@statquest Thank you!

  • @Sarifmen
    @Sarifmen Рік тому

    13:15 so the attention for EOS is just 1 number (per LSTM cell) which combines references to all the input words?

  • @Xayuap
    @Xayuap Рік тому +2

    weeeeee,
    video for tonite,
    tanks a lot

  • @ahmedwesam7286
    @ahmedwesam7286 23 дні тому +1

    You are amazing, BAM!!

  • @luvxxb
    @luvxxb Рік тому +1

    thank you so much for making these great materials

  • @akashat1836
    @akashat1836 9 місяців тому +1

    Hey Josh! Firstly, Thank you so much for this amazing content!! I can always count on your videos for a better explanation!
    I have one quick clarification to make. Before the fully dense layer. The first two numbers we get are from the [scaled(input1-cell1) + scaled(input2-cell1) ] and [scaled(input1-cell2) + scaled(input2-cell2) ] right?
    And the other two numbers are from the outputs of the decoder, right?

    • @statquest
      @statquest  9 місяців тому

      Yes.

    • @akashat1836
      @akashat1836 9 місяців тому +1

      @@statquest Thank you for the clarification!

  • @mrstriker1847
    @mrstriker1847 Рік тому +3

    Please add to the neural network playlist! Or don't it's your video, I just want to be able to find it when I'm looking for it to study for class.

    • @statquest
      @statquest  Рік тому

      I'll add it to the playlist, but the best place to find my stuff is here: statquest.org/video-index/

  • @lequanghai2k4
    @lequanghai2k4 Рік тому +1

    I am stilling learning this so hope next video come out soon

    • @statquest
      @statquest  Рік тому

      I'm working on it as fast as I can.

  • @patrikszepesi2903
    @patrikszepesi2903 Рік тому

    Hi, great video. At 13:49 can you please explain how you get -.3 and 0.3 for the input to the fully connected? THank you

    • @statquest
      @statquest  Рік тому +1

      The outputs from the softmax function are multiplied with the short-term memories coming out of the encoders LSTM units. We then add those products together to get -0.3 and 0.3.

  • @mymy-bi8ze
    @mymy-bi8ze 3 місяці тому

    Thanks as always for your great videos!!! This video was a little difficult for me. Can I ask a stupid question? In 13:31 , the fully connected layer is addressed, out of blue in my understanding. The inputs are attention values and EOS encoding, but how can the fully connected layer, which I think would have no information of translated sentences' encodings, can generate varmos???

    • @statquest
      @statquest  3 місяці тому +1

      The entire model, the weights and biases in the LSTM units and the weights and biases in the fully connected layer, is trained with backpropagation. So we quantify how similar the output is to the desired output and modify the weights and biases (all of them) based on that, in an iterative way, until we get the desired output.

    • @mymy-bi8ze
      @mymy-bi8ze 3 місяці тому

      @@statquest Thank you so much for your reply! My question is why that fully connected layer is necessary. It isn't needed in Seq2Seq and valinilla LSTM. So I'm wondering why it is necessary in Attention and what its role is.....

    • @statquest
      @statquest  3 місяці тому +1

      @@mymy-bi8ze We need something to combine the attention values with the values that come out of the decoder LMTMs.

    • @mymy-bi8ze
      @mymy-bi8ze 3 місяці тому +1

      @@statquest Thank you so much! I went back to Seq2Seq, and it was there too! Now I got it. Thanks again!

  • @shamshersingh9680
    @shamshersingh9680 5 місяців тому +1

    Hi Josh, thanks again for awesomest video ever made on Attention models. The video is so wonderfully made that it made such involved concept crystal clear. However, I have one small doubt. Till time step 14:37 you explained the attention with single layer of LSTMs. But what if we have two layers in Encoder and Decoder as we have in previous Seq2Seq Encoder-Decoder video. In that case, how the attention is going to get calculated.
    My guess is that we will calculate similarity score between LSTM output of second layer for each token with LSTM output of Decoder and feed the final similarity score to Fully Connected Layer along with output of hidden cells of LSTMs of second layer.
    Or will we calculate similarity score between LSTM output of each layer in Encoder with each layer in Decoder as pass the input to the FC layer along with the output of second layer in Decoder since that is the final output from the Decoder.
    Thanks a lot again for being our saviour and your presence makes this the best time to learn new things.

    • @statquest
      @statquest  4 місяці тому +1

      Thank you! I'm pretty sure we would calculate the similarities between each layer in the encoder with each later in the decoder to pass them to a fully connected layer.

  • @theelysium1597
    @theelysium1597 Рік тому

    Since you asked for video suggestions in another video: A video about the EM and Mean Shift algorithm would be great!

  • @dvm509
    @dvm509 10 місяців тому

    It makes negative sense to me at 13:49, I can see why you get -0.3 which is -0.76 * 0.4, but what is the calculation for 0.3 after that i multiply 0.01 * 0.6 and add it to -0.3 it wouldn't be 0.3.

    • @statquest
      @statquest  10 місяців тому

      (-0.76 * 0.4) + (0.01 * 0.6) = -0.3 and (-0.76 * 0.4) + (0.01 * 0.6) = 0.3

    • @dvm509
      @dvm509 10 місяців тому

      Do you mean (0.75 * 0.4 ) + (0.01 * 0.6) = 0.3 ? For the second part

    • @statquest
      @statquest  10 місяців тому

      Yes.

  • @yoshidasan4780
    @yoshidasan4780 Рік тому +1

    first of all thanks a lot Josh! you made it way too understandable for us and i would be forever grateful to you for this !! Have a nice time! and can you please upload videos on Bidirectional LSTM and BERT?

    • @statquest
      @statquest  Рік тому

      I'll keep those topics in mind.

  • @JL-vg5yj
    @JL-vg5yj Рік тому +1

    super clutch my final is on thursday thanks a lot!

  • @andresg3110
    @andresg3110 Рік тому +1

    You are on Fire! Thank you so much

  • @yuanyuan524
    @yuanyuan524 Рік тому +1

    best tutorial in youtube

  • @gordongoodwin6279
    @gordongoodwin6279 Рік тому +1

    fun fact - if your vectors are scaled/mean-centered, cosine similarity is geometrically equivalent to the pearson correlation, and the dotproduct is the same as the covariance (un-scaled correlation).

  • @The-Martian73
    @The-Martian73 Рік тому +1

    Great, that's really what I was looking for, thanks mr Starmer for the explanation ❤

  • @markus_park
    @markus_park Рік тому +1

    Thanks! This was a great video!

  • @juliank7408
    @juliank7408 11 місяців тому +1

    Phew! Lots of things in this model, my brain feels a bit overloaded, haha
    But thanks! Might have to rewatch this

  • @방향-o7z
    @방향-o7z Місяць тому

    목표: encoder의 마지막 토큰인 EOS와의 similarity를 계산해서 decoder의 첫 번째 토큰을 만들자.
    11:52 한 토큰에 대해: 다른 토큰 포함해서 모든 토큰 하나씩 층을 만들어서 / 각 토큰층마다 내적곱으로 EOS와의 비슷한정도를 계산. => 각 토큰마다 점수로 나옴.
    12:31 그 점수를 softmax로 계산하면 0부터 1까지의 값이 나옴. 더 비슷한 것을 decoder의 첫 번째 토큰 만드는 데 이용하는 것.
    13:48 decoder에서 softmax로 다시 계산해서 deocer의 첫 번째 토큰 생성.
    중요한 것은 11:52에서 '한 토큰에 대해'서 계산했다는 것.
    - 원래는 모든 토큰 전체 층을 decoder에 보내서 decoder의 첫 번째 토큰 만드는 데 이용했다면,
    - attetion은 한 토큰마다 전체 층 내적곱 구해서 decoder의 첫 번째 토큰 만드는 데 이용하는 것.

  • @frogloki882
    @frogloki882 Рік тому +2

    Another BAM!

  • @andrewsiah
    @andrewsiah Рік тому +1

    Can't wait for the transformer video!

    • @statquest
      @statquest  Рік тому

      I'm making great progress on it.

  • @owlrion
    @owlrion Рік тому +1

    Hey! Great video, this is really helping me with neural networks at the university, do we have a date for when the transformer video comes out?

  • @handsomemehdi3445
    @handsomemehdi3445 Рік тому

    Hello, Thank you for the video, but I am so confused that some terms introduced in original 'Attention is All You Need' paper were not mentioned in video, for example, keys, values, and queries. Furthermore, in the paper, authors don't talk about cosine similarity and LSTM application. Can you please clarify this case a little bit much better?

    • @statquest
      @statquest  Рік тому +1

      The "Attention is all you need" manuscript did not introduce the concept of attention. That does done years earlier, and that is what this video describes. If you'd like to understand the "Attention is all you need" concept of transformers, check out my video on transformers here: ua-cam.com/video/zxQyTK8quyY/v-deo.html

  • @tangt304
    @tangt304 Рік тому

    Another awesome video! Josh, will you plan to talk about BERT? Thank you!

  • @Thepando20
    @Thepando20 Рік тому

    Hi, great video SQ as always!
    I had the same question as @manuelcortes1835 and I understand that the encodings are the LSTM outputs. However, in 9:02 the outputs are 0.91 and 0.38, maybe I am missing something here?

    • @statquest
      @statquest  Рік тому +1

      Yes, and at 13:36 they are rounded to the nearest 10th so they can fit in the small boxes. Thus 0.91 is rounded to 0.9 and 0.38 is rounded to 0.4.

    • @Thepando20
      @Thepando20 Рік тому +1

      Thank you, all clear!

  • @carloschau9310
    @carloschau9310 Рік тому +1

    thank you sir for your brilliant work!

  • @orlandopalmeira623
    @orlandopalmeira623 9 місяців тому

    Hello, I have a doubt. The initialization of the cell state and hidden state of the decoder is a context vector that is the representation (generated by encoder) of the entire sentence (input)? And what about each hidden state (from encoder) used in decoder? Are they stored somehow? Thanks!!!

    • @statquest
      @statquest  9 місяців тому +1

      1) Yes, the context vector is a representation of the entire input.
      2) The hidden states in the encoder are stored for attention.

    • @orlandopalmeira623
      @orlandopalmeira623 9 місяців тому +1

      @@statquest Thanks!!

  • @shaktisd
    @shaktisd Рік тому

    I have one fundamental question related to how attention model learns, so basically higher attention score is given to those pairs of word which have higher softmax (Q.K) similarity score. Now the question is how relationship in the sentence "The cat didn't climb the tree as it was too tall" is calculated and it knows that in this case "it" refers to tree and not "cat" . Is it from large content of data that the model reads helps it in distinguishing the difference ?

    • @statquest
      @statquest  Рік тому +1

      Yes. The more data you have, the better attention is going to work.

  • @automatescellulaires8543
    @automatescellulaires8543 Рік тому +1

    wow, i didn't think i would see this kind of stuff on this channel.

  • @Sergio.Freitas
    @Sergio.Freitas Рік тому +1

    Valeu!

    • @statquest
      @statquest  Рік тому

      Thank you very much for supporting StatQuest!!! TRIPLE BAM!!! :)

  • @MelideCrippa
    @MelideCrippa Рік тому +1

    Thank you very much for your explanation! You are always super clear. Will the transformer video be out soon? I have a natural language processing exam in a week and I just NEED your explanation to go through them 😂

    • @statquest
      @statquest  Рік тому +1

      Unfortunately I still need a few weeks to work on the transformers video... :(

  • @seifeddineidani3256
    @seifeddineidani3256 Рік тому +1

    Thanks josh, great video! ❤I hope you upload the transformer video soon :)

  • @Fahhne
    @Fahhne Рік тому +1

    Nice video, can't wait for the video about transformers
    (I imagine it will be the next one?)

  • @Rykurex
    @Rykurex Рік тому

    Do you have any courses with start-to-finish projects for people who are only just getting interested in machine learning?
    Your explanations on the mathematical concepts has been great and I'd be more than happy to pay for a course that implements some of these concepts into real world examples

    • @statquest
      @statquest  Рік тому

      I don't have a course, but hope to have one one day. In the meantime, here's a list of all of my videos somewhat organized: statquest.org/video-index/ and I do have a book called The StatQuest Illustrated Guide to Machine Learning: statquest.org/statquest-store/