All The Math You Need For Attention In 15 Minutes

Поділитися
Вставка
  • Опубліковано 25 гру 2024

КОМЕНТАРІ • 33

  • @ritvikmath
    @ritvikmath  3 місяці тому +7

    Ongoing Notes:
    1. I should note that the concept of which words pay attention to which others doesn't always line up with our human expectations. In this video, I frequently claim that "meal" should attend to "savory" and "delicious" but if you look at the attention weights matrix at 9:25, "meal" attends the most to "savory" but not so much to "delicious". In reality, the model is going to do what it needs to do to excel at next word prediction, which might mean taking a different approach to setting the attention layer weights than what our human brains would "neatly expect". Still, the illustration of "meal" attending to "savory" and "delicious" is usually correct, but I wanted to clarify that it's not guaranteed and that's not a bad thing.

  • @kumaraman147
    @kumaraman147 2 місяці тому +2

    Best video I have ever seen for explaining attention mechanism and now I got cleared about attention ❤

  • @YourDailyR
    @YourDailyR 2 місяці тому

    This channel is gold!!!

  • @adastraprojects9426
    @adastraprojects9426 9 днів тому

    crystal clear explanation man, immediately subscribed!

  • @eingram
    @eingram 3 місяці тому +3

    Perfect timing, learning about this in class right now!

  • @tantzer6113
    @tantzer6113 3 місяці тому +5

    Question: LLMs obviously 1) account for hierarchies of concepts/abstractions, 2) perform complicated logical operations, decision-tree-like, on those concepts (and words). Having read about attention and having watched a dozen videos on it, I have never encountered an explanation of how attention can do these things. My guess is that the layering of attention layers is instrumental in all of that but I have seen no discussion or explanation of this.

    • @ScilentE
      @ScilentE 2 місяці тому

      I'm not sure if I would say LLMs "obviously" do those two things, but they are certainly emergent behaviors due to increases in compute. Scaling laws are pretty cool!

  • @Omsip123
    @Omsip123 3 місяці тому +1

    Liked , subscribed, and commented. This is pure gold!

  • @rubncarmona
    @rubncarmona 3 місяці тому +1

    great job. I've been studying the subject by myself and had missed the visualization of vector sums in the value space. thanks for posting.

    • @ritvikmath
      @ritvikmath  3 місяці тому +2

      Glad it was helpful!

  • @bin4ry_d3struct0r
    @bin4ry_d3struct0r 3 місяці тому +1

    Fantastic explanation! For the next videos in this series, please touch upon the role of the residual connection. I'm still iffy on what it's doing.

  • @Darkev77
    @Darkev77 3 місяці тому +2

    Oof, I really needed this a while ago, finally!

    • @ritvikmath
      @ritvikmath  3 місяці тому

      Sorry to be late but I hope it was worth it!

  • @softerseltzer
    @softerseltzer 2 місяці тому

    Great explanation, loved it!

  • @horoshuhin
    @horoshuhin 3 місяці тому

    yessssss. let's talk about those in the next videos. this is a great channel for the way you explain things. I don't know if it;s too far ahead but it would be awesome to see some small code examples too.

  • @jessicatran5467
    @jessicatran5467 3 місяці тому

    thank you for these videos !!!

  • @sonaliganguli6553
    @sonaliganguli6553 3 місяці тому +4

    I waited for it for months..

    • @ritvikmath
      @ritvikmath  3 місяці тому +1

      sorry for the wait! hope it is worth it 😎

  • @vedantvashi9051
    @vedantvashi9051 3 місяці тому +1

    Can you do a video on how input(ex. Words, videos, audio) are tokenized into vectors

  • @juanluisesteban7394
    @juanluisesteban7394 3 місяці тому

    This great. Thanks

  • @VictorAlmeida27
    @VictorAlmeida27 3 місяці тому

    The values for the attention of a word on the Attention Matrix are on the lines? What does the columns represent? I always imagined this matrix to be like a covariance matrix, but by the looks of it I could be more wrong

  • @TechWithAbee
    @TechWithAbee 3 місяці тому

    ❤thanks

  • @radionnazmiev546
    @radionnazmiev546 2 місяці тому

    Amazing video! Would be nice to see how its actually calculated on a small few words sentence