The many amazing things about Self-Attention and why they work

Поділитися
Вставка
  • Опубліковано 11 січ 2025

КОМЕНТАРІ • 24

  • @avb_fj
    @avb_fj  Рік тому +3

    Part 1 - Neural Attention: ua-cam.com/video/frosrL1CEhw/v-deo.html
    Part 3 - Transformers: ua-cam.com/video/0P6-6KhBmZM/v-deo.html

  • @eatchocolate
    @eatchocolate 25 днів тому +1

    Very informative video. Thank you for the great step by step walkthrough❤

  • @ControllerQuickSwaps
    @ControllerQuickSwaps Рік тому +5

    Keep it up man, you're doing an amazing job. You have incredible production value.
    A small suggestion I'd say is to talk a tiny bit slower when discussing technical details. I totally understand the words you're saying, but since you use very information-rich language my brain can need a few more milliseconds to digest the meaning of each token. Not a huge issue though, you're doing great :)

    • @avb_fj
      @avb_fj  Рік тому +1

      Noted! Thanks for the kind words! 🙌🏼

  • @sharannagarajan4089
    @sharannagarajan4089 11 місяців тому +1

    Hey, I would love to know more about the way you compare self attention just like a feedforward layer which projects on a different space. Do let me know what resource I can see more on it

  • @soojinlee6191
    @soojinlee6191 5 місяців тому

    Thank you so much for making videos on transformer! Your explanations are very intuitive, one of the best I've ever watched!

    • @avb_fj
      @avb_fj  5 місяців тому

      Wow, thanks! Glad you are enjoying the videos!

  • @saleemun8842
    @saleemun8842 Рік тому +1

    It explained very well and easy to follow. I learned a lot, great work man!

  • @willikappler1401
    @willikappler1401 Рік тому +1

    Wonderful video, very well explained!

  • @shahriarshaon9418
    @shahriarshaon9418 Рік тому +1

    Can't appreciate it enough. Recently I cracked one interview and your videos helped me a lot. I was searching for your mail, but couldn't find it. However, what I was gonna request you is if can you make tutorials by that I don't mean tutorials from scratch, but rather if you have any plans to make videos on a capstone project basis. Anyway, I can see your channel shining, all the best. Thanks again.
    😍

    • @avb_fj
      @avb_fj  Рік тому +3

      That's awesome dude! Thanks for your kind words and congrats on the interview! I generally don't make tutorials, and I have a long list of video ideas waiting in the backlog (i have a full time job so hard to manage time to make comprehensive tutorials), but I definitely plan to get into the space eventually. If you have any specific project idea you want to see covered, let me know in the comments.

    • @shahriarshaon9418
      @shahriarshaon9418 Рік тому

      @@avb_fj yeah sure. Actually i am going to work on reconstructing CT scan images from MRI images and also predicting treatment doses for glioma patients and i am planning to incorporate transformed based deep learning models. If you have any suggestions and guidance, it will be highly appreciated. Thanks again and all the best for your upcoming videos, hope to watch all of them.

  • @sharvani_0779
    @sharvani_0779 9 місяців тому +2

    Great content and simple explanation!

    • @avb_fj
      @avb_fj  9 місяців тому

      Glad you liked it!

  • @hieunguyentranchi947
    @hieunguyentranchi947 2 місяці тому

    WONDERFUL VIDEO!!! Btw do you have any resources that talk about the analogy of "WX+B" in Perceptron and "softmax(QK)V+X" in transformers, and how transformers is an adaptive learning framework? And what was the talk that you mentioned in this video?

    • @avb_fj
      @avb_fj  2 місяці тому

      I wish I could find which talk I learnt that in. I remember reading/watching it somewhere during my university days, and it sort of stuck with me. I had tried to find the resource back when I was working on this video, but unfortunately I couldn’t find it. There is surprisingly little resources online about this.
      Anyway here is a great paper that contains mathematical proofs of many important things about attention/transformers:
      arxiv.org/pdf/1912.10077

  • @carloslfu
    @carloslfu Рік тому +1

    Great content!

  • @saisaigraph1631
    @saisaigraph1631 2 місяці тому

    Bro Great... Thank you...

    • @avb_fj
      @avb_fj  2 місяці тому

      Glad you liked it!

  • @ramanShariati
    @ramanShariati Місяць тому

    LEGEND !

  • @Pokemon00158
    @Pokemon00158 Рік тому +1

    I do not really understand, when you say Adaptative in the way that a dense/weight layer is "fixed", what does that mean in practice? The dense/weight layer is with the same logic also "adaptative" when it sees input data due to back propagation changing the values inside of it.

    • @avb_fj
      @avb_fj  Рік тому +1

      Aha I see the point. Let me clarify.
      The adaptive nature I was referring to is when we are doing inferencing on an already trained neural network. Backprop is done only when we are training the network, but once they are trained, the weights of dense layers remain fixed and constant for each input.
      In Self Attention, the key, query and value neural networks also remain fixed, so each new input go through the same multiply-add ops to derive the K, Q, V… but these combine to generate a new weight matrix that produces the final output (as shown in video). Hope that clarifies it.

  • @deekshitht786
    @deekshitht786 14 днів тому

    You are awesome ❤

    • @avb_fj
      @avb_fj  13 днів тому

      Haha you are awesome too!