Transformers - Part 1 - Self-attention: an introduction

Поділитися
Вставка
  • Опубліковано 30 вер 2024

КОМЕНТАРІ • 18

  • @wangkuanlee3548
    @wangkuanlee3548 2 роки тому +7

    Superb explanation. This is the clearest explanation of the concept of weight in self-attention I have ever heard. Thank you so much.

  • @mar-a-lagofbibug8833
    @mar-a-lagofbibug8833 3 роки тому +3

    Thank you for sharing.

  • @kencheligeer3448
    @kencheligeer3448 3 роки тому +3

    It is a very brilliant explanation about self-attention!!! Thank you.

  • @kacemichakdi3048
    @kacemichakdi3048 2 роки тому +1

    Thank you for your explanation. I just didnt understand how we chose W_k and W_q???

    • @lennartsvensson7636
      @lennartsvensson7636  2 роки тому +1

      These matrices contain learnable parameters that can be trained using standard techniques from deep learning.

    • @kacemichakdi3048
      @kacemichakdi3048 2 роки тому

      Thank you

  • @mustafakocakulak5895
    @mustafakocakulak5895 3 роки тому +1

    Best explanation ever :) Thank you

  • @murkyPurple123
    @murkyPurple123 3 роки тому +1

    Thank you

  • @piyushkumar-wg8cv
    @piyushkumar-wg8cv Рік тому

    Intuition buildup was amazing, you clearly explained why we need learnable parameters in the first place and how that can help relate similar words. Thanks for the explanation.

  • @euisasriani_01
    @euisasriani_01 2 роки тому

    Thank you for great explanation. I still don't understand about how to gain Wq ad Wk.

  • @po-yupaulchen166
    @po-yupaulchen166 2 роки тому

    Great and clear explanation. One question about W_Q and W_K. Since z1 = k1^T *q3 = x1^T * (W_k^T * W_Q) * x2, and W_k and W_Q are trainable matrices, could we just combine it as a matrix like
    W_KQ = W_k^T * W_Q to reduce the number of paramters?

    • @lennartsvensson7636
      @lennartsvensson7636  2 роки тому

      What you are suggesting should be possible as long as the matrices are quadratic.

  • @andrem82
    @andrem82 Рік тому

    Best explanation of self-attention I've seen so far. This is gold.

  • @exxzxxe
    @exxzxxe 2 роки тому

    A first class explanation of self attention- the best on UA-cam.

  • @jhnflory
    @jhnflory 2 роки тому

    Thanks for putting these videos together!

  • @prasadkendre149
    @prasadkendre149 Рік тому

    greatful forever

  • @ahmedb2559
    @ahmedb2559 Рік тому

    Thank you !

  • @prateekpatel6082
    @prateekpatel6082 8 місяців тому

    pretty bad example . Even if we have trainiable Wq and Wk , what if there was a new sentence where we had Tom and and he , the WQ will still make word 9 point to wmma and she