Positional embeddings in transformers EXPLAINED | Demystifying positional encodings.

Поділитися
Вставка
  • Опубліковано 5 чер 2024
  • What are positional embeddings / encodings?
    📺 Follow-up video: Concatenate or add positional encodings? Learned positional embeddings. • Adding vs. concatenati...
    ➡️ AI Coffee Break Merch! 🛍️ aicoffeebreak.creator-spring....
    ► Outline:
    00:00 What are positional embeddings?
    03:39 Requirements for positional embeddings
    04:23 Sines, cosines explained: The original solution from the “Attention is all you need” paper
    📺 Transformer explained: • The Transformer neural...
    ▀▀▀▀▀▀▀▀▀▀▀▀▀▀▀▀▀▀▀▀▀▀▀▀▀▀
    NEW (channel update):
    🔥 Optionally, pay us a coffee to boost our Coffee Bean production! ☕
    Patreon: / aicoffeebreak
    Ko-fi: ko-fi.com/aicoffeebreak
    ▀▀▀▀▀▀▀▀▀▀▀▀▀▀▀▀▀▀▀▀▀▀▀▀▀▀
    Paper 📄
    Vaswani, Ashish, Noam Shazeer, Niki Parmar, Jakob Uszkoreit, Llion Jones, Aidan N. Gomez, Łukasz Kaiser, and Illia Polosukhin. "Attention is all you need." In Advances in neural information processing systems, pp. 5998-6008. 2017. proceedings.neurips.cc/paper/...
    ✍️ Arabic Subtitles by Ali Haidar Ahmad / ali-ahmad-0706a51bb .
    Music 🎵 :
    Discovery Hit by Kevin MacLeod is licensed under a Creative Commons Attribution 4.0 licence. creativecommons.org/licenses/...
    Source: incompetech.com/music/royalty-...
    Artist: incompetech.com/
    ---------------------------
    🔗 Links:
    AICoffeeBreakQuiz: / aicoffeebreak
    Twitter: / aicoffeebreak
    Reddit: / aicoffeebreak
    UA-cam: / aicoffeebreak
    #AICoffeeBreak #MsCoffeeBean #MachineLearning #AI #research​

КОМЕНТАРІ • 209

  • @anonymousanon4822
    @anonymousanon4822 10 місяців тому +24

    I found no explanation for this anywhere and when reading the paper missed the detail that each tokens positional encoding consists of multiple values (calculated by different sine functions). Your explanation and visual representation finally made me understand! Fourier transforms are genius and I'm amazed in how many different areas they show up.

  • @yimingqu2403
    @yimingqu2403 2 роки тому +11

    love how the "Attention is all you need" paper appears with an epic-like bgm

    • @AICoffeeBreak
      @AICoffeeBreak  2 роки тому +2

      It wasn't on purpose, but it is funny -- in hindsight 😅🤣

  • @444haluk
    @444haluk 2 роки тому +13

    This video is a clear explaination of why you shouldn't add your positional encoding but concat.

    • @AICoffeeBreak
      @AICoffeeBreak  2 роки тому +6

      Extra dimensions dedicated exclusively to encode position! Sure, but only if you have some extra to share. 😅

    • @444haluk
      @444haluk 2 роки тому +2

      @@AICoffeeBreak this method relocates the embeddings in a specific direction in the embeddings space, so that new position in the relevant embedding cluster have "another" meaning to (say there is another instance of the same word later) other words of "same kind". But that place should be reserved other semantics, else the space is literally filled with "second position" coffee and "tenth position" me, "third position" good etc etc. This can go wrong in soooo many ways. Don't get me wrong, I am a clear cut "Chinese Room Experiment" guy, I don't think you can translate "he is a good doctor" before imagining an iconic low resolution male doctor and recall a memory of satisfaction and admiration of consumatory reward, but again, the "he" in "he did again" and "man, he did it again" should literally have the same representation in the network to start discussing things.

    • @AICoffeeBreak
      @AICoffeeBreak  2 роки тому +7

      You are entirely right. I was short in my comment because I commented on the same issue in Cristian Garcia's comment. But there is no way you would have seen it, so I will copy paste it here: 😅
      "Concatenating has the luxury of extra, exclusive dimensions dedicated to positional encoding with the upside of avoiding mixing up semantic and positional information. The downside is, you can afford those extra dimensions only if you have capacity to spare.
      So adding the positional embeddings to initial vector representations saves some capacity by using it for both semantic and positional information, but with the danger of mixing these up if there is no careful tuning on this (for tuning, think about the division by 10000 in the sine formula in "attention is all you need")."

    • @AICoffeeBreak
      @AICoffeeBreak  2 роки тому +6

      And you correctly read between the lines, because this was not explicitly mentioned in the video. In the video I explained what an act of balance it is between semantic and positional information, but you identified the solution: If adding them up causes such trouble, then... let's don't! 😂

    • @blasttrash
      @blasttrash 17 днів тому +1

      @@AICoffeeBreak new to AI, but what do you mean by the word "capacity"? Do you mean RAM? Do you mean that if we concat positional encodings to original vector instead of adding, it will take up more RAM/memory and therefore make the training process slow?

  • @adi331
    @adi331 2 роки тому +20

    +1 for more vids on positional encodings.

  • @sqripter256
    @sqripter256 7 місяців тому +7

    This is the most intuitive explanation of the positional encoding I have come across. Everyone out there explain how to do it, even with code, but not the why which is more important.
    Keep this up. You have earned my subscription.

  • @20Stephanus
    @20Stephanus 2 роки тому +2

    "A multi-dimensional, spurious correlation identifying beast..." ... wow. Douglas Adams would be proud of that.

  • @deepk889
    @deepk889 2 роки тому +5

    I had my morning coffee with this and will make an habit!

  • @woddenhorse
    @woddenhorse 2 роки тому +2

    Multi Dimensional Spurious Corelation Identifying Beast 🔥🔥
    That's what I am calling transformers from now on

  • @ausumnviper
    @ausumnviper 2 роки тому +5

    Great explanation !! And Yes Yes Yes.

  • @sharepix
    @sharepix 2 роки тому +4

    Letitia's Explanation Is All You Need!

  • @hannesstark5024
    @hannesstark5024 2 роки тому +8

    + 1 for video on relative positional representations!

  • @yyyang_
    @yyyang_ Рік тому +5

    i've read numerous articles explaining the positional embedding so far.. however, it is surely the greatest & clearest ever

  • @garisonhayne668
    @garisonhayne668 2 роки тому +5

    Dang it, i learned something and my morning coffee isn't even finished.
    Its going to be one of *those* days.

    • @AICoffeeBreak
      @AICoffeeBreak  2 роки тому +1

      Sound like a good day to me! 😅
      Whish you a fruitful day!

  • @karimedx
    @karimedx 2 роки тому +3

    Nice explanation

  • @rahulchowdhury3722
    @rahulchowdhury3722 Рік тому +2

    You've solid understanding of Mathematics of Signal Processing

  • @jayjiyani6641
    @jayjiyani6641 Рік тому +1

    Very intuitive. I know there is sine cosine positional encoding but it is actually effective that I got it here..👍👍

  • @speed-stick
    @speed-stick 10 місяців тому +2

    Bro
    Where have you been hiding all this time?
    This is next level explaining

  • @kryogenica4759
    @kryogenica4759 2 роки тому +3

    Make Ms. Coffee Bean spill the beans on positional embeddings for images

  • @harshkumaragarwal8326
    @harshkumaragarwal8326 2 роки тому +3

    great explanation :)

  • @tanmaybhayani
    @tanmaybhayani 22 дні тому +1

    Amazing! This is the best explanation for positional encodings period. Subscribed!!

  • @yusufani8
    @yusufani8 2 роки тому +2

    Probably the clearest explanation for positional encoding:D

  • @khursani8
    @khursani8 2 роки тому +5

    Thanks for the explanation
    Interested to know about rotary position embedding

  • @ylazerson
    @ylazerson 2 роки тому +2

    Just watched this again for a refresher; thee best video out there on the subject!

  • @full-stackmachinelearning2385
    @full-stackmachinelearning2385 Рік тому +2

    BEST AI channel on UA-cam!!!!!

  • @ConsistentAsh
    @ConsistentAsh 2 роки тому +6

    I was browsing through some channels after first stopping on Sean Cannells and I noticed your channel. You got a great little channel building up here. I decided to drop by and show some support. Keep up the great content and I hope you keep posting :)

    • @AICoffeeBreak
      @AICoffeeBreak  2 роки тому +3

      Thanks for passing by and for the comment! I appreciate it!

  • @SyntharaPrime
    @SyntharaPrime Рік тому +2

    Great explanation - It might be the best. I think I finally figured it out. I highly appreciate it.

  • @Phenix66
    @Phenix66 2 роки тому +47

    Great stuff :) Would love to see more of that, especially for images or geometry!

  • @huonglarne
    @huonglarne 2 роки тому +2

    This explanation is incredible

  • @kevon217
    @kevon217 8 місяців тому +3

    Super intuitive explanation, nice!

  • @magnuspierrau2466
    @magnuspierrau2466 2 роки тому +9

    Great explanation of the intuition of positional encodings used in the Transformer!

  • @elinetshaaf75
    @elinetshaaf75 2 роки тому +5

    great explanation of positional embeddings. Just what I need.

  • @raoufkeskes7965
    @raoufkeskes7965 4 місяці тому +2

    the most brilliant positional encoding explanation EVER that was GOD Level explanation

  • @exoticcoder5365
    @exoticcoder5365 10 місяців тому +1

    The best explanation of how exactly position embeddings work !

  • @MaximoFernandezNunez
    @MaximoFernandezNunez 11 місяців тому +1

    I finally understand the positinal encoding! Thanks

  • @ashish_sinhrajput5173
    @ashish_sinhrajput5173 9 місяців тому +2

    i watched bunch of videos on the positional embedding , but this video makes me very clear intuition behind the positional embedding , thank you very much for this great video , 😊

    • @AICoffeeBreak
      @AICoffeeBreak  9 місяців тому +2

      Thanks, that's great to hear! ☺️

  • @ugurkap
    @ugurkap 2 роки тому +6

    Explained really well, thank you 😊

  • @mbrochh82
    @mbrochh82 Рік тому +3

    This is probably the best explanation of this topic on UA-cam! Great work!

  • @tonoid117
    @tonoid117 2 роки тому +9

    What a great video, I'm studying my Ph.D. at NLU, so this came in very handy. Thank you very much and greetings from Ensenada Baja California Mexico :D!

    • @AICoffeeBreak
      @AICoffeeBreak  2 роки тому +3

      Thanks, thanks for visiting from so far away! Greetings from Heidelberg Germany! 👋

  • @gauravchattree5273
    @gauravchattree5273 Рік тому +4

    Amazing content. After seeing this all the articles and research papers makes sense.

  • @roberto2912
    @roberto2912 3 місяці тому +1

    I loved your simple and explicit explanation. You've earned a sub and like!

  • @DerPylz
    @DerPylz 2 роки тому +6

    Thanks, as always, for the great explanation!

    • @AICoffeeBreak
      @AICoffeeBreak  2 роки тому +2

      It was Ms. Coffee Bean's pleasure! 😅

  • @helenacots1221
    @helenacots1221 Рік тому +6

    amazing explanation!!! I have been looking for a clear explanation on how the positional encodings actually work and this really helped! thank you :)

  • @aterribleyoutuber9039
    @aterribleyoutuber9039 5 місяців тому +1

    This was very intuitive, thank you very much! Needed this, please keep making videos

  • @markryan2475
    @markryan2475 2 роки тому +5

    Great explanation - thanks very much for sharing this.

  • @deepshiftlabs
    @deepshiftlabs 2 роки тому +2

    Brilliant video. This was the best explanation of positional encodings I have seen. It helped a TON!!!

    • @deepshiftlabs
      @deepshiftlabs 2 роки тому

      I also make AI videos. I am more into the image side(convolutions and pooling) so it was great to see more AI educators.

  • @bartlomiejkubica1781
    @bartlomiejkubica1781 4 місяці тому +1

    Great! It took me forever, before I had found your videos, but finally I understand it. Thank you soooo much!

  • @nicohambauer
    @nicohambauer 2 роки тому +6

    Sooo good!

  • @timoose3960
    @timoose3960 2 роки тому +4

    This was so insightful!

  • @ravindrasharma85
    @ravindrasharma85 3 дні тому +1

    excellent explanation!

  • @PenguinMaths
    @PenguinMaths 2 роки тому +6

    This is a great video! Just found your channel and glad I did, instantly subscribed :)

  • @matt96920
    @matt96920 Рік тому +4

    Excellent! Great work!

  • @WhatsAI
    @WhatsAI 2 роки тому +7

    Super clear and amazing (as always) explanation of sines and cosines positional embeddings! 🙌

    • @AICoffeeBreak
      @AICoffeeBreak  2 роки тому +3

      Thanks! Always happy when you visit!

  • @shamimibneshahid706
    @shamimibneshahid706 2 роки тому +5

    I feel lucky to have found your channel. Simply amazing ❤️

  • @javiervargas6323
    @javiervargas6323 2 роки тому +2

    Thank you. One thing is to know the formula and applying it and other thing is to understand the intuition behind it. You made it very clear. All the best

    • @AICoffeeBreak
      @AICoffeeBreak  2 роки тому +1

      Well said! -- Humbled to realize this was put in context with our video, thanks.
      Thanks for watching!

  • @gemini_537
    @gemini_537 2 місяці тому +1

    Gemini: This video is about positional embeddings in transformers.
    The video starts with an explanation of why positional embeddings are important. Transformers are a type of neural network that has become very popular for machine learning tasks, especially when there is a lot of data to train on. However, transformers do not process information in the order that it is given. This can be a problem for tasks where the order of the data is important, such as language translation. Positional embeddings are a way of adding information about the order of the data to the transformer.
    The video then goes on to explain how positional embeddings work. Positional embeddings are vectors that are added to the input vectors of the transformer. These vectors encode the position of each element in the sequence. The way that positional embeddings are created is important. The embeddings need to be unique for each position, but they also need to be small enough that they do not overwhelm the signal from the original data.
    The video concludes by discussing some of the different ways that positional embeddings can be created. The most common way is to use sine and cosine functions. These functions can be used to create embeddings that are both unique and small. The video also mentions that there are other ways to create positional embeddings, and that these methods may be more appropriate for some types of data.░

  • @justinwhite2725
    @justinwhite2725 2 роки тому +5

    In another video I've seen, apparently it doesn't matter if positional embedding are learned or static. It seems as thiugh the rest of the model makes accurate deductions regardless.
    This is why I was not surprised that Fourier transforms seem to work nearly as well as self attention.

    • @meechos
      @meechos 2 роки тому

      COuld you please elaborate using an example maybe?

  • @aasthashukla7423
    @aasthashukla7423 7 місяців тому +1

    Thanks Letitia, great explanation

  • @alphabetadministrator
    @alphabetadministrator Місяць тому +1

    Hi Letitia. Thank you so much for your wonderful video! Your explanations are more intuitive than almost anything else I've seen on the internet. Could you also do a video on how positional encoding works for images, specifically? I assume they are different from text because images do not have the sequential pattern text data have. Thanks!

    • @AICoffeeBreak
      @AICoffeeBreak  Місяць тому +1

      Thanks for the suggestion. I do not think I will come to do this in the next few months. But the idea of image position embeddings is that those representations are most often learned. The gist of it is to divide the image into patches, let's say 9. And then to number them from 1 to 9 (from the top-left to bottom right). Then let gradient descent learn better representations of these addresses.

  • @yonahcitron226
    @yonahcitron226 Рік тому +3

    amazing stuff! so clear and intuitive, exactly what I was looking for :)

    • @AICoffeeBreak
      @AICoffeeBreak  Рік тому +2

      Thanks for watching and appreciating! 😊

  • @ColorfullHD
    @ColorfullHD Місяць тому +1

    Lifesaver! Thank you for the explanation.

  • @clementmichaud724
    @clementmichaud724 11 місяців тому +1

    Very well explained! Thank you so much!

  • @conne637
    @conne637 2 роки тому +2

    Great content! Can you do a video about Tabnet please? :)

  • @bdennyw1
    @bdennyw1 2 роки тому +5

    Nice explanation! I’d love to hear more about multidimensional and learned position encodings

  • @Galinator9000
    @Galinator9000 2 роки тому +2

    These videos are priceless, thank you!

  • @amirhosseinramazani757
    @amirhosseinramazani757 2 роки тому +3

    Your explanation was great! I got everything I wanted to know about positional embedding. thank you:)

    • @AICoffeeBreak
      @AICoffeeBreak  2 роки тому +2

      Awesome, thanks for the visit! ☺️

  • @amirpourmand
    @amirpourmand 2 роки тому +3

    Awesome. Thank you for making this great explanation. I highly appreciate it.

  • @saurabhramteke8511
    @saurabhramteke8511 Рік тому +2

    Hey, Great Explanation :). Love to see more videos.

  • @gopikrish999
    @gopikrish999 2 роки тому +4

    Thank you for the explanation! Can you please make a video on Positional information in Gated Positional Self Attention in ConViT paper?

  • @user-gk3ue1he4d
    @user-gk3ue1he4d 9 місяців тому +1

    Great work! Clear and deep explanation!

  • @adeepak7
    @adeepak7 3 місяці тому +1

    Very good explanation!! Thanks for this 🙏🙏

    • @AICoffeeBreak
      @AICoffeeBreak  3 місяці тому +1

      Thank You for your wonderful message!

  • @oleschmitter55
    @oleschmitter55 7 місяців тому +2

    So helpful! Thank you a lot!

  • @hedgehog1962
    @hedgehog1962 Рік тому +2

    Really Thank you! Your video is just amazing!

  • @CristianGarcia
    @CristianGarcia 2 роки тому +9

    Thanks Letitia! A vid on relative positional embeddings would be nice 😃
    Implementations seems a bit involved so I've never used them in my toy examples.

    • @CristianGarcia
      @CristianGarcia 2 роки тому +2

      Regarding this topic, I've seen positional embeddings sometimes being added and sometimes being concatenated with no real justification for either 😐

    • @AICoffeeBreak
      @AICoffeeBreak  2 роки тому +4

      Concatenating has the luxury of extra, exclusive dimensions dedicated to positional encoding with the upside of avoiding mixing up semantic and positional information. The downside is, you can have those extra dimensions only if you have capacity to spare.
      So adding the positional embeddings to initial vector representations saves some capacity by using it for both semantic and positional information with the danger of mixing these up if there is no careful tuning on this (for tuning, think about the division by 10000 in the sine formula in "attention is all you need").

  • @aloksharma4611
    @aloksharma4611 11 місяців тому +1

    Excellent explanation. Will certainly like to learn about other encodings in areas like image processing.

  • @EpicGamer-ux1tu
    @EpicGamer-ux1tu 2 роки тому +2

    Great video, many thanks!

  • @jayk253
    @jayk253 Рік тому +1

    Amazing explanation! Thank you so much !

  • @erikgoldman
    @erikgoldman Рік тому +2

    this helped me so much!! thank you!!!

  • @jfliu730
    @jfliu730 Рік тому

    best video about position emb i have ever heard

  • @antoniomajdandzic8462
    @antoniomajdandzic8462 2 роки тому +2

    love your explanations !!!

  • @andyandurkar7814
    @andyandurkar7814 Рік тому +2

    Just an amazing explanation ...

  • @anirudhthatipelli8765
    @anirudhthatipelli8765 Рік тому +1

    Thanks, this was so clear! Finally understood position embeddings!

  • @user-fg4pr4ct6g
    @user-fg4pr4ct6g 9 місяців тому +1

    Thanks, your videos helped the most

  • @jayktharwani9822
    @jayktharwani9822 Рік тому +1

    great explanation. really loved it. Thank you

  • @montgomerygole6703
    @montgomerygole6703 11 місяців тому +1

    Wow, thanks so much! This is so well explained!!

  • @sborkes
    @sborkes 2 роки тому +3

    I really enjoy your videos 😄!
    I would like a video about using transformers with time-series data.

  • @nitinkumarmittal4369
    @nitinkumarmittal4369 4 місяці тому +1

    Loved your explanation, thank you for this video!

  • @bingochipspass08
    @bingochipspass08 4 місяці тому +1

    What a lovely explanation & video!.. Thank you!

    • @AICoffeeBreak
      @AICoffeeBreak  4 місяці тому +2

      Glad you enjoyed it! Thanks for the visit and leaving a comment.

    • @bingochipspass08
      @bingochipspass08 4 місяці тому +1

      @@AICoffeeBreak Thank you again!.. subscribed!!

    • @AICoffeeBreak
      @AICoffeeBreak  4 місяці тому +2

      @@bingochipspass08 Oh, great, then I'll see you on future videos as well.

  • @pypypy4228
    @pypypy4228 Місяць тому +1

    This was awesome! I don't have a complete understanding but it definitely pushed me to the side of understanding. Did you make a video about relative positions?

    • @AICoffeeBreak
      @AICoffeeBreak  Місяць тому +2

      Yes, I did! ua-cam.com/video/DwaBQbqh5aE/v-deo.html

  • @user-ru4nb8tk6f
    @user-ru4nb8tk6f 9 місяців тому +1

    so helpful, appreciate it!

  • @zhangkin7896
    @zhangkin7896 2 роки тому +2

    Really great!

  • @richbowering3350
    @richbowering3350 Рік тому

    Best explanation I've seen - good work!

  • @machinelearning5964
    @machinelearning5964 Рік тому +1

    Cool explanation

  • @andybrice2711
    @andybrice2711 Місяць тому

    The positional encodings are not that weird when you think about them like the hands on a clock. They're a way of expressing a continuous, unlimited sequence, at multiple levels of precision, within a confined space. The tips of clock hands also trace out a sin and cosine pattern at various frequencies.

  • @arishali9248
    @arishali9248 Рік тому +1

    Beautiful explanation

  • @xv0047
    @xv0047 Рік тому +1

    Good explanation.

  • @avneetchugh
    @avneetchugh Рік тому +2

    Awesome, thanks!

  • @klammer75
    @klammer75 Рік тому +1

    This is an amazing explanation! Tku!!!🤓🥳🤩

  • @Nuwiz
    @Nuwiz 10 місяців тому +1

    Nice explanation!

  • @johannreiter1087
    @johannreiter1087 9 місяців тому +1

    Awesome video, thanks :)

  • @ylazerson
    @ylazerson 2 роки тому +2

    amazing video - rockin!