Attention mechanism: Overview

Поділитися
Вставка
  • Опубліковано 23 сер 2024
  • This video introduces you to the attention mechanism, a powerful technique that allows neural networks to focus on specific parts of an input sequence. Attention is used to improve the performance of a variety of machine learning tasks, including machine translation, text summarization, and question answering.
    Enroll in this course on Google Cloud Skills Boost → goo.gle/436ZFPR
    View the Generative AI Learning path playlist → goo.gle/LearnG...
    Subscribe to Google Cloud Tech → goo.gle/Google...

КОМЕНТАРІ • 68

  • @googlecloudtech
    @googlecloudtech  Рік тому +3

    Subscribe to Google Cloud Tech → goo.gle/GoogleCloudTech

  • @PapiJack
    @PapiJack 5 місяців тому +17

    Great video. One tip: Include some sort of pointer so you can direct the attention of the viewer towords a particular part of the slide. It helps following your explantion of the information dense slides.

  • @alexanderscoutov
    @alexanderscoutov 9 місяців тому +27

    4:08 "H_b"... I could not find H_b here :-( I don't understand what are the H_d7 entities in the diagram. So confusing.

    • @aqgi7
      @aqgi7 5 місяців тому +1

      I think she meant H_d, with d for decoder. H_d7 would be the 7th hidden state produced by decoder. But not clear why H_d7 appears three times (or more).

  • @user-tq2og8wo7m
    @user-tq2og8wo7m 7 місяців тому +12

    Besides some mistakes. The invertion mechanism is not clear here. Where in the final slide is it shown? All I see is a correct order of words. Would be great to visualize where and how the ordering occurs.

  • @llewsub
    @llewsub Рік тому +30

    confusing

  • @cy12343
    @cy12343 Рік тому +61

    So confusing...😵‍💫

    • @alileo1578
      @alileo1578 9 місяців тому

      Yeah, many many concepts depend on the neural networks and deducing parameters with back-propagation

    • @JohnDoe-pq8yw
      @JohnDoe-pq8yw 3 місяці тому

      This takes place after the base model is trained and there are fine tuning training mechanisms as well, so this is not confusing at all, it is part of the information about LLM's.

  • @KumR
    @KumR 2 місяці тому

    Felt like being explained in person. Thanks a lot.

  • @for-ever-22
    @for-ever-22 5 місяців тому +1

    Thanks to the creator. Will be coming back to this video which is amazing and well detailed

  • @manjz7hm
    @manjz7hm 8 місяців тому +15

    Google should give attention to simplify the content to public , couldn't completely get the concept .

  • @samuelqueiroz156
    @samuelqueiroz156 11 місяців тому +16

    Still not clear for me. How does the network know which hidden state should have the higher score?

    • @unknown-otter
      @unknown-otter 10 місяців тому +10

      I guess the answer you were looking for is the following: the same as the network knows how to classify digits, for example. It learns it by optimizing a loss function through backprop. So, attention is not a magic thing that connects inputs with outputs but just a mechanism for a network to learn what it needs to attend to.
      One cool thing is that you can think of attention head as a fully connected layer with weights that can change based on the input. While a normal fully connected layer has fixed weights and will process any data with them, attention head first calculates what would is the most beneficial in that input data and then run it through a fully connected layer!

  • @devloper_hs
    @devloper_hs 2 місяці тому +1

    for those refering to alpha not present its a actually. Its just some constant that when multiplied by hidden vector produces attention.

  • @aakidatta
    @aakidatta 9 місяців тому +8

    I watched it almost 4 times and still not able figure out. Where is Alpha in the slide 3:58?

  • @Udayanverma
    @Udayanverma 10 місяців тому +6

    where is alpha in this whole diagram! why do you guys make it more difficult than it is.

  • @abhiksaha3451
    @abhiksaha3451 9 місяців тому +7

    Is this video made by a generative AI 😂?

  • @richardglady3009
    @richardglady3009 11 місяців тому +8

    Very complex concepts that were well presented. I may not understand everything (I didn’t-but that is a reflection of my ignorance), the overall picture of what occurred is clear. Thank you.

  • @KiranMundy
    @KiranMundy 8 місяців тому +8

    Very helpful video, but I got confused at one point and am hoping you can help clarify some points.
    At timestamp 4:14: You talk of "alpha" representing the attention weight at each time step. I don't see any "alpha" onscreen, so am a bit confused. Is "alpha" a weight that will get adjusted with training and indicates how important that particular word is at time step 1 in the decoding process?
    I'm also not completely clear on the difference between hidden state and weights, could you explain this?
    It would help me if while explaining you could point to the value you're referring to onscreen and if it were possible to clarify that when you talk about time step, you are referring to the first decoder time step (is that right?)

    • @NetworkDirection
      @NetworkDirection 8 місяців тому +3

      I assume by 'alpha' she means 'a'

    • @m1erva
      @m1erva 6 місяців тому

      hidden state is activation function for each word

  • @ipurelike
    @ipurelike 11 місяців тому +12

    too high leveled, not enough detail... where are the dislikes?

  • @saurabhmahra4084
    @saurabhmahra4084 10 місяців тому +62

    You are the example why everyone should not start making youtube videos. You literally made a simple topic look complex.

    • @jiadong7873
      @jiadong7873 5 місяців тому +1

      agree

    • @Dom-zy1qy
      @Dom-zy1qy 5 місяців тому +16

      Disagree heavily. For me, this was more palletable than other videos I'd seen on the subject.
      Don't see the point for needlessly harsh criticism.

    • @Omsip123
      @Omsip123 2 місяці тому +1

      You are the example why commenting should be disabled

    • @Omsip123
      @Omsip123 2 місяці тому +2

      Besides, you probably meant to write "not everyone should" instead of "everyone should not" but that might be too complex too.

    • @baluandhavarapu
      @baluandhavarapu Місяць тому +2

      That's an incredibly rude thing to say. And I disagree

  • @changliu7553
    @changliu7553 2 дні тому

    Why you go from "the cat eat the mouse" to "black cat eat the mouse"? is this a mistake? thanks,

  • @user-tq2og8wo7m
    @user-tq2og8wo7m 7 місяців тому +4

    Beside some mistakes, it is still not clear to me how the inverting mechanism operates. All i can observe is an already correctly ordered sequence of words. Would be great to visualize where and how the ordering occurs.

  • @BR-hi6yt
    @BR-hi6yt Рік тому +5

    Thanks for the hidden states, very clear.

  • @user-wr4yl7tx3w
    @user-wr4yl7tx3w 9 місяців тому +2

    I think you are introducing an interesting angle that hasn’t been presented before. Thanks.

  • @user-wr4yl7tx3w
    @user-wr4yl7tx3w 9 місяців тому +4

    Where’s the alpha on the slide?

  • @yashvander
    @yashvander Рік тому +11

    Just a quick question, I'm not able to wrap my head around how encoder gets the decoder hidden state annotated by Hd?

    • @kartikthakur-ql9yn
      @kartikthakur-ql9yn Рік тому +1

      Encoder doesn't it decoder hidden states .. it's opposite

    • @MrAmgadHasan
      @MrAmgadHasan Рік тому +1

      What happens is: The encoder encodes the input and passes it to the decoder. For each time step in the output, the decoder gets the hidden states of all time steps concatenated as a matrix. It then calculates the attention weights.

    • @thejaniyapa3660
      @thejaniyapa3660 Рік тому

      @@MrAmgadHasan Thanks for the explanation. Then how does the Encoder hidden states said to be associated with each word(3.26)? It should be part of sentence before nth word + nth word

  • @arkaganguly1
    @arkaganguly1 2 місяці тому +2

    Confusing video. Very difficult to follow

  • @mushroom4533
    @mushroom4533 4 місяці тому +1

    Hard to understand the final slide....

  • @ChargedPulsar
    @ChargedPulsar 6 місяців тому +2

    I think these tutorials are thrown in the internet to further slow down and confuse people. The video explains nothing. It will only make sense to people who already know attention mechanism.

  • @dariovicenzo8139
    @dariovicenzo8139 7 місяців тому +3

    just a waste of time and memory for youtube servers

  • @iGhostr
    @iGhostr 8 місяців тому +2

    confusing😢

  • @kartikpodugu
    @kartikpodugu 7 місяців тому

    I think this is explanation for general attention mechanism and not attention in transformers.

  • @user-ez9ex8hx6v
    @user-ez9ex8hx6v 8 місяців тому +1

    Ok got it watched thank you yeah

  • @gg-ke6mw
    @gg-ke6mw 8 місяців тому

    this is so confusing.
    Why are Google courses so difficult to understand?

  • @franktaylor7978
    @franktaylor7978 26 днів тому

    it should be "The black cat .."

  • @thinkmath4270
    @thinkmath4270 28 днів тому

    It started good but fizzled out as it progresses. unnecessarily confusing. anyways good attempt.

  • @muskduh
    @muskduh Рік тому

    thanks

  • @HelenaCrawford-q6f
    @HelenaCrawford-q6f День тому

    Smith Linda Thompson Michelle Perez Ronald

  • @user-ez9ex8hx6v
    @user-ez9ex8hx6v 8 місяців тому

    Yeah okay watched

  • @tamurchoudary3452
    @tamurchoudary3452 4 місяці тому +1

    Regurgitating spoon fed knowledge … google has fallen behind.

  • @user-su9pg1jo4x
    @user-su9pg1jo4x 7 місяців тому

    4:04 there is no alpha but an "a" in the sum on the left.

  • @interweb3401
    @interweb3401 20 днів тому

    Not clear!

  • @primeentelechy
    @primeentelechy Місяць тому

    I'm sorry, but this video is complete rubbish. Incoherent explanation that is unlikely to help anyone. Plus a number of little errors that just should not be there in such a short video - not to mention one by one of the world's most prominent tech companies. Even the example "English sentence" chosen isn't actually a valid English sentence 🤦‍♂️

  • @julius3005
    @julius3005 6 місяців тому

    La explicación es pobre, esconden gran cantidad de procesos.

  • @abhyutaichou8322
    @abhyutaichou8322 Рік тому

  • @kislaya7239
    @kislaya7239 6 місяців тому

    This is a poor video for someone who does not know this topic.

  • @yahavx
    @yahavx Рік тому +15

    confusing

    • @Shmancyfancy536
      @Shmancyfancy536 5 місяців тому

      You’re not gonna learn it in 5 min

  • @user-ep1tz8gv8f
    @user-ep1tz8gv8f 11 місяців тому +5

    confusing

  • @sergeykurk
    @sergeykurk 10 місяців тому +3

    confusing

  • @vedansharts8274
    @vedansharts8274 Рік тому +5

    confusing

  • @dimitrisparaschakis3280
    @dimitrisparaschakis3280 11 місяців тому +3

    confusing

  • @VikasDubeyg
    @VikasDubeyg Рік тому +9

    confusing