transformer neural network simply explained
Вставка
- Опубліковано 10 чер 2024
- #transformer #neuralnetwork #nlp #machinelearning
Hello, in this video I share a simple step by step explanation on how Transformer Neural Network work.
Timestamps
0:00 - Intro
0:45 - Understanding attention technique
1:40 - Problem with sequence networks
1:57 - Motivation for Transformer networks
3:22 - Positional Encoding
4:32 - Vanilla Attention
5:22 - Self Attention
6:52 - Multi-Head Attention
7:50 - Residual Connection & Normalization
9:53 - Masked Multi-Head Attention
Amazing visualization on this topic
3:00 1 Input Embedding
3:40 2 Positional Encoding
4:08 3 Encoder Layer
4:30 3a. Multi-head Attention
7:51 3.a.ii Residual Addition, Layer Normalization & Pointwise Feed-Forward
9:28 4 Output Embedding & Positional Encoding
9:33 5 Decoder Layer
10:07 5.a. Masked Multi-head Attention
It finally cleared up what attention is for me! Thank you so much!!
I'm so glad!
Great explaination.
Thank you for sharing!
Loved your explanation. What an amazing mix of clear presentation and insightful visualization! Please keep it up.
Thank you kindly!
You teach very intuitively ❤️... Please make more videos on Deep leaning concepts , people definitely like.. : )
Thank you, I will
this is soooo helpful~! Just love it
Glad it was helpful!
Nice explanation
Well done eniola, totally enjoyed your video.!!
Thank you so much!
you won a subscribe!
Perfect thank you so much
You're welcome 😊
😭 i’m subscribing directly 😌
Thanks
Hi Great explanation! Which software did you use for your animation and presentation?
Very nice explained ma'am can you please make more videos on ttn to get more clear
So crispy and precise.
When you're stacking multiple decoders, will we stack masked multi head attention too?
yes we do, the masked multi head attention unit is a part of the decoder.
Hello! wonderful explanation and video! I had one question, when going from the encoder layer to the embedding layer, what is the input to the output embedding layer? is it still "I am a student"? How are the French words incorporated?
I guess like for the output embedding, would the look up table be the learned french translations?
what is recived in the output embeding?
ok How does query ,key and values calculated?
By the Query ,Key and Values matrices which is randomly assigned in intial state.
I couldn't get the idea of masked multi-heead attention, any comment i would be appreciated.