Keep it up man, you're doing an amazing job. You have incredible production value. A small suggestion I'd say is to talk a tiny bit slower when discussing technical details. I totally understand the words you're saying, but since you use very information-rich language my brain can need a few more milliseconds to digest the meaning of each token. Not a huge issue though, you're doing great :)
Hey, I would love to know more about the way you compare self attention just like a feedforward layer which projects on a different space. Do let me know what resource I can see more on it
Can't appreciate it enough. Recently I cracked one interview and your videos helped me a lot. I was searching for your mail, but couldn't find it. However, what I was gonna request you is if can you make tutorials by that I don't mean tutorials from scratch, but rather if you have any plans to make videos on a capstone project basis. Anyway, I can see your channel shining, all the best. Thanks again. 😍
That's awesome dude! Thanks for your kind words and congrats on the interview! I generally don't make tutorials, and I have a long list of video ideas waiting in the backlog (i have a full time job so hard to manage time to make comprehensive tutorials), but I definitely plan to get into the space eventually. If you have any specific project idea you want to see covered, let me know in the comments.
@@avb_fj yeah sure. Actually i am going to work on reconstructing CT scan images from MRI images and also predicting treatment doses for glioma patients and i am planning to incorporate transformed based deep learning models. If you have any suggestions and guidance, it will be highly appreciated. Thanks again and all the best for your upcoming videos, hope to watch all of them.
WONDERFUL VIDEO!!! Btw do you have any resources that talk about the analogy of "WX+B" in Perceptron and "softmax(QK)V+X" in transformers, and how transformers is an adaptive learning framework? And what was the talk that you mentioned in this video?
I wish I could find which talk I learnt that in. I remember reading/watching it somewhere during my university days, and it sort of stuck with me. I had tried to find the resource back when I was working on this video, but unfortunately I couldn’t find it. There is surprisingly little resources online about this. Anyway here is a great paper that contains mathematical proofs of many important things about attention/transformers: arxiv.org/pdf/1912.10077
I do not really understand, when you say Adaptative in the way that a dense/weight layer is "fixed", what does that mean in practice? The dense/weight layer is with the same logic also "adaptative" when it sees input data due to back propagation changing the values inside of it.
Aha I see the point. Let me clarify. The adaptive nature I was referring to is when we are doing inferencing on an already trained neural network. Backprop is done only when we are training the network, but once they are trained, the weights of dense layers remain fixed and constant for each input. In Self Attention, the key, query and value neural networks also remain fixed, so each new input go through the same multiply-add ops to derive the K, Q, V… but these combine to generate a new weight matrix that produces the final output (as shown in video). Hope that clarifies it.
Part 1 - Neural Attention: ua-cam.com/video/frosrL1CEhw/v-deo.html
Part 3 - Transformers: ua-cam.com/video/0P6-6KhBmZM/v-deo.html
Very informative video. Thank you for the great step by step walkthrough❤
Keep it up man, you're doing an amazing job. You have incredible production value.
A small suggestion I'd say is to talk a tiny bit slower when discussing technical details. I totally understand the words you're saying, but since you use very information-rich language my brain can need a few more milliseconds to digest the meaning of each token. Not a huge issue though, you're doing great :)
Noted! Thanks for the kind words! 🙌🏼
Hey, I would love to know more about the way you compare self attention just like a feedforward layer which projects on a different space. Do let me know what resource I can see more on it
Thank you so much for making videos on transformer! Your explanations are very intuitive, one of the best I've ever watched!
Wow, thanks! Glad you are enjoying the videos!
It explained very well and easy to follow. I learned a lot, great work man!
Wonderful video, very well explained!
Can't appreciate it enough. Recently I cracked one interview and your videos helped me a lot. I was searching for your mail, but couldn't find it. However, what I was gonna request you is if can you make tutorials by that I don't mean tutorials from scratch, but rather if you have any plans to make videos on a capstone project basis. Anyway, I can see your channel shining, all the best. Thanks again.
😍
That's awesome dude! Thanks for your kind words and congrats on the interview! I generally don't make tutorials, and I have a long list of video ideas waiting in the backlog (i have a full time job so hard to manage time to make comprehensive tutorials), but I definitely plan to get into the space eventually. If you have any specific project idea you want to see covered, let me know in the comments.
@@avb_fj yeah sure. Actually i am going to work on reconstructing CT scan images from MRI images and also predicting treatment doses for glioma patients and i am planning to incorporate transformed based deep learning models. If you have any suggestions and guidance, it will be highly appreciated. Thanks again and all the best for your upcoming videos, hope to watch all of them.
Great content and simple explanation!
Glad you liked it!
WONDERFUL VIDEO!!! Btw do you have any resources that talk about the analogy of "WX+B" in Perceptron and "softmax(QK)V+X" in transformers, and how transformers is an adaptive learning framework? And what was the talk that you mentioned in this video?
I wish I could find which talk I learnt that in. I remember reading/watching it somewhere during my university days, and it sort of stuck with me. I had tried to find the resource back when I was working on this video, but unfortunately I couldn’t find it. There is surprisingly little resources online about this.
Anyway here is a great paper that contains mathematical proofs of many important things about attention/transformers:
arxiv.org/pdf/1912.10077
Great content!
Bro Great... Thank you...
Glad you liked it!
LEGEND !
I do not really understand, when you say Adaptative in the way that a dense/weight layer is "fixed", what does that mean in practice? The dense/weight layer is with the same logic also "adaptative" when it sees input data due to back propagation changing the values inside of it.
Aha I see the point. Let me clarify.
The adaptive nature I was referring to is when we are doing inferencing on an already trained neural network. Backprop is done only when we are training the network, but once they are trained, the weights of dense layers remain fixed and constant for each input.
In Self Attention, the key, query and value neural networks also remain fixed, so each new input go through the same multiply-add ops to derive the K, Q, V… but these combine to generate a new weight matrix that produces the final output (as shown in video). Hope that clarifies it.
You are awesome ❤
Haha you are awesome too!