Best explanation of multi-head attention i have attended to! I already had a reasonable intuition but still gathered so much more, massive respect to your work 🙏
Amazing explanation... i did not come accross the beautiful and easy explanation of transformers that seems extremely difficult... this channel deserves millions of subscribers 🎉
Would rearranging by heads before splitting into q, k , v cause any logical difference. Just means fewer lines of code, and operations, but mostly was just curious to verify as it felt same to me.
*Github Code* - github.com/explainingai-code/VIT-Pytorch
*Patch Embedding* - Vision Transformer (Part One) - ua-cam.com/video/lBicvB4iyYU/v-deo.html
*Attention* in Vision Transformer (Part Two) - ua-cam.com/video/zT_el_cjiJw/v-deo.html
*Implementing Vision Transformer* (Part Three) - ua-cam.com/video/G6_IA5vKXRI/v-deo.html
Best explanation of multi-head attention i have attended to! I already had a reasonable intuition but still gathered so much more, massive respect to your work 🙏
Thank you! Really glad that it was of any help.
Amazing explanation... i did not come accross the beautiful and easy explanation of transformers that seems extremely difficult... this channel deserves millions of subscribers 🎉
Thank you for the kind words :)
Sir Can you explain dual attention vision transformers (Davit)please
Would rearranging by heads before splitting into q, k , v cause any logical difference. Just means fewer lines of code, and operations, but mostly was just curious to verify as it felt same to me.
@sladewinter Yes, I agree with you, both seem the same to me as well.
Great content! This is helping a lot!! Keep it up :)
Thank you:)
Helping, much appreciated. Sir how about self attention in image context
Thank you!. I didn't get what exactly you mean by self attention in image context. Could you clarify a bit.