Swin Transformer - Paper Explained
Вставка
- Опубліковано 12 чер 2024
- Brief explanation of swin transformer paper.
Paper link: arxiv.org/abs/2103.14030
Table of Content:
00:00 Intro
00:13 Patch Embedding
02:56 Swin transformer block
03:57 W-MSA
05:14 SW-MSA
08:56 Masked MSA implementation
14:58 Patch Merging
16:12 stages
18:28 Image classification result
19:12 Relative position bias
Icon made by Freepik from flaticon.com
By far one of the best + complete, SWIN transformer explanations on the entire Internet.
Thanks!
@@soroushmehraban Hi sir, could you also explain the FasterViT and GCViT paper...
Thorough! Very comprehensible, thank you.
Really informative, helped me lot to understand many concepts here. Keep up the good work
Thanks! I’ll try my best.
Thanks for the good explanation!
That's The Most Illustrative Video Of Swin-Transformers on The Internet!
Glad you enjoyed it 😃
@@soroushmehraban yes abs thx so much, although I Have a Quick Question More Related to PyTorch actually which is in min 12:49 in line 239 in the code 1st what does -1 here means and what does it do exactly with the tensor 2nd from where we get [4,16] the 4 here from where we got it cuz its not mentioned in the reshaping. Thanks in advance.
Very well explained, thank you Soroush.
Glad you liked it
Great video! Thanks
Thanks for the feedback 🙂
I enjoy very much
Amazing video !
Thanks!
You deserve more likes and subscribers
Thanks man🙂 appreciated
perfect description.
Glad it was helpful 🙂
Why channel increasees c to 4c after merging
Because we downsample the width by 2 and height by 2. That means we have 4x downsampling in spatial resolution that we give it to the channel dimension. It's just a simple tensor reshaping.
For example 10x10x2 = 200.
After merging it's 5x5x8 = 200.