Swin Transformer - Paper Explained

Attention in transformers, visually explained | DL6

Vision Transformer and its Applications

Тайское мороженое в Калининграде

Что выбрать Вике айфон или таба лапку? SchoolBoy Runaway

Swin Transformer: Hierarchical Vision Transformer using Shifted Windows (paper illustrated)

AI Bites

Переглядів 29 762

Додати в
- Мій плейлист
- Переглянути пізніше
Поділитися

Поділитися

Вставка

Розмір відео:

Показувати елементи керування програвачем

Автоматичне відтворення

Автоповтор

Опубліковано 17 гру 2024

КОМЕНТАРІ • 63

@phattailam9814 Рік тому ⁺¹
Thank you so much for the explanation!
@mahmoudimus 10 місяців тому ⁺¹
Great explanation. Love the music + the voice :)
@AIBites 10 місяців тому
Thanks. Glad you liked it!
@MangalisoMngomezulu-y3b 8 місяців тому ⁺¹
This is brilliant!
@AIBites 8 місяців тому
Thanks 👍
@kalluriramakrishna5732 2 роки тому ⁺¹
Thank you for your fabulous Explanation
@tonywang7933 8 місяців тому ⁺¹
Thank you!! So nicely explained
@AIBites 8 місяців тому
You're welcome. So would you like to see more of papers explained or would you like more of coding videos?
@tensing2009 2 роки тому
Great Video!
Thanks for making it! :)
@muhammadsalmanali1066 3 роки тому
Thank you so much for the explanation. Please keep the videos coming.
@AIBites 3 роки тому ⁺¹
Sure will do!
@suke933 2 роки тому ⁺³
Thanks for the video dear AI Bites. I was struggling to understand the SWIN architecture. It was very easily elaborated up to the point, but I would like to ask on "the motivation for different C value selection". Why is it important? If you would convey, it would further give more meaningful understanding to me.
@robosergTV 6 місяців тому ⁺¹
huh? ViT was the first backbone Trasnformer arch for vision, not swin
@AIBites 3 місяці тому
awesome spot. And thanks for this info.
@deadbeat_genius_daydreamer Рік тому
This is seriously underrated, I enjoyed this visual approach, Thanks and regards for your efforts to make this explanation. Cheers🎊👍
@AIBites Рік тому
Thank you so much Harshad! 😊
@JagannathanK-y5e Рік тому ⁺¹
Great explanation
@AIBites Рік тому
Thanks!
@JC-ru4bp 3 роки тому ⁺¹
Very clear explanation of the paper idea, thanks.
@AIBites 3 роки тому
very encouraging to keep making videos :)
@JC-ru4bp 3 роки тому
@@AIBites Keep up, man,
@manub.n2451 2 роки тому ⁺¹
Thank you so much
@sanjeetpatil1249 2 роки тому
Can you kindly explain this line in the paper, related to the patch merging layer, "The first patch merging layer concatenates the
features of each group of 2 × 2 neighboring patches, and applies a linear layer on the 4C-dimensional concatenated
features".
Thank you for the video
@muhammadwaseem_ Рік тому ⁺¹
Good explanation
@TheMomentumhd 2 роки тому
You think these swin transformers would be usefull in real time object detection? (are they fast enough)?
@harutmargaryan9980 3 роки тому
Thank you, well done!
@triminh3849 3 роки тому
great video with excellent visualization, thanks a lot
@AIBites 3 роки тому
Glad you like it! :)
@anhminhtran7609 3 роки тому
Can you civer a bit more on the using Swin for object detection please?
@EngRiadAlmadani 3 роки тому ⁺²
thanks for this great video just one question why we used linear layer in patch merging while we can reshaping the input patches directly using reshape method ???
@AIBites 3 роки тому ⁺²
Great question. One thing I can think of is efficiency. I believe reshape is also challenging to propagate gradients backwards.
@Deshwal.mahesh 2 роки тому ⁺¹
Maybe thy're trying to make the model learn how to merge with knowledge? Just like solving a graphical puzzle?
@suke933 2 роки тому
@@AIBites Can we use the convolution within this scenario?
@arpitaingermany 2 роки тому ⁺¹
Thank you for illustrating this architecture. Can you make videos more on segmentation algorithms which are being used now a days please. Thanks.
@AIBites 2 роки тому ⁺²
Sure. Will plan to make one on SegFormers.
@arpitaingermany 2 роки тому
@@AIBites cool ❤️
And thanks for this presentation
@saeedataei269 2 роки тому ⁺¹
Thanks for the explanation. plz review more SOTA papers.
@AIBites 2 роки тому ⁺¹
Sure will do Saeed! Thx. 🙂
@jialima8298 3 роки тому
Love the voice!
@anonymous-random 3 роки тому
The video is awesome! Thanks a lot!
@AIBites 3 роки тому
Glad you liked it!
@parveenkaur2747 3 роки тому ⁺¹
Very informative video!
@AIBites 3 роки тому
Thanks! Glad you liked it.
@knowhowww 3 роки тому
Thank you for the great effort.
@AIBites 3 роки тому
My pleasure!
@kashishbansal2651 3 роки тому
AMAZING EXPLANATION!
@taoufiqelfilali2224 3 роки тому
great exlplanation, thank you
@AIBites 3 роки тому
Thanks for your postive comment! :)
@harshkumaragarwal8326 3 роки тому
great work, thanks :)
@rybdenis 3 роки тому ⁺¹
cool, thank you
@keroldjoumessi 3 роки тому ⁺¹
Thanks for the video. It was very awesome and easy to follow. Therefore even if the Windows architecture reduces the complexity to compute the self-attention, I think we still have this computational issue for the overall image and the attention becomes locally as in CNNs instead of globally like in RNN. Anyway thanks for your explaination
@readera84 3 роки тому ⁺¹
How you are saying such complex things so easily 😫 I couldn't even understand what he said 🤕
@keroldjoumessi9597 3 роки тому
@@readera84 what don't you understand? maybe I can give you a hand
@readera84 3 роки тому
@@keroldjoumessi9597 Windows shifting diagonally...an you make it more clear it to me
@garyhuntress6871 3 роки тому ⁺¹
Excellent review, thanks. I've subscribed for future papers! Do you use manim for your animations?
@AIBites 3 роки тому
Hi Gary, Thanks for your comments! In some places I use manim but not always. :)
@rajatayyab7737 3 роки тому ⁺¹
next should Dynamic Head: Unifying Object Detection Heads with Attentions
@rybdenis 3 роки тому
agreed
@AIBites 3 роки тому
Thanks Raja for pointing out. We will try to prioritise the paper at some point.
@peddisaivivek6676 2 роки тому
Great video. But can you refrain from putting the music in the background while explaining. It's a little distracting when viewing it at higher speed.
@AIBites 2 роки тому
Sure will take it on board when making the future ones 👍
@nguyenanhnguyen7658 3 роки тому
NLP, you have 100,000 words at most to permute and train with. With images? Well. ViT with 400m images can hardly manage to match ImageNet :)

Наступне

Автоматичне відтворення

Swin Transformer - Paper Explained

Swin Transformer - Paper Explained

Attention in transformers, visually explained | DL6

Attention in transformers, visually explained | DL6

Vision Transformer and its Applications

Vision Transformer and its Applications

Тайское мороженое в Калининграде

Тайское мороженое в Калининграде

Что выбрать Вике айфон или таба лапку? SchoolBoy Runaway

Что выбрать Вике айфон или таба лапку? SchoolBoy Runaway

How to treat Acne💉

How to treat Acne💉

Swin Transformer: Hierarchical Vision Transformer using Shifted Windows

Swin Transformer: Hierarchical Vision Transformer using Shifted Windows

Vision Transformer Quick Guide - Theory and Code in (almost) 15 min

Vision Transformer Quick Guide - Theory and Code in (almost) 15 min

Swin Transformer paper animated and explained

Swin Transformer paper animated and explained

Vision Transformer (ViT) - An Image is Worth 16x16 Words: Transformers for Image Recognition

Vision Transformer (ViT) - An Image is Worth 16x16 Words: Transformers for Image Recognition

Will Transformers Replace CNNs in Computer Vision? + NVIDIA GTC Giveaway

Will Transformers Replace CNNs in Computer Vision? + NVIDIA GTC Giveaway

Vision Transformer Basics

Vision Transformer Basics

[Paper Review] Swin Transformer: Hierarchical Vision Transformer using Shifted Windows

[Paper Review] Swin Transformer: Hierarchical Vision Transformer using Shifted Windows

DINO: Emerging Properties in Self-Supervised Vision Transformers (Facebook AI Research Explained)

DINO: Emerging Properties in Self-Supervised Vision Transformers (Facebook AI Research Explained)

An Image is Worth 16x16 Words: Transformers for Image Recognition at Scale (Paper Explained)

An Image is Worth 16x16 Words: Transformers for Image Recognition at Scale (Paper Explained)

1% vs 100% #beatbox #tiktok

1% vs 100% #beatbox #tiktok

Что будет если украсть в магазине шоколадку 🍫

Что будет если украсть в магазине шоколадку 🍫

до конца, там самая счастливая табалапка🐾🐾 #тикток #табалапка

до конца, там самая счастливая табалапка🐾🐾 #тикток #табалапка

МІША ЛЕБІГА і АНДРІЙ ЛУЗАН в СРАЧІ #32

МІША ЛЕБІГА і АНДРІЙ ЛУЗАН в СРАЧІ #32

Они Скупали ВСЁ Серебро Мира и вот ЧТО Было Дальше! #shorts

Они Скупали ВСЁ Серебро Мира и вот ЧТО Было Дальше! #shorts

Тайское мороженое в Калининграде

Тайское мороженое в Калининграде

ШАЛОСТЬ (смешное видео, приколы, юмор, поржать)

ШАЛОСТЬ (смешное видео, приколы, юмор, поржать)

НА ЦЕ можна дивитись ВІЧНО! Такої ПАЛКОЇ зустрічі НІХТО НЕ ЧЕКАВ

НА ЦЕ можна дивитись ВІЧНО! Такої ПАЛКОЇ зустрічі НІХТО НЕ ЧЕКАВ