Wow. Thank you Mak!!! I was learning about this model and having no idea where to start. This video must have saved me weeks of reading materials and going through source codes!
Amazing work, I really love the DETR video series, your explanations are so clear and easy to understand! Is there any chance these latest videos will be also available in text format on your github page, like in the case of DAB-DETR and Deformable DETR?
I think this is a performance problem. Looking at "fat" transformer-based models like CODETR (arxiv.org/pdf/2211.12860), they don't seem to have any problem with APsmall. Perhaps it's just that looking at HD feature maps is expensive for a small model like rt detr?
Wow. Thank you Mak!!!
I was learning about this model and having no idea where to start. This video must have saved me weeks of reading materials and going through source codes!
Amazing work, I really love the DETR video series, your explanations are so clear and easy to understand!
Is there any chance these latest videos will be also available in text format on your github page, like in the case of DAB-DETR and Deformable DETR?
Certainly! Will get started on that
There we go: github.com/adensur/blog/tree/main/computer_vision_zero_to_hero/30_rt_detr
WIll do other videos to, in due time
Amazing work, can you please do video on vrwkv its very recent vision encoder with linear attention. Thanks a lot for your videos.
Sure thing! Never heard of this one before, will be interesting
It’s surprising that transformers are not very good at small object detection as self attention should help the decoder to find them.
I think this is a performance problem. Looking at "fat" transformer-based models like CODETR (arxiv.org/pdf/2211.12860), they don't seem to have any problem with APsmall. Perhaps it's just that looking at HD feature maps is expensive for a small model like rt detr?