VirTex: Learning Visual Representations from Textual Annotations (Paper Explained)

DETR: End-to-End Object Detection with Transformers (Paper Explained)

Object-Centric Learning with Slot Attention (Paper Explained)

How Strong Is Tape?

人是不能做到吗？#火影忍者 #家人 #佐助

Як азовська піхота прийняла групу розвідки вс рф? Зізнання окупантів і кадри з GoPro

CornerNet: Detecting Objects as Paired Keypoints (Paper Explained)

Yannic Kilcher

Переглядів 12 701

Додати в
- Мій плейлист
- Переглянути пізніше
Поділитися

Поділитися

Вставка

Розмір відео:

Показувати елементи керування програвачем

Автоматичне відтворення

Автоповтор

Опубліковано 23 гру 2024

КОМЕНТАРІ • 23

@teslaonly2136 4 роки тому ⁺²⁸
I almost finished 1/3 of your uploaded videos. It feels like someone is reading papers with me. The feeling is great! Thanks so much Yannic. Keep it up!
@AediWang 4 роки тому ⁺⁶
Paper from 1 year ago is now "a bit old". Just amazing how fast the field moves.
@jasdeepsinghgrover2470 4 роки тому ⁺²
This shows that there still so much left in DL to be done. One thing I see is that it seems that every point is making a prediction. According to research by Uber, there are location sensitive CNNs which can also be tried in these cases. Would love to see something like a combination of the two ideas.
@ruanjiayang 2 роки тому
Corner pooling is a really smart way to largely increase perception field, sort of like deformable convolution. But DETR seems to generally solve the problem in object detection since it make use of the full image as perception field.
@herp_derpingson 4 роки тому ⁺²
These embeddings get more interesting the more you think about it. It is essentially two neural networks inventing a language to talk to each other. If we can make this interpretable, it might open up a lot of possibilities.
@0lec817 4 роки тому ⁺²
Any specific reason you went with this approach over any of the other very similar boxless/keypoint detection approaches like CenterNet ("Objects as Points") or CSPNet ("Center and Scale Prediction") that not even require laborious embeddings while performing equally or even better? Or the "CenterNet: Keypoint Triplets for Object Detection" paper that basically is the combination of the CornerNet and the Center approaches.
I mean they basically all do the same (keypoint detection) which in my opinion is quite different to what you suggested with the cross attention matrix from the attention heads?
@YannicKilcher 4 роки тому
Yes this paper didn't turn out to be exactly what I hoped, but still interesting. I chose it just because it sounded like fun.
@rickywong8149 4 роки тому ⁺²
I really like your content, can you make an explanation of centernet : objects as point i dont really quite get the idea of its loss function
@awangprajaanugerah8231 4 місяці тому
How can i find the research paper like you do
@efedoganay07 4 роки тому ⁺¹
So, does network predict a tensor of WxHxC for heatmap branch ?
@YannicKilcher 4 роки тому
Yes, one for top left and one for bottom right
@austinmw89 4 роки тому
Hey, have you done videos on the older but still heavily used architectures Faster RCNN, SSD, YOLO3, RetinaNet?
@AIwithAniket 4 роки тому ⁺¹
Nicely explained
@siddharthbhargava4857 4 роки тому ⁺²
Thank you for the explanation.
@kwonohhyeok5016 4 роки тому
first subscription in my life. thanks for your video
@larrybird3729 4 роки тому ⁺³
The person who put the thumbs down has Oppositional defiant disorder (ODD)🤣
@l33tc0d3 4 роки тому ⁺¹
My intuition is that using paired keypoints is cheaper but should be more inaccurate over anchor boxes. For example, It is not clear what the paper does when there are overlapping objects that share the same keypoint (e.g. top-left). Using keypoints is interesting nevertheless. I found another recent paper that just uses keypoints inside transformer to replace RGB tracking and matching pipeline for pose tracking task: arxiv.org/pdf/1912.02323.pdf
@SadatAShaik 4 роки тому ⁺²
Your videos are great!! Keep them up :) Why do you think they decided to go with these push and pull losses instead of using a triplet loss? Seems almost identical to the push + pull losses they propose
@YannicKilcher 4 роки тому ⁺³
No idea, but it's either the first thing they tried, or they tried a bunch of things and this worked the best.
@LaoZhao11 4 роки тому ⁺¹
now Taiwan (GMT+8) is 11 PM
yt: it's time reading a paper
@hanbrianlee Рік тому
embeddings of 1 dimension, not 1 number. 1 number wouldn't work lol
@SethuIyer95 4 роки тому ⁺²
first

Наступне

Автоматичне відтворення

VirTex: Learning Visual Representations from Textual Annotations (Paper Explained)

VirTex: Learning Visual Representations from Textual Annotations (Paper Explained)

DETR: End-to-End Object Detection with Transformers (Paper Explained)

DETR: End-to-End Object Detection with Transformers (Paper Explained)

Object-Centric Learning with Slot Attention (Paper Explained)

Object-Centric Learning with Slot Attention (Paper Explained)

How Strong Is Tape?

How Strong Is Tape?

人是不能做到吗？#火影忍者 #家人 #佐助

人是不能做到吗？#火影忍者 #家人 #佐助

Як азовська піхота прийняла групу розвідки вс рф? Зізнання окупантів і кадри з GoPro

Як азовська піхота прийняла групу розвідки вс рф? Зізнання окупантів і кадри з GoPro

REAL or FAKE? #beatbox #tiktok

REAL or FAKE? #beatbox #tiktok

Hallucination-Free? Assessing the Reliability of Leading AI Legal Research Tools (Paper Explained)

Hallucination-Free? Assessing the Reliability of Leading AI Legal Research Tools (Paper Explained)

[Classic] Deep Residual Learning for Image Recognition (Paper Explained)

[Classic] Deep Residual Learning for Image Recognition (Paper Explained)

[Code] How to use Facebook's DETR object detection algorithm in Python (Full Tutorial)

[Code] How to use Facebook's DETR object detection algorithm in Python (Full Tutorial)

Scalable MatMul-free Language Modeling (Paper Explained)

Scalable MatMul-free Language Modeling (Paper Explained)

SpineNet: Learning Scale-Permuted Backbone for Recognition and Localization (Paper Explained)

SpineNet: Learning Scale-Permuted Backbone for Recognition and Localization (Paper Explained)

How I Read a Paper: Facebook's DETR (Video Tutorial)

How I Read a Paper: Facebook's DETR (Video Tutorial)

SIREN: Implicit Neural Representations with Periodic Activation Functions (Paper Explained)

SIREN: Implicit Neural Representations with Periodic Activation Functions (Paper Explained)

RepNet: Counting Out Time - Class Agnostic Video Repetition Counting in the Wild (Paper Explained)

RepNet: Counting Out Time - Class Agnostic Video Repetition Counting in the Wild (Paper Explained)

BYOL: Bootstrap Your Own Latent: A New Approach to Self-Supervised Learning (Paper Explained)

BYOL: Bootstrap Your Own Latent: A New Approach to Self-Supervised Learning (Paper Explained)

Ветеран війни отримав гроші на житло

Ветеран війни отримав гроші на житло

Пилот обманул смерть ракета пролетела рядом с ним #shorts

Пилот обманул смерть ракета пролетела рядом с ним #shorts

THE AMAZING DIGITAL CIRCUS - Ep 4: Fast Food Masquerade

THE AMAZING DIGITAL CIRCUS - Ep 4: Fast Food Masquerade

"ХИТРЕЦ": Трамп РОЗЛЮТИВ Скабєєву / Оля ЛИЄ ЯДОМ #shorts

"ХИТРЕЦ": Трамп РОЗЛЮТИВ Скабєєву / Оля ЛИЄ ЯДОМ #shorts

СПОРИМ ТЫ НЕ ЗНАЕШЬ ТРИ СЛОВА НА БУКВУ О? #shortsvideo #юмор #катяклон #comedy #прикол #мамадочка

СПОРИМ ТЫ НЕ ЗНАЕШЬ ТРИ СЛОВА НА БУКВУ О? #shortsvideo #юмор #катяклон #comedy #прикол #мамадочка

«Просив пробачення, що не уберіг Діму» - історія братів Василя Репчука і Дмитра Мурару #shorts

«Просив пробачення, що не уберіг Діму» — історія братів Василя Репчука і Дмитра Мурару #shorts

1% vs 100% #beatbox #tiktok

1% vs 100% #beatbox #tiktok

Lp. Сердце Вселенной #60 РОЖДЕНИЕ ЛОЛОЛОШКИ [Финал] • Майнкрафт

Lp. Сердце Вселенной #60 РОЖДЕНИЕ ЛОЛОЛОШКИ [Финал] • Майнкрафт