DINO: Emerging Properties in Self-Supervised Vision Transformers (Facebook AI Research Explained)

Vision Transformer Basics

Vision Transformers (ViT) Explained + Fine-tuning in Python

NO NO NO YES! (50 MLN SUBSCRIBERS CHALLENGE!) #shorts

«Пристрелив чоловік п’ять»: історія військового, який був 41 день у заваленому бліндажі в Авдіївці

Зубарев в шоке от цены самого первого Kaws! #тренды #шортс

Vision Transformer and its Applications

Open Data Science

Переглядів 37 144

Додати в
- Мій плейлист
- Переглянути пізніше
Поділитися

Поділитися

Вставка

Розмір відео:

Показувати елементи керування програвачем

Автоматичне відтворення

Автоповтор

Опубліковано 27 тра 2024
Vision transformer is a recent breakthrough in the area of computer vision. While transformer-based models have dominated the field of natural language processing since 2017, CNN-based models are still demonstrating state-of-the-art performances in vision problems. Last year, a group of researchers from Google figured out how to make a transformer work on recognition. They called it "vision transformer". The follow-up works by the community demonstrated superior performance of vision transformers not only in recognition but also in other downstream tasks such as detection, segmentation, multi-modal learning and scene text recognition to mention a few.
In this talk, Rowel Atienza will go into a deeper understanding of the model architecture of vision transformers. Most importantly, Rowel will focus on the concept of self-attention and its role in vision. Then, he will present different model implementations utilizing the vision transformer as the main backbone.
Since self-attention can be applied beyond transformers, Rowel Atienza will also discuss a promising direction in building general-purpose model architectures. In particular, networks that can process a variety of data formats such as text, audio, image and video.
→ To watch more videos like this, visit aiplus.training ←
Do You Like This Video? Share Your Thoughts in Comments Below
Also, You can visit our website and choose the nearest ODSC Event to attend and experience all our Trainings and Workshops:
odsc.com/california/
odsc.com/apac/
Sign up for the newsletter to stay up to date with the latest trends in data science: opendatascience.com/newsletter/
Follow Us Online!
• Facebook: / opendatasci
• Instagram: / odsc
• Blog: opendatascience.com/
• LinkedIn: / open-data-science
• Twitter: / odsc
Наука та технологія

КОМЕНТАРІ • 24

@DrAIScience 26 днів тому
Very very very nice explanation!!! I like learning the foundation/origin of the concepts where models are derived..
@jhjbm1959 6 місяців тому ⁺³
This video provides a clear step by step explanation how to get from images to input features for Transformer encoders, which has proven hard to find anywhere else.
Thank you.
@crapadopalese Рік тому ⁺⁷
10:46 - this is a mistake; the convolution is not equivariant to scaling - if the bird is scaled, the output of the convolution will not be simply a scaling of the original output. That would only be true if you also rescale the filters.
@PrestonRahim Рік тому ⁺⁵
Super helpful. Was very lost on the process from image patch to embedded vector until I watched this.
@xXMaDGaMeR Рік тому ⁺³
amazing lecture, thank you sir!
@rikki146 Рік тому ⁺¹
20:17 I think the encoder blocks are stacked in parallel fashion rather than sequential?
@DrAIScience 26 днів тому
Do you have a video about beit or dino?
@sahil-vz8or 11 місяців тому ⁺¹
you said 196 patches in imagenet data. No of matches will depend on the input image size and the patch size. For eg: if the input image is of 400X400 and patch size of 8X8, then no of patches will be (400X400/8X8) = 50X50 =2500.
@ailinhasanpour Рік тому ⁺⁴
thanks for sharing , it was extremely helpful 💯
@OpenDataScienceCon Рік тому
Thank you!
@scottkorman4953 Рік тому ⁺⁴
What exactly is happening in the self-attention and MLP blocks of the encoder module? Could you describe it in a simplistic way?
@muhammadshahzaibiqbal7658 Рік тому
Thanks for sharing.
@PRASHANTKUMAR-ze6mj Рік тому ⁺¹
thanks for sharing
@anirudhgangadhar6158 Рік тому
Great resource!
@user-co6pu8zv3v 10 місяців тому
Thank you, sir
@DrAIScience 26 днів тому
Are you the channel owner??
@hoangtrung.aiengineer Рік тому
Thank you for making such a great video
@capocianni1043 Рік тому
Thank you for this genuine knowledge.
@mohammedrakib3736 2 місяці тому
Fantastic Video! Really loved the detailed explanation step-by-step.
@liangcheng9856 9 місяців тому
awesome
@saimasideeq7254 6 місяців тому
thankyou much clearer
@improvement_developer8995 9 місяців тому ⁺¹
Tax evader 🤮
@improvement_developer8995 9 місяців тому ⁺¹
🤮

Наступне

Автоматичне відтворення

DINO: Emerging Properties in Self-Supervised Vision Transformers (Facebook AI Research Explained)

DINO: Emerging Properties in Self-Supervised Vision Transformers (Facebook AI Research Explained)

Vision Transformer Basics

Vision Transformer Basics

Vision Transformers (ViT) Explained + Fine-tuning in Python

Vision Transformers (ViT) Explained + Fine-tuning in Python

NO NO NO YES! (50 MLN SUBSCRIBERS CHALLENGE!) #shorts

NO NO NO YES! (50 MLN SUBSCRIBERS CHALLENGE!) #shorts

«Пристрелив чоловік п’ять»: історія військового, який був 41 день у заваленому бліндажі в Авдіївці

«Пристрелив чоловік п’ять»: історія військового, який був 41 день у заваленому бліндажі в Авдіївці

Зубарев в шоке от цены самого первого Kaws! #тренды #шортс

Зубарев в шоке от цены самого первого Kaws! #тренды #шортс

Пес Рем прощається зі своїм тренером і другом - військовим Сергієм Будзаном #війна #shortsyoutube

Пес Рем прощається зі своїм тренером і другом — військовим Сергієм Будзаном #війна #shortsyoutube

Vision Transformer (ViT) - An image is worth 16x16 words | Paper Explained

Vision Transformer (ViT) - An image is worth 16x16 words | Paper Explained

Прикладное машинное обучение 4. Self-Attention. Transformer overview

Прикладное машинное обучение 4. Self-Attention. Transformer overview

Vision Transformer Quick Guide - Theory and Code in (almost) 15 min

Vision Transformer Quick Guide - Theory and Code in (almost) 15 min

Meaning Representation for Natural Language Understanding - Mariana Romanyshyn | - ODSC Europe 2019

Meaning Representation for Natural Language Understanding - Mariana Romanyshyn | - ODSC Europe 2019

An Image is Worth 16x16 Words: Transformers for Image Recognition at Scale (Paper Explained)

An Image is Worth 16x16 Words: Transformers for Image Recognition at Scale (Paper Explained)

Transformers, explained: Understand the model behind ChatGPT

Transformers, explained: Understand the model behind ChatGPT

EfficientML.ai Lecture 14 - Vision Transformer (MIT 6.5940, Fall 2023)

EfficientML.ai Lecture 14 - Vision Transformer (MIT 6.5940, Fall 2023)

[ 100k Special ] Transformers: Zero to Hero

[ 100k Special ] Transformers: Zero to Hero

Swin Transformer: Hierarchical Vision Transformer using Shifted Windows (paper illustrated)

Swin Transformer: Hierarchical Vision Transformer using Shifted Windows (paper illustrated)

ТОП Найкращих Смартфонів від $200 до $2000 в 2024-2025 для кожного 🔥 Підсумки півріччя

ТОП Найкращих Смартфонів від $200 до $2000 в 2024-2025 для кожного 🔥 Підсумки півріччя

Этот школьник ПО ПРИКОЛУ ПОЛОЖИЛ 6000 компов! 🖥️ #технологии #пк #вирус #хакер

Этот школьник ПО ПРИКОЛУ ПОЛОЖИЛ 6000 компов! 🖥️ #технологии #пк #вирус #хакер

Эффект Карбонаро и бумажный телефон

Эффект Карбонаро и бумажный телефон

Apple, как вас уделал Тюменский бренд CaseGuru? Конец удивил #caseguru #кейсгуру #наушники

Apple, как вас уделал Тюменский бренд CaseGuru? Конец удивил #caseguru #кейсгуру #наушники

Эволюция телефонов!

Эволюция телефонов!

ИИ на вооружении КИТАЯ | Обновленный ИИ Gemini от Google | Самый маленький дисплей от LG

ИИ на вооружении КИТАЯ | Обновленный ИИ Gemini от Google | Самый маленький дисплей от LG

КОМП В МЕШКЕ / КУПИЛ ИГРОВОЙ ПК ЗА 80К ОТ ACER.. А ВНУТРИ..

КОМП В МЕШКЕ / КУПИЛ ИГРОВОЙ ПК ЗА 80К ОТ ACER.. А ВНУТРИ..

ЭТО Главный провал Apple перевод @mkbhd Смотри до КОНЦА

ЭТО Главный провал Apple перевод @mkbhd Смотри до КОНЦА