SimCLR Explained!

CoAtNet: Marrying Convolution and Attention for All Data Sizes

Knowledge Distillation with TAs

Unexpected way to open the new Audi A6 e-tron Frunk 😮! #shorts

⚡КОРЕЙЦІ ПРОТИ росіянок

СОЛДАТ КНДР: ВТЕЧА/ВІЙНА В УКРАЇНІ/10 РОКІВ ШПИГУВАВ У ПІВНІЧНІЙ КОРЕЇ/ТОРГУЮТЬ НАРКОТИКАМИ І ЗБРОЄЮ

Self-Training with Noisy Student (87.4% ImageNet Top-1 Accuracy!)

Connor Shorten

Переглядів 9 860

Додати в
- Мій плейлист
- Переглянути пізніше
Поділитися

Поділитися

Вставка

Розмір відео:

Показувати елементи керування програвачем

Автоматичне відтворення

Автоповтор

Опубліковано 25 січ 2025
This video explains the new state-of-the-art for ImageNet classification from Google AI. This technique has a very interesting approach to Knowledge Distillation. Rather than using the student network for model compression (usually either faster inference or less memory requirements), this approach iteratively scales up the capacity of the student network. This video also explains other interesting characteristics in the paper such as the use of noise via data augmentation, dropout, and stochastic depth, as well as ideas like class balancing with pseudo-labels!
Thanks for watching! Please Subscribe!
Self-training with Noisy Student improves ImageNet classification: arxiv.org/pdf/...
EfficientNet: Rethinking Model Scaling for Convolutional Networks
arxiv.org/pdf/...
Billion-scale semi-supervised learning for state-of-the art image and video classification:
/ billion-scale-semi-sup...

КОМЕНТАРІ • 31

@Thomasssien 4 роки тому
A very accessible discussion of the paper that gets its main ideas across well. Thanks, subbed!
@geo2073 5 років тому ⁺²
Loved it! Please continue more of this!
@connor-shorten 5 років тому
Thank you so much!!
@mikemihay 5 років тому
Thank you!
@neutrinocoffee1151 5 років тому ⁺¹
Thanks for making this video! Clear and succinct.
@connor-shorten 5 років тому
Thank you!!
@BlakeEdwards333 5 років тому ⁺¹
Amazing! Thank you! I do research involving knowledge distillation and this is amazing motivation to keep moving in that direction
@connor-shorten 5 років тому
Thank you!! I have been really interested in distillation lately as well! Planning on making a few videos of odd papers I came across while doing a literature review of distillation!
@BlakeEdwards333 5 років тому ⁺¹
@@connor-shorten Awesome, please do! I'd love to talk sometime with you about my work.
@connor-shorten 5 років тому
@@BlakeEdwards333 That sounds great! Could you send me an email at henryailabs@gmail.com to discuss this? Looking forward to it!
@hocusbogus7930 5 років тому ⁺¹
Thank you for your work, I greatly appreciate it. One suggestion for future episodes: if applicable, could you dedicate a slide or a few sentences to the novelty and/or contributions as stated in the paper? That would be great to know.
In this paper I'm inclined to think it was the stochastic depth noise, and I couldn't make out if anything else was new.
@connor-shorten 5 років тому
Thank you so much for the suggestion! I don't think the incremental scaling up of the student networks was the key idea as well (in the paper they attribute +0.5% to this compared to 1.9% for the noise trio of SD, Dropout, and Data Augmentation). I think the scale of the experiments is novel as well, definitely a computationally intense workload of going form EfficientNet-B7 to the bigger L0, L1, L2 models and illustrating this underexplored / underpromoted training methodology. I think generally it flips common knowledge on its head in the sense that we usually think of distillation as a technique for reducing capacity / faster models, but this shows how it can be used for larger models as well. I haven't completely surveyed the literature on distillation, please share any papers similar to this study!
@SergePanev 5 років тому ⁺¹
Subbed and recommended to all the people I know :)
Keep up the great work!
@connor-shorten 5 років тому
Thank you so much!!
@LouisChiaki 4 роки тому
Can you turn on the caption! Very appreciated!
@sayakpaul3152 4 роки тому
Thank you as always. Could you elaborate a bit more on the stochastic depth & class balancing parts?
@arkasaha4412 5 років тому ⁺¹
Awesome work! Can you consider doing a series dedicated to the intuition behind loss functions and their use cases?
@connor-shorten 5 років тому
Thank you and thanks for the suggestion, it sounds like an interesting project!
@arkasaha4412 5 років тому ⁺¹
@@connor-shorten Really looking forward to it!
@ArturHolanda91 5 років тому ⁺¹
Thank you and keep up the good work.
@connor-shorten 5 років тому
Thank you so much!!
@mikemihay 5 років тому ⁺¹
Thank you!
@connor-shorten 5 років тому ⁺¹
Thank you!
@mikemihay 5 років тому ⁺¹
@@connor-shorten It helps me so much, please don't stop making this videos
@jonatan01i 5 років тому ⁺¹
Train two teacher networks and train student only on images that the teachers agree on its content.
@maxmetz01 4 роки тому ⁺¹
Hey, I know it's been five months. I really like your idea. Did you try it out?
@jonatan01i 4 роки тому
@@maxmetz01 Nope, sadly no.
@mendi1122 5 років тому ⁺³
Only 1% improvement
@Riitasusei 5 років тому ⁺²
well nowadays it's hard to improve, even 1%
@connor-shorten 5 років тому ⁺⁶
Usain Bolt only set the 100m record by 0.28 seconds lol

Наступне

Автоматичне відтворення

SimCLR Explained!

SimCLR Explained!

CoAtNet: Marrying Convolution and Attention for All Data Sizes

CoAtNet: Marrying Convolution and Attention for All Data Sizes

Knowledge Distillation with TAs

Knowledge Distillation with TAs

Unexpected way to open the new Audi A6 e-tron Frunk 😮! #shorts

Unexpected way to open the new Audi A6 e-tron Frunk 😮! #shorts

⚡КОРЕЙЦІ ПРОТИ росіянок

⚡КОРЕЙЦІ ПРОТИ росіянок

СОЛДАТ КНДР: ВТЕЧА/ВІЙНА В УКРАЇНІ/10 РОКІВ ШПИГУВАВ У ПІВНІЧНІЙ КОРЕЇ/ТОРГУЮТЬ НАРКОТИКАМИ І ЗБРОЄЮ

СОЛДАТ КНДР: ВТЕЧА/ВІЙНА В УКРАЇНІ/10 РОКІВ ШПИГУВАВ У ПІВНІЧНІЙ КОРЕЇ/ТОРГУЮТЬ НАРКОТИКАМИ І ЗБРОЄЮ

ШАЛОСТЬ (смешное видео, приколы, юмор, поржать)

ШАЛОСТЬ (смешное видео, приколы, юмор, поржать)

The ImageNet Moment with Geoff Hinton | Best Bits

The ImageNet Moment with Geoff Hinton | Best Bits

Scale AI CEO Alexandr Wang on U.S.-China AI race: We need to unleash U.S. energy to enable AI boom

Scale AI CEO Alexandr Wang on U.S.-China AI race: We need to unleash U.S. energy to enable AI boom

Contrastive Clustering with SwAV

Contrastive Clustering with SwAV

Kernel Size and Why Everyone Loves 3x3 - Neural Network Convolution

Kernel Size and Why Everyone Loves 3x3 - Neural Network Convolution

Graph Neural Networks - a perspective from the ground up

Graph Neural Networks - a perspective from the ground up

Self-training with Noisy Student improves ImageNet classification (Paper Explained)

Self-training with Noisy Student improves ImageNet classification (Paper Explained)

Yann LeCun: Self-Supervised Learning Explained | Lex Fridman Podcast Clips

Yann LeCun: Self-Supervised Learning Explained | Lex Fridman Podcast Clips

How AI Image Generators Work (Stable Diffusion / Dall-E) - Computerphile

How AI Image Generators Work (Stable Diffusion / Dall-E) - Computerphile

Rethinking Pre-training and Self-Training

Rethinking Pre-training and Self-Training

СОЛДАТ КНДР: ВТЕЧА/ВІЙНА В УКРАЇНІ/10 РОКІВ ШПИГУВАВ У ПІВНІЧНІЙ КОРЕЇ/ТОРГУЮТЬ НАРКОТИКАМИ І ЗБРОЄЮ

СОЛДАТ КНДР: ВТЕЧА/ВІЙНА В УКРАЇНІ/10 РОКІВ ШПИГУВАВ У ПІВНІЧНІЙ КОРЕЇ/ТОРГУЮТЬ НАРКОТИКАМИ І ЗБРОЄЮ

вернулись в ПРОШЛОЕ 🔃 | WICSUR #shorts

вернулись в ПРОШЛОЕ 🔃 | WICSUR #shorts

Хто такий РОМАН СВІТАН? Звідки бере інформацію про фронт?

Хто такий РОМАН СВІТАН? Звідки бере інформацію про фронт?

Cat mode and a glass of water #family #humor #fun

Cat mode and a glass of water #family #humor #fun

Рабочий способ бросить вредную привычку

Рабочий способ бросить вредную привычку

TOY STORY IN BRAWL STARS!?

TOY STORY IN BRAWL STARS!?

ШАЛОСТЬ (смешное видео, приколы, юмор, поржать)

ШАЛОСТЬ (смешное видео, приколы, юмор, поржать)