[Classic] ImageNet Classification with Deep Convolutional Neural Networks (Paper Explained)

Yannic Kilcher

Додати в
- Мій плейлист
- Переглянути пізніше
Поділитися

Поділитися

Вставка

Розмір відео:

Показувати елементи керування програвачем

Автоматичне відтворення

Автоповтор

Опубліковано 24 гру 2024

КОМЕНТАРІ • 60

@aa-xn5hc 4 роки тому ⁺⁶⁷
I love these historical papers
@Dendus90 4 роки тому ⁺¹⁵
7 years old paper is called historical. This is what should be called progress.
@05xpeter 4 роки тому ⁺¹⁸
For somebody that left machine learning 3 years ago to move to software development, these videos are pure gold in terms of catching up with cutting edge and knowing what I learned back then is still relevant for today.
@agenticmark Рік тому
going the other direction here, 30 years coding, but now I wont touch any coding that isnt machine learning :D his videos are certainly gold.
@DiegoJimenez-ic8by 4 роки тому ⁺¹⁸
The paper that everyones cites in the introduction. Thanks for sharing!!
@CosmiaNebula 2 роки тому
26:20 The VGG (2014) paper "Very deep convolutional networks for large-scale image recognition
" mentioned explicitly that they tried LRN and found it's not worth the trouble.
> First, we note that using local response normalisation (A-LRN network) does not improve on the model A without any normalisation layers. We thus do not employ normalisation in the deeper architectures (B-E).
Again, in Batch normalization: Accelerating deep network training by reducing internal covariate shift
(2015):
> Remove Local Response Normalization. While Inception and other networks (Srivastava et al., 2014) benefit from it, we found that with Batch Normalization it is not necessary.
@MinecraftLetstime 4 роки тому ⁺²
What I love about these the most is when he mentions what is being used now compared to the paper and all these little gems of information here and there on top of the paper.
@herp_derpingson 4 роки тому ⁺²⁷
30:20 SqueezeNet and LeNet talked about it. Having a 9x9x9 conv is equivalent to having 3 3x3x3 filters but takes cubic more time to compute.
.
31:37 Global Max Pool was introduced later I think, it helps when dimensions of image is variable. 32:00 Global max pooling also means that the dense params can be dialed back a lot.
.
66406 citations as of now. Crazy. Good paper, keep it coming.
.
Can you do one on Adam Optimizer? I think it is so ubiquitous that people dont even cite it anymore XD
@Alex-ms1yd 4 роки тому ⁺²
Thanks a ton for this series! And clarifications of what techniques stayed and which are gone is highly appreciated!
@fulin3397 4 роки тому ⁺⁴
I genuinely wish I had a teacher like Yannic five years ago
@darioushs 4 роки тому ⁺¹
4:20 One of the main reasons it's not used as much is because BatchNorm has the same over all effect but much better. That being said, I still find dropout to be quite effective at lowering overfitting. Specially on lower sample sizes, and when used on the dense layers at the end.
@MikeAirforce111 4 роки тому ⁺³
Wow! This is such a great idea (classic paper series)! Love it :D
@MrjbushM 4 роки тому ⁺⁴
I love the classical paper series i do not have masters or phd, but I want to learn deep learning this series help us cover the basics
@sacramentofwilderness6656 4 роки тому ⁺¹
In the beginning was the AlexNet, and the AlexNet was with DNN, and the Alexnet was DNN.
@pranshu041 3 роки тому ⁺²
I don't know why I am paying for college...You videos are amazing!
@michaelcho 4 роки тому ⁺¹
Thank you so much for spending time to walk thro the paper. The world's a better place cos of folks like you!
@yaxiongzhao6640 4 роки тому ⁺¹²
I was told that this guy is on a break?!
@GenerativeDiffusionModel_AI_ML 4 роки тому ⁺¹
He is so productive even in a break.
@peterfireflylund 4 роки тому ⁺²
Apparently not from a reliable source ;)
@ruiwang780 4 роки тому ⁺¹
He is, the classic paper videos are usually pre-recorded
@somecalc4964 4 роки тому
Interesting! You have sparked my interest in learning about the present state of dropout layers.
@surajrao9729 Рік тому ⁺¹
enjoyed the way you presented it, thank you
@liuwu7350 3 роки тому ⁺¹
Does anyone know where the statement large model doesn't overfit 7:56 come from
@YannicKilcher 3 роки тому ⁺¹
search for "deep learning double descent"
@rishikaushik8307 4 роки тому ⁺⁴
love your channel, how can someone be up to date with new advancements in ML/DL? like its counter intuitive to me that larger models overfit less
@andyblue953 4 роки тому ⁺¹
To prevent overfitting, the loss landscape should be as smooth as possible so that the model can better generalize. If I remember correctly, residual connections + batch normalization help smoothing the loss landscape, maybe because they allow us to build even deeper models.
My bet for the future is actually on Bayesian networks, which use a learnable version of a Gaussian dropout.
@aritraroygosthipaty3662 4 роки тому ⁺¹
In test time the crops and reflections were used by another paper called OverFeat, that crushed '13 imagenet detection challenge I suppose. (46:06)
@bengineer_the 4 роки тому
Wonderfully presented, thank you! :) I look forward to taking the rest of the journey through this subject with you and your channel. :)
@galchinsky 3 роки тому
A lot of stuff mentioned like dropout become out of fashion when batch normalization was introduced. I still use it in denoisers though
@barisois 3 роки тому
Thank you for a beautiful explanation with a retrospective overlook
@vtrandal Рік тому
You need a microphone that’s not so sensitive it records every tiny little sound. Actually, it could be automatic gain control (AGC) that increases amplification when you are not speaking and records every time you swallow with more volume than you want. Recommendation: Turn off AGC if you can.
@jesschil266 4 роки тому ⁺²
This is helping me with my literature review assignment, hahaha 😂 thank you!
@AVINASHKUMAR-cz9sm 3 роки тому
Thanks for the explanation, It helped me understand and learn a lot things which I couldn't have, if I had read the paper by myself
@tiziangottschlich8800 2 роки тому
the link doesn't work properly bc download doesnt start
@arshikantony3021 4 роки тому
Do you have a powerpoint presentation on this paper?
@axequalsb8431 4 роки тому ⁺¹
So you saying people don’t care about parameter size?
@MinecraftLetstime 4 роки тому ⁺¹
So if dropout is not being used, what is being used now?
@andyblue953 4 роки тому
My bet for the future is on Bayesian networks, which use a sort of a learnable version of a Gaussian dropout.
@AndreiMargeloiu 4 роки тому
Global Average Pooling
@xl0xl0xl0 3 роки тому
Please, more videos like this one!
@nulliusinverba7732 4 роки тому ⁺¹
Does anyone know what he's using to annotate over the papers?
@YannicKilcher 4 роки тому ⁺¹
OneNote
@nulliusinverba7732 4 роки тому
@@YannicKilcher Thanks
@ai_station_fa 2 роки тому
Thanks for this great video! I really enjoyed.
@sonOfLiberty100 4 роки тому ⁺¹
I love your channel, Im one of the first who watch your videos and smash the like button.
@geetumolbabu5696 3 роки тому
Can alexnet detect multiple objects in single frame ?
@Aniket7Tomar 4 роки тому ⁺⁵
Yannic is gonna be in the stone age soon. Can't wait for the invention of wheel.
@vivekpandian08 4 роки тому ⁺¹
Hi, i am following ur channel since DERT. Can you make a video explains DeepSORT?
TIA
@vladimirbosinceanu5778 2 роки тому
Great video! Thank you!
@Fortnite_king954 4 роки тому ⁺¹
Amazing, can you do their book as well :D
@brandomiranda6703 3 роки тому
I can nearly guarantee that their net did not really overfit. I’ve trained many nets and past lab mates have increased the number of parameters and the test loss never keeps increasing. Feel free to check my google scholar for paper examples of what I mean but I am sure their nets did fine especially since they didnt quantify if their neg overfit.
Regardless fun paper!
@forcanadaru 4 роки тому
Great channel, subscribed, liked!
@vishnum9613 4 роки тому ⁺³
you should make more classics...i just love knowing about them
@shivamraisharma1474 4 роки тому
38.33 why don't people use dropout anymore?
@YannicKilcher 4 роки тому
It's usually fine without.
@utku_yucel 4 роки тому ⁺¹
woow. It's really classic.
@peterfireflylund 4 роки тому
"Action potential" is the name of the spike/signal itself, not the name of the activation threshold. It is a really dumb name :(
@ziyangchen2301 2 роки тому
Why people now do not care about overfitting?

Наступне

Автоматичне відтворення

[Classic] Deep Residual Learning for Image Recognition (Paper Explained)