[Classic] ImageNet Classification with Deep Convolutional Neural Networks (Paper Explained)
Вставка
- Опубліковано 28 чер 2024
- #ai #research #alexnet
AlexNet was the start of the deep learning revolution. Up until 2012, the best computer vision systems relied on hand-crafted features and highly specialized algorithms to perform object classification. This paper was the first to successfully train a deep convolutional neural network on not one, but two GPUs and managed to outperform the competition on ImageNet by an order of magnitude.
OUTLINE:
0:00 - Intro & Overview
2:00 - The necessity of larger models
6:20 - Why CNNs?
11:05 - ImageNet
12:05 - Model Architecture Overview
14:35 - ReLU Nonlinearities
18:45 - Multi-GPU training
21:30 - Classification Results
24:30 - Local Response Normalization
28:05 - Overlapping Pooling
32:25 - Data Augmentation
38:30 - Dropout
40:30 - More Results
43:50 - Conclusion
Paper: www.cs.toronto.edu/~hinton/abs...
Abstract:
We trained a large, deep convolutional neural network to classify the 1.2 million high-resolution images in the ImageNet LSVRC-2010 contest into the 1000 different classes. On the test data, we achieved top-1 and top-5 error rates of 37.5% and 17.0% which is considerably better than the previous state-of-the-art. The neural network, which has 60 million parameters and 650,000 neurons, consists of five convolutional layers, some of which are followed by max-pooling layers, and three fully-connected layers with a final 1000-way softmax. To make training faster, we used non-saturating neurons and a very efficient GPU implementation of the convolution operation. To reduce overfitting in the fully-connected layers we employed a recently-developed regularization method called “dropout” that proved to be very effective. We also entered a variant of this model in the ILSVRC-2012 competition and achieved a winning top-5 test error rate of 15.3%, compared to 26.2% achieved by the second-best entry.
Authors: Alex Krizhevsky, Ilya Sutskever, Geoffrey E. Hinton
Links:
UA-cam: / yannickilcher
Twitter: / ykilcher
Discord: / discord
BitChute: www.bitchute.com/channel/yann...
Minds: www.minds.com/ykilcher
Parler: parler.com/profile/YannicKilcher
LinkedIn: / yannic-kilcher-488534136
If you want to support me, the best thing to do is to share out the content :)
If you want to support me financially (completely optional and voluntary, but a lot of people have asked for this):
SubscribeStar (preferred to Patreon): www.subscribestar.com/yannick...
Patreon: / yannickilcher
Bitcoin (BTC): bc1q49lsw3q325tr58ygf8sudx2dqfguclvngvy2cq
Ethereum (ETH): 0x7ad3513E3B8f66799f507Aa7874b1B0eBC7F85e2
Litecoin (LTC): LQW2TRyKYetVC8WjFkhpPhtpbDM4Vw7r9m
Monero (XMR): 4ACL8AGrEo5hAir8A9CeVrW8pEauWvnp1WnSDZxW7tziCDLhZAGsgzhRQABDnFy8yuM9fWJDviJPHKRjV4FWt19CJZN9D4n - Наука та технологія
I love these historical papers
7 years old paper is called historical. This is what should be called progress.
The paper that everyones cites in the introduction. Thanks for sharing!!
For somebody that left machine learning 3 years ago to move to software development, these videos are pure gold in terms of catching up with cutting edge and knowing what I learned back then is still relevant for today.
going the other direction here, 30 years coding, but now I wont touch any coding that isnt machine learning :D his videos are certainly gold.
Thanks a ton for this series! And clarifications of what techniques stayed and which are gone is highly appreciated!
What I love about these the most is when he mentions what is being used now compared to the paper and all these little gems of information here and there on top of the paper.
30:20 SqueezeNet and LeNet talked about it. Having a 9x9x9 conv is equivalent to having 3 3x3x3 filters but takes cubic more time to compute.
.
31:37 Global Max Pool was introduced later I think, it helps when dimensions of image is variable. 32:00 Global max pooling also means that the dense params can be dialed back a lot.
.
66406 citations as of now. Crazy. Good paper, keep it coming.
.
Can you do one on Adam Optimizer? I think it is so ubiquitous that people dont even cite it anymore XD
Wow! This is such a great idea (classic paper series)! Love it :D
26:20 The VGG (2014) paper "Very deep convolutional networks for large-scale image recognition
" mentioned explicitly that they tried LRN and found it's not worth the trouble.
> First, we note that using local response normalisation (A-LRN network) does not improve on the model A without any normalisation layers. We thus do not employ normalisation in the deeper architectures (B-E).
Again, in Batch normalization: Accelerating deep network training by reducing internal covariate shift
(2015):
> Remove Local Response Normalization. While Inception and other networks (Srivastava et al., 2014) benefit from it, we found that with Batch Normalization it is not necessary.
Wonderfully presented, thank you! :) I look forward to taking the rest of the journey through this subject with you and your channel. :)
I love the classical paper series i do not have masters or phd, but I want to learn deep learning this series help us cover the basics
I genuinely wish I had a teacher like Yannic five years ago
Thank you so much for spending time to walk thro the paper. The world's a better place cos of folks like you!
Interesting! You have sparked my interest in learning about the present state of dropout layers.
Thank you for a beautiful explanation with a retrospective overlook
Pure gold! You deserve more subscribers!!
Thanks for the explanation, It helped me understand and learn a lot things which I couldn't have, if I had read the paper by myself
I was told that this guy is on a break?!
He is so productive even in a break.
Apparently not from a reliable source ;)
He is, the classic paper videos are usually pre-recorded
enjoyed the way you presented it, thank you
Thanks for this great video! I really enjoyed.
Great video! Thank you!
In the beginning was the AlexNet, and the AlexNet was with DNN, and the Alexnet was DNN.
4:20 One of the main reasons it's not used as much is because BatchNorm has the same over all effect but much better. That being said, I still find dropout to be quite effective at lowering overfitting. Specially on lower sample sizes, and when used on the dense layers at the end.
Please, more videos like this one!
I love your channel, Im one of the first who watch your videos and smash the like button.
Great channel, subscribed, liked!
love your channel, how can someone be up to date with new advancements in ML/DL? like its counter intuitive to me that larger models overfit less
To prevent overfitting, the loss landscape should be as smooth as possible so that the model can better generalize. If I remember correctly, residual connections + batch normalization help smoothing the loss landscape, maybe because they allow us to build even deeper models.
My bet for the future is actually on Bayesian networks, which use a learnable version of a Gaussian dropout.
I don't know why I am paying for college...You videos are amazing!
woow. It's really classic.
Amazing, can you do their book as well :D
you should make more classics...i just love knowing about them
In test time the crops and reflections were used by another paper called OverFeat, that crushed '13 imagenet detection challenge I suppose. (46:06)
This is helping me with my literature review assignment, hahaha 😂 thank you!
Do you have a powerpoint presentation on this paper?
A lot of stuff mentioned like dropout become out of fashion when batch normalization was introduced. I still use it in denoisers though
Hi, i am following ur channel since DERT. Can you make a video explains DeepSORT?
TIA
You need a microphone that’s not so sensitive it records every tiny little sound. Actually, it could be automatic gain control (AGC) that increases amplification when you are not speaking and records every time you swallow with more volume than you want. Recommendation: Turn off AGC if you can.
Does anyone know where the statement large model doesn't overfit 7:56 come from
search for "deep learning double descent"
👍
Does anyone know what he's using to annotate over the papers?
OneNote
@@YannicKilcher Thanks
Can alexnet detect multiple objects in single frame ?
So you saying people don’t care about parameter size?
So if dropout is not being used, what is being used now?
My bet for the future is on Bayesian networks, which use a sort of a learnable version of a Gaussian dropout.
Global Average Pooling
I can nearly guarantee that their net did not really overfit. I’ve trained many nets and past lab mates have increased the number of parameters and the test loss never keeps increasing. Feel free to check my google scholar for paper examples of what I mean but I am sure their nets did fine especially since they didnt quantify if their neg overfit.
Regardless fun paper!
the link doesn't work properly bc download doesnt start
Yannic is gonna be in the stone age soon. Can't wait for the invention of wheel.
38.33 why don't people use dropout anymore?
It's usually fine without.
"Action potential" is the name of the spike/signal itself, not the name of the activation threshold. It is a really dumb name :(
Why people now do not care about overfitting?