[Classic] ImageNet Classification with Deep Convolutional Neural Networks (Paper Explained)

Поділитися
Вставка
  • Опубліковано 28 чер 2024
  • #ai #research #alexnet
    AlexNet was the start of the deep learning revolution. Up until 2012, the best computer vision systems relied on hand-crafted features and highly specialized algorithms to perform object classification. This paper was the first to successfully train a deep convolutional neural network on not one, but two GPUs and managed to outperform the competition on ImageNet by an order of magnitude.
    OUTLINE:
    0:00 - Intro & Overview
    2:00 - The necessity of larger models
    6:20 - Why CNNs?
    11:05 - ImageNet
    12:05 - Model Architecture Overview
    14:35 - ReLU Nonlinearities
    18:45 - Multi-GPU training
    21:30 - Classification Results
    24:30 - Local Response Normalization
    28:05 - Overlapping Pooling
    32:25 - Data Augmentation
    38:30 - Dropout
    40:30 - More Results
    43:50 - Conclusion
    Paper: www.cs.toronto.edu/~hinton/abs...
    Abstract:
    We trained a large, deep convolutional neural network to classify the 1.2 million high-resolution images in the ImageNet LSVRC-2010 contest into the 1000 different classes. On the test data, we achieved top-1 and top-5 error rates of 37.5% and 17.0% which is considerably better than the previous state-of-the-art. The neural network, which has 60 million parameters and 650,000 neurons, consists of five convolutional layers, some of which are followed by max-pooling layers, and three fully-connected layers with a final 1000-way softmax. To make training faster, we used non-saturating neurons and a very efficient GPU implementation of the convolution operation. To reduce overfitting in the fully-connected layers we employed a recently-developed regularization method called “dropout” that proved to be very effective. We also entered a variant of this model in the ILSVRC-2012 competition and achieved a winning top-5 test error rate of 15.3%, compared to 26.2% achieved by the second-best entry.
    Authors: Alex Krizhevsky, Ilya Sutskever, Geoffrey E. Hinton
    Links:
    UA-cam: / yannickilcher
    Twitter: / ykilcher
    Discord: / discord
    BitChute: www.bitchute.com/channel/yann...
    Minds: www.minds.com/ykilcher
    Parler: parler.com/profile/YannicKilcher
    LinkedIn: / yannic-kilcher-488534136
    If you want to support me, the best thing to do is to share out the content :)
    If you want to support me financially (completely optional and voluntary, but a lot of people have asked for this):
    SubscribeStar (preferred to Patreon): www.subscribestar.com/yannick...
    Patreon: / yannickilcher
    Bitcoin (BTC): bc1q49lsw3q325tr58ygf8sudx2dqfguclvngvy2cq
    Ethereum (ETH): 0x7ad3513E3B8f66799f507Aa7874b1B0eBC7F85e2
    Litecoin (LTC): LQW2TRyKYetVC8WjFkhpPhtpbDM4Vw7r9m
    Monero (XMR): 4ACL8AGrEo5hAir8A9CeVrW8pEauWvnp1WnSDZxW7tziCDLhZAGsgzhRQABDnFy8yuM9fWJDviJPHKRjV4FWt19CJZN9D4n
  • Наука та технологія

КОМЕНТАРІ • 62

  • @aa-xn5hc
    @aa-xn5hc 3 роки тому +61

    I love these historical papers

    • @Dendus90
      @Dendus90 3 роки тому +14

      7 years old paper is called historical. This is what should be called progress.

  • @DiegoJimenez-ic8by
    @DiegoJimenez-ic8by 3 роки тому +18

    The paper that everyones cites in the introduction. Thanks for sharing!!

  • @05xpeter
    @05xpeter 3 роки тому +15

    For somebody that left machine learning 3 years ago to move to software development, these videos are pure gold in terms of catching up with cutting edge and knowing what I learned back then is still relevant for today.

    • @agenticmark
      @agenticmark 6 місяців тому

      going the other direction here, 30 years coding, but now I wont touch any coding that isnt machine learning :D his videos are certainly gold.

  • @Alex-ms1yd
    @Alex-ms1yd 3 роки тому +2

    Thanks a ton for this series! And clarifications of what techniques stayed and which are gone is highly appreciated!

  • @MinecraftLetstime
    @MinecraftLetstime 3 роки тому +2

    What I love about these the most is when he mentions what is being used now compared to the paper and all these little gems of information here and there on top of the paper.

  • @herp_derpingson
    @herp_derpingson 3 роки тому +27

    30:20 SqueezeNet and LeNet talked about it. Having a 9x9x9 conv is equivalent to having 3 3x3x3 filters but takes cubic more time to compute.
    .
    31:37 Global Max Pool was introduced later I think, it helps when dimensions of image is variable. 32:00 Global max pooling also means that the dense params can be dialed back a lot.
    .
    66406 citations as of now. Crazy. Good paper, keep it coming.
    .
    Can you do one on Adam Optimizer? I think it is so ubiquitous that people dont even cite it anymore XD

  • @MikeAirforce111
    @MikeAirforce111 3 роки тому +3

    Wow! This is such a great idea (classic paper series)! Love it :D

  • @CosmiaNebula
    @CosmiaNebula 2 роки тому

    26:20 The VGG (2014) paper "Very deep convolutional networks for large-scale image recognition
    " mentioned explicitly that they tried LRN and found it's not worth the trouble.
    > First, we note that using local response normalisation (A-LRN network) does not improve on the model A without any normalisation layers. We thus do not employ normalisation in the deeper architectures (B-E).
    Again, in Batch normalization: Accelerating deep network training by reducing internal covariate shift
    (2015):
    > Remove Local Response Normalization. While Inception and other networks (Srivastava et al., 2014) benefit from it, we found that with Batch Normalization it is not necessary.

  • @bengineer_the
    @bengineer_the 3 роки тому

    Wonderfully presented, thank you! :) I look forward to taking the rest of the journey through this subject with you and your channel. :)

  • @MrjbushM
    @MrjbushM 3 роки тому +5

    I love the classical paper series i do not have masters or phd, but I want to learn deep learning this series help us cover the basics

  • @fulin3397
    @fulin3397 3 роки тому +4

    I genuinely wish I had a teacher like Yannic five years ago

  • @michaelcho
    @michaelcho 3 роки тому +1

    Thank you so much for spending time to walk thro the paper. The world's a better place cos of folks like you!

  • @somecalc4964
    @somecalc4964 3 роки тому

    Interesting! You have sparked my interest in learning about the present state of dropout layers.

  • @barisois
    @barisois 3 роки тому

    Thank you for a beautiful explanation with a retrospective overlook

  • @dbsjro
    @dbsjro 3 роки тому +1

    Pure gold! You deserve more subscribers!!

  • @AVINASHKUMAR-cz9sm
    @AVINASHKUMAR-cz9sm 3 роки тому

    Thanks for the explanation, It helped me understand and learn a lot things which I couldn't have, if I had read the paper by myself

  • @yaxiongzhao6640
    @yaxiongzhao6640 3 роки тому +12

    I was told that this guy is on a break?!

    • @bingbingsun6304
      @bingbingsun6304 3 роки тому +1

      He is so productive even in a break.

    • @peterfireflylund
      @peterfireflylund 3 роки тому +2

      Apparently not from a reliable source ;)

    • @ruiwang780
      @ruiwang780 3 роки тому +1

      He is, the classic paper videos are usually pre-recorded

  • @surajrao9729
    @surajrao9729 10 місяців тому

    enjoyed the way you presented it, thank you

  • @ai_station_fa
    @ai_station_fa 2 роки тому

    Thanks for this great video! I really enjoyed.

  • @vladimirbosinceanu5778
    @vladimirbosinceanu5778 2 роки тому

    Great video! Thank you!

  • @sacramentofwilderness6656
    @sacramentofwilderness6656 3 роки тому +1

    In the beginning was the AlexNet, and the AlexNet was with DNN, and the Alexnet was DNN.

  • @darioushs
    @darioushs 3 роки тому +1

    4:20 One of the main reasons it's not used as much is because BatchNorm has the same over all effect but much better. That being said, I still find dropout to be quite effective at lowering overfitting. Specially on lower sample sizes, and when used on the dense layers at the end.

  • @xl0xl0xl0
    @xl0xl0xl0 3 роки тому

    Please, more videos like this one!

  • @quebono100
    @quebono100 3 роки тому +1

    I love your channel, Im one of the first who watch your videos and smash the like button.

  • @forcanadaru
    @forcanadaru 3 роки тому

    Great channel, subscribed, liked!

  • @rishikaushik8307
    @rishikaushik8307 3 роки тому +4

    love your channel, how can someone be up to date with new advancements in ML/DL? like its counter intuitive to me that larger models overfit less

    • @andyblue953
      @andyblue953 3 роки тому +1

      To prevent overfitting, the loss landscape should be as smooth as possible so that the model can better generalize. If I remember correctly, residual connections + batch normalization help smoothing the loss landscape, maybe because they allow us to build even deeper models.
      My bet for the future is actually on Bayesian networks, which use a learnable version of a Gaussian dropout.

  • @pranshu041
    @pranshu041 3 роки тому +1

    I don't know why I am paying for college...You videos are amazing!

  • @utku_yucel
    @utku_yucel 3 роки тому +1

    woow. It's really classic.

  • @Fortnite_king954
    @Fortnite_king954 3 роки тому +1

    Amazing, can you do their book as well :D

  • @vishnum9613
    @vishnum9613 3 роки тому +4

    you should make more classics...i just love knowing about them

  • @aritraroygosthipaty3662
    @aritraroygosthipaty3662 3 роки тому +1

    In test time the crops and reflections were used by another paper called OverFeat, that crushed '13 imagenet detection challenge I suppose. (46:06)

  • @jesschil266
    @jesschil266 3 роки тому +2

    This is helping me with my literature review assignment, hahaha 😂 thank you!

  • @arshikantony3021
    @arshikantony3021 3 роки тому

    Do you have a powerpoint presentation on this paper?

  • @galchinsky
    @galchinsky 3 роки тому

    A lot of stuff mentioned like dropout become out of fashion when batch normalization was introduced. I still use it in denoisers though

  • @vivekpandian08
    @vivekpandian08 3 роки тому +1

    Hi, i am following ur channel since DERT. Can you make a video explains DeepSORT?
    TIA

  • @vtrandal
    @vtrandal Рік тому

    You need a microphone that’s not so sensitive it records every tiny little sound. Actually, it could be automatic gain control (AGC) that increases amplification when you are not speaking and records every time you swallow with more volume than you want. Recommendation: Turn off AGC if you can.

  • @liuwu7350
    @liuwu7350 3 роки тому +1

    Does anyone know where the statement large model doesn't overfit 7:56 come from

    • @YannicKilcher
      @YannicKilcher  3 роки тому +1

      search for "deep learning double descent"

  • @user-ui5dg3nr3r
    @user-ui5dg3nr3r Рік тому

    👍

  • @nulliusinverba7732
    @nulliusinverba7732 3 роки тому +1

    Does anyone know what he's using to annotate over the papers?

  • @geetumolbabu5696
    @geetumolbabu5696 3 роки тому

    Can alexnet detect multiple objects in single frame ?

  • @axequalsb8431
    @axequalsb8431 3 роки тому +1

    So you saying people don’t care about parameter size?

  • @MinecraftLetstime
    @MinecraftLetstime 3 роки тому +1

    So if dropout is not being used, what is being used now?

    • @andyblue953
      @andyblue953 3 роки тому

      My bet for the future is on Bayesian networks, which use a sort of a learnable version of a Gaussian dropout.

    • @AndreiMargeloiu
      @AndreiMargeloiu 3 роки тому

      Global Average Pooling

  • @brandomiranda6703
    @brandomiranda6703 3 роки тому

    I can nearly guarantee that their net did not really overfit. I’ve trained many nets and past lab mates have increased the number of parameters and the test loss never keeps increasing. Feel free to check my google scholar for paper examples of what I mean but I am sure their nets did fine especially since they didnt quantify if their neg overfit.
    Regardless fun paper!

  • @tiziangottschlich8800
    @tiziangottschlich8800 Рік тому

    the link doesn't work properly bc download doesnt start

  • @Aniket7Tomar
    @Aniket7Tomar 3 роки тому +5

    Yannic is gonna be in the stone age soon. Can't wait for the invention of wheel.

  • @shivamraisharma1474
    @shivamraisharma1474 3 роки тому

    38.33 why don't people use dropout anymore?

  • @peterfireflylund
    @peterfireflylund 3 роки тому

    "Action potential" is the name of the spike/signal itself, not the name of the activation threshold. It is a really dumb name :(

  • @ziyangchen2301
    @ziyangchen2301 Рік тому

    Why people now do not care about overfitting?