[Classic] ImageNet Classification with Deep Convolutional Neural Networks (Paper Explained)

Поділитися
Вставка
  • Опубліковано 17 лис 2024

КОМЕНТАРІ • 61

  • @aa-xn5hc
    @aa-xn5hc 4 роки тому +66

    I love these historical papers

    • @Dendus90
      @Dendus90 4 роки тому +15

      7 years old paper is called historical. This is what should be called progress.

  • @05xpeter
    @05xpeter 4 роки тому +17

    For somebody that left machine learning 3 years ago to move to software development, these videos are pure gold in terms of catching up with cutting edge and knowing what I learned back then is still relevant for today.

    • @agenticmark
      @agenticmark 11 місяців тому

      going the other direction here, 30 years coding, but now I wont touch any coding that isnt machine learning :D his videos are certainly gold.

  • @DiegoJimenez-ic8by
    @DiegoJimenez-ic8by 4 роки тому +18

    The paper that everyones cites in the introduction. Thanks for sharing!!

  • @MinecraftLetstime
    @MinecraftLetstime 4 роки тому +2

    What I love about these the most is when he mentions what is being used now compared to the paper and all these little gems of information here and there on top of the paper.

  • @CosmiaNebula
    @CosmiaNebula 2 роки тому

    26:20 The VGG (2014) paper "Very deep convolutional networks for large-scale image recognition
    " mentioned explicitly that they tried LRN and found it's not worth the trouble.
    > First, we note that using local response normalisation (A-LRN network) does not improve on the model A without any normalisation layers. We thus do not employ normalisation in the deeper architectures (B-E).
    Again, in Batch normalization: Accelerating deep network training by reducing internal covariate shift
    (2015):
    > Remove Local Response Normalization. While Inception and other networks (Srivastava et al., 2014) benefit from it, we found that with Batch Normalization it is not necessary.

  • @herp_derpingson
    @herp_derpingson 4 роки тому +27

    30:20 SqueezeNet and LeNet talked about it. Having a 9x9x9 conv is equivalent to having 3 3x3x3 filters but takes cubic more time to compute.
    .
    31:37 Global Max Pool was introduced later I think, it helps when dimensions of image is variable. 32:00 Global max pooling also means that the dense params can be dialed back a lot.
    .
    66406 citations as of now. Crazy. Good paper, keep it coming.
    .
    Can you do one on Adam Optimizer? I think it is so ubiquitous that people dont even cite it anymore XD

  • @Alex-ms1yd
    @Alex-ms1yd 4 роки тому +2

    Thanks a ton for this series! And clarifications of what techniques stayed and which are gone is highly appreciated!

  • @fulin3397
    @fulin3397 4 роки тому +4

    I genuinely wish I had a teacher like Yannic five years ago

  • @MrjbushM
    @MrjbushM 4 роки тому +4

    I love the classical paper series i do not have masters or phd, but I want to learn deep learning this series help us cover the basics

  • @pranshu041
    @pranshu041 3 роки тому +2

    I don't know why I am paying for college...You videos are amazing!

  • @michaelcho
    @michaelcho 4 роки тому +1

    Thank you so much for spending time to walk thro the paper. The world's a better place cos of folks like you!

  • @MikeAirforce111
    @MikeAirforce111 4 роки тому +3

    Wow! This is such a great idea (classic paper series)! Love it :D

  • @darioushs
    @darioushs 4 роки тому +1

    4:20 One of the main reasons it's not used as much is because BatchNorm has the same over all effect but much better. That being said, I still find dropout to be quite effective at lowering overfitting. Specially on lower sample sizes, and when used on the dense layers at the end.

  • @sacramentofwilderness6656
    @sacramentofwilderness6656 4 роки тому +1

    In the beginning was the AlexNet, and the AlexNet was with DNN, and the Alexnet was DNN.

  • @somecalc4964
    @somecalc4964 4 роки тому

    Interesting! You have sparked my interest in learning about the present state of dropout layers.

  • @yaxiongzhao6640
    @yaxiongzhao6640 4 роки тому +12

    I was told that this guy is on a break?!

    • @bingbingsun6304
      @bingbingsun6304 4 роки тому +1

      He is so productive even in a break.

    • @peterfireflylund
      @peterfireflylund 4 роки тому +2

      Apparently not from a reliable source ;)

    • @ruiwang780
      @ruiwang780 4 роки тому +1

      He is, the classic paper videos are usually pre-recorded

  • @surajrao9729
    @surajrao9729 Рік тому +1

    enjoyed the way you presented it, thank you

  • @bengineer_the
    @bengineer_the 4 роки тому

    Wonderfully presented, thank you! :) I look forward to taking the rest of the journey through this subject with you and your channel. :)

  • @vishnum9613
    @vishnum9613 4 роки тому +3

    you should make more classics...i just love knowing about them

  • @AVINASHKUMAR-cz9sm
    @AVINASHKUMAR-cz9sm 3 роки тому

    Thanks for the explanation, It helped me understand and learn a lot things which I couldn't have, if I had read the paper by myself

  • @rishikaushik8307
    @rishikaushik8307 4 роки тому +4

    love your channel, how can someone be up to date with new advancements in ML/DL? like its counter intuitive to me that larger models overfit less

    • @andyblue953
      @andyblue953 4 роки тому +1

      To prevent overfitting, the loss landscape should be as smooth as possible so that the model can better generalize. If I remember correctly, residual connections + batch normalization help smoothing the loss landscape, maybe because they allow us to build even deeper models.
      My bet for the future is actually on Bayesian networks, which use a learnable version of a Gaussian dropout.

  • @barisois
    @barisois 3 роки тому

    Thank you for a beautiful explanation with a retrospective overlook

  • @aritraroygosthipaty3662
    @aritraroygosthipaty3662 4 роки тому +1

    In test time the crops and reflections were used by another paper called OverFeat, that crushed '13 imagenet detection challenge I suppose. (46:06)

  • @vtrandal
    @vtrandal Рік тому

    You need a microphone that’s not so sensitive it records every tiny little sound. Actually, it could be automatic gain control (AGC) that increases amplification when you are not speaking and records every time you swallow with more volume than you want. Recommendation: Turn off AGC if you can.

  • @sonOfLiberty100
    @sonOfLiberty100 4 роки тому +1

    I love your channel, Im one of the first who watch your videos and smash the like button.

  • @galchinsky
    @galchinsky 3 роки тому

    A lot of stuff mentioned like dropout become out of fashion when batch normalization was introduced. I still use it in denoisers though

  • @xl0xl0xl0
    @xl0xl0xl0 3 роки тому

    Please, more videos like this one!

  • @ai_station_fa
    @ai_station_fa 2 роки тому

    Thanks for this great video! I really enjoyed.

  • @Fortnite_king954
    @Fortnite_king954 4 роки тому +1

    Amazing, can you do their book as well :D

  • @liuwu7350
    @liuwu7350 3 роки тому +1

    Does anyone know where the statement large model doesn't overfit 7:56 come from

    • @YannicKilcher
      @YannicKilcher  3 роки тому +1

      search for "deep learning double descent"

  • @jesschil266
    @jesschil266 4 роки тому +2

    This is helping me with my literature review assignment, hahaha 😂 thank you!

  • @vivekpandian08
    @vivekpandian08 4 роки тому +1

    Hi, i am following ur channel since DERT. Can you make a video explains DeepSORT?
    TIA

  • @vladimirbosinceanu5778
    @vladimirbosinceanu5778 2 роки тому

    Great video! Thank you!

  • @axequalsb8431
    @axequalsb8431 4 роки тому +1

    So you saying people don’t care about parameter size?

  • @tiziangottschlich8800
    @tiziangottschlich8800 2 роки тому

    the link doesn't work properly bc download doesnt start

  • @arshikantony3021
    @arshikantony3021 4 роки тому

    Do you have a powerpoint presentation on this paper?

  • @forcanadaru
    @forcanadaru 4 роки тому

    Great channel, subscribed, liked!

  • @brandomiranda6703
    @brandomiranda6703 3 роки тому

    I can nearly guarantee that their net did not really overfit. I’ve trained many nets and past lab mates have increased the number of parameters and the test loss never keeps increasing. Feel free to check my google scholar for paper examples of what I mean but I am sure their nets did fine especially since they didnt quantify if their neg overfit.
    Regardless fun paper!

  • @geetumolbabu5696
    @geetumolbabu5696 3 роки тому

    Can alexnet detect multiple objects in single frame ?

  • @MinecraftLetstime
    @MinecraftLetstime 4 роки тому +1

    So if dropout is not being used, what is being used now?

    • @andyblue953
      @andyblue953 4 роки тому

      My bet for the future is on Bayesian networks, which use a sort of a learnable version of a Gaussian dropout.

    • @AndreiMargeloiu
      @AndreiMargeloiu 4 роки тому

      Global Average Pooling

  • @nulliusinverba7732
    @nulliusinverba7732 4 роки тому +1

    Does anyone know what he's using to annotate over the papers?

  • @utku_yucel
    @utku_yucel 4 роки тому +1

    woow. It's really classic.

  • @Aniket7Tomar
    @Aniket7Tomar 4 роки тому +5

    Yannic is gonna be in the stone age soon. Can't wait for the invention of wheel.

  • @shivamraisharma1474
    @shivamraisharma1474 4 роки тому

    38.33 why don't people use dropout anymore?

  • @Plutos11
    @Plutos11 Рік тому

    👍

  • @peterfireflylund
    @peterfireflylund 4 роки тому

    "Action potential" is the name of the spike/signal itself, not the name of the activation threshold. It is a really dumb name :(

  • @ziyangchen2301
    @ziyangchen2301 2 роки тому

    Why people now do not care about overfitting?