Self-Training with Noisy Student (87.4% ImageNet Top-1 Accuracy!)

Поділитися
Вставка
  • Опубліковано 25 січ 2025
  • This video explains the new state-of-the-art for ImageNet classification from Google AI. This technique has a very interesting approach to Knowledge Distillation. Rather than using the student network for model compression (usually either faster inference or less memory requirements), this approach iteratively scales up the capacity of the student network. This video also explains other interesting characteristics in the paper such as the use of noise via data augmentation, dropout, and stochastic depth, as well as ideas like class balancing with pseudo-labels!
    Thanks for watching! Please Subscribe!
    Self-training with Noisy Student improves ImageNet classification: arxiv.org/pdf/...
    EfficientNet: Rethinking Model Scaling for Convolutional Networks
    arxiv.org/pdf/...
    Billion-scale semi-supervised learning for state-of-the art image and video classification:
    / billion-scale-semi-sup...

КОМЕНТАРІ • 31

  • @Thomasssien
    @Thomasssien 4 роки тому

    A very accessible discussion of the paper that gets its main ideas across well. Thanks, subbed!

  • @geo2073
    @geo2073 5 років тому +2

    Loved it! Please continue more of this!

  • @neutrinocoffee1151
    @neutrinocoffee1151 5 років тому +1

    Thanks for making this video! Clear and succinct.

  • @BlakeEdwards333
    @BlakeEdwards333 5 років тому +1

    Amazing! Thank you! I do research involving knowledge distillation and this is amazing motivation to keep moving in that direction

    • @connor-shorten
      @connor-shorten  5 років тому

      Thank you!! I have been really interested in distillation lately as well! Planning on making a few videos of odd papers I came across while doing a literature review of distillation!

    • @BlakeEdwards333
      @BlakeEdwards333 5 років тому +1

      @@connor-shorten Awesome, please do! I'd love to talk sometime with you about my work.

    • @connor-shorten
      @connor-shorten  5 років тому

      @@BlakeEdwards333 That sounds great! Could you send me an email at henryailabs@gmail.com to discuss this? Looking forward to it!

  • @hocusbogus7930
    @hocusbogus7930 5 років тому +1

    Thank you for your work, I greatly appreciate it. One suggestion for future episodes: if applicable, could you dedicate a slide or a few sentences to the novelty and/or contributions as stated in the paper? That would be great to know.
    In this paper I'm inclined to think it was the stochastic depth noise, and I couldn't make out if anything else was new.

    • @connor-shorten
      @connor-shorten  5 років тому

      Thank you so much for the suggestion! I don't think the incremental scaling up of the student networks was the key idea as well (in the paper they attribute +0.5% to this compared to 1.9% for the noise trio of SD, Dropout, and Data Augmentation). I think the scale of the experiments is novel as well, definitely a computationally intense workload of going form EfficientNet-B7 to the bigger L0, L1, L2 models and illustrating this underexplored / underpromoted training methodology. I think generally it flips common knowledge on its head in the sense that we usually think of distillation as a technique for reducing capacity / faster models, but this shows how it can be used for larger models as well. I haven't completely surveyed the literature on distillation, please share any papers similar to this study!

  • @SergePanev
    @SergePanev 5 років тому +1

    Subbed and recommended to all the people I know :)
    Keep up the great work!

  • @LouisChiaki
    @LouisChiaki 4 роки тому

    Can you turn on the caption! Very appreciated!

  • @sayakpaul3152
    @sayakpaul3152 4 роки тому

    Thank you as always. Could you elaborate a bit more on the stochastic depth & class balancing parts?

  • @arkasaha4412
    @arkasaha4412 5 років тому +1

    Awesome work! Can you consider doing a series dedicated to the intuition behind loss functions and their use cases?

    • @connor-shorten
      @connor-shorten  5 років тому

      Thank you and thanks for the suggestion, it sounds like an interesting project!

    • @arkasaha4412
      @arkasaha4412 5 років тому +1

      @@connor-shorten Really looking forward to it!

  • @ArturHolanda91
    @ArturHolanda91 5 років тому +1

    Thank you and keep up the good work.

  • @mikemihay
    @mikemihay 5 років тому +1

    Thank you!

    • @connor-shorten
      @connor-shorten  5 років тому +1

      Thank you!

    • @mikemihay
      @mikemihay 5 років тому +1

      @@connor-shorten It helps me so much, please don't stop making this videos

  • @jonatan01i
    @jonatan01i 5 років тому +1

    Train two teacher networks and train student only on images that the teachers agree on its content.

    • @maxmetz01
      @maxmetz01 4 роки тому +1

      Hey, I know it's been five months. I really like your idea. Did you try it out?

    • @jonatan01i
      @jonatan01i 4 роки тому

      @@maxmetz01 Nope, sadly no.

  • @mendi1122
    @mendi1122 5 років тому +3

    Only 1% improvement

    • @Riitasusei
      @Riitasusei 5 років тому +2

      well nowadays it's hard to improve, even 1%

    • @connor-shorten
      @connor-shorten  5 років тому +6

      Usain Bolt only set the 100m record by 0.28 seconds lol