Self-Training with Noisy Student (87.4% ImageNet Top-1 Accuracy!)

Поділитися
Вставка
  • Опубліковано 11 лис 2019
  • This video explains the new state-of-the-art for ImageNet classification from Google AI. This technique has a very interesting approach to Knowledge Distillation. Rather than using the student network for model compression (usually either faster inference or less memory requirements), this approach iteratively scales up the capacity of the student network. This video also explains other interesting characteristics in the paper such as the use of noise via data augmentation, dropout, and stochastic depth, as well as ideas like class balancing with pseudo-labels!
    Thanks for watching! Please Subscribe!
    Self-training with Noisy Student improves ImageNet classification: arxiv.org/pdf/1911.04252.pdf
    EfficientNet: Rethinking Model Scaling for Convolutional Networks
    arxiv.org/pdf/1905.11946.pdf
    Billion-scale semi-supervised learning for state-of-the art image and video classification:
    / billion-scale-semi-sup...
  • Наука та технологія

КОМЕНТАРІ • 31

  • @Thomasssien
    @Thomasssien 4 роки тому

    A very accessible discussion of the paper that gets its main ideas across well. Thanks, subbed!

  • @neutrinocoffee1151
    @neutrinocoffee1151 4 роки тому +1

    Thanks for making this video! Clear and succinct.

  • @SergePanev
    @SergePanev 4 роки тому +1

    Subbed and recommended to all the people I know :)
    Keep up the great work!

  • @geo2073
    @geo2073 4 роки тому +2

    Loved it! Please continue more of this!

  • @BlakeEdwards333
    @BlakeEdwards333 4 роки тому +1

    Amazing! Thank you! I do research involving knowledge distillation and this is amazing motivation to keep moving in that direction

    • @connorshorten6311
      @connorshorten6311  4 роки тому

      Thank you!! I have been really interested in distillation lately as well! Planning on making a few videos of odd papers I came across while doing a literature review of distillation!

    • @BlakeEdwards333
      @BlakeEdwards333 4 роки тому +1

      @@connorshorten6311 Awesome, please do! I'd love to talk sometime with you about my work.

    • @connorshorten6311
      @connorshorten6311  4 роки тому

      @@BlakeEdwards333 That sounds great! Could you send me an email at henryailabs@gmail.com to discuss this? Looking forward to it!

  • @hocusbogus7930
    @hocusbogus7930 4 роки тому +1

    Thank you for your work, I greatly appreciate it. One suggestion for future episodes: if applicable, could you dedicate a slide or a few sentences to the novelty and/or contributions as stated in the paper? That would be great to know.
    In this paper I'm inclined to think it was the stochastic depth noise, and I couldn't make out if anything else was new.

    • @connorshorten6311
      @connorshorten6311  4 роки тому

      Thank you so much for the suggestion! I don't think the incremental scaling up of the student networks was the key idea as well (in the paper they attribute +0.5% to this compared to 1.9% for the noise trio of SD, Dropout, and Data Augmentation). I think the scale of the experiments is novel as well, definitely a computationally intense workload of going form EfficientNet-B7 to the bigger L0, L1, L2 models and illustrating this underexplored / underpromoted training methodology. I think generally it flips common knowledge on its head in the sense that we usually think of distillation as a technique for reducing capacity / faster models, but this shows how it can be used for larger models as well. I haven't completely surveyed the literature on distillation, please share any papers similar to this study!

  • @ArturHolanda91
    @ArturHolanda91 4 роки тому +1

    Thank you and keep up the good work.

  • @sayakpaul3152
    @sayakpaul3152 4 роки тому

    Thank you as always. Could you elaborate a bit more on the stochastic depth & class balancing parts?

  • @arkasaha4412
    @arkasaha4412 4 роки тому +1

    Awesome work! Can you consider doing a series dedicated to the intuition behind loss functions and their use cases?

    • @connorshorten6311
      @connorshorten6311  4 роки тому

      Thank you and thanks for the suggestion, it sounds like an interesting project!

    • @arkasaha4412
      @arkasaha4412 4 роки тому +1

      @@connorshorten6311 Really looking forward to it!

  • @LouisChiaki
    @LouisChiaki 3 роки тому

    Can you turn on the caption! Very appreciated!

  • @mikemihay
    @mikemihay 4 роки тому +1

    Thank you!

    • @connorshorten6311
      @connorshorten6311  4 роки тому +1

      Thank you!

    • @mikemihay
      @mikemihay 4 роки тому +1

      @@connorshorten6311 It helps me so much, please don't stop making this videos

  • @jonatan01i
    @jonatan01i 4 роки тому +1

    Train two teacher networks and train student only on images that the teachers agree on its content.

    • @maxmetz01
      @maxmetz01 4 роки тому +1

      Hey, I know it's been five months. I really like your idea. Did you try it out?

    • @jonatan01i
      @jonatan01i 4 роки тому

      @@maxmetz01 Nope, sadly no.

  • @mendi1122
    @mendi1122 4 роки тому +3

    Only 1% improvement

    • @zhijingli6691
      @zhijingli6691 4 роки тому +2

      well nowadays it's hard to improve, even 1%

    • @connorshorten6311
      @connorshorten6311  4 роки тому +6

      Usain Bolt only set the 100m record by 0.28 seconds lol