Do ImageNet Classifiers Generalize to ImageNet? (Paper Explained)

Поділитися
Вставка
  • Опубліковано 2 тра 2024
  • Has the world overfitted to ImageNet? What if we collect another dataset in exactly the same fashion? This paper gives a surprising answer!
    Paper: arxiv.org/abs/1902.10811
    Data: github.com/modestyachts/Image...
    Abstract:
    We build new test sets for the CIFAR-10 and ImageNet datasets. Both benchmarks have been the focus of intense research for almost a decade, raising the danger of overfitting to excessively re-used test sets. By closely following the original dataset creation processes, we test to what extent current classification models generalize to new data. We evaluate a broad range of models and find accuracy drops of 3% - 15% on CIFAR-10 and 11% - 14% on ImageNet. However, accuracy gains on the original test sets translate to larger gains on the new test sets. Our results suggest that the accuracy drops are not caused by adaptivity, but by the models' inability to generalize to slightly "harder" images than those found in the original test sets.
    Authors: Benjamin Recht, Rebecca Roelofs, Ludwig Schmidt, Vaishaal Shankar
    Links:
    UA-cam: / yannickilcher
    Twitter: / ykilcher
    BitChute: www.bitchute.com/channel/yann...
    Minds: www.minds.com/ykilcher
  • Наука та технологія

КОМЕНТАРІ • 20

  • @herp_derpingson
    @herp_derpingson 4 роки тому +8

    This is awesome! I was not able to hold on to my papers.
    Its interesting to see why nobody thought of accuracy as a function of both skill and difficulty before.

  • @jrkirby93
    @jrkirby93 4 роки тому +7

    So, to sum it up: "Better models will struggle less on harder test sets."
    I'd call this statement "the difficulty bias". I think this work does not prove that overfitting never occurs on imagenet. But it does show that the difficulty bias is a stronger effect than overfitting bias. So if overfitting to the imagenet test set does occur, it's probably not a particularly strong effect.

    • @Vaishaal
      @Vaishaal 4 роки тому +1

      I agree this work doesn't *prove* overfitting doesn't happen, this work + a few other related works imply adaptive overfitting isn't a *huge* issue in ML.
      1. papers.nips.cc/paper/9117-a-meta-analysis-of-overfitting-in-machine-learning
      2. papers.nips.cc/paper/9190-model-similarity-mitigates-test-set-overuse

    • @YannicKilcher
      @YannicKilcher  4 роки тому +3

      True, I just find it's generally not what anyone would have expected.

  • @prachi07kgp
    @prachi07kgp 3 роки тому +2

    Wow, thanks for putting it so succintly, saved me so much time

  • @drdca8263
    @drdca8263 3 роки тому

    The super-holdout idea seems like a good idea, if it isn’t too costly. I hope people start to do that.

  • @ash3844
    @ash3844 Рік тому

    Hi, loved this content. But the base architecture at least half of it resembles tacotron2. could you pls make a detailed video on tacotron2 architecture. Thanks in advance.

  • @arpitaggarwal7167
    @arpitaggarwal7167 3 роки тому +1

    So can one say that transfer learning is here to stay or overfitting to ImageNet dataset still a possibility?

  • @bluel1ng
    @bluel1ng 4 роки тому +4

    Strange plots in Fig. 1 @ 5:00 : Why did they not use the same axis-scaling for new and original accuracy? The XY-ranges are so similar that a non-skewed projection would have been no problem at all.

    • @Vaishaal
      @Vaishaal 4 роки тому +6

      This was simply done for aesthetic reasons. Using the same axis scaling produces a lot of white space.

    • @mgpoirot
      @mgpoirot Рік тому

      @@Vaishaal Surely, can't have that in a 72-page document!

  • @Guytron95
    @Guytron95 4 роки тому +3

    I wonder if a third set produced by 50/50 randomly selecting instances from each set would fall half-way between the 2 linear relations.

    • @rainerzufall1868
      @rainerzufall1868 4 роки тому +3

      Yes, but that is obvious. Split the datapoints in the top-1 error into two sums (1 per dataset) and you see that you are just averaging the two error rates!

    • @Vaishaal
      @Vaishaal 4 роки тому +2

      Yes this is exactly what would happen.

  • @dribnet1
    @dribnet1 4 роки тому +3

    great summary. wouldn't an easy and revealing experiment here be training a binary classifier to discriminate between the old and new test set?

    • @YannicKilcher
      @YannicKilcher  4 роки тому +3

      They are doing this in the paper appendix, they reach about 53% accuracy or so.

    • @MrSystemStatic
      @MrSystemStatic 4 роки тому +1

      @@YannicKilcher That's just guessing, at that point.

  • @tristanwegner
    @tristanwegner Рік тому +1

    Is there anything stopping cheating researchers from training on the test set (original) itself, to get more klout for models that perform well? I mean even a new test set like this would not reveal such cheating, if the underlying model is at least descent, because the cheater basically had a bigger dataset to work with, which should lead to better generalization to the V2 testset.

  • @not_a_human_being
    @not_a_human_being 3 роки тому +1

    They should've "calibrated it"(by throwing away images) on some older models to make sure scores match FIRST, and only THEN do their comparison! Not an expert but can't see why this wasn't done.