Do ImageNet Classifiers Generalize to ImageNet? (Paper Explained)

Поділитися
Вставка
  • Опубліковано 16 гру 2024

КОМЕНТАРІ • 22

  • @herp_derpingson
    @herp_derpingson 4 роки тому +8

    This is awesome! I was not able to hold on to my papers.
    Its interesting to see why nobody thought of accuracy as a function of both skill and difficulty before.

  • @jrkirby93
    @jrkirby93 4 роки тому +8

    So, to sum it up: "Better models will struggle less on harder test sets."
    I'd call this statement "the difficulty bias". I think this work does not prove that overfitting never occurs on imagenet. But it does show that the difficulty bias is a stronger effect than overfitting bias. So if overfitting to the imagenet test set does occur, it's probably not a particularly strong effect.

    • @Vaishaal
      @Vaishaal 4 роки тому +1

      I agree this work doesn't *prove* overfitting doesn't happen, this work + a few other related works imply adaptive overfitting isn't a *huge* issue in ML.
      1. papers.nips.cc/paper/9117-a-meta-analysis-of-overfitting-in-machine-learning
      2. papers.nips.cc/paper/9190-model-similarity-mitigates-test-set-overuse

    • @YannicKilcher
      @YannicKilcher  4 роки тому +3

      True, I just find it's generally not what anyone would have expected.

  • @bluel1ng
    @bluel1ng 4 роки тому +4

    Strange plots in Fig. 1 @ 5:00 : Why did they not use the same axis-scaling for new and original accuracy? The XY-ranges are so similar that a non-skewed projection would have been no problem at all.

    • @Vaishaal
      @Vaishaal 4 роки тому +6

      This was simply done for aesthetic reasons. Using the same axis scaling produces a lot of white space.

    • @I-did-not-ask-for-a-handle
      @I-did-not-ask-for-a-handle Рік тому

      @@Vaishaal Surely, can't have that in a 72-page document!

  • @prachi07kgp
    @prachi07kgp 4 роки тому +2

    Wow, thanks for putting it so succintly, saved me so much time

  • @znotft
    @znotft 5 місяців тому

    Shouldn't test set v1 and v2 indistinguishable?

  • @Guytron95
    @Guytron95 4 роки тому +3

    I wonder if a third set produced by 50/50 randomly selecting instances from each set would fall half-way between the 2 linear relations.

    • @rainerzufall1868
      @rainerzufall1868 4 роки тому +3

      Yes, but that is obvious. Split the datapoints in the top-1 error into two sums (1 per dataset) and you see that you are just averaging the two error rates!

    • @Vaishaal
      @Vaishaal 4 роки тому +2

      Yes this is exactly what would happen.

  • @arpitaggarwal7167
    @arpitaggarwal7167 4 роки тому +1

    So can one say that transfer learning is here to stay or overfitting to ImageNet dataset still a possibility?

  • @ash3844
    @ash3844 2 роки тому

    Hi, loved this content. But the base architecture at least half of it resembles tacotron2. could you pls make a detailed video on tacotron2 architecture. Thanks in advance.

  • @drdca8263
    @drdca8263 4 роки тому

    The super-holdout idea seems like a good idea, if it isn’t too costly. I hope people start to do that.

  • @dribnet1
    @dribnet1 4 роки тому +3

    great summary. wouldn't an easy and revealing experiment here be training a binary classifier to discriminate between the old and new test set?

    • @YannicKilcher
      @YannicKilcher  4 роки тому +3

      They are doing this in the paper appendix, they reach about 53% accuracy or so.

    • @MrSystemStatic
      @MrSystemStatic 4 роки тому +1

      @@YannicKilcher That's just guessing, at that point.

  • @tristanwegner
    @tristanwegner 2 роки тому +1

    Is there anything stopping cheating researchers from training on the test set (original) itself, to get more klout for models that perform well? I mean even a new test set like this would not reveal such cheating, if the underlying model is at least descent, because the cheater basically had a bigger dataset to work with, which should lead to better generalization to the V2 testset.

  • @ego_sum_liberi
    @ego_sum_liberi 4 місяці тому

    Awesome.. Thank you!!!

  • @not_a_human_being
    @not_a_human_being 4 роки тому +1

    They should've "calibrated it"(by throwing away images) on some older models to make sure scores match FIRST, and only THEN do their comparison! Not an expert but can't see why this wasn't done.