Manifold Mixup: Better Representations by Interpolating Hidden States

Поділитися
Вставка
  • Опубліковано 31 гру 2024

КОМЕНТАРІ • 32

  • @vincenzodelzoppo9125
    @vincenzodelzoppo9125 4 роки тому +8

    Nice paper about regularization.
    Such an elegant solution to districate manifolds in hidden states. Most of the networks I have seen they basically learn only in the last layers. While the backbone just extracts king of random features.

  • @herp_derpingson
    @herp_derpingson 5 років тому +17

    So many papers in rapid succession. This guy is on fire!
    \m/

    • @YannicKilcher
      @YannicKilcher  5 років тому +11

      or I'm just procrastinating on doing the dishes :p

    • @valthorhalldorsson9300
      @valthorhalldorsson9300 4 роки тому +5

      It's 9 months later and based on the rate of new videos I'm starting to worry you'll never get around to those dishes

    • @CosmiaNebula
      @CosmiaNebula 4 роки тому

      @@valthorhalldorsson9300 sooner than he gets to the dishes, a robot arm would be doing the dishes

    • @lucahugh7209
      @lucahugh7209 3 роки тому

      you all prolly dont care at all but does someone know a method to log back into an instagram account..?
      I stupidly forgot the account password. I love any help you can offer me!

    • @aidentroy4892
      @aidentroy4892 3 роки тому

      @Luca Hugh Instablaster :)

  • @dermitdembrot3091
    @dermitdembrot3091 3 роки тому +2

    If the bottleneck layer makes the data linearly separable it may as well just be the last hidden layer. In that case this seems to be a technique for making the last hidden representation not just linearly separable but well-spaced. And I think it would induce the softmax inputs to seek an area where softmax is approximately linear.

  • @turbocaveman
    @turbocaveman 2 роки тому

    This is so coooool. It’s like saying here’s a cat, here’s a dog, here’s a mix of both.

    • @mucabi
      @mucabi 6 місяців тому

      It's exactly that. Basically it's the extension of MixUp data augmentation to the whole NN. Each layer has an input and an output and each layer learns individually the best representation. Now we are treating the latent representation of the previous layer (e.g cat, dog) as our input and smooth those accordingly.

  • @frenchmarty7446
    @frenchmarty7446 2 роки тому

    I agree with your point that not every layer (especially the lower layers) will or should be linearly separable.
    However I think the objective of manifold mixup is to act as more of a regularization penalty, a given layer should be non-linearly separable only in so far as the benefits (to accuracy) overcome the penalty of mixup. The mixup adds a bias towards linearity but not a strict requirement.
    Like all regularization methods there will probably have to be a lot more fine tuning and testing before we know if, when and how it gives the right bias variance trade-off.

  • @yanjieze
    @yanjieze 2 роки тому

    Thanks! your paper explanation is really awesome!!!

  • @kevon217
    @kevon217 Рік тому

    great explanation. thanks!

  • @dude8309
    @dude8309 5 років тому +2

    Wow! Super interesting paper and great insights.

  • @levikok1810
    @levikok1810 4 роки тому +1

    Great video, thanks a million!

  • @selfhelp119
    @selfhelp119 4 роки тому

    amazing technique!

  • @ulissemini5492
    @ulissemini5492 3 роки тому +11

    I like the video, but it's at 256 likes right now so I can't disturb the balance, sorry!

  • @ЗакировМарат-в5щ
    @ЗакировМарат-в5щ 4 роки тому

    As I understand this technique is also good for NN prunning

  • @EngineerNick
    @EngineerNick 3 роки тому

    Thankyou! :)

  • @rahuldeora5815
    @rahuldeora5815 5 років тому +2

    Nice

  • @AntonPanchishin
    @AntonPanchishin 5 років тому +1

    This video is another great Colab candidate. colab.research.google.com/drive/1qUDe3ENm3fnxND7iibyEF1Ixcw7nu4mK . Thanks again Yannic! Your video inspired me to create a colab ipython notebook that tested out this architecture. I love the concept! It was a pain to implement using Tensorflow Keras Layers. It does appear to help. I also decided that instead of just comparing it to a vanilla classifier that we could compare it to the "Worst" classifier from your other video about "Focusing on the Biggest Losers". Have a great weekend

  • @dimitriognibene8945
    @dimitriognibene8945 4 роки тому

    So many new hyper parameters...

  • @meditationMakesMeCranky
    @meditationMakesMeCranky 5 років тому +1

    I am not an expert, and I have not read the paper carefully, but this method seems more like a fancy data augmentation method rather than regularization.
    Also, there is something to be said about the spiral example, I personally think that batch norm does a very good job. It is not good enough because we, humans, are biased and we "know" from experience and by guessing the intentions of whomever made the dataset the true representation :)

    • @levikok1810
      @levikok1810 4 роки тому +2

      Good point. I would say it's somewhere in between. You sort of create new 'averaged' samples to learn the model to be 'unsure' sometimes and this way the model converges to be more stable representation.

    • @flightrisk7566
      @flightrisk7566 3 роки тому

      @@levikok1810 that analogy reminds me of DINO and CutMix