Manifold Mixup: Better Representations by Interpolating Hidden States

Rectified Flow: The Game-Changing Technique Powering Stable Diffusion 3 (Full Reimplementation!)

Should You Focus on LeetCode or Real-World Projects ?

REAL or FAKE? #beatbox #tiktok

人是不能做到吗？#火影忍者 #家人 #佐助

НА ЦЕ можна дивитись ВІЧНО! Такої ПАЛКОЇ зустрічі НІХТО НЕ ЧЕКАВ

Reconciling modern machine learning and the bias-variance trade-off

Yannic Kilcher

Переглядів 12 719

Додати в
- Мій плейлист
- Переглянути пізніше
Поділитися

Поділитися

Вставка

Розмір відео:

Показувати елементи керування програвачем

Автоматичне відтворення

Автоповтор

Опубліковано 28 гру 2024

КОМЕНТАРІ •

@PeterJMPuyneers 4 роки тому ⁺³
I struggled with understanding this paper due to lack of knowledge (conceptually spoken), but after seeing your explanation, everything is clear.
thank you very much
@AntonPanchishin 5 років тому ⁺⁵
Mind blown. Super cool! I have so many tests to rerun with higher parameter count now
@danielbigham 5 років тому ⁺⁵
Fantastic video -- thank you! Fascinating...
@MLDawn 3 роки тому ⁺¹
you did a great job. This just left me speechless!!!
@kristoferkrus 5 років тому ⁺¹¹
Mind blown. Very interesting paper! Does this mean that if you are in the regime where the test loss has started to decrease (as a function of parameters) again and you add more training examples, your test accuracy will get worse because it makes it harder for the optimizer to find a simple function that perfectly mahces the training data? In theory, this could make it beneficial to reduce the number of training examples, but intuitively, that feels wrong.
@YannicKilcher 5 років тому ⁺²
That's a very interesting point. Technically yes, but I agree it seems strange.
@YannicKilcher 5 років тому ⁺³
I think it all comes down to the inductive bias given implicitly by the network architecture and the optimizer. In this framework, adding training data will take capacity away from the inductive bias and potentially worsen your result.
@andreg5206 4 роки тому ⁺⁹
I know this is 10 months old, but at the end of 2019 OpenAI published a paper that suggests exactly what you imply here: openai.com/blog/deep-double-descent/
@kristoferkrus 4 роки тому
@@andreg5206 Yes, I saw that; that's so bizarre! Thanks for reminding me about it :)
@995Fede 5 років тому ⁺²
I started to read this paper during the last days and I confirm that it is really interesting! However, I have some doubts on the way they evaluate the MSE (how do they deal with the fact the function h(x) is complex?) and the zero-one loss/norm of coefficients (since it is a multi-class classification problem, they probably use one-hot encoding, but again how do they deal with the complex h(x)? Moreover, if they use one-hot encoding, the regressor is a 2D matrix, thus what norm are they plotting? L2 norm for matrices?). Did you try to reproduce their plots with the MNIST database? Are these technical passages clear to you? Thank you again for the video!
@DasGrosseFressen 4 роки тому ⁺⁶
A high-complexity solution be like "Braaaah! Brrraah!" 😂👍
@DrAhdol 5 років тому ⁺³
This is an interesting paper; I wonder if this applies to boosting/bagging with models that don't have many parameter options like multinomial naive bayes. Would parameter optimization on ensemble models have the same effect when the baseline model within are linear? Interesting option for some testing here.
@YannicKilcher 5 років тому
Seems worth a try :) don't even know if boosting models can overfit in the classic sense...
@sayakpaul3152 4 роки тому
This is such an amazing study. So many synergies with the Deep Double Descent paper.
@gyeonghokim 3 роки тому ⁺¹
Thanks a lot!
@herp_derpingson 5 років тому ⁺¹
Can you elaborate on the Hilbert space thing? What does Hilbert space to do with neural networks?
@YannicKilcher 5 років тому
That's a bit too much for a YT comment, but the concept is usually well explained in introductory ML classes in the advanced section of kernelized SVMs.
@singhay_mle 5 років тому ⁺¹
Lookup 3BlueBrown's video on it
@herp_derpingson 5 років тому
@@singhay_mle That does not explain what that has to do with neural networks.
@singhay_mle 5 років тому ⁺²
@@herp_derpingson Sure, try this users.umiacs.umd.edu/~hal/docs/daume04rkhs.pdf , also it have more to do with kernel used by SVM/SVC than NN
@agusavior_channel 2 роки тому
Very clear
@ujjwalkar1886 2 роки тому
Is complexity of H means no of features here ?

Наступне

Автоматичне відтворення

Manifold Mixup: Better Representations by Interpolating Hidden States

Manifold Mixup: Better Representations by Interpolating Hidden States

Rectified Flow: The Game-Changing Technique Powering Stable Diffusion 3 (Full Reimplementation!)

Rectified Flow: The Game-Changing Technique Powering Stable Diffusion 3 (Full Reimplementation!)

Should You Focus on LeetCode or Real-World Projects ?

Should You Focus on LeetCode or Real-World Projects ?

REAL or FAKE? #beatbox #tiktok

REAL or FAKE? #beatbox #tiktok

人是不能做到吗？#火影忍者 #家人 #佐助

人是不能做到吗？#火影忍者 #家人 #佐助

НА ЦЕ можна дивитись ВІЧНО! Такої ПАЛКОЇ зустрічі НІХТО НЕ ЧЕКАВ

НА ЦЕ можна дивитись ВІЧНО! Такої ПАЛКОЇ зустрічі НІХТО НЕ ЧЕКАВ

The Security Guard Fell Into The Trap Of The Beauty #still #parkour #funny#skate

The Security Guard Fell Into The Trap Of The Beauty #still #parkour #funny#skate

Shortcut Learning in Deep Neural Networks

Shortcut Learning in Deep Neural Networks

The Bias Variance Trade-Off

The Bias Variance Trade-Off

Grokking: Generalization beyond Overfitting on small algorithmic datasets (Paper Explained)

Grokking: Generalization beyond Overfitting on small algorithmic datasets (Paper Explained)

Deep Networks Are Kernel Machines (Paper Explained)

Deep Networks Are Kernel Machines (Paper Explained)

Rethinking Attention with Performers (Paper Explained)

Rethinking Attention with Performers (Paper Explained)

Statistical Learning: 10.7 Interpolation and Double Descent

Statistical Learning: 10.7 Interpolation and Double Descent

Transformers are RNNs: Fast Autoregressive Transformers with Linear Attention (Paper Explained)

Transformers are RNNs: Fast Autoregressive Transformers with Linear Attention (Paper Explained)

iMAML: Meta-Learning with Implicit Gradients (Paper Explained)

iMAML: Meta-Learning with Implicit Gradients (Paper Explained)

MLBBQ: A U-turn on Double Descent by Sergey Plis

MLBBQ: A U-turn on Double Descent by Sergey Plis

Они Скупали ВСЁ Серебро Мира и вот ЧТО Было Дальше! #shorts

Они Скупали ВСЁ Серебро Мира и вот ЧТО Было Дальше! #shorts

Прочистка шлюзов

Прочистка шлюзов

The Witcher IV - Cinematic Reveal Trailer | The Game Awards 2024

The Witcher IV — Cinematic Reveal Trailer | The Game Awards 2024

СИНИЙ ИНЕЙ УЖЕ ВЫШЕЛ!❄️

СИНИЙ ИНЕЙ УЖЕ ВЫШЕЛ!❄️

"ХИТРЕЦ": Трамп РОЗЛЮТИВ Скабєєву / Оля ЛИЄ ЯДОМ #shorts

"ХИТРЕЦ": Трамп РОЗЛЮТИВ Скабєєву / Оля ЛИЄ ЯДОМ #shorts

УКРАЇНСЬКИЙ ДЕТЕКТИВ | Стоматолог. ТОП СЕРІАЛ. 1,2 серія

УКРАЇНСЬКИЙ ДЕТЕКТИВ | Стоматолог. ТОП СЕРІАЛ. 1,2 серія

Cool Items!🥰 New Gadgets, Smart Appliances, Kitchen Tools Utensils, Home Cleaning, Beauty #shorts

Cool Items!🥰 New Gadgets, Smart Appliances, Kitchen Tools Utensils, Home Cleaning, Beauty #shorts

СКАНДАЛЬНЫЙ бой Али, когда в ринге ему противостояли сразу ДВОЕ #shorts

СКАНДАЛЬНЫЙ бой Али, когда в ринге ему противостояли сразу ДВОЕ #shorts