You got SO CLOSE to mentioning the bias-variance tradeoff, you even described it quite well, but you didn’t. It’s the name of the rule underlying this principle, and it runs much deeper than machine learning, even describing some aspects of information theory! Very worth understanding, and I highly recommend anyone interested looks it up.
@@gotri7hmmm pretty obvious since all things that are things are at least similar but actually exactly equivalent to themselves That observation is more recurrent than helpful LOL
I take advantage of this ALL the time on tests and exams. Instead of memorizing specific things word-for-word (which i find painfully slow and just painful in general the few times i do it) i just moderately memorize the few key points and talk out my ass.
This also remembers me kind of the "Einstellung" effekt which makes worked examples more useful at least if you are interleaving enough because like that you better learn when f.e. use the product rule.
When i was a teen i thought i could make a machine learning that is made by bayesian logic but also works like decision tree to classify(but instead of branching deeeply, it uses and and or logic to classify) even tho i dont have deep understanding of programming even by now
Wild educated guess: you're still a teen or below and you're not from the U.S. - Cover up ("when i was a teen") - Grammar ("have deep understanding") - Lack of capitalization Farewell and git gud at covering things up lmao
Hey, thank you for posting this video. I wasn't in a mood to dive into this specific topic deeply, but just watching segments of this video and hearing this nice voice with this nice music, contemplating graphs and formulas were so relaxing and kind of inspiring, I feel better now and going back to my learning routine. Great impact on curious minds, dear Author.
I agree. Any curve can be described by combining a group of sine waves using a Fourier transform. It seems much more efficient to catalog these frequencies when describing the fit of a function, and would respond better to additional data when tuning against validation.
But in overparameterized regime, it starts to generalized after the interpolation threshold in the second descent. Does that mean we study even more 😅?
I didn't watch the whole video but the opening analogy is clearly wrong. Overfitting is when your model is too powerful for the amount of data you have and your problem. Or seen from the opposite perspective, you don't have enough data for your model. When you study, you're (supposedly) not touching the complexity of your model (your brain, number of neurons and connexions it has) but you're adding more data. How could this lead to overfitting? The problem could be that you're choosing the wrong exercises to practice, focusing on a small part of your curriculum. But then, it's not overfitting, it's biased data.
Thanks for the comment! When you do practice problems for an exam, you are training your brain's problem-solving model to solve the exam questions. Overfitting occurs when people memorise the answers to the practice problems instead of understanding the reasoning behind (or in some scenarios, overcomplicating things). Towards the end of the video, I touched upon this and explained how we can avoid overfitting by training our brain using practice problems, but then use a timed mock exam as validation to ensure that our brain is not just memorising the specific answers to the practice questions but actually understanding them and applying the knowledge to unseen questions. So, overfitting is not a result of adding more data points (as you have corrrectly mentioned), but a result of focusing too much on the specific examples or patterns in the training set without generalising the underlying principles.
You got SO CLOSE to mentioning the bias-variance tradeoff, you even described it quite well, but you didn’t. It’s the name of the rule underlying this principle, and it runs much deeper than machine learning, even describing some aspects of information theory! Very worth understanding, and I highly recommend anyone interested looks it up.
You are 100% correct
Great catch, thanks for pointing out!
life is more like differential equations rather than polynomials
Life is more like life rather than mathematics.
@@gotri7hmmm pretty obvious since all things that are things are at least similar but actually exactly equivalent to themselves
That observation is more recurrent than helpful LOL
Hmm idk about that i think life is about differintengral equations.
Especially a system of non-linear PDEs - life is chaotic and there are no general solutions!
bro how about normal distribution or statistical models?
what bro sends me after I tell him he should study for the final instead of gaming all day
Haha! But more people underfit by barely studying at all!
Your channel is CRIMINALLY underrated. This was a brilliant watch.
Thank you! That means a lot!
Never saw it that way but it's true. I definitely overfit when studying for exams. Thank you for the insightful video!!
You’re welcome!
I take advantage of this ALL the time on tests and exams. Instead of memorizing specific things word-for-word (which i find painfully slow and just painful in general the few times i do it) i just moderately memorize the few key points and talk out my ass.
This also remembers me kind of the "Einstellung" effekt which makes worked examples more useful at least if you are interleaving enough because like that you better learn when f.e. use the product rule.
Never thought about the Barnum effect as overfitting thanks for the insight :)
When i was a teen i thought i could make a machine learning that is made by bayesian logic but also works like decision tree to classify(but instead of branching deeeply, it uses and and or logic to classify) even tho i dont have deep understanding of programming even by now
Wild educated guess: you're still a teen or below and you're not from the U.S.
- Cover up ("when i was a teen")
- Grammar ("have deep understanding")
- Lack of capitalization
Farewell and git gud at covering things up lmao
Have you heard about grokking though (generalization beyond overfitting in machine learning) ?
Never heard of that. Let me look that up!
Was coming here to mention grokking.
When I watched this vid I thought you had at least 100k subscribers and years of youtube expirience. Great video! keep it up!
Thanks so much! That really means a lot!
I thought the same thing! This really is a great and professional video
OMG same
Ya same
Amazing content, you deserve more recognition for the quality you produce!
Hey, thank you for posting this video.
I wasn't in a mood to dive into this specific topic deeply, but just watching segments of this video and hearing this nice voice with this nice music, contemplating graphs and formulas were so relaxing and kind of inspiring, I feel better now and going back to my learning routine.
Great impact on curious minds, dear Author.
Thank you!
I avoid overfitting by not studying at all
Maybe a polynomial for a temperature evolution is just completely wrong in general
I agree. Any curve can be described by combining a group of sine waves using a Fourier transform. It seems much more efficient to catalog these frequencies when describing the fit of a function, and would respond better to additional data when tuning against validation.
Instant sub
Pretty underrated content ❤
great video man! just subscribed
Thanks for the sub!
The video is extremely insightful and creative ,love it ,keep it up
This is some great content, keep it up
But in overparameterized regime, it starts to generalized after the interpolation threshold in the second descent. Does that mean we study even more 😅?
Very insightful. Lovely!
Mythical pull, nice vid
great vid
Thanks!
maybe is you study even more you experience the double descent phenomena
interesting
I didn't watch the whole video but the opening analogy is clearly wrong. Overfitting is when your model is too powerful for the amount of data you have and your problem. Or seen from the opposite perspective, you don't have enough data for your model.
When you study, you're (supposedly) not touching the complexity of your model (your brain, number of neurons and connexions it has) but you're adding more data. How could this lead to overfitting?
The problem could be that you're choosing the wrong exercises to practice, focusing on a small part of your curriculum. But then, it's not overfitting, it's biased data.
Thanks for the comment! When you do practice problems for an exam, you are training your brain's problem-solving model to solve the exam questions. Overfitting occurs when people memorise the answers to the practice problems instead of understanding the reasoning behind (or in some scenarios, overcomplicating things).
Towards the end of the video, I touched upon this and explained how we can avoid overfitting by training our brain using practice problems, but then use a timed mock exam as validation to ensure that our brain is not just memorising the specific answers to the practice questions but actually understanding them and applying the knowledge to unseen questions.
So, overfitting is not a result of adding more data points (as you have corrrectly mentioned), but a result of focusing too much on the specific examples or patterns in the training set without generalising the underlying principles.
Well, hyperfitting and the double U want to enter the chat…