This is showing that the quality and value of a video is not depending on how fancy the animations are, but how expert and pedagogue the speaker is. Really brilliant! I assume you spent a lot of time designing that course, so thank you for this!
"Now that we understand the REASON we're doing this, let's get into the math." The world would be a better place if more abstract math concepts were approached this way, thank you.
I was searching for ridge regression on the whole internet and stumbled upon this is a video which is by far the best explanation you can find anywhere thanks.
This is awesome! Lots of machine learning books or online courses don't bother explaining the reason behind Ridge regression, you helped me a lot by pulling out the algebraic and linear algebra proofs to show the reason WHY IT IS THIS! Thanks!
Excellent video! One more thing to add - if you're primarily interested in causal inference, like estimating the effect of daily exercise on blood pressure while controlling for other variables, then you want an unbiased estimate of the exercise coefficient and standard OLS is appropriate. If you're more interested in minimizing error on blood pressure predictions and aren't concerned with coefficients, then ridge regression is better. Also left out is how we choose the optimal value of lambda by using cross-validation on a selection of lambda values (don't think there's a closed form expression for solving for lambda, correct me if I'm wrong).
Superb. Thanks for such a concise video. It saved a lot of time for me. Also, subject was discussed in a fluent manner and it was clearly understandable.
These explanations are by far the best ones I have seen so far on youtube ... would really love to watch more videos on the intuitions behind more complicated regression models
Fantastic! It's like getting the Cliff's Notes for Machine Learning. These videos are a great supplement/refresher for concepts I need to knock the rust off of. I think he takes about 4 shots of espresso before each recording though :)
You should add in that all the variables (dependent and independent) need to be normalized prior to doing a ridge regression. This is because betas can vary in regular OLS depending on the scale of the predictors and a ridge regression would penalize those predictors that must take on a large beta due to the scale of the predictor itself. Once you normalize the variables, your A^t*A matrix being a correlation matrix of the predictors. The regression is called "ridge" regression because you add (lambda*I + A^t*A ) which is adding the lambda value to the diagonal of the correlation matrix, which is like a ridge. Great video overall though to start understanding this regression.
Hi and thanks fr the video. Can you explain briefly why when the m_i and t_i variables are highly correlated , then the estimators β0 and β1 are going to have very big variance? Thanks a lot in advance!
Excellent approach to discuss Lasso and Ridge regression. It could have been better if you have discussed how Lasso yields sparse solutions! Anyway, nice discussion.
Sir, a question about 4:54: I understand that in tax/income example the VARIANCE of the beta0-beta1's is high, since there's an additional beta2 effecting things. However, the MEAN in the population should be the same, even with high variance, isn't it so? Thanks in advance!
Can anyone explain the statement "The efficient property of any estimator says that the estimator is the minimum variance unbiased estimator", so what is minimum variance denotes here.
@ritvik when you said that the estimated coefficients has small variance does that implies the tendency of obtaining different estimate values of those coefficients ? I tend to confuse this term 'variance ' with the statistic Variance (spread of the data!).
Variance is the change in prediction accuracy of ML model between training data and test data. Simply what it means is that if a ML model is predicting with an accuracy of "x" on training data and its prediction accuracy on test data is "y" Variance = x - y A smaller variance would thus mean the model is fitting less noise on the training data, reducing overfitting. this definition was taken from: datascience.stackexchange.com/questions/37345/what-is-the-meaning-of-term-variance-in-machine-learning-model Hope this helps.
This is showing that the quality and value of a video is not depending on how fancy the animations are, but how expert and pedagogue the speaker is. Really brilliant! I assume you spent a lot of time designing that course, so thank you for this!
Wow, thanks!
Totally agree. I learn a lot from his short videos. Precise, concise, enough math, enough ludic examples. True professor mind.
This is the best explanation of Ridge regression that I have ever heard! Fantastic! Hats off!
"Now that we understand the REASON we're doing this, let's get into the math."
The world would be a better place if more abstract math concepts were approached this way, thank you.
good point
Watched these 5 years ago to understand the concept and I passed an exam. Coming back to it now to refresh my memory, still very well explained!
Nice! Happy to help!
I was searching for ridge regression on the whole internet and stumbled upon this is a video which is by far the best explanation you can find anywhere thanks.
This is awesome! Lots of machine learning books or online courses don't bother explaining the reason behind Ridge regression, you helped me a lot by pulling out the algebraic and linear algebra proofs to show the reason WHY IT IS THIS! Thanks!
This really helps me! Definitely the best ridge and lasso regression explanation videos on UA-cam. Thanks for sharing! :D
It's so inspiring to see how you get rid of the c^2! I learned Ridge but didn't know why! Thank you for making this video!
This is gold. Thank you so much!
This is, by far, the best explanation of Ridge Regression that I could find on UA-cam. Thanks a lot!
This really is gold, amazing!
Excellent video! One more thing to add - if you're primarily interested in causal inference, like estimating the effect of daily exercise on blood pressure while controlling for other variables, then you want an unbiased estimate of the exercise coefficient and standard OLS is appropriate. If you're more interested in minimizing error on blood pressure predictions and aren't concerned with coefficients, then ridge regression is better.
Also left out is how we choose the optimal value of lambda by using cross-validation on a selection of lambda values (don't think there's a closed form expression for solving for lambda, correct me if I'm wrong).
Brilliant! Just found your channel and can't wait to watch them all!!!
This really helped a lot. A big thanks to you Ritvik!
You, Ritvik, are simply amazing. Thank you!
Thank you soooo much!!! You explain everything so clear!! and there is no way I couldn't understand!
Amazing video, you really explained why we do things which is what really helps me!
best explanation of any topic i've ever watched , respect to you sir
You are the best of all.... you explained all the things,,, so nobody is gonna have problems understanding them.
It's just awesome. Thanks for this amazing explanation. Settled in mind forever.
So so so very helpful! Thanks so much for this genuinely insightful explanation.
I think its explained very fast, but still very clear, for my level of understanding its just perfect !
I subscribed just after watching this. Great foundation for ML basics
Thanks a lot.. I watched many videos and read blogs before this but none of them clarified at this depth
This is literally the best video on ridge regression
The explanation is so clear!! Thank you so much!!
Your data science videos are the best I have seen on UA-cam till now. :)
Waiting to see more
I appreciate it!
Your explanation is extremely good!
The best ridge regression lecture ever.
Superb. Thanks for such a concise video. It saved a lot of time for me. Also, subject was discussed in a fluent manner and it was clearly understandable.
Very good explanation in an easy way!
Amazingly helpful. Thank you.
These explanations are by far the best ones I have seen so far on youtube ... would really love to watch more videos on the intuitions behind more complicated regression models
great video, the explanation is really clear!
simple and effective video, thank you!
great video, brief and clear.
Excellent explanation, thanks!
excellent video! Keep up the great work!
best explanation on ridge reg. so far
Stunning! Absolute gold!
seriously!!!
Fantastic! It's like getting the Cliff's Notes for Machine Learning. These videos are a great supplement/refresher for concepts I need to knock the rust off of. I think he takes about 4 shots of espresso before each recording though :)
I'm impressed by your explanation. Great job
Thanks! That means a lot
excellent video, thanks.
Thank you. I make the comment because I know I will never need to watch it again! Clearly explained..
Glad it was helpful!
I love this video, really informative! Thanks a lot
great video! thank you very much.
Stunning!! Need more access to your coursework
Excellent explanation .
Awesome, Thanks a Million for great video! Searching you have done video on LASSO regression :-)
Very clear. Thank you!
I was looking for the math behind the algorithm. Thank you for explaining it.
No problem!
you are the man, keep doing what you're doing
Beautiful explanation
Thank you so much!!!!
You should add in that all the variables (dependent and independent) need to be normalized prior to doing a ridge regression. This is because betas can vary in regular OLS depending on the scale of the predictors and a ridge regression would penalize those predictors that must take on a large beta due to the scale of the predictor itself. Once you normalize the variables, your A^t*A matrix being a correlation matrix of the predictors. The regression is called "ridge" regression because you add (lambda*I + A^t*A ) which is adding the lambda value to the diagonal of the correlation matrix, which is like a ridge. Great video overall though to start understanding this regression.
Excellent explanation! Could you please do a similar video for Elastic-net?
thanks for the nice explanation
This is great, thank you!
Great videos thanks for making it
THIS IS ONE HELL OF A VIDEO !!!!
Thank you!
great video - thanks
Huge thanks!
Anyone else get anxiety when he wrote with the marker?? Just me?
Felt like he was going to run out of space 😂
Thank you so much thoo, very helpful :)
very helpful, thanks
Brilliant! Could you make more videos about Cross validation, RIC, BIC, and model selection.
It is a brilliant video. Great
bless this is amazing
best explanation ever!
I don't have money to pay him so leaving a comment instead for the algo. He is the best.
a big thanks
Thanks for this one!
I would trade diamonds for this explanation (well, allegorically! :) ) Thank you!!
Thank you! Your explaining is really good, Sir. Do you have time to make a video explaining the adaptive lasso too?
excellent video!!!!
Hi and thanks fr the video. Can you explain briefly why when the m_i and t_i variables are highly correlated , then the estimators β0 and β1 are going to have very big variance? Thanks a lot in advance!
Hi same question here😶🌫
hey, great video and excellent job
This is gold indeed!
Dude ! Hats off 🙏🏻
this is great stuff
I think it's the best video ever made
Very well explained :)
You are awesome!
very good explanation in an easy way!!
Finally, someone who talks quickly.
Thanks for this really helpful video!
Could you explain why the independent variables in A should be standardized for Ridge and Lasso Regression?
a very nice video
Super clear
Awesome, thanks man
Awesome video! Very intuitive and easy to understand. Are you going to make a video using the probit link?
Excellent approach to discuss Lasso and Ridge regression. It could have been better if you have discussed how Lasso yields sparse solutions! Anyway, nice discussion.
Shouldn't the radius of the Circle be c instead of c^2 (at time around 7:00)?
Amazing!!!
Thanks for the video! silly question: where is your L2 norm video, can you provide a link? (subscribed)
Brilliant simplification of this topic. No need for fancy presentation to explain the essence of an idea!!
SUPER !!! You have to become a professor and replace all those other ones !!
Sir, a question about 4:54: I understand that in tax/income example the VARIANCE of the beta0-beta1's is high, since there's an additional beta2 effecting things. However, the MEAN in the population should be the same, even with high variance, isn't it so? Thanks in advance!
Thanks! why lamba cannot be negative? What if to improve variance it is need to increase the slope and not decrease?
Can anyone explain the statement "The efficient property of any estimator says that the estimator is the minimum variance unbiased estimator", so what is minimum variance denotes here.
@ritvik when you said that the estimated coefficients has small variance does that implies the tendency of obtaining different estimate values of those coefficients ? I tend to confuse this term 'variance ' with the statistic Variance (spread of the data!).
Variance is the change in prediction accuracy of ML model between training data and test data.
Simply what it means is that if a ML model is predicting with an accuracy of "x" on training data and its prediction accuracy on test data is "y"
Variance = x - y
A smaller variance would thus mean the model is fitting less noise on the training data, reducing overfitting.
this definition was taken from: datascience.stackexchange.com/questions/37345/what-is-the-meaning-of-term-variance-in-machine-learning-model
Hope this helps.
@@benxneo thanks mate!