If you want to see why Lasso can set parameters to 0 and Ridge can not, check out: ua-cam.com/video/Xm2C_gTAl8c/v-deo.html Support StatQuest by buying my book The StatQuest Illustrated Guide to Machine Learning or a Study Guide or Merch!!! statquest.org/statquest-store/
I am so happy to easily understand these methods after only a few minutes (after spending so many hours studying without really understanding what it was about). Thank you so much, your videos are increadibly helpful! 💯☺
Good video, but didn't really explain how LASSO gets to make a variable zero. What's the difference between squaring a term and using the absolute value for that?
Intuitively, the closer slope gets to zero, the square of that number becomes insignificant compared to the increase in the sum of the squared error. In other words, the smaller you slope, the square gets asymptotically close to 0 because it can't outweigh the increase in the sum of squared error. In contrast, the absolute value adds a fixed amount to the regularization penalty and can overcome the increase in the sum of squared error.
@@theethatanuraksoontorn2517 Maybe this discussion on stack-exchange will clear things up for you: stats.stackexchange.com/questions/151954/sparsity-in-lasso-and-advantage-over-ridge-statistical-learning
@@statquest I didn't reply before, but the answer really helped me a lot, with basic machine learning and now artificial neural networks, thank you very much for the videos and the replies :D
Very very well-explained video, easy way to gain knowledge on the matters that would otherwise looks complicated and takes long to understand if reading them from textbook, I never used ridge or lasso regression, just stumble upon the terms and got curious, but now I fell like I might have gotten a valuable data analysis knowledge that I potentially use in the future
Hi man, really LOVE your videos. Right now I'm studying Data Science and Machine Learning and more often than not your videos are the light at the end of the tunnel, sot thanks!
Came here because I didn't understand it at all when my professor lectured about LASSO in my university course... I have a much better understanding now thank you so much!
NOBODY IS GOING TO TALK ABOUT THE EUROPEAN / AFRICAN SWALLOW REFERENCE ????are you all dummies or something ? It made my day. Plus, video on top, congratulation. BAMM !
Me and my friend are studying. When the first BAM came, we fell for laught for about 5min. Then the DOUBLE BAM would cause a catrastofic laughter if we didn't stop it . I want you to be my professor please!
Thank you so much for the video ! I have watched several your videos and I prefer to watch your video first then see the real math formula. When I did that, the formula became so easier and understandable! For instance, I don't even know what does 'norm' is, but after watching your video then it would be very easy to understand!
Thank you, Josh, for this exciting and educational video! It was really insightful to learn both the superficial difference (i.e. how the coefficients of the predictors are penalized) and the significant difference in terms of application (i.e. some useless predictors may be excluded through Lasso regression)!
Yeahhhh!!! I was the first to express Gratitude to Josh for this awesome video!! Thanks Josh for posting this and man! your channel is growing.. last time, 4 months ago it was 12k. You have the better stats ;)
Hi Josh! I am a big fan of your videos and it is clearly the best way to learn machine learning. I would like to ask you if you will be uploading videos relating to deep learning and NLP as well. If so, that will be awesome. BAM!!!
Right now I'm finishing up Support Vector Machines (one more video), then I'll do a series of videos on XGBoost and after that I'll do neural networks and deep learning.
Both the Ridge and Lasso videos made me want to cry. (Know you aren't alone if anyone else feels the same.) Also noteworthy: Ridge Regression avoids problems introduced by having more observations than predictor variables, or when multicollinearity is an issue. This example avoids either condition. Triple Bam. (Obviously, I am taking the definition too literally. It's a relative statement, re: the vars to observations ratio. )...Nevertheless, there's no end to my confusion. I was approaching "understanding", using the ISLR book....but you can actually get two different perspectives on the same topic, and then be worse off due to variance in how the concepts are presented. That said, you're still awesome, StatsQuest, and you are invited to play guitar at my funeral when I end things from trying to learn ML.
love the work, i remember reading books about linear regresion, when they spent like 5 pages for these 2 topics but i still have no clue what they really do =))
love your videos.... extremely helpful and cristal clear explained.... but your songs..... let's say you have a very promising career as a statistician... no question
this is awesome thank you so much for this u explained it so well . I will recommend this video to every one I know who is interested . I also watched your lasso video and it was just as good thank you
I have seen some articles mentioning that Ridge Regression is better in handling multicollinearity between variables as compared to Lasso. But i am not sure of the reason why. Since the difference between Lasso and Ridge is just the way it penalized the coefficients.
I am also looking for the answer to this. I'm just using my intuition here, but here's what I think. The least important variables have terrible predictive value so the residuals along these dimensions are the highest. If we create a penalty for introducing these variables (especially with a large lambda that outweighs/similar in magnitude to the size of these residuals squared), decrease in coefficient of these "bad predictors" will cause comparatively smaller increase in residuals compared to the decrease in penalty due to the randomness of these predictors. In contrast, the penalty for "good predictors" (which are less random) will cause significant change in residuals as we decrease its coefficients. This would probably mean that these coefficients would have to undergo smaller change to account for the larger increase in residuals. This is why the minimisation will reduce the coefficients of "bad predictors" faster than "good predictors. I take this case would be especially true when cross-validating.
if you draw the curves of y=x and y=x^2, you will find the gradient will vanish for y=x^2 near origin point, hence very hard to be decreased to zero if using optimizing approach like SGD.
If you want to see why Lasso can set parameters to 0 and Ridge can not, check out: ua-cam.com/video/Xm2C_gTAl8c/v-deo.html
Support StatQuest by buying my book The StatQuest Illustrated Guide to Machine Learning or a Study Guide or Merch!!! statquest.org/statquest-store/
Love how you keep these videos introductory and don't go into the heavy math right away to confuse;
Love the series!
Thank you!
I am eternally grateful to you and those videos!! Really saves me time in preparing for exams!!
Happy to help!
One of the best explanation of Ridge and Lasso regression I have seen till date... Keep up the good work....Kudos !!!
Thanks! :)
My teacher is 75 years old, explained us Lasso during one hour , without explaining it. But this is a war I can win :), thanks to your efforts.
I love it!!! Glad my video is helpful! :) p.s. I got the joke too. Nice! ;)
Why is this scenario many times the reality? Also, I check StatQuest's vids very often to really understand the things. Thanks @StatQuest
(Possible) Fact: 78% of people who understand statistics and machine learning attribute their comprehension to StatQuest.
bam! :)
@@statquest Double Bam !!
I am so happy to easily understand these methods after only a few minutes (after spending so many hours studying without really understanding what it was about). Thank you so much, your videos are increadibly helpful! 💯☺
Great to hear!
Hi, I can't thank you enough for explaining the core concepts in such short amount of time. Your videos help a lot! My appreciations are beyond words.
Thank you!
Good video, but didn't really explain how LASSO gets to make a variable zero. What's the difference between squaring a term and using the absolute value for that?
Intuitively, the closer slope gets to zero, the square of that number becomes insignificant compared to the increase in the sum of the squared error. In other words, the smaller you slope, the square gets asymptotically close to 0 because it can't outweigh the increase in the sum of squared error. In contrast, the absolute value adds a fixed amount to the regularization penalty and can overcome the increase in the sum of squared error.
@@theethatanuraksoontorn2517 Maybe this discussion on stack-exchange will clear things up for you: stats.stackexchange.com/questions/151954/sparsity-in-lasso-and-advantage-over-ridge-statistical-learning
@@statquest Thanks for reading the comments and responding!
@@programminginterviewprep1808 I'm glad to help. :)
@@statquest I didn't reply before, but the answer really helped me a lot, with basic machine learning and now artificial neural networks, thank you very much for the videos and the replies :D
That "Bam???" cracks me up. Thanks for your work!
:)
Very very well-explained video, easy way to gain knowledge on the matters that would otherwise looks complicated and takes long to understand if reading them from textbook, I never used ridge or lasso regression, just stumble upon the terms and got curious, but now I fell like I might have gotten a valuable data analysis knowledge that I potentially use in the future
Glad it was helpful!
Hi man, really LOVE your videos. Right now I'm studying Data Science and Machine Learning and more often than not your videos are the light at the end of the tunnel, sot thanks!
Came here because I didn't understand it at all when my professor lectured about LASSO in my university course... I have a much better understanding now thank you so much!
Awesome!! I'm glad the video was helpful. :)
Some video ideas to better explain the following topics:
1. Monte Carlo experiments
2. Bootstrapping
3. Kernel functions in ML
4. Why ML is black box
OK. I'll add those to the to-do list. The more people that ask for them, the more I'll priority they will get.
@@statquest That is great! keep up the great work!
@@statquest yes we need it please do plsssssssssssssssssssssssssssssssss
plsssssssssssssssssssssssssssssssssssssssssssssss
Bootstrapping is explained well in Random Forest video.
Do it for us... thanks good stuff
The difference between BAM??? and BAM!!! is hilarious!!
:)
@@statquestCan you please explain how the irrelevant parameters "shrink"? How does Lasso go to zero when Ridge doesn't?
@@SaiSrikarDabbukottu I show how it all works in this video: ua-cam.com/video/Xm2C_gTAl8c/v-deo.html
I am eternally grateful to you. You've helped immensely with my last assessment in uni to finish my bachelors
Congratulations!!! I'm glad my videos were helpful! BAM! :)
This channel is pure gold. This would have saved me hours of internet search... Keep up the good work!
Thank you! :)
NOBODY IS GOING TO TALK ABOUT THE EUROPEAN / AFRICAN SWALLOW REFERENCE ????are you all dummies or something ? It made my day. Plus, video on top, congratulation. BAMM !
bam!
Great people know subtle differences which is not visible to common eyes
love you sir
Thanks!
Don't think your Monty Python reference went unnoticed
(Terrific and very helpful video, as always)
Thanks so much!!! :)
Oh it absolutely did. And it was much loved!
Josh - as always your videos are brilliant in their simplicity! Please keep up your good work!
Thanks, will do!
Just love the way you say 'BAM?'.....a feeling of hope mixed with optimism, anxiety and doubt 😅
:)
Every time I think your video subject is going to be daunting, I find you explanation dispel that thought pretty quickly. Nice job!
Wow. This new understanding just slammed into me. Great job. Thank you.
Glad it was helpful!
Me and my friend are studying. When the first BAM came, we fell for laught for about 5min. Then the DOUBLE BAM would cause a catrastofic laughter if we didn't stop it . I want you to be my professor please!
BAM! :)
I really appreciated the inclusion of swallow airspeed as a variable above and beyond the clear-cut explanation. Thanks Josh. ;-)
:)
Me too!
Thank you so much for the video !
I have watched several your videos and I prefer to watch your video first then see the real math formula. When I did that, the formula became so easier and understandable!
For instance, I don't even know what does 'norm' is, but after watching your video then it would be very easy to understand!
Awesome! I'm glad the videos are helpful. :)
This is brilliant. Thanks for making it publicly available
You're welcome! :)
I came for the quality content, fell in love with the songs and bam.
BAM! :)
Explained in a very simple yet very effective way! Thank you for your contribution Sir
Hooray! I'm glad you like my video. :)
Thank you, Josh, for this exciting and educational video! It was really insightful to learn both the superficial difference (i.e. how the coefficients of the predictors are penalized) and the significant difference in terms of application (i.e. some useless predictors may be excluded through Lasso regression)!
Double BAM! :)
a man of his word...very clearly explained!
Thank you! :)
So easy to understand. And I like the double BAM!!!
Thanks!
Your videos make it so easy to understand. Thank you!
Thank you! :)
Your intro songs reminds me of Pheobe from the TV show "Friends", and the songs are amazing for starting the videos on a good note, cheers!
You should really check out the intro song for this StatQuest: ua-cam.com/video/D0efHEJsfHo/v-deo.html
You have a gift for teaching! Excellent videos!
Thanks! :)
Excelent video Josh! Amazing way to explain Statistics Thank you so much! Regards from Querétaro, México
Muchas gracias! :)
Hi Josh, Thanks for clear explanation on regularization techniques. very exciting. God bless for efforts.
Glad you enjoyed it!
Yeahhhh!!! I was the first to express Gratitude to Josh for this awesome video!! Thanks Josh for posting this and man! your channel is growing.. last time, 4 months ago it was 12k. You have the better stats ;)
Hooray! Yes, the channel is growing and that is very exciting. It makes me want to work harder to make more videos as quickly as I can. :)
@@statquest please keep on going... You are our saviour
Statquest is like Marshall Eriksen from HIMYM teaching us stats. BAM? Awesome work Josh.
Thanks!
The other day, I had homework to write about Lasso and I struggled.. wish I had seen this video a few days earlier.. Thank you as always!
Bam! I appreciate the pace of the videos. Thanks for doing this.
Thanks! :)
Airspeed of swallow lol. These videos are really helping me a ton, very simply explained and entertaining as well!
Glad you like them!
Thank you so much for making these videos! Had to hold a presentation about LASSO in university.
I hope the presentation went well! :)
@@statquest Thx. It did :)
Thx very much. Clear explanation for these similar models. Great video I will conserve forever
Incredible great explanations of regularization methods, thanks a lot.
Thanks! :)
Thank you for clarifying that the Swallow can be African or European
bam! :)
Thanks for posting, my new favourite youtube channel absolutely !!!!
Wow, thanks!
Great....please continue to learn other models...thank you so much.
Thanks!
The beginning songs are always amazing hahaha!!
Awesome! :)
Thanks for the Video. They make difficult concepts seem really easy..
Thank you! :)
@@statquest Can u make a similar video for LSTM?
Hi Josh! I am a big fan of your videos and it is clearly the best way to learn machine learning. I would like to ask you if you will be uploading videos relating to deep learning and NLP as well. If so, that will be awesome. BAM!!!
Right now I'm finishing up Support Vector Machines (one more video), then I'll do a series of videos on XGBoost and after that I'll do neural networks and deep learning.
StatQuest with Josh Starmer Thanks Josh for the updates. I’ll send you request at Linkedin.
Harvard should hire you. Your videos never fail me!
Thank you for such great content!
Thank you very much!!!
One more use case of Ridge/Lasso regression is 1) When data points are less 2) High Multicollinearity between variables
Million BAM for this channel 🎉🎉🎉
Thank you!
I prefer the intro where is firmly claimed that StatQuest is bad to the bone. And yes I think this is fundamental.
That’s one of my favorite intros too! :)
But I think my all time favorite is the one for LDA.
Yes I agree! Together these two could be the StatQuest manifesto summarising what people think about stats!
So true!
Seriously the best videos ever!!
Thanks!
Both the Ridge and Lasso videos made me want to cry. (Know you aren't alone if anyone else feels the same.) Also noteworthy: Ridge Regression avoids problems introduced by having more observations than predictor variables, or when multicollinearity is an issue. This example avoids either condition. Triple Bam. (Obviously, I am taking the definition too literally. It's a relative statement, re: the vars to observations ratio. )...Nevertheless, there's no end to my confusion. I was approaching "understanding", using the ISLR book....but you can actually get two different perspectives on the same topic, and then be worse off due to variance in how the concepts are presented. That said, you're still awesome, StatsQuest, and you are invited to play guitar at my funeral when I end things from trying to learn ML.
(gonna check the StatsExchange link down below that you provided. Thank you, sir!!!)
Thank you once again Josh!
bam!
Dude you are an absolute lifesaver! keep it up!!!
Hooray! I'm glad I could help. :)
Great video, clear explanation, loved the Swallows reference! Keep it up! :)
Awesome, thank you!
love the work, i remember reading books about linear regresion, when they spent like 5 pages for these 2 topics but i still have no clue what they really do =))
Glad it was helpful!
Love the fact that you reply to every single comment here in YT haha
Brilliant explanation
didnt need to check out any other video
Thank you!
Best youtube channel
Thank you! :)
Thanks! I finally understand how they shrink parameters!
love your videos.... extremely helpful and cristal clear explained.... but your songs..... let's say you have a very promising career as a statistician... no question
;)
Amazing video, explanation is fantastic. I like the song along with the concept :)
Bam! :)
so incredible, so well explained
Thanks!
Amazing explanation. Loved the Monty Python reference :D
:)
Thank you so much for these videos you are a literal godsend. You should do a video on weighted least squares!!
Wow! so easy to understand this! Thanks very much!
Thanks!
The best explanation ever.
wonderfully explained
Thank you! :)
A StatQuest a day, keeps Stat fear away!
I love it! :)
That Monty Python reference though... good video btw :)
Ha! I'm glad you like the video. ;)
Excellent information
Thanks!
Finally, I found 'The One'!
:)
Hooray!!!! excellent video as always
Thank you!
Hooray, indeed!!!! Glad you like this one! :)
Fantastic videos - very well explained!
Thank you! :)
My favourite youtuber!
Thank you! :)
You are the best. Thank you so much
Thanks!
Great video! The topic is really well explained
Thank you!
I remember Phoebe while listening to your videos' beginning!!
What about this intro song: ua-cam.com/video/D0efHEJsfHo/v-deo.html
@@statquest haha!!
Magnificent video
Thanks!
why can't ridge reduce weight/parameter to 0 like lasso?
this is awesome thank you so much for this u explained it so well . I will recommend this video to every one I know who is interested . I also watched your lasso video and it was just as good thank you
Thank you very much! :)
Amazing! Thank you so much for this!
Thanks!
You are the best! I understand it now!
Thanks!
I enjoy the content and your jam so much! '~Stat Quest~~'
Thanks!
Always amazing videos.
Thank you!
Great explanation
Thank you! :)
I have seen some articles mentioning that Ridge Regression is better in handling multicollinearity between variables as compared to Lasso. But i am not sure of the reason why. Since the difference between Lasso and Ridge is just the way it penalized the coefficients.
Thank you Josh for sharing this video. Could you please do a video on Bayesian statistics and Monte Carlo methods?
I hope to do bayesian statistics soon.
This was gold!
Thank you!
Thanks a lot for the explanation !!!
You are welcome!
Thanks Josh!
You're welcome! :)
How do Ridge or Lasso know which variables are useless? Will they not also shrink the parameter of important variables ?
I am also looking for the answer to this. I'm just using my intuition here, but here's what I think. The least important variables have terrible predictive value so the residuals along these dimensions are the highest. If we create a penalty for introducing these variables (especially with a large lambda that outweighs/similar in magnitude to the size of these residuals squared), decrease in coefficient of these "bad predictors" will cause comparatively smaller increase in residuals compared to the decrease in penalty due to the randomness of these predictors. In contrast, the penalty for "good predictors" (which are less random) will cause significant change in residuals as we decrease its coefficients. This would probably mean that these coefficients would have to undergo smaller change to account for the larger increase in residuals. This is why the minimisation will reduce the coefficients of "bad predictors" faster than "good predictors. I take this case would be especially true when cross-validating.
if you draw the curves of y=x and y=x^2, you will find the gradient will vanish for y=x^2 near origin point, hence very hard to be decreased to zero if using optimizing approach like SGD.
Vey nice explanations, it is better than UDEMY!
Thanks a lot!
Thankyou Sir ! Great Help.
awesome your explanation just simplifies everything
request to make videos on rest of the algorithms as well
thank you
I'm working on them :)