Regularization - Explained!
Вставка
- Опубліковано 17 гру 2024
- We will explain Ridge, Lasso and a Bayesian interpretation of both.
ABOUT ME
⭕ Subscribe: www.youtube.co...
📚 Medium Blog: / dataemporium
💻 Github: github.com/ajh...
👔 LinkedIn: / ajay-halthor-477974bb
RESOURCES
[1] Graphing calculator to plot nice charts: www.desmos.com
[2] Refer section 6.2 on "Shrinkage Methods" for mathematical details: hastie.su.doma...
[3] Karush-Kuhn-Tucker conditions for constrained optimization with inequality constraints: en.wikipedia.o...
[4] stat exchange discussions on [3]: stats.stackexc...
[5] Proof of ridge regression: stats.stackexc...
[6] Laplace distribution (or double exponential distribution) used for lasso prior: en.wikipedia.o...
[7] @ritvikmath 's amazing video for the bayesian interpretation of lasso and ridge regression: • Bayesian Linear Regres...
[8] Distinction between Maximum "Likelihood" Estimations and Maximum "A Posteriori" Estimations: agustinus.kris...
MATH COURSES (7 day free trial)
📕 Mathematics for Machine Learning: imp.i384100.ne...
📕 Calculus: imp.i384100.ne...
📕 Statistics for Data Science: imp.i384100.ne...
📕 Bayesian Statistics: imp.i384100.ne...
📕 Linear Algebra: imp.i384100.ne...
📕 Probability: imp.i384100.ne...
OTHER RELATED COURSES (7 day free trial)
📕 ⭐ Deep Learning Specialization: imp.i384100.ne...
📕 Python for Everybody: imp.i384100.ne...
📕 MLOps Course: imp.i384100.ne...
📕 Natural Language Processing (NLP): imp.i384100.ne...
📕 Machine Learning in Production: imp.i384100.ne...
📕 Data Science Specialization: imp.i384100.ne...
📕 Tensorflow: imp.i384100.ne...
Why this is so Underrated, this should be on every one playlist for linear regression.
Hatsoff man :)
My man looks sharp and dapper
Haha. Thanks! I think this shirt looked better on camera than in person. :)
Excellent videos! Great graphing for intuition of L1 regularization where parameters become exactly zero (9:45) as compared with behavior of L2 regularization.
I had to watch it twice to truly digest your approach, but I like your approach to the contour plot in particular. I hope to boost your channel with my comments a tiny bit ;). tyvm!
what I was taught and what is helpful to know imo:
1) Speaking on an abstract level what regularization achieves: it punishes high-dimensional terms.
2) The notion of L1- and L2- regularization and when you talk about "Gaussian" for Ridge, you could also talk about "Laplace" distribution instead of double exponential distribution for Lasso regression
Thanks so much for your comments Paul! And yea, I feel like I have seen similar contour plots in books but never truly understood “why” they were like that until I started diving into details myself. Hopefully in the future I can explain it in a way that you’d be able to get it in a single pass through the video too :)
Hi Ajay, great video, as always. One suggestion with your permission;) I think it might be worthwhile introducing the concept of regularization by comparing:
Feature elimination ( which is equivalent to making the weight zero) vs reducing the weight ( which is regularization) and elaborate on this and then drfting towards Lasso and, Ridge. ;)
Well hello everyone right back at you Ajay! These are fire, the live viz is on point!
Thank you for noticing ma guy. I will catch up to the 100K gang soon. Pls wait for me 😂
@@CodeEmporium 😂 you're one hunnit in my eyes 🙏
Thank you very much for this answer, I have been looking for it for a while: 7:42
8:27 yi < -(lambda/2)
Such an awesome video! Can't believe i hadn't made the connection between ridge and Lagrangians, literally has a lambda in it lol!
With the lasso intuition, the stepwise function you get for theta, how do you get the conditions on the right i.e. yi < lambda/2.I thought perhaps instead of writing theta < 0, you are just using the implied relationship between yi and lambda. E.g. that if theta < 0, and therefore |theta|.= - theta, which then after optimising gives theta = y - lambda/2 i.e. y = lambda/2 + theta, but then i get the opposite conditions as you...i.e. as theta is negative in this case wouldn't that give y = lambda/2 + theta < lambda/2?
Good explanation!
always find interesting things here ,Keep going .Good luck .
Hah! Glad that is the case. I am here to pique that interest :)
Nice explaination of Bayesian.
Isn't Regularization just the Lagrange multiplier. The optimum point is where the the gradient of the constraint is proportional to the gradient of the cost function.
It is mathematically written in the same way but they are not the same. Langrange multipliers are used when you need to min/max a given function provided a constraint, and then you find the value of lambda, but in regularisation, we set the lambda value ourselves. Regularisation gives us a penalty if we take steps towards the non minimum direction and thus allows us to go back to the correct direction in the following iteration.
Love your awesome videos! Salute! Thank you so much!
You are so welcome! I am happy this helps
How does Gauss-Newton for nonlinear regression change with (L2) regularization?
Nice video, thanks! The only thing I think is slightly incorrect is that you could see polynomials with increasing degrees as complex. Since you are talking about maths, I was expecting to see imaginary unit when I first heard complex.
Great content on your channel. I just found it! Heh i used desmos to debug/visualize too!
I just added a video explaining easy multilayer back propogation. The book math with all the subscripts is confusing, so i did it without any. Much simpler to understand.
Thank you! And Solid work on that explanation :)
Thank you
AWESOME!!!!! thanks!
thx !
🌸 pքɾօʍօʂʍ