Linear regression (6): Regularization

Alexander Ihler

Додати в
- Мій плейлист
- Переглянути пізніше
Поділитися

Поділитися

Вставка

Розмір відео:

Показувати елементи керування програвачем

Автоматичне відтворення

Автоповтор

Опубліковано 25 лис 2024

КОМЕНТАРІ • 60

@vineel.gujjar 10 років тому ⁺³⁷
Thank you. One of the best explanations of L1 vs L2 regularization!
@TileoptikoPamfago 5 років тому ⁺⁷
I have had courses and put a lot of effort reading material online, but your explanation is by far the one that will remain indelible in my mind. Thank you
@jokotajokisi 11 місяців тому
Oh my G. After 5 years of confusion, I finally understood Lp regularization!
Thank you so much Alex!
@AlexanderIhler 9 років тому ⁺⁹
@Emm-- not sure how / if I can reply to your comment.
An iso-surface is the set of points such that a function f(x) has constant value, e.g. all x such that f(x) = c. For a Gaussian distribution, for example, this is an ellipse, shaped according to the eigenvectors and eigenvalues of the covariance matrix.
So, the iso-surfaces of theta1^2 + theta2^2 are circles, while the iso-surfaces of |theta1|+|theta2| look like diamonds. The iso-surface of the squared error on the data is also ellipsoidal, with a shape that depends on the data.
Alpha scales the importance of the regularization term in the loss function, so higher alpha means more regularization.
I didn't prove the sparsity assertion in the recording, but effectively, the "sharpness" of the diamond shape on the axes (specifically, the discontinuous derivative at e.g. theta1=0) means that it is possible for the optimum of the sum of (data + regularization) to have its optimum at a point where some of the parameters are exactly zero. If the function is differentiable at those points, this will effectively never happen -- the optimum will effectively always be at some (possibly small, but) non-zero value.
@morkapronczay1512 5 років тому ⁺³
Best explanation of regularization I ever saw! Concise, detailed just enough, and covers all the practically important aspects. Thank you Sir!
@Why_I_am_a_theist Рік тому
Nice video , this is what I dig in youtube , an actual concise clear explanation worth any paid course
@Phoenix231092 2 роки тому
Very few videos online give some key concepts here, like what we're truly trying to minimize with the penalty expression. Most just give the equation but never explain the intuition behind L1 and L2. Kudos man
@MarZandvliet 4 роки тому ⁺¹
Whoa, I wasn't ready for the superellipse, that's a nice suprise. That helps me understand the limit case of p -> inf. Also exciting to think about rational values for P such as the 0.5 case.
Major thanks for the picture at 7 minutes in. I learned about the concept of compressed sensing the other day, but didn't understand how optimization under regularized L1 norm leads to sparsity. This video made it click for me. :)
@tictacX1 4 роки тому
Best explanation yet on what ridge regression does.
@aravkat 6 років тому ⁺³
Wonderful video to give some intuition on L1 vs L2. Thank you!
@dadmehrdidgar4971 6 місяців тому
great video even after 10 years! thanks! :)
@yd_ 3 роки тому
First heard of this via more theoretical material. Very cool to see a discussion from a more applied (?) perspective.
@lukaleko7208 5 років тому
This really is an incredible explanation of the idea behind regularization. Thanks a lot for your insight!
@Raven-bi3xn 4 роки тому
Great presentation with very reasonable depth!
@amipigeon 2 роки тому
Thank you! That's a very clear and concise explanation.
@ThinkingRockh 7 років тому
Thank you!! This really helped to understand the difference between L1 and L2.
@lumpy1092 8 років тому ⁺¹
Very clear explained, helped a lot, thanks Alex!
@nathanzorndorf8214 2 роки тому
Wow, that was such a great explanation. Thank you.
@nathanzorndorf8214 2 роки тому
Next time, I'd love it if you included the effect lambda has on regularization, including visuals!
@OmerBoehm 4 роки тому
Thank you Alexander - very well explained !
@andreluisal 9 років тому
Which excellent videos you posted! Congratulations!
@OmerBoehm 4 роки тому
Many thanks for the brilliant video !!
@axe863 9 років тому
English major: Brevity is the soul of wit.
Statistics/Math Major: Verbal SCAD type regularization is the soul of wit.
@insanaprilian8184 5 років тому
I just found out you videos now, thank you for a such wonderful explanation, it really helps me to understand this term
@joshuasansom_software 6 років тому ⁺²
OMG! This stuff is just way too cool! I love maths.
@rezab314 4 роки тому
As my old friend Borat would say: Very Nice!
@poojakabra1479 2 роки тому
Why don’t we draw concentric circles and diamonds as well? To represent optimization space of regularization term?
@chucktrier 4 роки тому
Awesome description, thanks 🙏
@michaelclark195 7 років тому
This is superb. Thanks for putting it together.
@Revetice 7 років тому
Thanks! This helps me understand regularization term a lot.
@zheshi3823 7 років тому
I learn a lot from this video.Thank you!
@intrepid_grovyle 2 роки тому
awesome explanation. thank you
@RandomUser20130101 8 років тому ⁺¹
Thank you for the great explanation. Some questions:
1. At 2:09 the slide says that the regularization term alpha x theta x thetaTranspose is known as the L2 penalty. However, going by the formula for Lp norm, isn't your term missing the square root? Shouldn't the L2 regularization be: alpha x squareroot(theta x thetaTranspose)?
2. At 3:27 you say "the decrease in the mean squared error would be offset by the increase in the norm of theta". Judging from the tone of your voice, I would guess that statement should be self-apparent from this slide. However, am I correct in understanding that this concept is not explained here; rather, it is explained two slides later?
@AlexanderIhler 8 років тому ⁺³
+RandomUser20130101 "L2 regularization" is used loosely in the literature to mean either Euclidean distance, or squared Euclidean distance. Certainly, the L2 norm has a square root, and in some cases (L2,1 regularization, for example; see en.wikipedia.org/wiki/Matrix_norm) the square root is important, but often it is not; it does not change, for example, the isosurface shape. So, there should exist values of alpha (regularization strength) that will make them equivalent; alternatively, the path of solutions as alpha is changed should be the same.
offset by increase: regularization is being explained in these slides generally; using the (squared) norm of theta is introduced as a notion of "simplicity" in the previous slides, and I think it is not hard to see (certainly if you actually solve the values) that to get the regression curve in the upper right of the slide at 3:27 requires high values of the coefficients, causing a trade-off between the two terms. Two slides later is the geometric picture in parameter space, which certainly also illustrates this trade-off point.
@RandomUser20130101 8 років тому
+Alexander Ihler Thank you for the info.
@bradfordstring65 11 років тому
Thanks for the videos, I really enjoy learning from them!
@ednaT1991 6 років тому
Sometimes I wish some profs would present a UA-cam playlist of good videos instead of giving their lectures themselves. This is so much better explained. There are so many good resources on the net, why are there still so many bad lectures given?
@aminator1 6 років тому
fuck thats truth but depressing
@sharkserberus 5 років тому
How to identify the extremities of ellipse with the equation?
@yuanqili4836 6 років тому
Finally I know what does those isosurface diagrams mean found in PRML
@bhupensinha3767 6 років тому
Apologies. But what is rationale of concentric ellipses ??? Understood the l1/l2 area though
@lokeshn8864 6 років тому
Awesome Explanation sir. Thanks much!
@danielgusenleitner4090 9 років тому
Thank you, that was an elegant explanation.
@MillionEuroRoad 9 років тому
Great video! Did help me a lot!
@erisha78 2 роки тому
Beautiful!
@BharathSatheesh 8 років тому
That was a great video Alex!
@ksy8585 5 років тому
The most perfect video
@alimuqaibel7619 5 років тому
Thank you for excellent video
@okeuwechue9238 6 років тому
Nice, clear explanation. Thnx.
@jasminesun4355 4 роки тому
Thank you so much!!!
@austinmw89 7 років тому
Thanks you make great videos :)
@콘충이 4 роки тому
Awesome!
@kent.johnson 8 років тому ⁺¹¹
sorry i can give only one like : )
@chd9841 5 років тому ⁺¹
I love your accent
@pt2091 10 років тому
How is L1 regularization performed?
@AlexanderIhler 10 років тому
Just replace the "regularizing" cost term that is the sum of squared values of the parameters (L2 penalty), with one that is the sum of the absolute values of the parameters.
@waleedtahir2072 7 років тому
Thank You!
@dARKf3n1Xx 4 роки тому
Lasso gives sparse parameter vectors. QUOTE. OF THE DAY, GO AHEAD AND FINISH THE REPORT :P
@iOSGamingDynasties 2 роки тому
6:47
@hungnm02 6 років тому
what is the best in real world ? why your boos keep paying you

Наступне

Автоматичне відтворення