Excellent: "you can think of the holy grail of machine learning is to find an in sample estimate of the out of sample error. If you get that, you are done, minimize it and you go home".
Small clarification :- in 16:00 the professor says Lebesgue of order 1 upto lebesgue of order Q - I presume he meant legendre ... Lebesgue to the best of my knowledge is in integration ...
Such a wonderful material consists of non-scary math, simple yet clear visualization, and plain language accompanied by lots of small hints to keep following. It takes audience and go deep into such an important topic in machine learning. Awesome!
Great lecture (as always!). What I find confusing is the content of slide 9 (28:42) - from where is the derivation coming, that: gradient Ein(_w reg_) is proportional to -_w reg_ ? Do you know that? I cannot get this derivation point.
+Filip Wójcik E_in will be minimized by a w closest to w_lin with the constraint that w be within some circle not containing w_lin. Convince yourself that w should point in the direction of w_lin, since if it didn't we could rotate it along the circle C until it did and have it be closer to w_lin. So we want w pointing at w_lin but the gradient is perpendicular to the ellipse containing w_lin as its center and pointing away from the interior of the ellipse. Therefore when w points to w_lin the gradient is pointing away from w_lin and their directions are opposite. vectors with opposite directions differ by a negative sign only.
The point is like the following. When you are trying to get the minimal Ein by updating w using gradient descent, you always update w with a small move in the opposite direction of Ein'(w), the gradient at w, i.e., w = w - t*Ein'(w). When you reach the minimal point of Ein, it usually requires Ein'(w) = 0; but now with the constraint w^2
@@markoiugdefsuiegh Monarchist When you move the w in the _tangential direction_ to the constraint circle, that means you are always _on_ the circle. I guess what you meant is that, you move the w in the direction of the _tangent line_ that touches the circle at point w. But that is not how it works, and you must move w on the circle. Ein'(w) always has two components, one is tangential to the circle, and one is normal. When it reaches minimum while w is on the circle, the tangential component of Ein'(w) becomes zero.
Tikhonov regularization, and the weight decay, somehow reminds me of the projection methods for ordinary differential equation. (i.e. algebraic-differential equations, or differential equations with constraints or on a manifolds).
In the homework#6, Question#2, out of sample (Eout) classification error is required to be calculated. But looks like the right answer is only obtained, if classification error is calculated in the non-linear transformation domain. But I remember prof mentioned, Eout is always measured on the X (input) space not the Z (transformed space). Any thoughts on this?
From the point of view of forming a polynomial model, the coefficients of monomials are not independent. Legendre polynomials ensure that different components of Z are orthogonal, hence the coefficients are independent. It is like to construct a w space with orthonormal basis. The neat part is that sum{ wi * zi} expresses vector (w1, ..., wn) of the space.
This guy is a rock star. These lectures are so satisfying and useful.
I haven't seen any other class where Regularization was explained in such a depth. I love Prof. Abu-Mustafa's teaching.
A great class! Thank you to Yaser Abu-Mostafa and Caltech for making it available to the public!
Excellent: "you can think of the holy grail of machine learning is to find an in sample estimate of the out of sample error. If you get that, you are done, minimize it and you go home".
59:49
"You don't hide behind a great-looking derivation, when the basis of it is shaky"
Damnn this line is 🔥🔥
1:05:49 "Heuristic is heuristic but we are still scientists and engineers"
The best words ever!
Small clarification :- in 16:00 the professor says Lebesgue of order 1 upto lebesgue of order Q - I presume he meant legendre ... Lebesgue to the best of my knowledge is in integration ...
Good observation. He indeed meant Legendre!
It's amazing how well this man understands what he is talking about, and how clearly he establishes the key notions... just superb!
So enjoyable! One of the best lectures ever!
same feeling here. impresive!!
Such a wonderful material consists of non-scary math, simple yet clear visualization, and plain language accompanied by lots of small hints to keep following. It takes audience and go deep into such an important topic in machine learning. Awesome!
this lecture is insanely clear on explaining regularization!
I always wanted to know intution behind regularization...one of the best lecture on it along with Andrew Ng's one.
His classes are really amazing!
Excellent lecture. He makes quite a sense out of raw math.
Never heard Regularization in such an interesting way:)
thanks
Excellent..for the first time i have understood what regularization is :)
Not going to say I understand everything here, but he's a really great instructor.
Great lecture (as always!).
What I find confusing is the content of slide 9 (28:42) - from where is the derivation coming, that:
gradient Ein(_w reg_) is proportional to -_w reg_ ?
Do you know that? I cannot get this derivation point.
+Filip Wójcik E_in will be minimized by a w closest to w_lin with the constraint that w be within some circle not containing w_lin. Convince yourself that w should point in the direction of w_lin, since if it didn't we could rotate it along the circle C until it did and have it be closer to w_lin. So we want w pointing at w_lin but the gradient is perpendicular to the ellipse containing w_lin as its center and pointing away from the interior of the ellipse. Therefore when w points to w_lin the gradient is pointing away from w_lin and their directions are opposite. vectors with opposite directions differ by a negative sign only.
The point is like the following. When you are trying to get the minimal Ein by updating w using gradient descent, you always update w with a small move in the opposite direction of Ein'(w), the gradient at w, i.e., w = w - t*Ein'(w). When you reach the minimal point of Ein, it usually requires Ein'(w) = 0; but now with the constraint w^2
@@markoiugdefsuiegh Monarchist When you move the w in the _tangential direction_ to the constraint circle, that means you are always _on_ the circle. I guess what you meant is that, you move the w in the direction of the _tangent line_ that touches the circle at point w. But that is not how it works, and you must move w on the circle. Ein'(w) always has two components, one is tangential to the circle, and one is normal. When it reaches minimum while w is on the circle, the tangential component of Ein'(w) becomes zero.
Tikhonov regularization, and the weight decay, somehow reminds me of the projection methods for ordinary differential equation. (i.e. algebraic-differential equations, or differential equations with constraints or on a manifolds).
4:35 what does the intensity level between -0.2 to 0.2 corresponds to at the image at bottom right? Thank you
It's the difference in the out of sample error between a complex and a simpler model
In the homework#6, Question#2, out of sample (Eout) classification error is required to be calculated. But looks like the right answer is only obtained, if classification error is calculated in the non-linear transformation domain. But I remember prof mentioned, Eout is always measured on
the X (input) space not the Z (transformed space). Any thoughts on this?
At 56:34, what technique is he talking about?
@35:27 but what is the capital I on the solution for W_reg
Identity matrix
This guy is a Legend....I am wondering why he doesn't teach in any MOOC platform like Idacity or coursera
If monomials form a basis of polyomials, then they are independent, right? Why can't we just use them instead of Legendre polyomials?
+Attila Kun What's nice about the Legendre polynomials is that they form an orthogonal basis.
From the point of view of forming a polynomial model, the coefficients of monomials are not independent. Legendre polynomials ensure that different components of Z are orthogonal, hence the coefficients are independent. It is like to construct a w space with orthonormal basis. The neat part is that sum{ wi * zi} expresses vector (w1, ..., wn) of the space.
ty
thanks prof. Yaser great lessons
Prof Yaser Thank you.
What a man! Incredible!
This guy rocks, really
Amazing, thank you!
How are the biases calculated in theory and for whate estimator aginst whih true valeu.
if you work with synthetic data you should be able to empirically evaluate them :)
Watch earlier videos, there is a video called Bias-Variance something
SUPERB!
From where I can download the slides....it is so helpful for me.
The slides are available on EdX course forum CS1156
GOD professor ❤️
thank you so much
want to punish the noise more than the signal @51m
What an insight!
Thank you sir, great work
Let's get carried away, like people get carried away with medicine :)
would you say me that this teacher is from which university?
Caltech
👍👍👍👍👍