I'm here from yesterday's 3b1b video on Newton's method of finding roots, after wondering if there's any way to use it for minimizing a function. Mainly to see why we can't use it instead of Stochastic Gradiend Descent in Linear Regression. Turns out the Hessian of functions with many components can turn out to be large and computationally intensive, and also that if the second derivative is not a parabola, it can lead you far away from the minima. Still it was nice to see how the operation works in practice, and you mentioned the same points about Hessians too. Good job 😊👍
Just finished calculus 1 and learning about Newton’s method brought me here. The visuals were fantastic and the explanation was clear. I’ll need to learn a lot more to grasp the entire concept, but it’s exciting to see topics taught like this for us visual learners. Subbed 😁
It's rare when less viewed video gives best explanation. Your presentations are almost like 3Blue1Brown or Khan academy! Don't know why this video has this less view!!
HOLYYYYY FKKK !!!! I really wish I came across your video much before I took the painful ways to learn all this… definitely a big recommendation for all the people I know who just started with optimisation courses. Great work !!!!!
Loved this - very helpful! I knew this a long time ago and forgot much of it, so this was an excellent refresher, accessible to many. (And this is coming from a Stanford / Caltech grad working in computer vision, machine learning, and related things.)
your videos are so good i wish they were a thing when I took my course on continuous optimization. my professor could never. i wish you would keep making them though!!!
Hello Mr. Bachir El Khadir, I recently came across your channel and was truly impressed by your videos and your clear explanations. I've just started working with AI and am also using the Manim library (created by Grant Sanderson) to make animated explanations. I would really appreciate any advice you could offer, and I'm also curious to learn more about how you create your videos.
Excellent video. I especially liked how you linked it back to the root finding version we learned in school. My one beef with this video is that that's an unfair depiction of Tuco.
Another problem is for a negative curvature, the method climbs uphill. E.g. ML Loss functions tend to have a lot of saddle points, which attract the method, so gradient descent is used, because it can find the direction down from the saddle
@Visually Explained, could you help me understand @8:26, why the Newton method can be written from xk - grad^2(f(xk))^-1 * grad(f(xk)) to xk- g(x)/ grad(g(x)) ?
Sure. Consider the quadratic approximation f(x) ~ f(xk) + f'(xk) (x - xk) + 1/2 f''(xk) (x-xk)^2 at the bottom of the screen at 7:06. To minimize the right hand side, we can take the derivative with respect to x and set it to zero (i.e., f'(xk) + f''(xk) (x - xk) = 0). If you solve for x, you get x = xk - 1 / f''(xk) * f'(xk).
@@VisuallyExplained I appreciate your answer and video explanation. I have one confusion. Why do we want to take the derivative in the RHS? In other words, why did we decide to take the minimizer of the quadratic equation as the next step?
@@prub4146 What we are really trying to do is minimize the LHS (i.e., the function f), but it is often hard to do that directly. Instead, we approximate f by a quadratic function (the one in the RHS), and we minimize that quadratic instead. (The minimizer of a quadratic function admits a simple analytical formula, which we find by taking the derivative.) The hope is that the quadratic function is a good enough approximation that its minimum and the minimum of f are close to each other. Let me know if this explanation is clear enough, otherwise I can expand a bit more.
you reading out the whole things made things confusing. Can you explain what did you meant by pick a direction "IE" @1:51 ? Or did you mean i.e. an abbreviation for 'that is'. Hope you don't read next time " = " as double dash.
thank you for this amazing visualization. Is it also possible to find roots of a multivariable vector fuction (f: R^n -> R^m)? The resources I found solved this by using the jacobi matrix such that x_{k+1} = x_{k} - J^{-1} f, where J^{-1} is the inverse or the pseudoinverse. Is this method referred to as the newton method for a vector function or is it a completely different method? Any help and reference to resources would be greatly appreciated.
Iterative optimization towards a target or goal is a syntropic process -- teleological. Convergence (syntropy) is dual to divergence (entropy) -- the 4th law of thermodynamics! Teleological physics (syntropy) is dual to non teleological physics (entropy). Synchronic lines/points are dual to enchronic lines/points. Points are dual to lines -- the principle of duality in geometry. "Always two there are" -- Yoda. Concepts are dual to percepts -- the mind duality of Immanuel Kant. Mathematicians create new concepts all the time from their perceptions or observations.
I have explained this in another comment. Let me paste it here: "Sure. Consider the quadratic approximation f(x) ~ f(xk) + f'(xk) (x - xk) + f''(xk) (x-xk)^2 at the bottom of the screen at 7:06. To minimize the right hand side, we can take the derivative with respect to x and set it to zero (i.e., f'(xk) + f''(xk) (x - xk) = 0). If you solve for x, you get x = xk - 1 / f''(xk) * f'(xk)." Hope this answers your question.
The first iteration gives me 1.25 not 1.7, is this a mistake on the video or am I doing something wrong? x_(k+1)= x-(1/(6(x)))(3(x^2)-3) Evaluating the with the 2 x_(k+1)= 2-(1/(6(2)))(3(2^2)-3)=1.25
You run into another problem with this method when you evaluate the Hessian at a point where it's not positive-definite. Then you're suddenly calculating a saddle point or even a maximum of the approximation which might lead you farther and farther away from the desired minimum of f(x).
Hello, great video. I am currently following a course on non-linear optimization and I would like to make videos like this for my own problems. I think you used manim for this video, is this code available somewhere that I can take a look? thanks
I'm a visual learner and this video is exactly what I'm looking for! Great content!
8:27 is such a revelation! That was exactly what I was confused about, thanks for giving it a special mention!
This is so high quality stuff! Thanks for the graphical explanation at the beginning!
I'm here from yesterday's 3b1b video on Newton's method of finding roots, after wondering if there's any way to use it for minimizing a function. Mainly to see why we can't use it instead of Stochastic Gradiend Descent in Linear Regression. Turns out the Hessian of functions with many components can turn out to be large and computationally intensive, and also that if the second derivative is not a parabola, it can lead you far away from the minima. Still it was nice to see how the operation works in practice, and you mentioned the same points about Hessians too. Good job 😊👍
Just finished calculus 1 and learning about Newton’s method brought me here. The visuals were fantastic and the explanation was clear. I’ll need to learn a lot more to grasp the entire concept, but it’s exciting to see topics taught like this for us visual learners.
Subbed 😁
Great quality! Wow! Your explanations are clear, structured and very well understandable - thank you!
What the what?! Even I understood this. Killer tutorial!
Yayy!
It's rare when less viewed video gives best explanation. Your presentations are almost like 3Blue1Brown or Khan academy! Don't know why this video has this less view!!
This guy knowwwwsssss🔥🔥🙌🙌I love 3blue1brown
HOLYYYYY FKKK !!!! I really wish I came across your video much before I took the painful ways to learn all this… definitely a big recommendation for all the people I know who just started with optimisation courses. Great work !!!!!
Loved this - very helpful! I knew this a long time ago and forgot much of it, so this was an excellent refresher, accessible to many. (And this is coming from a Stanford / Caltech grad working in computer vision, machine learning, and related things.)
your videos are so good i wish they were a thing when I took my course on continuous optimization. my professor could never. i wish you would keep making them though!!!
Hi Bachir, what an interesting series, very helpful. Cant wait to see next episode
Hello Mr. Bachir El Khadir,
I recently came across your channel and was truly impressed by your videos and your clear explanations. I've just started working with AI and am also using the Manim library (created by Grant Sanderson) to make animated explanations.
I would really appreciate any advice you could offer, and I'm also curious to learn more about how you create your videos.
It is indeed a truly amazing explanation, and it helps me to understand Newton method visually.
Your explanation is awesome. Extension from root-finding scenario to minimum-point-finding problem was exactly my question.
It just needs more videos to get rocket growth !! Very Good Quality stuff ..
This is brilliant thank you, hope you give us more visual insight into calculus related things
Loved the graphical presentation
Amazing explaination! This is very helful for understanding. Thanks a lot sir.
Brilliant visualization and explanation
What a concise and informative explanation!!! thank you SO MUCH!! I subscribe your channel from now!
WoW! This is amasing work man, thank you.
Thank you for your amazing comment :-)
Excellent video. I especially liked how you linked it back to the root finding version we learned in school. My one beef with this video is that that's an unfair depiction of Tuco.
Crystal clear explanation, thank you!
Amazing video! I could save a lot of time! Thank you very much.
Another problem is for a negative curvature, the method climbs uphill. E.g. ML Loss functions tend to have a lot of saddle points, which attract the method, so gradient descent is used, because it can find the direction down from the saddle
Very nice video, complicated topic made easy to understand.
@Visually Explained, could you help me understand @8:26, why the Newton method can be written from xk - grad^2(f(xk))^-1 * grad(f(xk)) to xk- g(x)/ grad(g(x)) ?
This was such an awesome explanation, so grateful thank you.
still waiting for the video on Quasi-Newton Methods!! especially BFGS
Amazing video. Looking forward to more.
Is my intuition correct (7:21) if the curvature is high you take a small step and vice-versa?
Brillant explanation, thank you so much.
Excellent, Please explain LBFGS
Hi, can you please explain how do you convert alpha into 1 over second derivative of xk at 7:06? Thank you!
Sure. Consider the quadratic approximation f(x) ~ f(xk) + f'(xk) (x - xk) + 1/2 f''(xk) (x-xk)^2 at the bottom of the screen at 7:06. To minimize the right hand side, we can take the derivative with respect to x and set it to zero (i.e., f'(xk) + f''(xk) (x - xk) = 0). If you solve for x, you get x = xk - 1 / f''(xk) * f'(xk).
@@VisuallyExplained Got it. Thank you so much for the explanation! :)
@@VisuallyExplained I appreciate your answer and video explanation. I have one confusion. Why do we want to take the derivative in the RHS? In other words, why did we decide to take the minimizer of the quadratic equation as the next step?
@@prub4146 What we are really trying to do is minimize the LHS (i.e., the function f), but it is often hard to do that directly. Instead, we approximate f by a quadratic function (the one in the RHS), and we minimize that quadratic instead. (The minimizer of a quadratic function admits a simple analytical formula, which we find by taking the derivative.) The hope is that the quadratic function is a good enough approximation that its minimum and the minimum of f are close to each other. Let me know if this explanation is clear enough, otherwise I can expand a bit more.
@@VisuallyExplained Thank you for the explanation. Thanks
Hey can you do a sum of square, dsos optimization tutorial for post graduate student.
just amazing explanation!!
Truly an amazing video!!
you reading out the whole things made things confusing.
Can you explain what did you meant by pick a direction "IE" @1:51 ?
Or did you mean i.e. an abbreviation for 'that is'.
Hope you don't read next time " = " as double dash.
Do you have a video for quasi newton?
Great video. Thank you!
where we can see quasi-newton video??
Great explanation
Love the video!
sir do you use "MANIM" libray of python to create these beautiful animations in your great videos ?
Very good explanation
oh man is this gold
Great content
Nice videos! May i know what tools do you use to make this figures?
very useful, thanks !
Amazingly presented, thank you.
how come by subtracting the multiple of the slope from the current iterate, we find the minimum point?
Thank you, brilliant stuff.
Great video man. God bless you
Amazing job! Thanks a lot!!
holy cow this was super helpful
Nice explanation
Loved the video
lovely explanation 🤩🤩🤩🤩🤩🤩
Thanks a lot 😊
thank you for this amazing visualization. Is it also possible to find roots of a multivariable vector fuction (f: R^n -> R^m)? The resources I found solved this by using the jacobi matrix such that x_{k+1} = x_{k} - J^{-1} f, where J^{-1} is the inverse or the pseudoinverse. Is this method referred to as the newton method for a vector function or is it a completely different method? Any help and reference to resources would be greatly appreciated.
Iterative optimization towards a target or goal is a syntropic process -- teleological.
Convergence (syntropy) is dual to divergence (entropy) -- the 4th law of thermodynamics!
Teleological physics (syntropy) is dual to non teleological physics (entropy).
Synchronic lines/points are dual to enchronic lines/points.
Points are dual to lines -- the principle of duality in geometry.
"Always two there are" -- Yoda.
Concepts are dual to percepts -- the mind duality of Immanuel Kant.
Mathematicians create new concepts all the time from their perceptions or observations.
The first statement you made explained half of my confusions 😩🤲
can someone please tell me whats the algebra needed for getting the newton method from the taylor series stated in 6:58. thank you in advance
I have explained this in another comment. Let me paste it here:
"Sure. Consider the quadratic approximation f(x) ~ f(xk) + f'(xk) (x - xk) + f''(xk) (x-xk)^2 at the bottom of the screen at 7:06. To minimize the right hand side, we can take the derivative with respect to x and set it to zero (i.e., f'(xk) + f''(xk) (x - xk) = 0). If you solve for x, you get x = xk - 1 / f''(xk) * f'(xk)."
Hope this answers your question.
@@VisuallyExplained thank you it really helped me!
I didnt understand how you changed alpha to 1/f''(x) at 7:00
thankyou . very helpful
The first iteration gives me 1.25 not 1.7, is this a mistake on the video or am I doing something wrong?
x_(k+1)= x-(1/(6(x)))(3(x^2)-3)
Evaluating the with the 2
x_(k+1)= 2-(1/(6(2)))(3(2^2)-3)=1.25
You run into another problem with this method when you evaluate the Hessian at a point where it's not positive-definite. Then you're suddenly calculating a saddle point or even a maximum of the approximation which might lead you farther and farther away from the desired minimum of f(x).
Thank you so much!
Again amazing
This is great.
What is the plotting tool you are using?
Thank! For this video I used the excellent library manim: github.com/3b1b/manim
Good job. I am subscribing !
Awesome, thank you!
Tyvm sir
Great video!
Please use a different backgouind music. It's all weird and out of tune :)
I'm curious to know where are you from, my guesses are egypt and Morocco
Morocco. Was it that obvious? :-)
More!
amazing
Hello, great video. I am currently following a course on non-linear optimization and I would like to make videos like this for my own problems. I think you used manim for this video, is this code available somewhere that I can take a look? thanks
i am sad
you are tooo much underrated :(
Thank you for the words of encouragement, I appreciate it!
holy good.
well done but u overskipped intermediate steps which made u lose me
Thank you for the feedback! Would mind elaborating a little bit on which part of the video I lost you? It will help me a lot for future videos
增添
😢
Newton's Method is now LESS clear then before watching this vid.
Great content