That's really a good question! Sadly I can't answer it now, but I'll use it as an inspiration to look into it, when I have the time. I think the best strategy to approach this problem is to calculate the determinant for some 2 or 3 dimensional functions and then play around with different values for x,y and z.
pretty sure the determinant of anything kind of represents the dimension-specific area of said thing. So the determinant of the hessian matrix might have something to do with the area of the rate of the rate of change of that particular function. I might just be spitting some hot shi tho lol
Gamer Sparta You find the eigenvalues of this matrix after solving the differential equation that optimizes. Find out if this matrix is positive definite, negative definite or semi definite.
Can someone explain why the ideal learning rate for 2 or more dimensions in the gradient descent algorithm is the inverse of the Hessian (matrix of second partial derivatives)?
This guy explains it well: medium.com/@ranjeettate/learning-rate-in-gradient-descent-and-second-derivatives-632137dad3b5 . Intuitively, using the first derivative gives us the change in loss w.r.t x as a straight line; using second derivatives gives us information about the curvature of the loss function.
So is this Hessian matrix is valid only for scalar valued functions right? If my intuition is correct then for a vector valued function of maybe 4 components, would there be 4 Hessian matrices?
Ah, so this is where the formula for the discrimination comes from.We can see that taking the determinant of the Hessian gives the formula for the discriminant.I know it works for R^2. Will verify for R^3 and R^n as an exercise.Thanks!
You can find this video on the Khan Academy website by using the search bar at the top of the screen and typing in "hessian." Here is the link: www.khanacademy.org/math/multivariable-calculus/applications-of-multivariable-derivatives/quadratic-approximations/v/the-hessian-matrix
Sal, if I have 1 equation and 6 independent variables, my partial first derivatives is a vector with 6 terms. If I follow, the Hessian will be a 6x6 matrix. Is that correct? Thanks!!! I contribute to you as your program and platform makes an amazing contribution!
Good day, I was wondering whether you know any python library that has implemented second order gradient descent with hessian error matrix. If you can point me to the right direction, it would be very helpful. Thanks in advance, Kind regards Shantanu
@@joluju2375 Continuously differentiable functions are sometimes said to be of class C1. A function is of class C2 if the first and second derivative of the function both exist and are continuous. More generally, a function is said to be of class Ck if the first k derivatives f′(x), f′′(x), ..., f (k)(x) all exist and are continuous. If derivatives f (n) exist for all positive integers n, the function is smooth or equivalently, of class C∞.
@@matakos22 Thanks. So, for Fxy and Fyx not to be equal, they have to exist. Then, if F is not C2 and Fxy and Fyx exist, it means that Fxy or Fyx is not continuous. Right ?
Well you should think of it this way: d/dx (df/dy), so you take df/dy and then differentiate it with respect to x, so the video is correct. In other words we start with the partial derivative with respect to y and then differentiate it with respect to x.
if anyone is wondering why the mixed derivates are the same: it's Schwarz's theorem.
Grant
thats 3blue1brown, wrong channel
You have the same voice as the guy on 3blue1brown
@David Beyer Was looking for this comment. thanks lol
I noticed that too... then I checked who he was.
No wonder this made sense to me XD He is a blessing
He is the sane guy
Grant Sanderson worked for Khan Academy.
Thank you so much, here I get much more infornation in one day than in university in a month)))
Hey Grant, Love your video's from Khan and 3Blue1Brown!
What does the Hessian matrix represent geometrically? In particular, what does the determinant of the Hessian matrix measure?
That's really a good question! Sadly I can't answer it now, but I'll use it as an inspiration to look into it, when I have the time. I think the best strategy to approach this problem is to calculate the determinant for some 2 or 3 dimensional functions and then play around with different values for x,y and z.
pretty sure the determinant of anything kind of represents the dimension-specific area of said thing. So the determinant of the hessian matrix might have something to do with the area of the rate of the rate of change of that particular function. I might just be spitting some hot shi tho lol
Awesome video! Thank you! And wow! It's 3Blue1Brown's voice doing this video!
the moment I clicked on this link, oh this is the 3blue1brown guy!
Awesome guy.... Mr Sanderson.
I love u Khan!! u save me today
you are perfect, thanks for you videos, and you're fanny mood :)
Yo Khan Academy, thank you for making these videos. They are a real lifesaves at times ^w^
So clear and helpful!
3 blue one brown?
Yes!! It's Grant only..
DUDE and why don't you tell me how to find extrema points with this !!!
Gamer Sparta You find the eigenvalues of this matrix after solving the differential equation that optimizes. Find out if this matrix is positive definite, negative definite or semi definite.
Can someone explain why the ideal learning rate for 2 or more dimensions in the gradient descent algorithm is the inverse of the Hessian (matrix of second partial derivatives)?
This guy explains it well: medium.com/@ranjeettate/learning-rate-in-gradient-descent-and-second-derivatives-632137dad3b5 . Intuitively, using the first derivative gives us the change in loss w.r.t x as a straight line; using second derivatives gives us information about the curvature of the loss function.
thank you so much ! so helpful
So is this Hessian matrix is valid only for scalar valued functions right? If my intuition is correct then for a vector valued function of maybe 4 components, would there be 4 Hessian matrices?
what if the output of function f was a vector of 3 rows instead of a single expression. How would the hessian change?
awesome! :-)
i have a question
what kind of tools are you using when you work??
I really wanna get that blackboard tool :-) thx in advance
a math animating Python library ... i guess
Is this the 3blue1brown guy?
What if the function is a matrix itself? The Hessian matrix will be a matrix of matrices?
...yep
Tensor?
Albanovaphi7 just a block matrix
Do you guys know which lecture/series/playlist is this video from? Please let me know! Thanks!
Wunan Zeng khan academy multivariable calculus playlist
What is the point of finding H, why do we use it? Is it some sort of solution or something, i dont reallly get it
When will the diagonal not be symmetric?
when the function is not continuous
@@ethanbooth1174 Sure of that ? I mean, for the second mixed derivatives to be different, they have to exist.
Ah, so this is where the formula for the discrimination comes from.We can see that taking the determinant of the Hessian gives the formula for the discriminant.I know it works for R^2. Will verify for R^3 and R^n as an exercise.Thanks!
explain
0:00 - 0:06 sorry what? Don't think I've ever been confused so quickly in a tutorial.
how do i know which channel of khan academy is for this video?
You can find this video on the Khan Academy website by using the search bar at the top of the screen and typing in "hessian."
Here is the link:
www.khanacademy.org/math/multivariable-calculus/applications-of-multivariable-derivatives/quadratic-approximations/v/the-hessian-matrix
Thank you for the amazing explanation
Sal, if I have 1 equation and 6 independent variables, my partial first derivatives is a vector with 6 terms. If I follow, the Hessian will be a 6x6 matrix. Is that correct? Thanks!!! I contribute to you as your program and platform makes an amazing contribution!
I know it's not 3blue1Brown answering but you're right.
@@jyly261 thanks for confirming
Thank you sir, this video has given me a good idea
i think in place of hessian you actually have mentioned hessian transpose!
Thank you!
Is there a vector form of the multivariable Taylor series?
I'm pretty sure that's the taylor expansion using the jacobian for derivatives.
고맙습니다
hey, you are 3blue1brown?
Wow, amazing. Thank you!
Is this 3blue1brown as the lecturer?
is he 3 blue 1 brown ???
Is this the same guy as 3 blue 1 brown?
Good day, I was wondering whether you know any python library that has implemented second order gradient descent with hessian error matrix. If you can point me to the right direction, it would be very helpful. Thanks in advance, Kind regards
Shantanu
Jax
When are Fxy and Fyx not equal?
When F is not C2
@@matakos22 Sorry, I don't know what C2 means ... do you have an example please ?
@@joluju2375 Continuously differentiable functions are sometimes said to be of class C1. A function is of class C2 if the first and second derivative of the function both exist and are continuous. More generally, a function is said to be of class Ck if the first k derivatives f′(x), f′′(x), ..., f (k)(x) all exist and are continuous. If derivatives f (n) exist for all positive integers n, the function is smooth or equivalently, of class C∞.
@@matakos22 Thanks. So, for Fxy and Fyx not to be equal, they have to exist. Then, if F is not C2 and Fxy and Fyx exist, it means that Fxy or Fyx is not continuous. Right ?
@@joluju2375 Yes, or they could also be undefined
The video is in fundraiser but the video is from 3b1b hehh? 🧐🧐
And the rest of the video?
when can fxy!=fyx ?
in honor of Otto Hesse
youre related?
Aren't you the guy of 3 Blue 1 Brown?
interesting
I dont get it
fitz?
Genius.
Dude I’m in 8th grade doing calc 1 and I already understand this.
What about fonction whitch have three variable 😩
He explains it at 4:24
@@BayesianBrain thank you
@@BayesianBrain what about a vector valued function?
Grant???
I thought i clicked 3b1b's video
Came for Neo and Morpheus, left disappointed.
go check 3b1b, sound just like you, he is a cool guy
It's the same guy 😂
Joel McAllister i know:))
Jay Shree Ram
👌👌👌👌
If you differntiate x first, then y, shouldnt it be "dxdy" ? why do you keep putting it backwards
No, the way he does it is notationally correct. Oh course, you are free to write things the way you like but he is following the convention.
Well you should think of it this way: d/dx (df/dy), so you take df/dy and then differentiate it with respect to x, so the video is correct. In other words we start with the partial derivative with respect to y and then differentiate it with respect to x.
becausr we move right to left in leibniz notation
First
dddd
You dont explain the mixed derivative thing clearly, disliked.
because there are other videos dedicated to this.
Aye 3B1B
黑人😊