what's J in this? Y values? I'm super confused about this d/dm of m, cz it would be just 1. and m I think is just total number of values. Shouldn't the slope be d/dx of y?
Why am I not surprised with such a lucid and amazing explanation of cost function, gradient descent,Global minima, learning rate ...may be because watching you making complex things seems easy and normal has been one of my habit. Thank you SIR
A small comment at 17:35. I guess it is Derivative of J(m) over m. In other words, the rate of change of J(m) over a minute change of m. That gives us the slope at instantaneous points, especially for non linear curves when slope is not constant. At each point of "m, J(m)", Gradient descent travels in the opposite direction of slope to find the Global minima, with the smaller learning rate. Please correct me if I am missing something. Thanks for a wonderful video on this concept @Krish, your videos are very helpful to understand the Math intuition behind the concepts, I am a super beneficiary of your videos, Huge respect!!.
The video was really great. But I would like to point out that the derivative that you took for convergence theorem, there instead of (dm/dm) it should be derivative of cost function with respect to m . Also a little suggestion at the end it would have been helpful, if you mentioned what m was, total number of points or the slope of the best fit line. Apart from this the video helped me a lot hope you add a text somewhere in this video to help the others.
Hi Krish, Thanks for the video. Some queries/clarifications required: 1. We do not take gradient of m wrt m. That will always be 1. We take the gradient of J wrt m 2. If we have already calculated the cost function J at multiple values of m, then why do we need to do gradient descent because we already know the m where J is minimum 3. So we start with an m , calculate grad(J) at that point and update m with m' = m - grad(J)* learn_rate and repeat till we reach some convergence criteria Please let me know if my understanding is correct.
Before watching this video I was struggling with the concepts exactly like you were struggling in plotting the gradient descent curve. ☺️Thanks for explaining this beautifully.
How can I not say that you are amazing !! I was struggling to understand the importance of gradient descent and u cleared it to me in the simplest way possible.. Thank you so much sir :)
Really thanks you krish. you just cleared my doubts on cost function and gradient descent. First I saw Andrew Ng class but have few doubts after seeing you video. Now its crystal clear.. Thank You...
@@Gayathri-jo4ho This playlist itself is a fantastic place to start, Or can enroll in this course "Machine Learning A-Z by krill eremenkrov" in udemy. The course will give you an intuitive understanding of the ML Algorithms. Then it's up to you to research and study the math behind each concept..Reff (kgnuggets, Medium, MachineLearningplus and lot more)
I feel so sad for him... because only aspired Data science is gonna watch this video so he will have fewer subscribers that are not even comparable with what he is giving... Really hats of you sir,. I have taken 2 online paid class but I don't think they are better thank you, Never.
Please add the indepth math intution of other algorithms like logistic, random forest, support vector and ANN.. Many Thanks for the clearly explained abt linear regression
Hi . Can you please do a video about the architecture of machine learning systems in real world . How does really work in real life .for example how hadop (pig,hive) , spark, flask , Cassandra , tableau are all integrated to create a machine learning architecture. Like an e2e
I think in the Convergence theorem part, the derivative should be d(J(m))/d(m), as in a y-x graph, we take derivative of y wrt x. Here our Y is J(m) and X is m.
Thank you for sharing this insightful video about linear regression. While I found it informative, I'm uncertain about how it addresses the challenge of avoiding local minima. I'd greatly appreciate it if you could provide some insights on this aspect as well.
The graph of the cost function is not gradient descent. The automatic differentiation of cost function with respect to m is gradient decent which is used to update the m.
In 22:50 time sir said when it reaches to global minima the slope value will be 0 and And the value of m will be considered for best fit line but the value of slope and m is same.Please clear doubt @krishan Naik sir
Hi krish, that was an awesome explanation of Gradient Descent. With respect to finding the optimal slope. But in linear regression both slope and the intercept are tweakable parameters, how do we achive the optimal intercept value in linear regression.
Dear Krish: At 14:42' you mention that curve is called gradient descent. I believe this is not true. Gradient descent is not the name of that curve. Gradient descent is an optimization algorithm.
sir please elaborate this topic more, like please add what are the assumptions of linear regression what are the conditions need to be satisfied to apply linear regression. its my humble request to you sir . please i am able to understand those topic better which you teach so sir its my request to you.
H i sir great content and a big fan of your work let me ask a doubt in cost function many books or blogs takes the cost function as 1/NSUMATION( Y - Y^) BUT you used 1/2N SUMATION( Y - Y^) so i was bit confused in that part and tq u for wonderful content thnak you so much sir
I never understood what is a gradient descent and a cost function is until I watch this video 🙏🙏
Best explanation of cost function, we learned it as masters students and the course couldnt explain it as well.. simply brilliant
I have seen many teachers explaining the same concept, but your explainations are next level. Best teacher.
For those who are confused.
The convergence derivative will be dJ/dm.
what's J in this? Y values? I'm super confused about this d/dm of m, cz it would be just 1. and m I think is just total number of values. Shouldn't the slope be d/dx of y?
@@tusharikajoshi8410 it will be the cost or loss (J)
new(m) = m- d(loss or cost)/dm * Alpha(learning rate.
Super helpful
I'dont think because it netwons method actually
Why am I not surprised with such a lucid and amazing explanation of cost function, gradient descent,Global minima, learning rate ...may be because watching you making complex things seems easy and normal has been one of my habit. Thank you SIR
I knew that their will be an Indian that can make all the stuffs easy !! Thanks Krish
A small comment at 17:35. I guess it is Derivative of J(m) over m. In other words, the rate of change of J(m) over a minute change of m. That gives us the slope at instantaneous points, especially for non linear curves when slope is not constant. At each point of "m, J(m)", Gradient descent travels in the opposite direction of slope to find the Global minima, with the smaller learning rate. Please correct me if I am missing something.
Thanks for a wonderful video on this concept @Krish, your videos are very helpful to understand the Math intuition behind the concepts, I am a super beneficiary of your videos, Huge respect!!.
This maths is same as coursera machine learning courses
Thank you sir for this great content ..
Really awesome video , so much better than many famous online portals charging huge amount of money to teach things.
No one can find easiest explanation of gradient descent on youtube. This video is the exception.
The video was really great. But I would like to point out that the derivative that you took for convergence theorem, there instead of (dm/dm) it should be derivative of cost function with respect to m . Also a little suggestion at the end it would have been helpful, if you mentioned what m was, total number of points or the slope of the best fit line. Apart from this the video helped me a lot hope you add a text somewhere in this video to help the others.
you just made the whole concept clear with this video,you are a great teacher
Best video on youtube to understand the intution and math(surface level) behind Linear regression.
Thank you for such great content
Hi Krish, Thanks for the video. Some queries/clarifications required:
1. We do not take gradient of m wrt m. That will always be 1. We take the gradient of J wrt m
2. If we have already calculated the cost function J at multiple values of m, then why do we need to do gradient descent because we already know the m where J is minimum
3. So we start with an m , calculate grad(J) at that point and update m with m' = m - grad(J)* learn_rate and repeat till we reach some convergence criteria
Please let me know if my understanding is correct.
Yes this is correct
I think we have to train the model to reach that min. loss point while performing grad. descent in real life problems.
How to find best Y intercept ?
Best explanation of Linear Regression🙏🙏🙏.Simply wow🔥🔥
This is the best stuff i ever came across on this topic !
I don't see a link on the top right corner for the implementation as you said in the end.
every line you speak..so much important to understand ths concept......thank u
Watched this video 3 times back to back .Now its embaded in my mind forever. Thanks Krish , great explanation !!
i knew the concept of Linear Regression but didn't know the logic behind it.. the way Line of Regression is chosen. Thanks for this!
Such a great explanation of gradient descent and convergence theorem.
Before watching this video I was struggling with the concepts exactly like you were struggling in plotting the gradient descent curve. ☺️Thanks for explaining this beautifully.
How can I not say that you are amazing !! I was struggling to understand the importance of gradient descent and u cleared it to me in the simplest way possible.. Thank you so much sir :)
Really thanks you krish.
you just cleared my doubts on cost function and gradient descent. First I saw Andrew Ng class but have few doubts after seeing you video. Now its crystal clear..
Thank You...
Similar to Andrew NG course from coursera kind of revision for me 😊😊
Can you please suggest me how to begin with in order to learn machine learning
@@ArpitDhamija did you have knowledge on machine learning??if so, please suggest me I saw so many but I couldnt able to .
@@Gayathri-jo4ho This playlist itself is a fantastic place to start, Or can enroll in this course "Machine Learning A-Z by krill eremenkrov" in udemy. The course will give you an intuitive understanding of the ML Algorithms. Then it's up to you to research and study the math behind each concept..Reff (kgnuggets, Medium, MachineLearningplus and lot more)
@@shhivram929 thank you
Exactly. This is the equivalent of Andrew Ng's description
the only video that made gradient descent so simple that even 2nd grade students woud understand
The best I've come across on gradient descent and convergence theorem
Now I understand what GD means. Thanks always, Krish
Finally I understood the perfect answer of gradient descent..
So beautifully explained...did not find anywhere this kind of clarity....keepnup the good work....
I feel so sad for him... because only aspired Data science is gonna watch this video so he will have fewer subscribers that are not even comparable with what he is giving... Really hats of you sir,. I have taken 2 online paid class but I don't think they are better thank you, Never.
Great! Fantastic! Fantabulous! tasting the satisfaction of learning completely - only in your videos!!!!!
I had so much difficulty in understanding gradient descent but after this video
It's perfectly clear
Bro, how we update the slope
Value of the video is just undefinable! Thanks a lot :)
god bless you too sir, explained very well. basics helps to grow high level understanding
Please add the indepth math intution of other algorithms like logistic, random forest, support vector and ANN.. Many Thanks for the clearly explained abt linear regression
Thank you Soo much Krish. No where I could find such a detailed explanation
You made my Day!
Hi . Can you please do a video about the architecture of machine learning systems in real world . How does really work in real life .for example how hadop (pig,hive) , spark, flask , Cassandra , tableau are all integrated to create a machine learning architecture. Like an e2e
Thank you my friend, you are a great teacher!
Really great sir. I very much thank you sir for this clear explanation
We would also recommend your videos to our students!
I think in the Convergence theorem part, the derivative should be d(J(m))/d(m), as in a y-x graph, we take derivative of y wrt x. Here our Y is J(m) and X is m.
Ya also I think this thing.
This guy was born to teach
Your explanations are the clearest!!!
After watching this 3 times everything is clear
Repetition is the key
never found a better explaination
Thanq so much for all your efforts.... Knowledge, rate of speech and ability to make thing easy are nicest skill that you hold...
Awesome!! Cleared all doubts seeing this video! Thanks alot Mr. Krish for creating indepth content on such subject!
Best video on theory of linear regression! Thankyou soo much Krish!
I wish I could like this thousand times.
Thank you for sharing this insightful video about linear regression. While I found it informative, I'm uncertain about how it addresses the challenge of avoiding local minima. I'd greatly appreciate it if you could provide some insights on this aspect as well.
Thankyou for this awesome explanation!
your videos are clear and easy to understand
The graph of the cost function is not gradient descent. The automatic differentiation of cost function with respect to m is gradient decent which is used to update the m.
Thank You Sir, You have explained everything about gradient Descent in the best possible easiest way !!
Plz try to upload videos on this series in span of 2 days...
couldnt undertsand when andrew Ng was teaching but you bro !!!
Implementation part:
Multiple linear Regression - ua-cam.com/video/5rvnlZWzox8/v-deo.html
Simple linear Regression - ua-cam.com/video/E-xp-SjfOSY/v-deo.html
Yaar you nailed it man after watching sooo many videos i had some Idea , By Finishing your Video now i m completely clear 😍😍😍😍
Right
Oh my gosh this is awesome tutorial I ever seen God bless you sir🤩🤩
Thank you so much, Krish!
In 22:50 time sir said when it reaches to global minima the slope value will be 0 and And the value of m will be considered for best fit line but the value of slope and m is same.Please clear doubt @krishan Naik sir
when you are writing convergence theorm it should be m - d(j(m))/dm * alpha
Great Tut sir got things pretty quick with this video ty
This is the equivalent of Andrew Ng's description. But I never understood this concept until watching this video.
Nice Explanation, I like this.
good expplanation now clear all queries
Thank you Krish bhaiya!
Sir No Words to explain simply super b
Great...
Very good and detailed explanation
Great sir. Love this video
Thanks so much sir.. you're doing good for the community
As always Krish very well explained!!
It should be derivative of J(m) w.r.t. m which will give slope of J vs m curve
Loved it. Thanks Krish.
Hi krish, that was an awesome explanation of Gradient Descent. With respect to finding the optimal slope.
But in linear regression both slope and the intercept are tweakable parameters, how do we achive the optimal intercept value in linear regression.
Thank you! This video was so good!
Excellent Explanation
Thankyou sir...Get to learn so much from you.
Finally... I got to know how it works 👍
very well explained Thank you.
Dear Krish: At 14:42' you mention that curve is called gradient descent. I believe this is not true. Gradient descent is not the name of that curve. Gradient descent is an optimization algorithm.
my god that was clear as crystal...thanks krish
sir please elaborate this topic more, like please add what are the assumptions of linear regression
what are the conditions need to be satisfied to apply linear regression.
its my humble request to you sir . please i am able to understand those topic better which you teach so sir its my request to you.
This video is really helpful.
I generally dont comment . But you are like angle for students like me who hate maths but love Program ming
H i sir great content and a big fan of your work let me ask a doubt in cost function many books or blogs takes the cost function as 1/NSUMATION( Y - Y^) BUT you used 1/2N SUMATION( Y - Y^) so i was bit confused in that part and tq u for wonderful content thnak you so much sir
Best explanation. Thank you!
Thanks for all great prepared videos, I think you meant (deriv.J(m) / deriv(m)) at 17'.45", is it correct?
Thanks Krish u are helping alot
you are ultimate, got answers to some many questions, video is good.
also need to note that the gradient descent should not be taken very very small as it will take a very long time to reach the global minimum
great video sir, so lucid
Great video I understood the concept
Hats off
Please always link the previous videos to help go through the topics in sequence
Nice tutorial. Thank you
lovely! love it.
In a single sentence "You're best"