3Blue1Brown is a great channel so is your explanation. Kudos to you! Also, it is quite appreciable how you positively promote and credit other's good work. That kind of Genuity is much needed.
For people who wants to know whats behind of scene: The reason we get partial derivative m t function (mse): - 2/n (summation) x_i ( y_i (mx_i+b)) is due to chain rule in calculus. We want to take m deriviative and as you see m would be gone as m^(1) and m^(1-1) = 1 and leave only x_i. with chain rule we dissect the function. so suppose we have random function F(m)= (am+b)^2, we would deal with (am+b)^2 first -> 2*(am+b) X df/dm (am+b) -> 2*(am+b) X a . likewise you'd use chain rule for same MSE above. and get - 2/n (summation) x_i ( y_i (mx_i+b)) Please don't accept as it is then you never learn why things are working completely and come up with your own solution. Easy way is never get you where you want it.
None really understands why we have -(2/n) instead of 2/n. if you do the calculations, even with the chainrule, you will get 2/n, never will get negative values!
Stochastic vs Batch vs Mini gradient descent: ua-cam.com/video/IU5fuoYBTAM/v-deo.html Step by step roadmap to learn data science in 6 months: ua-cam.com/video/H4YcqULY1-Q/v-deo.html Machine learning tutorials with exercises: ua-cam.com/video/gmvvaobm7eQ/v-deo.html
I followed tonnes of tutorials on gradient descent. Nothing came close to the simplicity of your explanation. Now I have a good grasp of this concept! thanks for this sir!
I’m so excited to see you uploaded a new video on machine learning. I’ve watched your other 3 a couple of times. They’re really top notch. Thank you. Please keep this series going. You’re a great teacher too.
Thank you so much for the detailed explanation! I have difficulties understanding these theories but most of the channels just explain without mentioning the basics. With your explanation, it is now it is soooo clear! amazing!!
This was an excellent explanation! Not too technical and explained in simple terms without losing its key elements. I used this to supplement Andrew Ng's Machine learning course on Coursera (which has gotten technical real quick) and it's been really helpful thanks
who are the people disliking these videos. These people work hard and make these videos for us. Please if you don't like it, don't watch it but don't dislike it. It is misleading to the people who come to watch these videos. I know many of us have studied some of these concepts before, but he is making videos for everyone and not for a few section of people. I feel that this channel's videos are amazing and doesn't deserve any dislikes.
Thanks ayush. I am moved by your comment and kind words. I indeed put lot of effort in making these videos. Dislikes are fine but at the same time if these people put a reason on why they disliked, it will help me a lot in terms of feedback and future improvements 😊
I think this is best gradient descent tutorial even better than andrew ng sir I got stuck with andrew sir tutorial and later came up here Finally got it...Thanks a lot bro🙏🙏
Ty for this wonderful tutorials. Literally the best channel I found on youtube for Data Science. Have followed your guidelines since the beginning and it has helped me very much. Exercise Output: Learning Rate = 0.0002, m = 1.0177381667350405, b = 1.9150826165722297, Cost = 31.604511334602297, Previous Cost = 31.604511334602297, Iteration = 415533 PS: Try it in pycharm jupyter get's stuck after 91915 iterations.
Thanks for sharing your result, my most recent tweak only reduced Cost to 31.60451133489039. With the build-in LinearRegression(the sklearn Module approach), the Cost end up as 31.60451133352958, so, use that as the benchmark I suppose.
This tutorial made me finally understand gradient descent and cost function ... I dont know you how u did but you did... thanks man. I really appreciate
Sir ,your explanation is the best i have found for the ML!! recitification : in the jupyter notebook code given in the github there is a small error in the plt.scatter() function we should use linewidths=5 if we take it as linewidth='5' it genrates a type error. do check it !!
i'm confused. This is something very new to me despite I've studied calculus in my undergrad years. I did not get it fully but the code worked from my end. Perhaps soon, the more I get into different models, I'd slowly understand this. Thanks for sharing all these.
I don't know why you are so underrated. Only 73K SUBSCRIBERS. You deserve way more than that, I mean the way you clear the concepts. You're simply awesome man.
Hey: you are absolutely excellent. I have seen many guys offering machine learning tutorials. None is as simple, as clear and as educative as you are. Best regards, Sukumar Roy Chowdhury - ex Kolkata, Portland, OR, USA
when you calculate partail derivatives, dont assume x or y zero, assume them constants instead. for example f(x,y)= x*y your partial derivatives will be 0 but it should be x and y
no why partial deriative will be zero ,we have to analyze it as a constant df(x,y)/dx=x.dy/dx+y this will be the derivative with respect to x and df(x,y)/dy=x+y.dx/dy.
Thank you Now it's Cristal clear what is gradient function , cost function (MSE) , sloap , intercept , what these terms is and how scikit learn library uses to implement such machine learning content
Hi Sir, learning_rate = 0.001 m 3.893281222433439e+95, b 5.493754515745566e+93, cost 1.000639601934861e+193, iteration 102m This is my result. You are really a great teacher.
Dear Sir, Thank you for replying, actually that day after I had posted my comment I have realized that I have not reached the minima value, actually "cost 1.000639601934861e+193" is written in scientific notation so it looks like very small, but actually its not. Then I have changed my learning rate and number of iterations and now below is my result:cost 1.000639601934861e+193 m 1.0201403245621825, b 1.7448479267089132, cost 31.606177703903207, iteration 99999
The Exercise that you have shared is taking many no of iterations to get to the correct intercepts and coefficients... my laptop hung many times doing that problem 😵💫😵💫😵💫😵💫
Please help me, I've a doubt. While calculating slope of cost function, if we don't know the cost function beforehand, how can we calculate the slope of cost function ? I mean, if I know that my cost function looks like a sigmoid(for eg.) Then I can use Sigmoid Dervative to find out the Slope of Cost function. But If don't know what my cost function looks like, how can I decide which derivative formula to use, to calculate slope ?
this thing in solution --> ''' Good students always try to solve exercise on their own first and then look at the ready made solution I know you are an awesome student !! :) Hence you will look into this code only after you have done your due diligence. If you are not an awesome student who is full of laziness then only you will come here without writing single line of code on your own. In that case anyways you are going to face my anger with fire and fury !!! '''it really works :P thanks
Eddited* struggled a bit with the math.isclose() function. m = 1.017738166... b = 1.9152193111... with a learning rate of 0.0002 I had 0.00018 learning rate in the beginning but i found out that i had a small type error. Thank you for your time and your precious knowledge that you share with us free of charge!
taking the equation y=2x+3 , where 2 is the slope and 3 is the intercept , by calculating the values of x and y satisfying the equation at 1:21, he came up with the given arrays of x and y. the arrays are points which lie on the straight line y=2x+3
guys i have a doubt when the cost is low we get higher accuracy but when we had 10 iterations we had like cost on 0. but when we did 1000 iterations we got 1. cost but we got good accuracy? why that?
I tried it and it was fun, I took iterations = 1000000, learning_rate= 0.0002 from the gradient_descent function m= 1.0192813318173286, b = 1.8057225128259167, in 120760 iterations, while from sklearn regression model coefficient = 1.01773624 and intercept = 1.9152193111568891.
Great video and teaching method. You have an art of keeping things simple but still teach advanced concepts. I get a very good and quick overview and understanding from your videos. Thanks a lot.
Question: A formula is shown for MSE at 4:30 in this video replacing y predicted with slope of x formula. Xi is the input so mxi+b would give yi not y predicted. So (Yi - Y predicted) would be zero. I think you need to substitute Yi with slope of x and not y predicted. Correct me if I am wrong. Great videos btw I just started learning ML and your videos are amazing. Thanks for making these!
Hi, i just wanna say that it doesn´t need the list comprehension (23:18): cost = (1/n) * sum( [val**2 for val in (y-y_predicted)] ) = (1/n) * sum( (y - y_pred)**2 ) I like your video, Thank you very much! Blessings!
In machine learning , we have input-output ; by help of these values , we derive 'equation' known as prediction function . Firstly we draw the line which is most appropriate and passes through nearest points of o/p values . Then calculate Mean Squared Error ( popular cost function) = 1/n *( summation i =1 to i = n ) *{[ yi - y_predicted]} square or MSE = (1/n) * Σ(actual - forecast)2 y_predicted = mx + b Gradient descent is an algorithm that finds best fit line for given training data set . Code : import numpy as np def gradient_descent(X, y): m_curr = b_curr = 0 iterations = 10000 n = len(X) learning_rate = 0.08 for i in range(iterations): y_predicted = m_curr*x + b_curr cost = (1/n)*sum([val**2 for val in (y-y_predicted)]) md = -(2/n)*sum(x*(y-y_predicted)) #derivative of m bd = -(2 / n) * sum(y - y_predicted) #derivative of b m_curr = m_curr - learning_rate*md b_curr = b_curr - learning_rate * bd print("m {},b {},cost {} , iteration {}".format(m_curr,b_curr,cost,i)) x = np.array([1,2,3,4,5]) y = np.array([5,7,9,11,13]) gradient_descent(x, y) We are supposed to find the learning rate for which the cost decreases continously
Sir, while explaining the theory you said that we are taking different value for the steps to reach the minima. But while doing the practical, you gave a constant value to learning rate. But according to your definition, learning rate should be dynamic in nature.
Sir, I have another doubt we are importing and using linear regression from sklearn. Will the gradient descent happens inside the linear regression model and gives us the result or should I use the gradient descent model separately?
Hey, thanks for creating all these playlists. These are so good. I think the viewers should at least like and comment in order to show some love and support.
It was a very useful video. After watching many other videos, I understood the concept in the best way after watching your video. Keep making such tutorials which are simple and easy to understand the complex topics. Thankyou.
Hi, Why we are doing only subtraction every time gradient descent function? What if the m gradient is negative? Don't we write like this m_curr = m_curr + learning_rate * md ? Thanks in advance
The tangent line for each m will slant downwards as we converge for m=0 .So we will have negative slope only .We are subtracting each time just to get a new smaller value of m.
I have a question in the end, how would we know that expected value of m & b are 2 & 3 while adjusting the parameters? because teacher here is adjusting the parameters so the values of m & b remain 2 & 3. Eventhough we got even lower cost function which was around 0.00010 in his first run of the program. confused. Another question the goal here is to find the lower cost function of some expected values of m & b?
Note to the commenters below who say you don't need the for loop. The reason that the instructor is using a for loop with the variable "i" that is not used in the body of the loop, is that he wants to be able to print out the intermediate values of the variables so that you can see them changing while they converge to the solution. If he went directly to the solution without the for loop, you would not see the printed variables changing.
i have been searching for Gradient Descent practical understanding. I hit the jack pot. Thx a lot. Question --> If we can find the predicted function through manual hit and trial, what's the purpose using the so called predefined ML models ?
Pritam, its a good question. With random hit and trial it can take so many permutation and combination and computing power and you will still not reach the answer. It is like you are trying to predict someone's 12 character password, can you do that effectively? Answer is nope. But instead everytime you predict someone's password, lets say I give you a hint on how right or wrong you are. for example I will let you you got your first 2 character right. Now your trial and error strategy is optimized and you can reach your answer in efficient way. GD is exactly that, every step on your trail and error it tells you how wrong you are and gives you a direction or hint on your next trail.
While doing the Gradient descent exercise, overflow error is occurring. Is this because the data frame is not totally uniform? Or is there something I can do about it?
Sir my question is I have gone through some of this ML series , and I feel like I can understand concepts, but problem is I am new to coding , I from another field I need to learn coding . Can you suggest, what I need to learn first for ML .
really nice and simple explanation. i tried the practice question and the cost function is stuck at 31.something with 'm'=1.04 , 'b'=0.001, "learning_rate" = 0.0002 and not reducing so i guess the number of iterations needed is around 50. Am i correct or i did something wrong.
You use for loop on y_predict to find value of cost but there is only one value of y_predict so why did you use for loop? You can do it in simple way by taking sqare of y_predict ! Right?
Check out our premium machine learning course with 2 Industry projects: codebasics.io/courses/machine-learning-for-data-science-beginners-to-advanced
Sir can you please upload the slides also sir
I've been struggling with my online lectures on machine learning. Your videos are so helpful. I can't thank you enough!
👍👍🙏
3Blue1Brown is a great channel so is your explanation. Kudos to you!
Also, it is quite appreciable how you positively promote and credit other's good work. That kind of Genuity is much needed.
How to learn coding for beginners | Learn coding for free: ua-cam.com/video/CptrlyD0LJ8/v-deo.html
For people who wants to know whats behind of scene:
The reason we get partial derivative m t function (mse): - 2/n (summation) x_i ( y_i (mx_i+b)) is due to chain rule in calculus.
We want to take m deriviative and as you see m would be gone as m^(1) and m^(1-1) = 1 and leave only x_i. with chain rule we dissect the function.
so suppose we have random function F(m)= (am+b)^2, we would deal with (am+b)^2 first -> 2*(am+b) X df/dm (am+b) -> 2*(am+b) X a . likewise you'd use chain rule for same MSE above. and get - 2/n (summation) x_i ( y_i (mx_i+b))
Please don't accept as it is then you never learn why things are working completely and come up with your own solution. Easy way is never get you where you want it.
None really understands why we have -(2/n) instead of 2/n. if you do the calculations, even with the chainrule, you will get 2/n, never will get negative values!
@@datalearningsihan I think it is indeed to prevent any negative values to occur.
I have gone through so many materials and couldn't understand a thing on these, but this video is amazing .Thanks for putting all you videos.
Glad it was helpful!
Stochastic vs Batch vs Mini gradient descent: ua-cam.com/video/IU5fuoYBTAM/v-deo.html
Step by step roadmap to learn data science in 6 months: ua-cam.com/video/H4YcqULY1-Q/v-deo.html
Machine learning tutorials with exercises:
ua-cam.com/video/gmvvaobm7eQ/v-deo.html
It has become so clear that I am gonna teach it to my dog.
👍🙂
Just Do it....
Dont do it!
He may become a threat to humanity!
@@Austin-pw2ud he may become the cleverest but he'll remain being a good boy
@@eresque7766 ouuuuuchh! Tht touched my ♥
Finally, found the best ML tutorials. Coding with mathematics combined and explained very clearly. Thank you!
I followed tonnes of tutorials on gradient descent. Nothing came close to the simplicity of your explanation. Now I have a good grasp of this concept! thanks for this sir!
👍☺️
This is the best tutorial i have ever seen. This is truly from scratch. Thank you so much
I’m so excited to see you uploaded a new video on machine learning. I’ve watched your other 3 a couple of times. They’re really top notch. Thank you. Please keep this series going. You’re a great teacher too.
Thank you so much for the detailed explanation! I have difficulties understanding these theories but most of the channels just explain without mentioning the basics. With your explanation, it is now it is soooo clear! amazing!!
If there will be an award of best teacher of theworld so the the award would go to this person, programming hero and brackeys
🙏 thanks for your kind words ayuro
@@codebasics Please can you make a playlist on tutorial of opencv python?
You explained in the simplest way this complex concept. Best teacher in the world 🎉🎉
Glad you liked it ! 😊
This was an excellent explanation! Not too technical and explained in simple terms without losing its key elements. I used this to supplement Andrew Ng's Machine learning course on Coursera (which has gotten technical real quick) and it's been really helpful thanks
Glad you found it useful Chia Jeng.
Sir u r the best teacher I ever got for Machine Learning.
Glad it was helpful!
who are the people disliking these videos. These people work hard and make these videos for us. Please if you don't like it, don't watch it but don't dislike it. It is misleading to the people who come to watch these videos. I know many of us have studied some of these concepts before, but he is making videos for everyone and not for a few section of people. I feel that this channel's videos are amazing and doesn't deserve any dislikes.
Thanks ayush. I am moved by your comment and kind words. I indeed put lot of effort in making these videos. Dislikes are fine but at the same time if these people put a reason on why they disliked, it will help me a lot in terms of feedback and future improvements 😊
This video is just enough to describe the excellence of your explanation. Simply mind blowing.
can you help me how to plot all values of m and b on chart?
I think this is best gradient descent tutorial even better than andrew ng sir
I got stuck with andrew sir tutorial and later came up here
Finally got it...Thanks a lot bro🙏🙏
this is the best ML course I've ever came upon !
Waooo, for a long time I've struggled to really understand the gradient descent algorithm. I feel like a pro
Ty for this wonderful tutorials. Literally the best channel I found on youtube for Data Science. Have followed your guidelines since the beginning and it has helped me very much.
Exercise Output: Learning Rate = 0.0002, m = 1.0177381667350405, b = 1.9150826165722297, Cost = 31.604511334602297, Previous Cost = 31.604511334602297, Iteration = 415533
PS: Try it in pycharm jupyter get's stuck after 91915 iterations.
how did you come to this learning rate to be your input ?
how did you tweek the number of iterations ?
Thanks for sharing your result, my most recent tweak only reduced Cost to 31.60451133489039. With the build-in LinearRegression(the sklearn Module approach), the Cost end up as 31.60451133352958, so, use that as the benchmark I suppose.
This tutorial made me finally understand gradient descent and cost function ... I dont know you how u did but you did... thanks man. I really appreciate
You're very welcome Prince :) I am glad your concepts are clear now.
codebasics no problem keep it up you’re a great teacher
Indians as always, so smart and brilliant people
Thank you for the video it helped me a lot
You are the best teacher for data science... thanks
Glad you enjoyed it
This was a difficult topic for me; then I spent the time to watch your video, thank you for making my learning easier! Very nice explanation.
👍😊
can you help me how to plot all values of m and b on chart?
so calmly and nicely u have explained a tough topic to beginners
It's the most helpful video I have seen till now on gradient descent . Great work . Looking forward for more videos on machine learning .
can you help me how to plot all values of m and b on chart?
Is this the same as:
reg = linear_model.LinearRegression()
reg.fit()
that was shown in your previous two videos?
Thank you, I think I found the right channel for machine learning
Great. Happy learning.
at 4:00 why are we squaring them and not doing ^4 , it would also make sure that the expression is always positive ?
Omg!!! This is my first time seeing people to calculate how gradients decent works!!!!
thank you for such easy explanation. was reading about gradient descent many times but this is the first time I understood the math behind that.
Sir ,your explanation is the best i have found for the ML!!
recitification : in the jupyter notebook code given in the github there is a small error in the plt.scatter() function we should use linewidths=5
if we take it as linewidth='5' it genrates a type error. do check it !!
i'm confused. This is something very new to me despite I've studied calculus in my undergrad years. I did not get it fully but the code worked from my end. Perhaps soon, the more I get into different models, I'd slowly understand this. Thanks for sharing all these.
Yup clarie. The tip here is to go slowly without getting overwhelmed. Don't give up and slowly you will start understanding it 😊👍
I don't know why you are so underrated. Only 73K SUBSCRIBERS. You deserve way more than that, I mean the way you clear the concepts. You're simply awesome man.
I am happy this was helpful to you
now he got 281K and in future I expect it to be more :D
Waoh ,waoh. Codebasics to the world.You are such a great teacher sir.Thanks for sharing these series.........
Best video on the topic I’ve seen so far!
Thanks
Hey: you are absolutely excellent. I have seen many guys offering machine learning tutorials. None is as simple, as clear and as educative as you are. Best regards, Sukumar Roy Chowdhury - ex Kolkata, Portland, OR, USA
Sukumar, I am glad this video helped 👍🙏
The best tutorial on Gradient Descent !
when you calculate partail derivatives, dont assume x or y zero, assume them constants instead.
for example
f(x,y)= x*y
your partial derivatives will be 0
but it should be x and y
no why partial deriative will be zero ,we have to analyze it as a constant df(x,y)/dx=x.dy/dx+y this will be the derivative with respect to x and df(x,y)/dy=x+y.dx/dy.
Great tutorial, explained in very easy language in very less time.
Glad you liked it ram.
Thank you
Now it's Cristal clear what is gradient function , cost function (MSE) , sloap , intercept , what these terms is and how scikit learn library uses to implement such machine learning content
Hi Sir,
learning_rate = 0.001
m 3.893281222433439e+95, b 5.493754515745566e+93, cost 1.000639601934861e+193, iteration 102m
This is my result.
You are really a great teacher.
Dear Sir,
Thank you for replying, actually that day after I had posted my comment I have realized that I have not reached the minima value, actually "cost 1.000639601934861e+193" is written in scientific notation so it looks like very small, but actually its not. Then I have changed my learning rate and number of iterations and now below is my result:cost 1.000639601934861e+193
m 1.0201403245621825, b 1.7448479267089132, cost 31.606177703903207, iteration 99999
best video on gradient descent and cost function. understood the match pretty well., excellent,. love from pakistan
I wanted to thank you before ending watching the video just to tell you that you make my day by implying this lesson
sara i am glad you liked it and thanks for leaving a comment :)
The Exercise that you have shared is taking many no of iterations to get to the correct intercepts and coefficients... my laptop hung many times doing that problem 😵💫😵💫😵💫😵💫
code (try this):
import pandas as pd
import numpy as np
import math
data = pd.read_csv("D:\\Machine_learning\\Grad_des\\test_scores.csv")
x = data['math'].to_numpy()
y = data['cs'].to_numpy()
def gradient_descent(x,y):
m_curr = b_curr = 0
iterations = 10000
n = len(x)
learning_rate = 0.001
prev = 0
for i in range(iterations):
y_predicted = m_curr * x + b_curr
md = -(2/n)*sum(x*(y-y_predicted))
bd = -(2/n)*sum(y-y_predicted)
m_curr = m_curr - learning_rate * md
b_curr = b_curr - learning_rate * bd
cost = (1/n) + sum([val**2 for val in (y-y_predicted)])
if math.isclose(cost,prev,rel_tol=1e-09):
break
print("m {}, b {}, cost {}, iteration {}".format(m_curr,b_curr,cost,i))
prev = cost
gradient_descent(x,y)
Please help me, I've a doubt.
While calculating slope of cost function, if we don't know the cost function beforehand, how can we calculate the slope of cost function ? I mean, if I know that my cost function looks like a sigmoid(for eg.) Then I can use Sigmoid Dervative to find out the Slope of Cost function.
But If don't know what my cost function looks like, how can I decide which derivative formula to use, to calculate slope ?
Finalmente aprendí Gradient descent, Finally I leaned Gradient Descent, thank you so much 🙏
BONNE!!!
this thing in solution --> ''' Good students always try to solve exercise on their own first and then look at the ready made solution
I know you are an awesome student !! :)
Hence you will look into this code only after you have done your due diligence.
If you are not an awesome student who is full of laziness then only you will come here
without writing single line of code on your own. In that case anyways you are going to
face my anger with fire and fury !!!
'''it really works :P
thanks
Ha ha nice.i knew it will work for someone atleast 😊
sumit you are awesome
Eddited*
struggled a bit with the math.isclose() function.
m = 1.017738166...
b = 1.9152193111... with a learning rate of 0.0002
I had 0.00018 learning rate in the beginning but i found out that i had a small type error.
Thank you for your time and your precious knowledge that you share with us free of charge!
Thanks to you I finally understood what the gradient descent is
Best ever video on Gradient Descent.
This is so far the best I have come across. My question is: How 2 and 3 the expected values for m and b? Are they some constant value?
Thanks
taking the equation y=2x+3 , where 2 is the slope and 3 is the intercept , by calculating the values of x and y satisfying the equation at 1:21, he came up with the given arrays of x and y. the arrays are points which lie on the straight line y=2x+3
guys i have a doubt when the cost is low we get higher accuracy but when we had 10 iterations we had like cost on 0. but when we did 1000 iterations we got 1. cost but we got good accuracy? why that?
I think best explaination of Gradient Descent on the Internet!
I am happy this was helpful to you.
You can say that again...It is the best I have seen.Thanks so much sir
One of the best Tutorial for Gradient Descent.
I tried it and it was fun,
I took iterations = 1000000,
learning_rate= 0.0002
from the gradient_descent function m= 1.0192813318173286, b = 1.8057225128259167, in 120760 iterations,
while from sklearn regression model coefficient = 1.01773624 and intercept = 1.9152193111568891.
what should be the range of iterations? because the lower the number the better
Great video and teaching method. You have an art of keeping things simple but still teach advanced concepts. I get a very good and quick overview and understanding from your videos. Thanks a lot.
Question: A formula is shown for MSE at 4:30 in this video replacing y predicted with slope of x formula. Xi is the input so mxi+b would give yi not y predicted. So (Yi - Y predicted) would be zero. I think you need to substitute Yi with slope of x and not y predicted. Correct me if I am wrong.
Great videos btw I just started learning ML and your videos are amazing. Thanks for making these!
Hi, i just wanna say that it doesn´t need the list comprehension (23:18):
cost = (1/n) * sum( [val**2 for val in (y-y_predicted)] ) = (1/n) * sum( (y - y_pred)**2 )
I like your video, Thank you very much!
Blessings!
Yup I know I kept it to make tutorial very easy and simple for those who don't understand vector operations through numpy
great and simple approach to learning gradient descent . Thank you for your effort
Superb!!
Your lectures are very good and make complicated things very easy. May you keep growing in your life.
Such an excellent tutorial, the clearest I have seen on this topic. Kudos. Thank you.
Best explanation even seen. Love from Bangladesh
In machine learning , we have input-output ; by help of these values , we derive 'equation' known as prediction function .
Firstly we draw the line which is most appropriate and passes through nearest points of o/p values .
Then calculate Mean Squared Error ( popular cost function) = 1/n *( summation i =1 to i = n ) *{[ yi - y_predicted]} square or MSE = (1/n) * Σ(actual - forecast)2
y_predicted = mx + b
Gradient descent is an algorithm that finds best fit line for given training data set .
Code :
import numpy as np
def gradient_descent(X, y):
m_curr = b_curr = 0
iterations = 10000
n = len(X)
learning_rate = 0.08
for i in range(iterations):
y_predicted = m_curr*x + b_curr
cost = (1/n)*sum([val**2 for val in (y-y_predicted)])
md = -(2/n)*sum(x*(y-y_predicted)) #derivative of m
bd = -(2 / n) * sum(y - y_predicted) #derivative of b
m_curr = m_curr - learning_rate*md
b_curr = b_curr - learning_rate * bd
print("m {},b {},cost {} , iteration {}".format(m_curr,b_curr,cost,i))
x = np.array([1,2,3,4,5])
y = np.array([5,7,9,11,13])
gradient_descent(x, y)
We are supposed to find the learning rate for which the cost decreases continously
Sir, while explaining the theory you said that we are taking different value for the steps to reach the minima. But while doing the practical, you gave a constant value to learning rate. But according to your definition, learning rate should be dynamic in nature.
Nope. Learning rate is constant but the value of derivative changes dynamically at each step
Sir, I have another doubt we are importing and using linear regression from sklearn. Will the gradient descent happens inside the linear regression model and gives us the result or should I use the gradient descent model separately?
Hey, thanks for creating all these playlists. These are so good. I think the viewers should at least like and comment in order to show some love and support.
Thank u! what does scikit regression give as m and b then if we don’t use gradient descent and is it even worth to do it?
It was a very useful video. After watching many other videos, I understood the concept in the best way after watching your video. Keep making such tutorials which are simple and easy to understand the complex topics. Thankyou.
Hi,
Why we are doing only subtraction every time gradient descent function?
What if the m gradient is negative? Don't we write like this m_curr = m_curr + learning_rate * md ?
Thanks in advance
The tangent line for each m will slant downwards as we converge for m=0 .So we will have negative slope only .We are subtracting each time just to get a new smaller value of m.
so calmly and nicely u have explained a tough topic to beginners. very thank full to you . Waiting for your new videos.
Glad you enjoyed it
Very clear, concise and helpful! Thank you !
sir, at 25:40 you say that the optimum value for m and b is 2 and 3 , how do we know that??
I have a question in the end, how would we know that expected value of m & b are 2 & 3 while adjusting the parameters? because teacher here is adjusting the parameters so the values of m & b remain 2 & 3. Eventhough we got even lower cost function which was around 0.00010 in his first run of the program. confused. Another question the goal here is to find the lower cost function of some expected values of m & b?
i watched this video the third times and after third time, now this is understandable.
glad that it helped.
your are doing fabulous work ............so easy explanation for tough topics
I am happy this was helpful to you.
Note to the commenters below who say you don't need the for loop. The reason that the instructor is using a for loop with the variable "i" that is not used in the body of the loop, is that he wants to be able to print out the intermediate values of the variables so that you can see them changing while they converge to the solution. If he went directly to the solution without the for loop, you would not see the printed variables changing.
i have a doubt.
can you help me how to plot all values of m and b on chart?
i have been searching for Gradient Descent practical understanding. I hit the jack pot. Thx a lot. Question --> If we can find the predicted function through manual hit and trial, what's the purpose using the so called predefined ML models ?
Pritam, its a good question. With random hit and trial it can take so many permutation and combination and computing power and you will still not reach the answer. It is like you are trying to predict someone's 12 character password, can you do that effectively? Answer is nope. But instead everytime you predict someone's password, lets say I give you a hint on how right or wrong you are. for example I will let you you got your first 2 character right. Now your trial and error strategy is optimized and you can reach your answer in efficient way. GD is exactly that, every step on your trail and error it tells you how wrong you are and gives you a direction or hint on your next trail.
While doing the Gradient descent exercise, overflow error is occurring. Is this because the data frame is not totally uniform? Or is there something I can do about it?
how you got the graph of gradient descent in jupyter notebook??
i think gradient_descent(x,y) in your jupyter code is not a built in function.
Sir my question is I have gone through some of this ML series , and I feel like I can understand concepts, but problem is I am new to coding , I from another field I need to learn coding . Can you suggest, what I need to learn first for ML .
i wasted 2 days of my time to learn this..... love it.... blast
Little hard for me! I cant do the exercise myself. But 100% sure no one will teach easier than this in the world. Keep doing it love you lot!!!!!!!!!
awesome explanation. plz keep it up ..... also appreciate how you credit others for their work, that's very rare
really nice and simple explanation. i tried the practice question and the cost function is stuck at 31.something with 'm'=1.04 , 'b'=0.001, "learning_rate" = 0.0002 and not reducing so i guess the number of iterations needed is around 50. Am i correct or i did something wrong.
Insightful!
Deep understanding of ML is necessary. You explained it very well
Watching this at 2x, like if you are too 😂
I am not a native English speeker, normal speed is fine for me😂
Clearly broken down concepts, very very good video, thank you for this amazing guide!
Glad it was helpful!
Your Explanation is great.. Its really helpful
Very effective teaching. Thanks for the videos.
sir gradient descent work only for simple linear regression or can i apply to multiple linear regression also
You use for loop on y_predict to find value of cost but there is only one value of y_predict so why did you use for loop? You can do it in simple way by taking sqare of y_predict ! Right?
Sharp, to the point, succinct. Great stuff!
Easy for me to learn ML from ur video
sir followed your tutorial but getting runtime warning overflow error in python how to correct that
how to plot those iterations/ visual representations at 26:42.
thanks for the machine learning playlist.