Corrections: 9:03. The values for the intercept and slope should be the most recent estimates, 0.86 and 0.68, instead of the original random values, 0 and 1. 9:33 the slope should be 0.7. Support StatQuest by buying my book The StatQuest Illustrated Guide to Machine Learning or a Study Guide or Merch!!! statquest.org/statquest-store/
I came down to the comments to check if I am right and thanks god I am right :) I have another question regarding the new data sample what if this new data sample is an outlier? the step will make the line fit to this new point only and the old samples will be ignored Do we need to add a check for the outliers before we apply the stochastic gradient descent?
I love it that you only go slow, and really slow, and do not assume people have understood everything and skip steps at all. Truly a wonderful explanation for a seemingly hard-to-grasp stuff! Keep up your good work!!
Every time I type some notion and one of your videos pops out, I know the probability of understanding that notion is 100%, and that the effort function is already minimized so I quickly converge towards optimal comprehension :D Hail to great JS !
This man should be awarded !! I feel ML is a simple math putted out it in a complicated way, and people like Josh exactly pop in there and makes the math simply understandable....! And ofc his teachings are BAAAAAAAAAAM! Cant belive taking his wisdom which was made past 4 years!
Our University's professor just screenshoted the whole gradient descent and stochastic gardient descent video and that was our notes for the topic😶🌫😶🌫, should've just pasted the link to Josh's video tbfr😮💨😮💨
This is like my second video on your channel and holy moly everything you explain is so clear and just clicks in my head. I truly appreciate this, you are a blessing to the learners
Thank you Mr. Josh. Your videos are really game changers. I love them and your songs even more. I will buy so much of your merchandise when I am employed
@@statquest Btw, what is the criteria for the mini-batch stochastic gradient descent. Like if I have a number of data, should I group the ones whose values are closely related E.g.: 10 , 30, 69, 38, 59, 16 Then I need to group them into groups --> (10, 16) (30, 38) (59, 69) and randomly select an element in a group then do the math Or I just go with three random data point and do the math? Thanks!
I am here to study for an exam I'll have soon, and you are saving me lots of time. Plus, it's so much more entertaining than my incomprehensible slides ! THANK YOU !
I am super dunked off of vodka and coffee right now and I feel like I just understood every complex maths class I've ever taken before. I understand now! THANK YOU! even my impaired mind can comprehend this at X2 speed.
I am sooo sad that I did not find this channel sooner, but now I know what I'm going to do the next weeks or month :) Great job! Really informative videos!
Man, when you love something you could achieve great things. This lecture is for Garduate and Doctorants level. I would say even a high school student can understand this, and this is the trick! Sorry but not everyone can do it... Joshua, congratulations!
Another solid video. Thanks a million!! I had to go through a bit of problem solving to neatly wrap my functions and compare the execution times in Python. But yeah, I found that on the same data set (small in size) regular gradient descent (batch gradient descent) was faster but was less accurate than Stochastic gradient descent in calculating the slope & intercept for the line of best fit. My Example: - 13 data points - Solved for Slope & Intercept using both types of gradient descent - Used Sum of squared residuals derivative Batch Gradient Descent Time = 0.0967 s Stochastic Gradient Descent Time = 1.2740 s Linear Regression function from scipy stats = 0.0015 s
dude....you are funny....honestly.....i didnt think i would laugh while learning about SGD....i was suprised, entertained and amazed by your video. thanks for that. now i go back to writing my stuff :D
Great videos. Suggestions for future videos: Kernel / Support Vector Machines. ICA (Independent Component Analysis). SOM (Self-Organizing Map). Convolutional Nets. Backpropagation algs for NN training.
Seriously man this was the first video i watched on your channel and you are amazing. I am going to watch all the videos now Thank you for sharing your knowledge
Thank you for explaining this clearly. Your videos are easy to understand. Thank you so much. Please make a video on SGD with momentum and issues of SGD with saddle points.
Math is not that hard to understand when it's explained properly. For me, this concept went from being super complex to something super simple and logical. Thanks for all the work you put in these videos, you explained stuff in a magnificent way.
Your channel is super incredible, it has helped me a lot and I always recommended to everybody! What about a statquest of time series analysis? Pleaseeeeeee! thanks! :) Triple BAAAAAM!!!
I really appreciate your efforts and love your way of explaining complicated concepts in Stats and ML in a calm and cool way. Related to this SGD video I have this doubt. My understanding of epoch is that it tells how many times the learning algorithm has seen the training data. An iteration corresponds to one pass-through parameters updation in the gradient descent algorithm. Batch size = # training samples used to express the loss function. Iterations per epoch = # training samples(N) / batch size. In SGD since one random sample is being chosen at each step therefore batch size = 1. N iterations per epoch would be there and these N iterations may involve repeated sampling of some training data points. My question is in a deep neural network, is SGD slower than batch gradient descent because in SGD optimizer case frequent forward and backward propagation occurs, and more iterations per epoch are there? And is it the opposite in case we use SGD for linear regression in ml ? Thank You
Using the data in batches is usually the most efficient approach since you can align the amount of data used with the amount of high-speed memory available for it.
Hi Josh, Sorry just a small comment (I hope you don't mind), it might be good to add an annotation for the errata of a video directly onto the video itself at the relevant time stamp, this way students will not accidentally miss the errata note. Kind regards, Ben
That's a great idea. However, there is no way to do that right now. UA-cam used to have a feature that allowed that, and it was awesome. But they took it away because they said it did not work with mobile. They said they were working on a replacement, but that was years ago and have not mentioned it since.
Hello Sir, When we use mini batch gradient descent, we choose the mini batch randomly or it is selected in the sequence the original data was divided into small mini batches? Thank you
Hi Josh, when you calculated the slopes by using new sample value at around 9:12, I am wondering why 0 is plugged into as intercept and 1 as the slope coefficient (which are initial guess) instead of using the most recent estimated intercept and slope coefficient (intercept = 0.86 and slope = 0.68). Thanks so much for your help!
You're right. Josh probably forgot to update that in the example. Plug in those values and you'd get an even more accurate value for intercept and slope
These videos are amazing. When adding a new sample, it looked like a bit of an outlier compared to the clusters where you took the original random points. So how much weight do you give one new sample compared to random values from tightly packed clusters?
I have recently come across the concept of the Stochastic Gradient Ascent (SGA). Do you happen to know how the Stochastic Gradient Descent (SGD) method is related to the SGA? I assume that the SGA attempts to 'maximize' the loss function unlike the SGD, but I am unsure about the reason(s) why someone would want to maximize the loss function.
I don't think you'd apply gradient ascent to a loss function. Instead, you might apply it to a likelihood function, where you want to maximize the likelihood of an estimator or something like that.
Thanks for the great video Josh! Just a quick question, why is it so that at 9:02, the values for the intercept and slope in the derivatives are the original random values of 0 and 1 instead of the most recent estimates of 0.86 and 0.68?
That's just a typo. I've not included a note about this in the video's description. Unfortunately UA-cam will not let me edit videos after they are posted.
Corrections:
9:03. The values for the intercept and slope should be the most recent estimates, 0.86 and 0.68, instead of the original random values, 0 and 1.
9:33 the slope should be 0.7.
Support StatQuest by buying my book The StatQuest Illustrated Guide to Machine Learning or a Study Guide or Merch!!! statquest.org/statquest-store/
Nice to see this upon getting confused XD. Great Job!
At 9:40, in the top right corner : ...and the new line ... slope 0.07 is a typo too. Should be 0.7!
@@louislesage3856 You are correct. Dang, I hate typos. ;)
I came down to the comments to check if I am right and thanks god I am right :)
I have another question regarding the new data sample
what if this new data sample is an outlier?
the step will make the line fit to this new point only and the old samples will be ignored
Do we need to add a check for the outliers before we apply the stochastic gradient descent?
@@adelsalaheldeen You should always check for outliers, no matter what you are doing.
I love it that you only go slow, and really slow, and do not assume people have understood everything and skip steps at all. Truly a wonderful explanation for a seemingly hard-to-grasp stuff! Keep up your good work!!
Thank you!
Without this channel... Machine learning is incomplete...
MEGAAAA BAMMMMM
Can u tell me about batch gradient descent
Any video ?????????
Pretty sure you mean... MECHA BAM! I’ll see myself out...
@@dani123456785 ruder.io/optimizing-gradient-descent/index.html#batchgradientdescent
Every time I type some notion and one of your videos pops out, I know the probability of understanding that notion is 100%, and that the effort function is already minimized so I quickly converge towards optimal comprehension :D
Hail to great JS !
This is awesome!!! Can I quote you on my website?
This man should be awarded !!
I feel ML is a simple math putted out it in a complicated way, and people like Josh exactly pop in there and makes the math simply understandable....!
And ofc his teachings are BAAAAAAAAAAM!
Cant belive taking his wisdom which was made past 4 years!
Thanks!
this video is so much better than what we have in university. Thank you man, you are a legend
Thank you!
Our University's professor just screenshoted the whole gradient descent and stochastic gardient descent video and that was our notes for the topic😶🌫😶🌫, should've just pasted the link to Josh's video tbfr😮💨😮💨
Best explanation ever. at first I was sceptical but the BAMs kinda grow on you after a while :)
Nice! :)
You are the best ML instructor I have so far come across !!!
Wow, thanks!
This is like my second video on your channel and holy moly everything you explain is so clear and just clicks in my head. I truly appreciate this, you are a blessing to the learners
Awesome! Thank you very much! :)
You are great. I'm glad I found you. Whenever I get stuck with the theory of something, you're there to help.
bam! :)
Another great and simple video. It's always a pleasure to see that there is a stat-quest video about a thing I'm looking for. Thank you!
Glad you enjoyed it!
This is gold, appreciate it! I really like how you take things one step at a time. It helps me understand better!! BAM!!!
BAM! :)
Bro, I don't know how you did it. You are gooood!
Your subscribers increased like crazzy since the last time I came here too!
Thank you very much! Hooray! The channel is growing and that is very exciting for me. It's an inspiration to keep making videos. :)
Just leave a mark here to appreciate every work you did Mr. Josh, thank you very much
Thank you! :)
Thank you Mr. Josh. Your videos are really game changers. I love them and your songs even more. I will buy so much of your merchandise when I am employed
BAM!
Super helpful, taught me more than my uni prof, your teaching method is effective and hilarious at the same time.
Thanks! :)
@@statquest Btw, what is the criteria for the mini-batch stochastic gradient descent.
Like if I have a number of data, should I group the ones whose values are closely related
E.g.: 10 , 30, 69, 38, 59, 16
Then I need to group them into groups --> (10, 16) (30, 38) (59, 69) and randomly select an element in a group then do the math
Or I just go with three random data point and do the math?
Thanks!
best channel for machine learning with quality content
Thank you! :)
I am here to study for an exam I'll have soon, and you are saving me lots of time. Plus, it's so much more entertaining than my incomprehensible slides ! THANK YOU !
Good luck on your exam! :)
@@statquest Thank you so much !! Please keep it up with your amazing videos !
How did the exam go?
@@statquest you are the nicest person in this world! It is on Thursday so we will see :oo
@@statquest I passed my exam and got my bachelor this year!! Thank you so much!
I am super dunked off of vodka and coffee right now and I feel like I just understood every complex maths class I've ever taken before. I understand now! THANK YOU! even my impaired mind can comprehend this at X2 speed.
bam!
I am sooo sad that I did not find this channel sooner, but now I know what I'm going to do the next weeks or month :) Great job! Really informative videos!
Awesome! Thank you!
good name you got there lol
I got your book last week from Amazon. It's incredible. Thanks for your work
TRIPLE BAM! Thank you so much! :)
Best explanation in the shortest time possible
Thank you!
I'm literally liking this video and commenting after the intro song. Well done!!
BAM! :)
This channel is a GEM
Thank you!
I'm writing my thesis, and you are my hero
Good luck! :)
I'm not sure what I like more: the clear examples or Josh's silly smooth voice on the "double bam"
Dude you’re just freakin good at explaining this stuff
Thank you! :)
Man, when you love something you could achieve great things. This lecture is for Garduate and Doctorants level. I would say even a high school student can understand this, and this is the trick! Sorry but not everyone can do it... Joshua, congratulations!
@@mohammedouallal2 Thank you very much! :)
Another solid video. Thanks a million!! I had to go through a bit of problem solving to neatly wrap my functions and compare the execution times in Python. But yeah, I found that on the same data set (small in size) regular gradient descent (batch gradient descent) was faster but was less accurate than Stochastic gradient descent in calculating the slope & intercept for the line of best fit.
My Example:
- 13 data points
- Solved for Slope & Intercept using both types of gradient descent
- Used Sum of squared residuals derivative
Batch Gradient Descent Time = 0.0967 s
Stochastic Gradient Descent Time = 1.2740 s
Linear Regression function from scipy stats = 0.0015 s
Very cool! :)
Josh, you make everything easy to understand! Many Thanks!
Thank you!
dude....you are funny....honestly.....i didnt think i would laugh while learning about SGD....i was suprised, entertained and amazed by your video. thanks for that. now i go back to writing my stuff :D
Glad you enjoyed it!
Better than any stats class I’ve ever taken
Thanks! :)
you are born to explain ML !
Thanks!
please upload every day, you R my Machine Learning Hero
Thank you!
You're a special kind of awesome! I have learned so much from your videos! Thank you!
Thanks!
This + bandcamp?? Dude you are my hero
Thanks! :)
One word ! Revolutionary
Lots of love from India 🇮🇳
Bam 💥
Thank you! :)
Thanks for these amazing videos and especially for the smile you bring on my face with each BAM :)
Thank you so much! :)
Josh Starmer from the Statquest is coming in clutch for the examination!
Good luck!
I think I am addicted to ML after following your channel
Nice! :)
Brilliant explanantion. I was forgetting one little thing which was bugging me about SGD. This helped alot!
Awesome! :)
StatQuest you're YYDS !!!!
Thank you!
Short and Clear explanation. Thanks a lot!!!
Thanks!
Triple BAM!? 💥 My heart can’t take it! Quest on. 👍
Great videos. Suggestions for future videos: Kernel / Support Vector Machines. ICA (Independent Component Analysis). SOM (Self-Organizing Map). Convolutional Nets. Backpropagation algs for NN training.
Yes, eagerly waiting for the video on Support vector machines.
Thanks for your work! Your explanation is well thought, clear, and entertaining.
You're very welcome!
Can not appreciate your channel more!!! CAN NOT!
Thank you! :)
You just made machine learning look so simple
BAAAAAAMMMMMMMMM
Thanks!
Seriously man this was the first video i watched on your channel and you are amazing. I am going to watch all the videos now
Thank you for sharing your knowledge
5 seconds into the intro: *smashes subscribe*
BAM!!!
super , super ,super explanation...till watching this video , I was very much confused with GD.Thanks alot
Thanks! :)
Thank you for explaining this clearly. Your videos are easy to understand. Thank you so much. Please make a video on SGD with momentum and issues of SGD with saddle points.
Very helpful! Everything is clear and well explained: super BAM!
Thank you! :)
Math is not that hard to understand when it's explained properly. For me, this concept went from being super complex to something super simple and logical. Thanks for all the work you put in these videos, you explained stuff in a magnificent way.
Thank you for helping with my PhD research!
Good luck with your PhD! :)
Super effective instructional approach...best wishes
Thank you very much! :)
I would love if you make a series/a playlist of all the basics of Machine Learning videos. Found the best channel for ML
See: statquest.org/video-index/#machine
@@statquest 0.o you are a savior!!!
The musical intro was LIT
Your channel is super incredible, it has helped me a lot and I always recommended to everybody! What about a statquest of time series analysis? Pleaseeeeeee! thanks! :) Triple BAAAAAM!!!
Time series is on the to-do list. It will still be awhile before I get to it, but I'll do my best.
@@statquest Super BAAAAM! thanks for the answer! It will be life changing! :) keep up the amazing work!
brief and clear explanation, great
Glad you liked it!
bro is a savior
:)
Just ordered a red Tshirt from you. Thanks for the great work.
Awesome, thank you!
Thank you. It's very very very clear and helpful.
Thanks!
Your videos are simply amazing. A big thank you!!!
Thanks! :)
Thanks for clearly explaining stochastic gradient descent. :)
Thanks!
Thank you.. Your video is very helpful in breaking down the concepts to basics :)
Glad it was helpful!
I really appreciate your efforts and love your way of explaining complicated concepts in Stats and ML in a calm and cool way. Related to this SGD video I have this doubt.
My understanding of epoch is that it tells how many times the learning algorithm has seen the training data. An iteration corresponds to one pass-through parameters updation in the gradient descent algorithm. Batch size = # training samples used to express the loss function. Iterations per epoch = # training samples(N) / batch size. In SGD since one random sample is being chosen at each step therefore batch size = 1. N iterations per epoch would be there and these N iterations may involve repeated sampling of some training data points.
My question is in a deep neural network, is SGD slower than batch gradient descent because in SGD optimizer case frequent forward and backward propagation occurs, and more iterations per epoch are there? And is it the opposite in case we use SGD for linear regression in ml ?
Thank You
Using the data in batches is usually the most efficient approach since you can align the amount of data used with the amount of high-speed memory available for it.
Clearly explained indeed! Great video!
Thank you! :)
Prof.BAM is very impressive!
Thanks! :)
Fantastic explanation, Man!!👍👍
Thank you! :)
I could frankly say I learned the theory of stats from u.
double bam! :)
@Josh, why would people dislike these videos I wonder ! The SGD is a cost saver on large datasets.
bam!
Hi Mr. Stramer, i hope you have a wonderful day.
Thank you!
Thanks man. It looks easy when learned from your channel.
Bam! :)
Hi Josh,
Sorry just a small comment (I hope you don't mind), it might be good to add an annotation for the errata of a video directly onto the video itself at the relevant time stamp, this way students will not accidentally miss the errata note.
Kind regards,
Ben
That's a great idea. However, there is no way to do that right now. UA-cam used to have a feature that allowed that, and it was awesome. But they took it away because they said it did not work with mobile. They said they were working on a replacement, but that was years ago and have not mentioned it since.
@@statquest Aw that is unfortunate thanks anyway Josh
I give it 1 years and this will become default channel to complement uni courses
That would be awesome! :)
Excellent Explanation.Thank You
Thanks! :)
Hey Man you are awesome. Please make videos about more sophisticated deep learning models, CNN, RNN, and Reinforced learning
Thank you! I'm always working on new stuff and excited about what's coming up.
Excellent and Informative and Bam!!!!!!!!!!!!!
BAM! :)
What do I get when completing all your quests? 🤗
So much thank you at this point too!
TRIPLE BAM!!! :)
I have to admit its is clearly explained!! amaziing
bam!
@@statquest double bam. Ow no wait its a triple bam
@@sophie0010 YES!
Great tutorial,loved it!
Thanks! :)
Great job! Thanks Josh!
Definitely better than reading my black and white book full of jargons
:)
excellent explanation.very helpful.
Thanks! :)
Thanks a lot josh for this video. can you please make a video on sgd with momentum.
I'll keep that in mind.
Very Cool JOHN !
Thank You !
Thanks! :)
how are you so good at explaining
:)
Thank you for such a good explanation!!
Thanks! :)
Hello Sir,
When we use mini batch gradient descent, we choose the mini batch randomly or it is selected in the sequence the original data was divided into small mini batches?
Thank you
You can do it either way.
@@statquest
Thank you Sir.
You are a golden god
Thanks! :)
Thank u very much. Really clear explanation!
Glad you liked it!
You made my day thanks!
Hooray!
Hi Josh, when you calculated the slopes by using new sample value at around 9:12, I am wondering why 0 is plugged into as intercept and 1 as the slope coefficient (which are initial guess) instead of using the most recent estimated intercept and slope coefficient (intercept = 0.86 and slope = 0.68). Thanks so much for your help!
You're right. Josh probably forgot to update that in the example. Plug in those values and you'd get an even more accurate value for intercept and slope
Cool. Thanks for your reply!
Haz H is correct. That's just a typo.
These videos are amazing.
When adding a new sample, it looked like a bit of an outlier compared to the clusters where you took the original random points. So how much weight do you give one new sample compared to random values from tightly packed clusters?
I'm pretty sure all values are given equal weights.
Josh, could you please explain the difference between GBM, Xgboost and Light GBM, etc?
Really nice. Subscribed.
Thank you!
Good, vert good. Now the follow explanation is SPGD. I really wait for that.
I'll keep that in mind.
I have recently come across the concept of the Stochastic Gradient Ascent (SGA). Do you happen to know how the Stochastic Gradient Descent (SGD) method is related to the SGA? I assume that the SGA attempts to 'maximize' the loss function unlike the SGD, but I am unsure about the reason(s) why someone would want to maximize the loss function.
I don't think you'd apply gradient ascent to a loss function. Instead, you might apply it to a likelihood function, where you want to maximize the likelihood of an estimator or something like that.
Thanks for the great video Josh!
Just a quick question, why is it so that at 9:02, the values for the intercept and slope in the derivatives are the original random values of 0 and 1 instead of the most recent estimates of 0.86 and 0.68?
That's just a typo. I've not included a note about this in the video's description. Unfortunately UA-cam will not let me edit videos after they are posted.
Nice you explained that clearly 👌👍🙂
Thank you 🙂
I gotta say this channel is amazing. Its especially nice as an amazing complement to the math side i learn in university.
Wow, thanks!