Stochastic Gradient Descent, Clearly Explained!!!

Поділитися
Вставка
  • Опубліковано 3 січ 2025

КОМЕНТАРІ • 548

  • @statquest
    @statquest  5 років тому +59

    Corrections:
    9:03. The values for the intercept and slope should be the most recent estimates, 0.86 and 0.68, instead of the original random values, 0 and 1.
    9:33 the slope should be 0.7.
    Support StatQuest by buying my book The StatQuest Illustrated Guide to Machine Learning or a Study Guide or Merch!!! statquest.org/statquest-store/

    • @yifanzhu8999
      @yifanzhu8999 4 роки тому +3

      Nice to see this upon getting confused XD. Great Job!

    • @louislesage3856
      @louislesage3856 4 роки тому +1

      At 9:40, in the top right corner : ...and the new line ... slope 0.07 is a typo too. Should be 0.7!

    • @statquest
      @statquest  4 роки тому

      @@louislesage3856 You are correct. Dang, I hate typos. ;)

    • @adelsalaheldeen
      @adelsalaheldeen 4 роки тому

      I came down to the comments to check if I am right and thanks god I am right :)
      I have another question regarding the new data sample
      what if this new data sample is an outlier?
      the step will make the line fit to this new point only and the old samples will be ignored
      Do we need to add a check for the outliers before we apply the stochastic gradient descent?

    • @statquest
      @statquest  4 роки тому +1

      @@adelsalaheldeen You should always check for outliers, no matter what you are doing.

  • @letranminhkhoa7492
    @letranminhkhoa7492 Рік тому +67

    I love it that you only go slow, and really slow, and do not assume people have understood everything and skip steps at all. Truly a wonderful explanation for a seemingly hard-to-grasp stuff! Keep up your good work!!

  • @rrrprogram8667
    @rrrprogram8667 5 років тому +427

    Without this channel... Machine learning is incomplete...
    MEGAAAA BAMMMMM

    • @dani123456785
      @dani123456785 5 років тому

      Can u tell me about batch gradient descent

    • @dani123456785
      @dani123456785 5 років тому

      Any video ?????????

    • @Ertihan
      @Ertihan 5 років тому +1

      Pretty sure you mean... MECHA BAM! I’ll see myself out...

    • @sandyjust
      @sandyjust 5 років тому

      @@dani123456785 ruder.io/optimizing-gradient-descent/index.html#batchgradientdescent

  • @Privacy-LOST
    @Privacy-LOST 5 років тому +97

    Every time I type some notion and one of your videos pops out, I know the probability of understanding that notion is 100%, and that the effort function is already minimized so I quickly converge towards optimal comprehension :D
    Hail to great JS !

    • @statquest
      @statquest  5 років тому +23

      This is awesome!!! Can I quote you on my website?

  • @vamsikrishna-j9n
    @vamsikrishna-j9n 10 місяців тому +6

    This man should be awarded !!
    I feel ML is a simple math putted out it in a complicated way, and people like Josh exactly pop in there and makes the math simply understandable....!
    And ofc his teachings are BAAAAAAAAAAM!
    Cant belive taking his wisdom which was made past 4 years!

  • @xfadl
    @xfadl 2 роки тому +35

    this video is so much better than what we have in university. Thank you man, you are a legend

    • @statquest
      @statquest  2 роки тому +3

      Thank you!

    • @walterpinkmantanay1577
      @walterpinkmantanay1577 2 роки тому +2

      Our University's professor just screenshoted the whole gradient descent and stochastic gardient descent video and that was our notes for the topic😶‍🌫😶‍🌫, should've just pasted the link to Josh's video tbfr😮‍💨😮‍💨

  • @danmartin7198
    @danmartin7198 5 років тому +177

    Best explanation ever. at first I was sceptical but the BAMs kinda grow on you after a while :)

  • @ramankumar41
    @ramankumar41 Рік тому +3

    You are the best ML instructor I have so far come across !!!

  • @khadajhin4019
    @khadajhin4019 4 роки тому +25

    This is like my second video on your channel and holy moly everything you explain is so clear and just clicks in my head. I truly appreciate this, you are a blessing to the learners

    • @statquest
      @statquest  4 роки тому +1

      Awesome! Thank you very much! :)

  • @orioncloud4573
    @orioncloud4573 3 роки тому +3

    You are great. I'm glad I found you. Whenever I get stuck with the theory of something, you're there to help.

  • @gilao
    @gilao 7 місяців тому +1

    Another great and simple video. It's always a pleasure to see that there is a stat-quest video about a thing I'm looking for. Thank you!

    • @statquest
      @statquest  7 місяців тому +1

      Glad you enjoyed it!

  • @Pwaing
    @Pwaing 11 місяців тому +3

    This is gold, appreciate it! I really like how you take things one step at a time. It helps me understand better!! BAM!!!

  • @shivujagga
    @shivujagga 5 років тому +9

    Bro, I don't know how you did it. You are gooood!
    Your subscribers increased like crazzy since the last time I came here too!

    • @statquest
      @statquest  5 років тому +4

      Thank you very much! Hooray! The channel is growing and that is very exciting for me. It's an inspiration to keep making videos. :)

  • @insanaprilian8184
    @insanaprilian8184 4 роки тому +4

    Just leave a mark here to appreciate every work you did Mr. Josh, thank you very much

  • @pavanvamsitadikonda3843
    @pavanvamsitadikonda3843 3 роки тому +6

    Thank you Mr. Josh. Your videos are really game changers. I love them and your songs even more. I will buy so much of your merchandise when I am employed

  • @vuphong2003
    @vuphong2003 5 років тому +24

    Super helpful, taught me more than my uni prof, your teaching method is effective and hilarious at the same time.

    • @statquest
      @statquest  5 років тому +2

      Thanks! :)

    • @vuphong2003
      @vuphong2003 5 років тому +1

      @@statquest Btw, what is the criteria for the mini-batch stochastic gradient descent.
      Like if I have a number of data, should I group the ones whose values are closely related
      E.g.: 10 , 30, 69, 38, 59, 16
      Then I need to group them into groups --> (10, 16) (30, 38) (59, 69) and randomly select an element in a group then do the math
      Or I just go with three random data point and do the math?
      Thanks!

  • @abhisheksaxena6413
    @abhisheksaxena6413 4 роки тому +2

    best channel for machine learning with quality content

  • @charlinemontial9217
    @charlinemontial9217 5 років тому +3

    I am here to study for an exam I'll have soon, and you are saving me lots of time. Plus, it's so much more entertaining than my incomprehensible slides ! THANK YOU !

    • @statquest
      @statquest  5 років тому +2

      Good luck on your exam! :)

    • @charlinemontial9217
      @charlinemontial9217 5 років тому +2

      @@statquest Thank you so much !! Please keep it up with your amazing videos !

    • @statquest
      @statquest  5 років тому +2

      How did the exam go?

    • @charlinemontial9217
      @charlinemontial9217 5 років тому +1

      @@statquest you are the nicest person in this world! It is on Thursday so we will see :oo

    • @charlinemontial9217
      @charlinemontial9217 5 років тому +4

      @@statquest I passed my exam and got my bachelor this year!! Thank you so much!

  • @abigailbarton195
    @abigailbarton195 3 роки тому +2

    I am super dunked off of vodka and coffee right now and I feel like I just understood every complex maths class I've ever taken before. I understand now! THANK YOU! even my impaired mind can comprehend this at X2 speed.

  • @tekkkkkkkkkkk
    @tekkkkkkkkkkk 4 роки тому +6

    I am sooo sad that I did not find this channel sooner, but now I know what I'm going to do the next weeks or month :) Great job! Really informative videos!

  • @sgiri2012
    @sgiri2012 4 місяці тому +1

    I got your book last week from Amazon. It's incredible. Thanks for your work

    • @statquest
      @statquest  4 місяці тому

      TRIPLE BAM! Thank you so much! :)

  • @alihaider2655
    @alihaider2655 2 роки тому +2

    Best explanation in the shortest time possible

  • @thebiggerpicture__
    @thebiggerpicture__ 2 роки тому +1

    I'm literally liking this video and commenting after the intro song. Well done!!

  • @hungrywaffle123
    @hungrywaffle123 2 роки тому +1

    This channel is a GEM

  • @SarcasticOnion
    @SarcasticOnion 3 роки тому +1

    I'm writing my thesis, and you are my hero

  • @philrobinson2924
    @philrobinson2924 5 років тому +11

    I'm not sure what I like more: the clear examples or Josh's silly smooth voice on the "double bam"

  • @Yangselw
    @Yangselw 4 роки тому +7

    Dude you’re just freakin good at explaining this stuff

    • @statquest
      @statquest  4 роки тому

      Thank you! :)

    • @mohammedouallal2
      @mohammedouallal2 4 роки тому +1

      Man, when you love something you could achieve great things. This lecture is for Garduate and Doctorants level. I would say even a high school student can understand this, and this is the trick! Sorry but not everyone can do it... Joshua, congratulations!

    • @statquest
      @statquest  4 роки тому

      @@mohammedouallal2 Thank you very much! :)

  • @j8ahmed
    @j8ahmed 4 роки тому +1

    Another solid video. Thanks a million!! I had to go through a bit of problem solving to neatly wrap my functions and compare the execution times in Python. But yeah, I found that on the same data set (small in size) regular gradient descent (batch gradient descent) was faster but was less accurate than Stochastic gradient descent in calculating the slope & intercept for the line of best fit.
    My Example:
    - 13 data points
    - Solved for Slope & Intercept using both types of gradient descent
    - Used Sum of squared residuals derivative
    Batch Gradient Descent Time = 0.0967 s
    Stochastic Gradient Descent Time = 1.2740 s
    Linear Regression function from scipy stats = 0.0015 s

  • @eulerthegreatestofall147
    @eulerthegreatestofall147 2 роки тому +2

    Josh, you make everything easy to understand! Many Thanks!

  • @JtotheAKOB
    @JtotheAKOB 4 роки тому +2

    dude....you are funny....honestly.....i didnt think i would laugh while learning about SGD....i was suprised, entertained and amazed by your video. thanks for that. now i go back to writing my stuff :D

  • @nataliebogda6554
    @nataliebogda6554 4 роки тому +2

    Better than any stats class I’ve ever taken

  • @barankaplan4308
    @barankaplan4308 4 роки тому +1

    you are born to explain ML !

  • @ibanguniverse811
    @ibanguniverse811 5 років тому +2

    please upload every day, you R my Machine Learning Hero

  • @thomasbates9189
    @thomasbates9189 Рік тому +1

    You're a special kind of awesome! I have learned so much from your videos! Thank you!

  • @stevenicholes5649
    @stevenicholes5649 5 років тому +4

    This + bandcamp?? Dude you are my hero

  • @riteish01
    @riteish01 2 роки тому +1

    One word ! Revolutionary
    Lots of love from India 🇮🇳
    Bam 💥

  • @samakh61
    @samakh61 5 років тому +3

    Thanks for these amazing videos and especially for the smile you bring on my face with each BAM :)

    • @statquest
      @statquest  5 років тому

      Thank you so much! :)

  • @TheClockmister
    @TheClockmister 5 днів тому +1

    Josh Starmer from the Statquest is coming in clutch for the examination!

  • @snarsule
    @snarsule 5 років тому +1

    I think I am addicted to ML after following your channel

  • @Beenum1515
    @Beenum1515 5 років тому +2

    Brilliant explanantion. I was forgetting one little thing which was bugging me about SGD. This helped alot!

  • @astromq8870
    @astromq8870 10 місяців тому +2

    StatQuest you're YYDS !!!!

  • @parvathyprathap4344
    @parvathyprathap4344 3 роки тому +1

    Short and Clear explanation. Thanks a lot!!!

  • @SurrenderPink
    @SurrenderPink 5 років тому +6

    Triple BAM!? 💥 My heart can’t take it! Quest on. 👍

  • @pat4rush
    @pat4rush 5 років тому +3

    Great videos. Suggestions for future videos: Kernel / Support Vector Machines. ICA (Independent Component Analysis). SOM (Self-Organizing Map). Convolutional Nets. Backpropagation algs for NN training.

    • @TheAbhiporwal
      @TheAbhiporwal 5 років тому +2

      Yes, eagerly waiting for the video on Support vector machines.

  • @thegt
    @thegt Рік тому +1

    Thanks for your work! Your explanation is well thought, clear, and entertaining.

  • @fivehuang7557
    @fivehuang7557 5 років тому +1

    Can not appreciate your channel more!!! CAN NOT!

  • @rajchoksi3533
    @rajchoksi3533 4 роки тому +4

    You just made machine learning look so simple
    BAAAAAAMMMMMMMMM

    • @statquest
      @statquest  4 роки тому

      Thanks!

    • @rajchoksi3533
      @rajchoksi3533 4 роки тому

      Seriously man this was the first video i watched on your channel and you are amazing. I am going to watch all the videos now
      Thank you for sharing your knowledge

  • @trevorfedyna1155
    @trevorfedyna1155 4 роки тому +11

    5 seconds into the intro: *smashes subscribe*

  • @maheshjayaraman6856
    @maheshjayaraman6856 4 роки тому +1

    super , super ,super explanation...till watching this video , I was very much confused with GD.Thanks alot

  • @gmayank32
    @gmayank32 5 років тому +2

    Thank you for explaining this clearly. Your videos are easy to understand. Thank you so much. Please make a video on SGD with momentum and issues of SGD with saddle points.

  • @giuliocipriani4659
    @giuliocipriani4659 4 роки тому +6

    Very helpful! Everything is clear and well explained: super BAM!

  • @lekjov6170
    @lekjov6170 5 років тому +3

    Math is not that hard to understand when it's explained properly. For me, this concept went from being super complex to something super simple and logical. Thanks for all the work you put in these videos, you explained stuff in a magnificent way.

  • @NuclearSpinach
    @NuclearSpinach 3 роки тому +1

    Thank you for helping with my PhD research!

    • @statquest
      @statquest  3 роки тому +1

      Good luck with your PhD! :)

  • @professorg000
    @professorg000 5 років тому +1

    Super effective instructional approach...best wishes

    • @statquest
      @statquest  5 років тому

      Thank you very much! :)

  • @isseym8592
    @isseym8592 2 роки тому

    I would love if you make a series/a playlist of all the basics of Machine Learning videos. Found the best channel for ML

    • @statquest
      @statquest  2 роки тому

      See: statquest.org/video-index/#machine

    • @isseym8592
      @isseym8592 2 роки тому +1

      @@statquest 0.o you are a savior!!!

  • @chyldstudios
    @chyldstudios 5 років тому +1

    The musical intro was LIT

  • @danielromero-alvarez5392
    @danielromero-alvarez5392 5 років тому +2

    Your channel is super incredible, it has helped me a lot and I always recommended to everybody! What about a statquest of time series analysis? Pleaseeeeeee! thanks! :) Triple BAAAAAM!!!

    • @statquest
      @statquest  5 років тому +3

      Time series is on the to-do list. It will still be awhile before I get to it, but I'll do my best.

    • @danielromero-alvarez5392
      @danielromero-alvarez5392 5 років тому

      @@statquest Super BAAAAM! thanks for the answer! It will be life changing! :) keep up the amazing work!

  • @shelo1747
    @shelo1747 2 роки тому +1

    brief and clear explanation, great

  • @zhiyuzhang7096
    @zhiyuzhang7096 Рік тому +1

    bro is a savior

  • @OlivierNayraguet
    @OlivierNayraguet 2 роки тому +1

    Just ordered a red Tshirt from you. Thanks for the great work.

  • @whaysdsdsd973
    @whaysdsdsd973 2 роки тому +1

    Thank you. It's very very very clear and helpful.

  • @manasatallam108
    @manasatallam108 5 років тому +1

    Your videos are simply amazing. A big thank you!!!

  • @dekroplay5373
    @dekroplay5373 2 роки тому +1

    Thanks for clearly explaining stochastic gradient descent. :)

  • @89rmehra
    @89rmehra 4 роки тому +1

    Thank you.. Your video is very helpful in breaking down the concepts to basics :)

  • @Shrikant_Anand
    @Shrikant_Anand 10 місяців тому +1

    I really appreciate your efforts and love your way of explaining complicated concepts in Stats and ML in a calm and cool way. Related to this SGD video I have this doubt.
    My understanding of epoch is that it tells how many times the learning algorithm has seen the training data. An iteration corresponds to one pass-through parameters updation in the gradient descent algorithm. Batch size = # training samples used to express the loss function. Iterations per epoch = # training samples(N) / batch size. In SGD since one random sample is being chosen at each step therefore batch size = 1. N iterations per epoch would be there and these N iterations may involve repeated sampling of some training data points.
    My question is in a deep neural network, is SGD slower than batch gradient descent because in SGD optimizer case frequent forward and backward propagation occurs, and more iterations per epoch are there? And is it the opposite in case we use SGD for linear regression in ml ?
    Thank You

    • @statquest
      @statquest  10 місяців тому

      Using the data in batches is usually the most efficient approach since you can align the amount of data used with the amount of high-speed memory available for it.

  • @reformed_attempt_1
    @reformed_attempt_1 4 роки тому +1

    Clearly explained indeed! Great video!

  • @fangf22
    @fangf22 4 роки тому +1

    Prof.BAM is very impressive!

  • @hemaswaroop7970
    @hemaswaroop7970 4 роки тому +1

    Fantastic explanation, Man!!👍👍

  • @orioncloud4573
    @orioncloud4573 3 роки тому +1

    I could frankly say I learned the theory of stats from u.

  • @millionwolves
    @millionwolves 3 роки тому +1

    @Josh, why would people dislike these videos I wonder ! The SGD is a cost saver on large datasets.

  • @darceysinclair8929
    @darceysinclair8929 2 роки тому +1

    Hi Mr. Stramer, i hope you have a wonderful day.

  • @GauravSingh-ku5xy
    @GauravSingh-ku5xy 4 роки тому +1

    Thanks man. It looks easy when learned from your channel.

  • @benphua
    @benphua 5 років тому +1

    Hi Josh,
    Sorry just a small comment (I hope you don't mind), it might be good to add an annotation for the errata of a video directly onto the video itself at the relevant time stamp, this way students will not accidentally miss the errata note.
    Kind regards,
    Ben

    • @statquest
      @statquest  5 років тому

      That's a great idea. However, there is no way to do that right now. UA-cam used to have a feature that allowed that, and it was awesome. But they took it away because they said it did not work with mobile. They said they were working on a replacement, but that was years ago and have not mentioned it since.

    • @benphua
      @benphua 5 років тому +1

      @@statquest Aw that is unfortunate thanks anyway Josh

  • @RazineBensari
    @RazineBensari 5 років тому +1

    I give it 1 years and this will become default channel to complement uni courses

    • @statquest
      @statquest  5 років тому

      That would be awesome! :)

  • @kiran082
    @kiran082 4 роки тому +1

    Excellent Explanation.Thank You

  • @AbedMotasemi
    @AbedMotasemi 5 років тому +1

    Hey Man you are awesome. Please make videos about more sophisticated deep learning models, CNN, RNN, and Reinforced learning

    • @statquest
      @statquest  5 років тому

      Thank you! I'm always working on new stuff and excited about what's coming up.

  • @Mars7822
    @Mars7822 2 роки тому +1

    Excellent and Informative and Bam!!!!!!!!!!!!!

  • @19SpeedFinger19
    @19SpeedFinger19 4 роки тому +2

    What do I get when completing all your quests? 🤗
    So much thank you at this point too!

  • @sophie0010
    @sophie0010 2 роки тому +1

    I have to admit its is clearly explained!! amaziing

    • @statquest
      @statquest  2 роки тому +1

      bam!

    • @sophie0010
      @sophie0010 2 роки тому +1

      @@statquest double bam. Ow no wait its a triple bam

    • @statquest
      @statquest  2 роки тому

      @@sophie0010 YES!

  • @meghnanatarajan6355
    @meghnanatarajan6355 4 роки тому +1

    Great tutorial,loved it!

  • @yulinliu850
    @yulinliu850 5 років тому +1

    Great job! Thanks Josh!

  • @smaug9833
    @smaug9833 4 роки тому +1

    Definitely better than reading my black and white book full of jargons

  • @DharmendraSingh-qv6nb
    @DharmendraSingh-qv6nb 5 років тому +1

    excellent explanation.very helpful.

  • @rishitjoshi8774
    @rishitjoshi8774 8 місяців тому +1

    Thanks a lot josh for this video. can you please make a video on sgd with momentum.

    • @statquest
      @statquest  8 місяців тому +1

      I'll keep that in mind.

  • @adityatrivediii
    @adityatrivediii 4 роки тому +1

    Very Cool JOHN !
    Thank You !

  • @efrdefrd10
    @efrdefrd10 3 роки тому +1

    how are you so good at explaining

  • @jaelbutler7966
    @jaelbutler7966 5 років тому +1

    Thank you for such a good explanation!!

  • @taruchitgoyal3735
    @taruchitgoyal3735 11 місяців тому +1

    Hello Sir,
    When we use mini batch gradient descent, we choose the mini batch randomly or it is selected in the sequence the original data was divided into small mini batches?
    Thank you

  • @mpkrass
    @mpkrass 4 роки тому +1

    You are a golden god

  • @muhammadumarsotvoldiev8768
    @muhammadumarsotvoldiev8768 2 роки тому +1

    Thank u very much. Really clear explanation!

  • @dmitricherleto8234
    @dmitricherleto8234 3 роки тому +1

    You made my day thanks!

  • @ashleychen6263
    @ashleychen6263 5 років тому +1

    Hi Josh, when you calculated the slopes by using new sample value at around 9:12, I am wondering why 0 is plugged into as intercept and 1 as the slope coefficient (which are initial guess) instead of using the most recent estimated intercept and slope coefficient (intercept = 0.86 and slope = 0.68). Thanks so much for your help!

    • @hazboy7328
      @hazboy7328 5 років тому +1

      You're right. Josh probably forgot to update that in the example. Plug in those values and you'd get an even more accurate value for intercept and slope

    • @ashleychen6263
      @ashleychen6263 5 років тому +2

      Cool. Thanks for your reply!

    • @statquest
      @statquest  5 років тому +1

      Haz H is correct. That's just a typo.

  • @mikehynz
    @mikehynz Рік тому

    These videos are amazing.
    When adding a new sample, it looked like a bit of an outlier compared to the clusters where you took the original random points. So how much weight do you give one new sample compared to random values from tightly packed clusters?

    • @statquest
      @statquest  Рік тому +1

      I'm pretty sure all values are given equal weights.

  • @mingyuanguan9979
    @mingyuanguan9979 5 років тому +26

    Josh, could you please explain the difference between GBM, Xgboost and Light GBM, etc?

  • @luthfishahab
    @luthfishahab 2 роки тому +1

    Really nice. Subscribed.

  • @Kvothe123
    @Kvothe123 2 роки тому

    Good, vert good. Now the follow explanation is SPGD. I really wait for that.

    • @statquest
      @statquest  2 роки тому

      I'll keep that in mind.

  • @SuperSerbia123
    @SuperSerbia123 5 місяців тому

    I have recently come across the concept of the Stochastic Gradient Ascent (SGA). Do you happen to know how the Stochastic Gradient Descent (SGD) method is related to the SGA? I assume that the SGA attempts to 'maximize' the loss function unlike the SGD, but I am unsure about the reason(s) why someone would want to maximize the loss function.

    • @statquest
      @statquest  5 місяців тому +1

      I don't think you'd apply gradient ascent to a loss function. Instead, you might apply it to a likelihood function, where you want to maximize the likelihood of an estimator or something like that.

  • @Lj-zn6ej
    @Lj-zn6ej 5 років тому +1

    Thanks for the great video Josh!
    Just a quick question, why is it so that at 9:02, the values for the intercept and slope in the derivatives are the original random values of 0 and 1 instead of the most recent estimates of 0.86 and 0.68?

    • @statquest
      @statquest  5 років тому +2

      That's just a typo. I've not included a note about this in the video's description. Unfortunately UA-cam will not let me edit videos after they are posted.

  • @youssefhunter5225
    @youssefhunter5225 3 роки тому +1

    Nice you explained that clearly 👌👍🙂

  • @tostupidforname
    @tostupidforname 4 роки тому +2

    I gotta say this channel is amazing. Its especially nice as an amazing complement to the math side i learn in university.