How Gradient Descent Works. Simple Explanation

Поділитися
Вставка
  • Опубліковано 22 сер 2024
  • Video explain what is gradient descent and how gradient descent works with a simple example. Basic intuition and explanation are revealed in the video.The content is:
    0:09 - What is Gradient Descent.
    0:30 - Example
    0:39 - Step no. 1. Start with a random point and find the gradient (derivative) of the given function.
    1:28 - Step no. 2. Set learning rate to get know how big should be a step to move forward on gradient descent to the opposite direction.
    1:47 - Step no. 3. Perform calculations on iterations.
    1:58 - Initialize parameters.
    2:20 - calculations on the 1st iteration.
    3:20 - calculations on the 2nd iteration.
    4:26 - till we reach global or local minimum.
    I hope that this explanation on how gradient descent works is useful for beginners of deep learning or good reminder for machine learning / deep learning experts.
    #deeplearning #gradientdescent #ai

КОМЕНТАРІ • 188

  • @lovekildetoft5658
    @lovekildetoft5658 3 роки тому +6

    All other videos on gradient descent are atleast 20 minutes long. This one is five, and made me understand more than any of those videos. Thank you!

  • @blendaguedes
    @blendaguedes 4 роки тому +45

    Sometimes we just need two loops to understand a whole. Thank you!

  • @uncoded0
    @uncoded0 2 роки тому +12

    Thank you! Many hours of trying to understand gradient decent, now I finally get it, thanks to this video. Thank you!

  • @anamikabhowmick6322
    @anamikabhowmick6322 3 роки тому +11

    This is one of the best and easiest way to learn and understand gradient descent, thank you so much for this

  • @impzhu3088
    @impzhu3088 3 роки тому +9

    That’s the way to explain a concept! Example with detailed steps. Thank you so much!

  • @riki2404
    @riki2404 3 роки тому +2

    Thank you for such a clear explanation. Short and precise. no unnecessary talk.

  • @DataScienceGarage
    @DataScienceGarage  5 років тому +4

    If you found useful in this video I highly recommend to check other related ones:
    -- Calculate Convolutional Layer Volume in ConvNet (ua-cam.com/video/3uEd0ErqGzU/v-deo.html)
    -- Adam. Rmsprop. Momentum. Optimization Algorithm. - Principles in Deep Learning (ua-cam.com/video/YacPECoI5SY/v-deo.html)
    -- Numpy Argsort - np.argsort() - function. Simple Example (ua-cam.com/video/6W8UHvn8ckg/v-deo.html)
    -- Python Regular Expression (RegEx). Extract Dates from Strings in Pandas DataFrame (ua-cam.com/video/E4avBXbNOGc/v-deo.html)

  • @wtry0067
    @wtry0067 4 роки тому +14

    It's very short and very useful.. I get clarity what I was looking for.
    Thanks once again.

  • @abdelrahmane657
    @abdelrahmane657 Рік тому

    Oh my god, you are excellent. You make the difference on UA-cam. Thank you so much. 🎉🙏👏🙌👌👍✌🏼

  • @hemantsah8567
    @hemantsah8567 4 роки тому +2

    It is easy... spent my 2 days on learning gradient descent.... then I came to your video... Thanks bro

  • @aalishanhunzai
    @aalishanhunzai 4 місяці тому

    Bro thank you so much for your efforts, couldn't find a more simple explanation of gradient descent than this one.

  • @sukanya4498
    @sukanya4498 2 роки тому +4

    Love this video ❤️, Very simple and precise! Thank you !

  • @Elementiah
    @Elementiah 2 місяці тому

    Thank you so much for this! This is the perfect explanation! 😄

  • @TingBie
    @TingBie 5 місяців тому

    Thanks for this example, simple and spot-on!

  • @Alex-pd5xc
    @Alex-pd5xc Рік тому

    wow dude, very clearly explained and you made it simple for me to understand. cheers man

  • @samvanoye
    @samvanoye 6 місяців тому

    Perfectly explained, thanks!

  • @alexbarq1900
    @alexbarq1900 2 роки тому +2

    I get the idea but any reason for not doing simple math to find the local min?
    dy/dx = 2(x+5)
    If we want to find the min, we just do dy/dx=0.. then:
    0 = 2(x+5)
    x = -5

  • @glenfernandes253
    @glenfernandes253 2 роки тому +1

    how do you know, how many iterations to run before reaching the global/local minimum, what if it reaches the minimum and starts climbing on the other side ?

  • @strykeregziadahmed9562
    @strykeregziadahmed9562 Рік тому

    2 hours in DL course i didnt get it
    5 min made my day this is how actually learning should be

  • @muhammadhashir7949
    @muhammadhashir7949 2 роки тому +1

    Thank you so much your work was practical and I loved it alot and underestood gradient descent. Before that I spent lots of time but didn't understood it properly

  • @phaniraju0456
    @phaniraju0456 4 роки тому +2

    I bow to your for this great clarification ..loved it

  • @josephsmy1994
    @josephsmy1994 2 роки тому +1

    Awesome explanation! straight to the point

  • @eramitvajpeyee85
    @eramitvajpeyee85 3 роки тому

    Thank you so much for explaining it in short and easy way!! Please keep uploading content like this.

  • @yaminikommi5406
    @yaminikommi5406 2 роки тому +2

    We can take any number as intial parameters and learning rate

  • @mattk6182
    @mattk6182 3 роки тому +2

    using x as your means of showing multiplication is confusing, makes it looks like you took the derivative wrong with 2x(x+5)..maybe in future videos either leave the x out so the multiplication is implied.

  • @zafarnasim9267
    @zafarnasim9267 2 роки тому +1

    You made it so simple. Great Job!

  • @sanurcucuyeva7040
    @sanurcucuyeva7040 Рік тому +1

    Hi, thanks for explanation. If our function is hard, at what point in the iteration should we stop to find the minimum point

  • @murat2073
    @murat2073 2 роки тому +1

    thank you Sir! you are a HERO!!!

  • @twicestay6683
    @twicestay6683 9 місяців тому

    Thx a lot!!! But I'd like to ask why the learning rate=0.01? is it a random number? Thx

  • @michaelscott8572
    @michaelscott8572 4 роки тому +1

    What I don't get is: When we use this method in neural net, we don't know the Errorfunction. We just have some point. So how can I build the derivative?

  • @omkarkadam5715
    @omkarkadam5715 3 роки тому

    Thanks mate, Finally Enlightened.

  • @smurfNA
    @smurfNA Рік тому

    hey! so do we choose the learning rate? and the gradient is simply just the function right ?

  • @abdellatifmarghan7521
    @abdellatifmarghan7521 7 місяців тому

    Thank you. grateful explanation

  • @mohamedelkhanche707
    @mohamedelkhanche707 2 роки тому

    ohhhhh wonderful i was chocked this is insane thank you from all my heart

  • @pwan3971
    @pwan3971 Рік тому

    Thanks a lot, really appeciate the video, this makes so much sense now

  • @yasamannazemi6706
    @yasamannazemi6706 3 роки тому +2

    It was so simple and helped me a lot :)
    Thanks👍🏻

  • @praneethcj6544
    @praneethcj6544 4 роки тому +1

    Simple and clear ... Yet need more detailing ...!!!!

  • @Slendich
    @Slendich 2 роки тому

    Really great and simple explanation. Thank you

  • @ajaykushwaha4233
    @ajaykushwaha4233 3 роки тому

    Best explanation ever.

  • @luisurena1770
    @luisurena1770 3 роки тому +2

    Coñazo siempre hay un indu que me ayuda a entender todo🔥🔥🔥🔥

  • @abdanettaye8217
    @abdanettaye8217 3 роки тому

    good starting, thank you

  • @hindbelkharchiche1654
    @hindbelkharchiche1654 3 роки тому

    Thank you .. the explanation is as simple as useful .

  • @blinky1892
    @blinky1892 Рік тому

    How do we know what the y value is of the parabole at any given x?😊

  • @mbogitechconpts
    @mbogitechconpts 2 роки тому

    Beautiful video. I have to like it.

  • @nawaab9275
    @nawaab9275 3 роки тому

    thanks for saving the semester

  • @fmikael1
    @fmikael1 2 роки тому

    Thanks for the great explination. everyone always complicates it

  • @SuperYtc1
    @SuperYtc1 Рік тому +1

    This is a good video.

  • @radhar5349
    @radhar5349 2 роки тому

    Great explanation. Easy to get the concept

  • @eliashossain4327
    @eliashossain4327 Рік тому

    Best explanation.

  • @dennisjoseph4528
    @dennisjoseph4528 4 роки тому

    Great job of explaining this as simple as possible Sir

  • @9891676610
    @9891676610 2 роки тому

    Awesome explanation . Thanks a lot !!

  • @basheeralwaely9658
    @basheeralwaely9658 3 роки тому

    Well done sir, very easy to understand

  • @tevinwright5109
    @tevinwright5109 Місяць тому

    GREAT VIDEO

  •  2 роки тому

    Perfect !

  • @bharatcreations7154
    @bharatcreations7154 2 роки тому

    Can we compute same thing without getting into learning rate??

  • @george4746
    @george4746 3 роки тому

    Thanks, It was very clear and concise.

  • @RayhanAhmedsimanto
    @RayhanAhmedsimanto 5 років тому

    Amazing Practical Explanation. Great work.

  • @bhavikdudhrejiya4478
    @bhavikdudhrejiya4478 4 роки тому

    Very good video. I appreciate your hard work. Keep uploading more videos.

  • @supantha118
    @supantha118 Рік тому

    Thank you so much

  • @sandipmaity2687
    @sandipmaity2687 4 роки тому +1

    Amazing Explanation :) Really simple and to the point 😀

  • @colton3000
    @colton3000 2 роки тому +1

    How do we find learning rate?

  • @denisplotnikov6875
    @denisplotnikov6875 2 роки тому

    How to use this example for Stochastic Gradient Descent?

  • @machinelearningid3931
    @machinelearningid3931 4 роки тому +2

    Thanks, this give me the light in darkness.

  • @TheJayenz
    @TheJayenz 2 роки тому

    Thank you so much!

  • @Snetter
    @Snetter 2 роки тому

    Nice work! thanks

  • @muhammadhilmirozan1266
    @muhammadhilmirozan1266 3 роки тому

    thx for explanation!

  • @thankyouthankyou1172
    @thankyouthankyou1172 2 роки тому

    Useful thank you

  • @mastan775
    @mastan775 4 роки тому

    Very good explanation...thanks a lot.

  • @ericklestrange6255
    @ericklestrange6255 4 роки тому +5

    didnt explain how do calculate the direction we are moving to (the minus), why the derivatives etc

    • @ak-ot2wn
      @ak-ot2wn 4 роки тому +3

      That's what I am looking for already for several days and nobody mentions this. Anyways I still think, that it is trivial. If your derivative is negative, you have to "move" to the right side (in case of 2 variables). If it is positive, you have to "move" to the left.

    • @debayondharchowdhury2680
      @debayondharchowdhury2680 4 роки тому

      he also didn't talk about loss calculation. why do we need to calculate loss at all if we can simply use the gradient descent on the function.

    • @blendaguedes
      @blendaguedes 4 роки тому +1

      @@debayondharchowdhury2680 Your loss function is the one pointing the difference between your output and your "y". You calculate the gradient to your loss function. At his example, he shows something that looks like a ' mean squared error' as loss function to me, and he is doing a linear regression with only one input "x".
      I recommend you the Andrew Ng classes on Coursera. have a good time

    • @blendaguedes
      @blendaguedes 4 роки тому

      @@ak-ot2wn I totally agree with what you are saying, the only matter is when you are programming you don't see witch direction your vector is going. So basically if the error is going down: keep going, if it starts to increase go back. You can just stop, or you can make your learning rate smaller to increase your accuracy

    • @rssaiganesh
      @rssaiganesh 4 роки тому

      I think the comments thread is looking for the math behind the formula for the gradient descent. Apologies if I misunderstood. But here is a link that helped me: towardsdatascience.com/understanding-the-mathematics-behind-gradient-descent-dde5dc9be06e

  • @user-qj1lm1xh2z
    @user-qj1lm1xh2z 2 роки тому

    Well done 👏

  • @arvinds7182
    @arvinds7182 Рік тому

    On point👏

  • @shankaks7217
    @shankaks7217 Рік тому

    Why did we choose 0.01 as the learning rate?

  • @davidbarnwell_virtual_clas6729
    @davidbarnwell_virtual_clas6729 2 роки тому

    How do we choose the learning rate? Good video but it's things like that I'd love to know

    • @DataScienceGarage
      @DataScienceGarage  2 роки тому +1

      Hi! Choosing learning rate often is not easy task. I usually makes experiments on model performance with multiple learning rate (manual, Grid search hyperparameter tuning, Bayesian search, etc.).

    • @davidbarnwell_virtual_clas6729
      @davidbarnwell_virtual_clas6729 2 роки тому

      @@DataScienceGarage Ahh...ok...I get you...it's very interesting.

  • @gerleenjosuegoya8777
    @gerleenjosuegoya8777 Рік тому

    Thank you!

  • @boniface385
    @boniface385 Рік тому

    Hi, why the learning rate are 0.01? Can it be any random learning rate? For example 0.2, 0.02 or any. I appreciate it for thee fast reply, thank you😊🙏🏻🙏🏻🙏🏻

    • @DataScienceGarage
      @DataScienceGarage  Рік тому +1

      Hello! Thanks for watching this video, I'm glad it was useful for you. While modelling ML system, you can specify random Learning rate. However the good practice is to use 0.1, 0.01, 0.001, or 0.0001. Each ML model has its own architecture, and different training data, hyperparameters, etc., so learning rate can be adopted separately for each case.
      Here, I used 0.01 just for demonstration purposes.

    • @boniface385
      @boniface385 Рік тому

      @@DataScienceGarage thank you so much for the explanation. 🫶🏻

  • @dveerraju1852
    @dveerraju1852 Місяць тому

    How can you know the learning rate

  • @ydkmusic
    @ydkmusic 4 роки тому

    Great video! There is a typo around 3:50. The bottom equation should be x_2 = .... instead of x_1.

  • @davidkayode6679
    @davidkayode6679 3 роки тому

    Wonderful Video!!! Thank You!

  • @kronlogic2408
    @kronlogic2408 3 роки тому

    For the Iteration 2, shouldn't the second line be x2= and not x1= ?

  • @karthiklogan9384
    @karthiklogan9384 3 роки тому

    really helpful sir.thank you so much

  • @darkman8939
    @darkman8939 3 роки тому

    thanks, very hhelpful.

  • @grinfacelaxu
    @grinfacelaxu 2 місяці тому

    Nice!

  • @bhavya2301
    @bhavya2301 3 роки тому

    Thankyou.

  • @seathru1232
    @seathru1232 3 роки тому

    GREAT!

  • @emrecik9882
    @emrecik9882 Рік тому

    Thanks

  • @bernardaslabutis5098
    @bernardaslabutis5098 3 роки тому

    Ačiū, padėjo!

  • @AJ-et3vf
    @AJ-et3vf 3 роки тому

    Very useful! Awesome ❤️

  • @pearlsofwisdom2416
    @pearlsofwisdom2416 4 роки тому

    Good explanation but would have been better if you elaborated its formula of why it is used to reach next step. Why is derivative multiplied by learning rate and why it is then substracted from first point value

    • @blendaguedes
      @blendaguedes 4 роки тому

      The learning rate makes the "decay slow". At his first interaction, the result would be: -3 -4 = -7.
      Can you see where this is going? As he goes slow he will keep dropping his "y", until he get to as close as possible to -5. Sometimes to get at the minimum you have to make you learning rate smaller while computing your weights .

  • @Hasasinful
    @Hasasinful 3 роки тому

    Thanks just what i needed

  • @diegososa5280
    @diegososa5280 3 роки тому

    Thank you very much!

  • @MuditDahiya
    @MuditDahiya 4 роки тому

    Very nice explanation!!

  • @mohsinjunaid8454
    @mohsinjunaid8454 Рік тому

    thanks

  • @govardhan3099
    @govardhan3099 3 роки тому

    Great explained...

  • @harshithbangera7905
    @harshithbangera7905 3 роки тому +1

    How we know -5 is global minimum...is there when gradient or derivative become 0

    • @explovictinischool2234
      @explovictinischool2234 Рік тому

      Hello, better now than never.
      Let's assume we have reached -5 at step Xn. However, we don't know that we have reached the local minimum.
      We perform another step Xn+1 with the formula, which gives:
      Xn+1 = Xn - (learning_rate) * (dy/dx)
      Xn+1 = -5 - (0.01) * (2 * (-5+5))
      Xn+1 = -5 - (0.01) * 0
      Xn+1 = -5
      And so, we have Xn+1 = Xn which means we can not progress anymore and which means we reached the local minimum.

  • @dtakamalakirthidissanayake9770
    @dtakamalakirthidissanayake9770 4 роки тому

    Thank You So Much. Great Simple Explanation!!!

  • @jimyang8824
    @jimyang8824 4 роки тому

    Good explanation!

  • @moazelsawaf2000
    @moazelsawaf2000 4 роки тому +1

    Thanks a lot sir

  • @AlfredEssa
    @AlfredEssa 3 роки тому

    Good job!

  • @codingtamilan
    @codingtamilan 4 роки тому

    How you draw that curve can be fixed as -5 ?
    Always it is centre as -5 ?

    • @blendaguedes
      @blendaguedes 4 роки тому +1

      First you decides witch will be his loss function. On his case it was (5+x)^2, or x^2 + 10x + 25. Nos you program the gradient descent to find the minimum of the function. It depends of your function.

    • @codingtamilan
      @codingtamilan 4 роки тому +1

      @@blendaguedes thq... pleasure to meet you

  • @MrAnindyabanerjee
    @MrAnindyabanerjee 4 роки тому

    Thank you