Gradient Descent Implementation from Scratch in Python

Поділитися
Вставка
  • Опубліковано 15 вер 2024
  • In this video we show how you can implement the batch gradient descent and stochastic gradient descent algorithms from scratch in python.
    ** SUBSCRIBE:
    www.youtube.co...
    You can find the Jupyter Notebook for this video on our Github repo here: github.com/end...
    ** Gradient descent for linear regression video: • Linear Regression with...
    ** Follow us on Instagram for more endless engineering:
    / endlesseng
    ** Like us on Facebook:
    / endlesseng
    ** Check us out on twitter:
    / endlesseng

КОМЕНТАРІ • 50

  • @Pmarmagne
    @Pmarmagne 3 роки тому +2

    I'm currently reading Python Machine Learning by Sebastian V.
    I wish that all the code in this book was as clear as it is in your video.
    Thank you for posting it

    • @EndlessEngineering
      @EndlessEngineering  3 роки тому

      Thank you! I am glad you found this video clear and useful. Please let me know if there are other topics you would like to see videos on

  • @arungandhi5612
    @arungandhi5612 3 роки тому +1

    excellent absolutely we need more implementations of these algorithms

  • @neonlearn
    @neonlearn 5 років тому +1

    heey guy this was extremly useful 😍😍😍 did yourself know ??

  • @sudiptosen3418
    @sudiptosen3418 4 роки тому +1

    Best explanation EVER!
    SUBSCRIBED!!

    • @EndlessEngineering
      @EndlessEngineering  3 роки тому

      Thank you for watching Rohan! I am glad you found the video useful

  • @sandyjust
    @sandyjust 4 роки тому +2

    Stochastic gradient descent implementation is not correct. you supposed to shuffle the data and every iteration pick only 1 random data point ( or multiple in case of mini-batch) for calculating the params(ETA) value.

  • @musabgulfam4229
    @musabgulfam4229 4 роки тому +1

    how is h(xi) = theta transpose . x bar? Please explain. Thanks in advance

    • @EndlessEngineering
      @EndlessEngineering  4 роки тому

      The formulation of the model is derived in detail in my Linear Regression video, see here the explanation for it ua-cam.com/video/fkS3FkVAPWU/v-deo.html

  • @rishabhjain2559
    @rishabhjain2559 4 роки тому +2

    Thank you for amazing explanation

  • @valentinfontanger4962
    @valentinfontanger4962 4 роки тому +1

    I love the format !

  • @ravikirankb
    @ravikirankb 3 роки тому +1

    Please zoom into the notebook for better visibility

  • @ericazombie793
    @ericazombie793 3 роки тому +1

    no "while" loop in your stochastic gradient decent function??? What happened there?

    • @EndlessEngineering
      @EndlessEngineering  3 роки тому

      Hi Ying, not sure I understand your question. There is a for loop in the stochastic gradient descent function

    • @EndlessEngineering
      @EndlessEngineering  2 роки тому

      @Nutty Jedi the function that used to split the data is sklearn.model_selection.train_test_split, that funciton by default shuffles the data see the link below for the funciton documentation from the sklearn page.
      scikit-learn.org/stable/modules/generated/sklearn.model_selection.train_test_split.html

  • @lailazahra3022
    @lailazahra3022 2 роки тому

    very good explained.
    It will be very appreciated if you implement the structure from Motion algorithm.
    or segest a good resource to learn how to do that.
    thanks

  • @kantconnect3626
    @kantconnect3626 3 роки тому +1

    Hey..... Thanks so much for this video...... But please can you do the same with OCTAVE PROGRAMMING LANGUAGE

    • @EndlessEngineering
      @EndlessEngineering  3 роки тому

      Hey Kant, thanks for watching.
      The reason I choose to go with python is because it is used a lot in the data science / machine learning community. It is also in higher demand by companies as compared to Octave. I will think about making some future videos with Octave, but for now I think it will mostly be python

  • @obinnaizima9387
    @obinnaizima9387 4 роки тому

    Hi, many thanks for providing these materials -- for free! However, I didn't really get understand the implementation of the SGD as it seems you didn't shuffle and randomly choose but looped over the entire dataset. Kindly clarify.

    • @EndlessEngineering
      @EndlessEngineering  4 роки тому +1

      Hi Obinna, thanks for your question.
      I used the sklearn.model_selection.train_test_split functionality ti split the data. That functionality has an option to shuffle the data, and it is true by default. See the documentation here: scikit-learn.org/stable/modules/generated/sklearn.model_selection.train_test_split.html

  • @berkc5323
    @berkc5323 4 роки тому

    how can i turn this into a polynomial model? i need to increase the complexity of the model and see the cost decrease while the complexity goes higher.

    • @EndlessEngineering
      @EndlessEngineering  4 роки тому

      You can still use gradient descent with a general n-order polynomial model, you just have to collect your data in a form that fits y = (theta^T) X to do for example batch gradient descent. The wikipedia page on polynomial regression has a good example en.wikipedia.org/wiki/Polynomial_regression

  • @kabilan8617
    @kabilan8617 4 роки тому

    @7:11 what do you mean by params[:,iteration] = params
    pls explain this line
    Thank you

    • @EndlessEngineering
      @EndlessEngineering  4 роки тому +1

      Hi Kabilan, thanks for you questions.
      This line stores the value of the parameters for every iteration, since there are more than one parameters I use the [:, iteration] to store a column of parameters. You can take a look at numpy array slicing to get more information on that.

  • @sandyjust
    @sandyjust 4 роки тому

    @10:35 is it not " params = params - alpha * gradient/num_sampls" ?

    • @EndlessEngineering
      @EndlessEngineering  4 роки тому

      Only if you make the gradient = np.array([1.0,X]) * (y_hat -y). Bit that is not how I have defined the problem, please see the start of the video for the mathematical description

  • @gourabchanda9268
    @gourabchanda9268 5 років тому +1

    Sir, Thanks a lot.

  • @fuckooo
    @fuckooo 4 роки тому

    Nice video, subbed

  • @geocarvalhont
    @geocarvalhont 5 років тому +1

    Ty man, any chance to implement fuzzy c-means (FCM)? I'm suffering trying to understand it and implement the kernel fuzzy c-means (KFCM). Nice project, ty again!

    • @EndlessEngineering
      @EndlessEngineering  5 років тому

      Thanks! Glad you enjoyed the video. I do not have one planned for FCM soon, but I will put it on my list!

  • @mjar3799
    @mjar3799 4 роки тому

    Hi
    Thanks for your nice illustration
    In your code, you supposed that we already know the derivative of the cost function. Could you please show an example of some complicated function and then show us how we can write a python code to derive it!
    Thanks

    • @EndlessEngineering
      @EndlessEngineering  4 роки тому

      Hi Mohammed, thanks for your question.
      Please checkout my video Linear Regression with Gradient Descent + Least Squares for all the mathematical details
      video here --> ua-cam.com/video/fkS3FkVAPWU/v-deo.html

    • @mjar3799
      @mjar3799 4 роки тому

      Endless Engineering
      Thanks for your response,
      I already checked that video, and thank you again for the nice job!
      My question is how to write a Python code to derive the cost function !

    • @EndlessEngineering
      @EndlessEngineering  4 роки тому +1

      @@mjar3799 I am not sure I understand your question. Is it that you want python code to compute the derivative of any cost function? That is a little out of scope here, since for linear regression we assume the cost function has a certain structure. Which is why the derivative is mathematically computed. If you want code that computes the derivative of any function you can try to use something like SymPy www.sympy.org/en/index.html
      If you want numerical computation of the derivative I would recommend just using the tools in scipy or nympy

  • @beautifulmercury
    @beautifulmercury 5 років тому +1

    what about a vectorized cost function? :D

    • @samblattner1659
      @samblattner1659 4 роки тому

      Rachel Newell yeah, you can do this and it increases efficiency so much. In the video above, to calculate the cost and the gradient, he is looping over the data. The way you can vectorize it is as follows (Pay attention to single quotes as transpose operator!): cost = (y-X*theta)’*(y-X*theta)/2N, where y is Nx1, X is Nx2 (the i-th row is (1, x_i)), and theta is 2x1. The gradient is then grad = X’*X*theta - X’*y, so that the updating rule is theta = theta - alpha*grad!

  • @sandyjust
    @sandyjust 4 роки тому

    also i think gradient = np.array([1.0,X]) * (y_hat -y)

  • @annie157
    @annie157 3 роки тому

    Can you please provide me your email.I have an error in my code and its killing me from days

    • @EndlessEngineering
      @EndlessEngineering  3 роки тому

      Hi Annie, you can send an email to endlessengineeringphd@gmail.com

  • @beautifulmercury
    @beautifulmercury 5 років тому

    what about a vectorized cost function? :D

    • @EndlessEngineering
      @EndlessEngineering  4 роки тому

      Hey Rachel! Do you mean a cost function that generates a cost vector? That would certainly be mathematically possible, but the math would get a little messy. And I am not exactly sure what that would buy you

    • @TheGheezoinky
      @TheGheezoinky 4 роки тому

      A vectorised cost function apparently does all the calculations in one go rather than iteratively so it is much faster for larger amounts of data.
      Also im a noob in ML... Just started studying but read this in my book