Tutorial 12- Stochastic Gradient Descent vs Gradient Descent

Поділитися
Вставка
  • Опубліковано 5 жов 2024
  • Below are the various playlist created on ML,Data Science and Deep Learning. Please subscribe and support the channel. Happy Learning!
    Deep Learning Playlist: • Tutorial 1- Introducti...
    Data Science Projects playlist: • Generative Adversarial...
    NLP playlist: • Natural Language Proce...
    Statistics Playlist: • Population vs Sample i...
    Feature Engineering playlist: • Feature Engineering in...
    Computer Vision playlist: • OpenCV Installation | ...
    Data Science Interview Question playlist: • Complete Life Cycle of...
    You can buy my book on Finance with Machine Learning and Deep Learning from the below url
    amazon url: www.amazon.in/...
    🙏🙏🙏🙏🙏🙏🙏🙏
    YOU JUST NEED TO DO
    3 THINGS to support my channel
    LIKE
    SHARE
    &
    SUBSCRIBE
    TO MY UA-cam CHANNEL

КОМЕНТАРІ • 99

  • @shashanktripathi3034
    @shashanktripathi3034 4 роки тому +6

    Krish sir your youtube channel is just like GITA for me as one gets all the answers to life in GITA I get all my doubts cleared on your channel.
    Thank you, SIr.

    • @kartikdave659
      @kartikdave659 3 роки тому

      after becoming member how can i get the data science material, can you please tell me?

  • @BalaguruGupta
    @BalaguruGupta 3 роки тому +14

    Amazing explanation Sir! You'll always be the hero for the AI Enthusiasts. Thanks a lot!

  • @ravindrav1895
    @ravindrav1895 2 роки тому +2

    whenever i am confused with some topics , i come back to this channel and watch your videos and it helps me a lot sir .Thank you sir for an amazing explanation

  • @saurabhnigudkar6115
    @saurabhnigudkar6115 4 роки тому +6

    Best Deep Learning playlist on youtube

  • @nagesh866
    @nagesh866 3 роки тому +5

    what an amazing teacher you are. Crystal clear.

  • @lakshminarasimhanvenkatakr3754
    @lakshminarasimhanvenkatakr3754 4 роки тому +3

    This is excellent explanation so that anyone can understand with so much granular level of details.

  • @nitayg1326
    @nitayg1326 4 роки тому +15

    My God! Finally am clear about GD SGD and mini batch SGD!

  • @ajithtolroy5441
    @ajithtolroy5441 4 роки тому +2

    I saw many videos but this one is quite comprehensible and informative

  • @archanamaurya89
    @archanamaurya89 3 роки тому +6

    This video is such a light bulb moment for me :D Thank you so very much!!

  • @fedisalhi6320
    @fedisalhi6320 4 роки тому +8

    Excellent explanation, it was really helpful thank you.

  • @Anand-uw2uc
    @Anand-uw2uc 4 роки тому +9

    Good Explanation! But you did not speak much about when to use SGD although you clarified better on GD and Mini Batch SGD

    • @vishaldas6346
      @vishaldas6346 4 роки тому +1

      There is nothing much to explain about SGD when you are talking about 1 datapoint at a time while considering dataset of 1000 datapoints.

  • @SandeepKashyap-ek2hx
    @SandeepKashyap-ek2hx 2 роки тому +1

    You are a HERO sir

  • @gayathrijpl
    @gayathrijpl Рік тому

    such a clean way of explanation

  • @guytonedhai
    @guytonedhai Рік тому

    How are you so good at explaining 😭😭😭😭😭 Thanks a lot ♥♥♥

  • @khuloodnasher1606
    @khuloodnasher1606 4 роки тому

    Really this is the best video i'v seen ever explaining the concept better than famous. school

  • @ArthurCor-ts2bg
    @ArthurCor-ts2bg 4 роки тому

    Krish you concise subject most meaningfully

  • @VVV-wx3ui
    @VVV-wx3ui 5 років тому +1

    Superb...simply superb. understood the concept now from the Loss function. Well don Krish.

  • @sandipansarkar9211
    @sandipansarkar9211 4 роки тому +1

    Thanks Krish. Good video.I want to use all this knowledge in my next batch of deep learning by ineuron

  • @RishikeshGangaDarshan
    @RishikeshGangaDarshan 3 роки тому

    Good Good clearly explained nobody can explained like this

  • @tonyzhang2501
    @tonyzhang2501 3 роки тому +1

    Thank you, It is clear explanation. I got it!

  • @koustavdutta5317
    @koustavdutta5317 3 роки тому +2

    Hi Krish, one request to you ...like this playlist, please make long videos for the ML Playlist with the Loss Functions , Optimizers used in various ML Algorithms --> mainly in case of Classification Algorithms

  • @lj123-g9d
    @lj123-g9d 2 місяці тому

    So simply explained

  • @Skandawin78
    @Skandawin78 5 років тому

    Your vidoes are excellent reference to brush up these concepts

  • @severnsevern1445
    @severnsevern1445 3 роки тому

    Great explanation . Very clear . Thank!

  • @nikkitha92
    @nikkitha92 4 роки тому +1

    Sir your videos are amazing. Can you please explain about latest methodologies such as BERT , ELMO

  • @jsverma143
    @jsverma143 4 роки тому +1

    negative weights and positive weights best explained as--
    since the angle of tangent is more than 90 degree in left side of the curve so this results in -ve values and for other its less than 90 degree so it would be +ve

  • @rabidub733
    @rabidub733 6 місяців тому

    thanks for this! great explanation

  • @nansonspunk
    @nansonspunk Рік тому

    yes i really liked this explanation thanks

  • @Kurtmind
    @Kurtmind 2 роки тому

    Excellent explanation Sir!

  • @allaboutdata2050
    @allaboutdata2050 4 роки тому +1

    What an explaination 🧡 . Great !! Awesome !! .

  • @chinmaybhat9636
    @chinmaybhat9636 4 роки тому

    Awesome @KrishNaik Sir.

  • @aditisrivastava7079
    @aditisrivastava7079 4 роки тому +2

    Just wanted to ask to ask if you could also suggest some good resources online that we can read which could bring more clarity.......

  • @taranilakshmi9680
    @taranilakshmi9680 5 років тому

    Explained very well. Thankyou.

  • @rohitsaini8480
    @rohitsaini8480 Рік тому

    Sir, please solve my problem, in my view we are doing gradient descent to find the best value of m (slop in case of linear regression, considering b = 0) so if we use all the point then we must came to know at which point the value of m is less, so why we have to use learning rate to update weight because we already know the best value.

  • @achrafkmout9398
    @achrafkmout9398 3 роки тому

    very good explanation

  • @gauravsingh2425
    @gauravsingh2425 5 років тому

    Thanks Krish !!! very nice explanation

  • @vinuvarshith6412
    @vinuvarshith6412 Рік тому

    Top notch explanation!

  • @bijaynayak6473
    @bijaynayak6473 5 років тому +5

    Hello Sir, could you share the link for the code where you explained, these videos series are very nice with short of the period we can cover so many concepts. :)

  • @uttamchoudhary5229
    @uttamchoudhary5229 5 років тому +1

    Great video man 👍👍..Please keep it up. I am waiting for next videos

  • @jiayuzhou6051
    @jiayuzhou6051 4 місяці тому

    the only video that explains

  • @sreejus8218
    @sreejus8218 3 роки тому

    If we use a sample of output to find the loss, will we use its derivative for changing whole weight or change the weights of the respective output

  • @bhavanapurohit2627
    @bhavanapurohit2627 3 роки тому +2

    Hi, is it completely theoretical or will you code in further sessions?

  • @siddharthachatterjee9959
    @siddharthachatterjee9959 4 роки тому

    Good attempt 👍. Please record with camera on manual focus.

  • @ankitbiswas8380
    @ankitbiswas8380 2 роки тому

    when you mentioned SGD takes place in linear regression . I didnt understand that comment . Even in your linear regression videos for the mean square error we are having sum of squares for all data points . So how SGD got linked in linear regression ?

  • @soheljagirdar8830
    @soheljagirdar8830 3 роки тому +1

    4:17 SGD have minimum 256 records to find error / minima you said it's 1 record at a time

    • @pramodyadav4422
      @pramodyadav4422 3 роки тому +1

      I read few articles which says In "SGD a randomly one data point is picked from the whole data set at each iteration". 256 records which you're talking about may be Mini Batch SGD "It is also common to sample a small number of data points instead of just one point at each step and that is called “mini-batch” gradient descent."

    • @tejasvigupta07
      @tejasvigupta07 3 роки тому

      @@pramodyadav4422 yeah ,even I have read that in SCD only one data point is selected and updated in each iteration instead of all.

  • @rababmaroc3354
    @rababmaroc3354 4 роки тому

    thank you very much for your efforts. please how can we solve a portfolio allocation problem using this algorithm? please answer me

  • @response2u
    @response2u 2 роки тому

    Thank you, sir!

  • @rdf1616
    @rdf1616 4 роки тому

    good explanation! thankss

  • @vishaljhaveri7565
    @vishaljhaveri7565 3 роки тому

    Thank you sir.

  • @r7918
    @r7918 3 роки тому

    I have 1 question regarding this topic. Is this concept applicable to linear regression, right?

  • @vineetagarwal18
    @vineetagarwal18 Рік тому

    Great Sir

  • @codewithbishal895
    @codewithbishal895 8 днів тому

    Excellent

  • @akfvc8712
    @akfvc8712 3 роки тому

    greate video excelent effort. appreciated!!

  • @minakshiboruah1356
    @minakshiboruah1356 3 роки тому

    @12:02 Sir it should bemini batch stocastic g.d.

  • @muhammedsahalot8683
    @muhammedsahalot8683 4 місяці тому

    which have more convergence speed SGD or GD ?

  • @syedsaqlainabatool3399
    @syedsaqlainabatool3399 3 роки тому

    This is what i was looking for

  • @percyjardine5724
    @percyjardine5724 4 роки тому

    thanks Krish

  • @yukeshnepal4885
    @yukeshnepal4885 4 роки тому +2

    8:58 , using GD it converge quickly and while using mini-batch SGD it follows zigzag path, How??

    • @kannanparthipan7907
      @kannanparthipan7907 4 роки тому +1

      In case of mini batch sgd, we are considering only some points so some deviations will be there in the calculation compared to usual gradient descent where we are considering all values. Simple example GD is like total population and mini SGD is like sample population, it will never be equal and in sample population some deviation always will be there in distribution compared to total population distribution.
      We cant use GD everywhere, due to time computation factor, using mini SGD will give approximate correct result.

    • @bhargavpotluri5147
      @bhargavpotluri5147 4 роки тому +1

      @@kannanparthipan7907 Deviation will be there in the final output or in the final converge result. Question is why do we have during the process of convergence. Also for every epoch if we consider different samples then understood that there can be zig zag results in the process of convergence. But if only one sample of k records are considered then why is that zig zag during convergence?

    • @bhargavpotluri5147
      @bhargavpotluri5147 4 роки тому +2

      Ok now I got it. For every iteration, samples are picked at random, so is zig zag. Just gone through other artciles

  • @funpoint3966
    @funpoint3966 6 місяців тому

    please workout your camera issue it seems like it is set to auto focus resulting in a little disturbance.

  • @manojsalunke2842
    @manojsalunke2842 4 роки тому

    9.28 time, you said sgd will take time to converge than gd, then which is fast , sgd or gd????

  • @ruchikalalit1304
    @ruchikalalit1304 4 роки тому +1

    have you make the videos of practical implementation of all the work if so please share the links

  • @abhrapuitandy3327
    @abhrapuitandy3327 4 роки тому

    please do tell about stochastic gradient ascent also

  • @alsabtilaila1923
    @alsabtilaila1923 3 роки тому

    Great one!

  • @NaveenKumar-ts1om
    @NaveenKumar-ts1om 2 місяці тому

    Awesome KRISHHHHHH

  • @pareesepathak7348
    @pareesepathak7348 3 роки тому

    can you share the paper for reference and also can you share the resources for deep learning for image processing.

  • @AjanUnderscore
    @AjanUnderscore 2 роки тому

    Thank u sir 🙏🙏🙌🧠🐈

  • @goodnewsdaily-tamil1990
    @goodnewsdaily-tamil1990 2 роки тому

    1000 likes for you man👏👍

  • @sathvikambati3464
    @sathvikambati3464 2 роки тому

    Thanks

  • @louerleseigneur4532
    @louerleseigneur4532 3 роки тому

    Thanks buddy

  • @aminuabdulsalami4325
    @aminuabdulsalami4325 5 років тому

    Great guy.

  • @rameshthamizhselvan2458
    @rameshthamizhselvan2458 5 років тому

    Excellent!

  • @muralimohan6974
    @muralimohan6974 3 роки тому

    How can we take k inputs at the same time

  • @khushboosoni2788
    @khushboosoni2788 Рік тому

    sir can you explain me SPGD algorithm please

  • @phaneendra3700
    @phaneendra3700 4 роки тому

    hats off man

  • @samiabidah4197
    @samiabidah4197 3 роки тому

    please what the difference between GD and Batch GD !

  • @_JoyshreeMozumder
    @_JoyshreeMozumder 3 роки тому

    what is resource of data point?

  • @a.sharan8876
    @a.sharan8876 Рік тому

    py:28: RuntimeWarning: overflow encountered in scalar power
    cost = (1/n)*sum([value**2 for value in(y-y_predicted)]) hey bro . ia m stuck here with this error , i could not understand the error itself, if you suggests me some solution. .... just now i started to practice a ml algorthm.

  • @praneethcj6544
    @praneethcj6544 4 роки тому

    Perfect ..!!!

  • @ting-yuhsu4229
    @ting-yuhsu4229 4 роки тому

    You are AWESOME! :)

  • @RaviRanjan_ssj4
    @RaviRanjan_ssj4 4 роки тому

    great video !!

  • @thanicssubakar6303
    @thanicssubakar6303 5 років тому +1

    Nice bro

  • @shubhangiagrawal336
    @shubhangiagrawal336 3 роки тому

    good video

  • @atchutram9894
    @atchutram9894 5 років тому

    Switch the auto focus feature in your camera. It is distracting.

  • @shekharkumar1902
    @shekharkumar1902 4 роки тому

    Confusing one !

  • @devaryan2201
    @devaryan2201 2 роки тому

    do change your method of teaching seems like someone has read a book and just trying to copy thatt content from ones side .....use your own ideologies for it
    :)

  • @chalapathinagavarmabhupath8432
    @chalapathinagavarmabhupath8432 5 років тому

    our videos are good but camara was bad