23. Accelerating Gradient Descent (Use Momentum)

Поділитися
Вставка
  • Опубліковано 22 лис 2024

КОМЕНТАРІ • 48

  • @gigik64
    @gigik64 5 років тому +40

    Jesus man, I remember back before I started college when I checked out Prof Strang’s calculus series.
    He’s aged quite a lot since that series, but he’s always sharp as a tack. And I’m just astonished that even being so old he knows so much about machine learning, I didn’t think it was his field.
    Huge kudos Gilbert Strang, huge kudos.

    • @marsag3118
      @marsag3118 3 роки тому +5

      impressive indeed. I'd be happy to be 50% sharp at that age as he was here.

  • @georgesadler7830
    @georgesadler7830 3 роки тому +3

    Professor Strang ,thank you for an old fashion lecture on Accelerating Gradient Descent.
    These topics are very theoretical for the average student.

  • @Arin177
    @Arin177 Рік тому

    Those who have sixth edition of Introduction to Linear Algebra can enjoy this course!!! In my view this course really increases the value of the book.

  • @franzdoe5558
    @franzdoe5558 4 роки тому +6

    Such a great lecturer, as well as in his classic Linear Algebra lecture series. Really nice to see him up and healthy, sharp and as a great step-by-step-explainer as ever.

  • @nguyenbaodung1603
    @nguyenbaodung1603 3 роки тому +1

    I'm so happy to see you here. I only trust you when it comes to lecture

  • @marjavanderwind4251
    @marjavanderwind4251 5 років тому +4

    Wow this old man is so smart. I would wish to see more lectures from him and learn much more of this stuff.

    • @yefetbentili128
      @yefetbentili128 5 років тому +1

      absolutely ! this man is a pure tresor

    • @mdrasel-gh5yf
      @mdrasel-gh5yf 4 роки тому +1

      Check out his linear algebra course, this is one of the most liked playlists of MIT.
      ua-cam.com/video/7UJ4CFRGd-U/v-deo.html

  • @dengdengkenya
    @dengdengkenya 5 років тому +20

    Why is there no more comments for such a great course? MIT is a great university!

  • @honprarules
    @honprarules 4 роки тому +3

    He radiates knowledge. Love the content!

  • @yubai6549
    @yubai6549 4 роки тому +2

    祝老爷子健康,非常感谢您!

  • @何浩源-r2y
    @何浩源-r2y 5 років тому +1

    Prof Boyd is also very good teacher !
    I enjoy his lecture very much.

  • @vaisuliafu3342
    @vaisuliafu3342 3 роки тому +4

    such great lecturing makes me wonder what part of MIT student success is due to innate ability and how much due to superior teaching

    • @PrzemyslawSliwinski
      @PrzemyslawSliwinski 2 роки тому +3

      In terms of this very lecture: think about a professor as a gradient with your ability being a momentum. ;)

  • @MsVanessasimoes
    @MsVanessasimoes 3 роки тому +1

    I loved this amazing lecture. Great professor, and great content. Thanks for sharing it openly on UA-cam.

  • @newbie8051
    @newbie8051 Рік тому

    Tough course to follow, from what I feel (I'm currently in my 4th semester of undergrad)
    Great lecture of Prof Gilbert, I feel kinda dumb after listening to this lecture, will try again

  • @casual_dancer
    @casual_dancer Рік тому

    Finally a lecture that explains the magic numbers in momentum! Those shorter video formats are great for introduction but leave me confused about the math behind it. Love the ground up approach to explaining.
    Could any one tell me what the book that Professor Strang mentioned in 06:53 of the lecture is?

    • @scotts.9460
      @scotts.9460 Рік тому

      web.stanford.edu/~boyd/cvxbook/bv_cvxbook.pdf

  • @brendawilliams8062
    @brendawilliams8062 3 роки тому

    It’s nice you got it on a linear line.

  • @meow75714
    @meow75714 4 роки тому +1

    wow, beautiful, now i see why it oscillates

  • @antaresd1
    @antaresd1 4 роки тому

    Crystal clear! Thank you very much for sharing it

  • @RLDacademyGATEeceAndAdvanced
    @RLDacademyGATEeceAndAdvanced 2 роки тому

    Excellent lecture

  • @Schweini8
    @Schweini8 9 місяців тому

    why is it enough to assume x follows an eigenvector to demonstrate the rate of convergence?

  • @alessandromarialaspina9997
    @alessandromarialaspina9997 2 роки тому

    Can this procedure be expanded to deal with problems in multiple dimensions? So a, b, c, and d are not scalars but actually vectors themselves, representing the inputs x1, x2, x3 to a function f(x1, x2, x3). How would you form R that way, and would you have different condition numbers for each element of b?

  • @vnpikachu4627
    @vnpikachu4627 3 роки тому +2

    At 27:00 why follow the direction of eigenvalue? It just comes out of no where

    • @ky8920
      @ky8920 3 роки тому

      i think it has something to do with pca.

    • @e2DAiPIE
      @e2DAiPIE Рік тому +1

      Can anyone provide some clarification here?
      I think why we would like to follow an eigen-vector is made clear, but what's not clear to me is why we expected this would work prior to deriving the result (that f decreases faster).
      I can see that following an eigen vector reduces the problem of inverting a block matrix containing the original S to just inverting a much smaller matrix of scalars. So, maybe this strategy was just wishful thinking that paid off?
      Insight would be very welcome. Thanks.

    • @Schweini8
      @Schweini8 9 місяців тому

      @@e2DAiPIE maybe if you can show that the method converges in all directions pointed by eigenvectors then it also converges with at least the same rate in all other directions (since any vector x in S can be written as a linear combination of the eigenbasis)

  • @itay4178
    @itay4178 4 роки тому

    Such a great lecturer. Thank you!

  • @ShadowGamer-qy7ls
    @ShadowGamer-qy7ls 2 роки тому

    That guy who is always capturing the photo

  • @archibaldgoldking
    @archibaldgoldking 3 роки тому

    B is just the momentum :)

  • @vishalpoddar
    @vishalpoddar 4 роки тому +1

    why do we need to make the eigen vector as small as possible ?

    • @samymohammed596
      @samymohammed596 3 роки тому

      You mean why are we trying to make the eigenvalue as small as possible? I am also wondering the same... if we make eigenvalues of R small, then R^k -->0 as k-->\infty and you end up with c_k, d_k --> 0, and what good is that? I am surely missing a few parts to this story...

    • @0ScarletBlood0
      @0ScarletBlood0 3 роки тому +3

      @@samymohammed596 1) if on the contrary, the powers of R where increasing, the new values of c_k, d_k would increase with them, meaning that x_k = c_k*q would never settle for the minimum of the function but diverge from it.
      2) you do want the value of d_k to approach zero, meaning that z_k = d_k*q = 0 which then makes x_(k+1) = x_k, the point of convergence would be found at the minimum of the function.
      it's true that R^k --> 0 as k --> inf but we are not computing these values that many times! Taking this into account, R^k*[c_k, d_k] is not = [0, 0]

    • @samymohammed596
      @samymohammed596 3 роки тому +1

      @@0ScarletBlood0 Ah, of course you are right about wanting d_k = 0! :):) Thanks for making that point clear!
      I certainly see the issue with powers of R increasing and then that causing immediate divergence. Yes, better for eigenvalues to be < 0 because then at least you don't start off with divergence...
      But then you might hit zero... I guess you need a little skill to pick the parameters s, beta to ensure that your problem is well defined so that you reach convergence (d_k = 0) before the powers of R runaway and make the whole thing zero! Just my 2 cents... but thanks very much for your reply!

    • @ky8920
      @ky8920 3 роки тому

      @@samymohammed596 that matrix has full rank, as long as β!=0.

    • @brendawilliams8062
      @brendawilliams8062 3 роки тому

      All I know is it’s based on symmetry and the remaining 5 will be at the end of the spool.

  • @anarbay24
    @anarbay24 4 роки тому

    why f is equal to (1/2)X(transpose)Sx where prof did not explain what is S. Does anyone know what is that?

    • @sheelaagarwal3392
      @sheelaagarwal3392 4 роки тому

      see lecture 22 for the definition

    • @ky8920
      @ky8920 3 роки тому

      this subchapter is limited to the convex function. convex provides a nice property: the local minima is also the global minima

  • @omaribrahim3370
    @omaribrahim3370 4 роки тому

    Momentum forsenCD

  • @ostrodmit
    @ostrodmit 4 роки тому +3

    Would they please stop calling Nesterov's algorithm ``descent''? It's not a descent method as Nesterov himself keeps repeating. Otherwise, a wonderful lecture, and an impressive feat for the lecturer given his age.

    • @ketan9318
      @ketan9318 4 роки тому

      I agree with your point.

  • @naterojas9272
    @naterojas9272 4 роки тому

    I'm back! 🤓

  • @murat7456
    @murat7456 4 роки тому +1

    reis 85 yaşında kafa zehir.