23. Accelerating Gradient Descent (Use Momentum)

25. Stochastic Gradient Descent

Solve any equation using gradient descent

СПОРИМ ТЫ НЕ ЗНАЕШЬ ТРИ СЛОВА НА БУКВУ О? #shortsvideo #юмор #катяклон #comedy #прикол #мамадочка

МІША ЛЕБІГА і АНДРІЙ ЛУЗАН в СРАЧІ #32

😯 Подарила сыну БМВ, но не ожидала такой реакции на машину! | Новостничок

22. Gradient Descent: Downhill to a Minimum

MIT OpenCourseWare

Переглядів 82 965

Додати в
- Мій плейлист
- Переглянути пізніше
Поділитися

Поділитися

Вставка

Розмір відео:

Показувати елементи керування програвачем

Автоматичне відтворення

Автоповтор

Опубліковано 29 гру 2024

КОМЕНТАРІ • 44

@perlaramos8783 4 роки тому ⁺⁴⁰
Gradient Descent is at 34:33
@NaveenKumar-yu6eo 4 роки тому
that helped thank you
@samirhajiyev6905 2 роки тому
thank you.
@georgesadler7830 3 роки тому ⁺¹³
Professor Strang thank you for a straight forward lecture on Gradient Descent: Downhill to a Minimum and its relationship with convex function. The examples are important for deep understanding of this topic in numerical linear algebra.
@tungohoang9201 3 роки тому ⁺⁸
Very clear and natural to follow the lesson. Thank Professor Strang so much. Btw, his books are also very wonderful.
@Musabbir_Sakib 4 роки тому ⁺¹⁴
Very natural way of teaching. Thank you sir
@TheRsmits 3 роки тому ⁺¹⁴
If in calc 1 they introduced the term argmin for the place where the minimum occurred there would be less confusion as students often mistake argmin for the actual min.
@paradisal2014 3 роки тому ⁺¹
Thanks for this lol
@trevandrea8909 6 місяців тому
Thank you so much
@martinspage 5 років тому ⁺¹⁷
his picture with grad(f) pointing up is a bit misleading around 9:00 I think. grad(f) is a vector in the x-y plane, pointing in the direction you should move in the x-y plane to maximize increase in f.
@quocanhhbui8271 5 років тому ⁺¹
True, this has been bothering me since last year when I started cal 3/
@RC98.19 4 роки тому ⁺³
I think both of you and Prof. Strang are right. Actually, what Prof. Strang plotted on the board is level graph (just like the previous comment mentioned about). While we have a function f(x, y) = ax + by, we can plot the level graph by setting the f(x, y) = C (some constant). If we increase the constant level by level, we could observe that we're actually shifting the level graph in the direction of grad(f). That direction is perpendicular to the level graph.
In my point of view, Prof. Strang did want to show that the gradient is perpendicular to the level graph. However, he didn't notice that the arrow he drew is pointing upward. This is probably the point that confused you.
@davidbenz2280 5 місяців тому ⁺²
First of all, 2x + 5y = 0 is not a plane, as Professor Strang says. Rather, it is a level "curve" of the plane described by f(x,y) = 2x + 5y (with f(x,y) set to 0). The level "curves" of a plane in 3D are parallel lines on the xy plane. Then, Professor Strang really makes an error when he says that the gradient is somehow perpendicular to the plane. No, the gradient is perpendicular to the level "curves" of the plane, or the parallel lines in the xy plane. And, all movement to any new z value is in the plane. I also think the way he drew the plane was very confusing, as he didn't even try to approximate its actual orientation in 3D.
@satyamwarghat9987 5 років тому ⁺⁶
Wow the video quality is awesome
and lecture of Professor Gilbert Strang is the best
@mkelly66 3 роки тому ⁺⁶
Your lectures are a pleasure to watch (and learn from)!
@ashwinmanickam 4 роки тому ⁺⁵
34:36 gradient descent
@naterojas9272 4 роки тому ⁺³
Omg look at how clean those top boards are 🤩
@kirinkirin9593 5 років тому ⁺⁴
what a beautiful functions. that's why i love linear algebra.
@에헤헿-l7v Рік тому ⁺¹
why in 42:25 insn't the gradient [x,by] since there is 1/2 multiplied at f?
@samuelyeo5450 4 роки тому ⁺⁵
How did he get all the equations of xk, yk and fk at 45:35? Specifically, how did he get (b-1)/(b+1) and vice versa? I shifted the equation to make xk+1 and yk+1 the subjects of the equations but instead I got xk+1 = xk (1-2sk), where xk = x0 = b.
@theos- 4 роки тому
Same question here. Post the answer if you found it please.
@zacharylee9030 4 роки тому ⁺¹
I think he have already use the optimized step size (sk) for the iterration. He didn't tell us what is the sk looks like, while he just show us the finally equation to explain the idea of good or bad convergance.
@ky8920 3 роки тому ⁺²
at the limit [x,y]=[x,y]-[sx,bsy] and to minimize f=0.5x^2+0.5y^2, we must have |x|=|y|. So |x-sx|=|y-bsy| s=2/(1+b) and [x,y]=[x,y]-[sx,bsy]=[(1-s)x,(1-bs)y]. we got x=(b-1)/(b+1)*x_old, y=(1-b)/(b+1)*y_old etc...
@RC98.19 4 роки тому ⁺²
Around 40:27. Does anybody know how to derive the result of reduction rate of m/M (the condition number)? Any tip or reference?
@Hotheaddragon 4 роки тому ⁺¹
By condition number I guess he meant (what I got)
lambda(max) / lambda(min)
max eigen value / min e value
which was 1/b for that example
@brainstormingsharing1309 3 роки тому ⁺²
Absolutely well done and definitely keep it up!!! 👍👍👍👍👍👍
@HieuLe-un7ll 2 роки тому
I think the grad(f) at 16:00 should be 0.5(S+S tranpose)x-a , right? anyway, thank you for the amazing lecture!
@nadeemqaiser Рік тому
Thanks, Teacher !
@TheNeutralGuy0 3 місяці тому
For those who have not taken the previous 22 lecture , This lecture wont much help them.
@finweman 5 років тому ⁺¹
I am hoping for a discussion about conventions of derivatives. Much of the stuff I've seen would make the gradient a row vector, which leads to the derivatives being the transpose of what he shows. In his example, the derivative a'x is 'a' which is contrary to intuition from single variable calculus though he uses intuition for x'Sx.
@tommy-lee-johnes 3 роки тому
To get the intuition you should try make the multiplication of a'x, arrive at a new matrix, and then calculate the derivative for x. Will be a
@Anskurshaikh 2 роки тому
same. I feel alot of people are using different notations for these vector/matrix derivatives. Nobody takes the time to elaborate the details :(
@nabeelali6721 5 років тому ⁺¹
Wonderful teaching
@gopalkulkarni402 3 роки тому ⁺⁴
Isnt grad(f) supposed to be [x by] instead of [2x 2by]?
@Andrew_J123 3 роки тому ⁺³
Yes I had the same objection. I think he glossed over the 1/2 present in the function. It's a multiple of the same vector so in the grand scheme of things I don't think it matters too much but with that being said having [x by] would have eased my mind
@shenzheng2116 4 роки тому
In 26:51, the professor writes Gradient(f) = entries of X^-1. Do anyone know how to get that equation? Thanks!
@samuelyeo5450 4 роки тому ⁺¹
if f(X) = -ln(det(X)), gradient(f) = (derivatives of det(X))/det(X) in matrix form, which is the same as a matrix of the entries of X^-1 for each entry. I'm also not too certain myself, but this does make sense to me.
@yuchaoli6385 4 роки тому
en.wikipedia.org/wiki/Adjugate_matrix this gives the answer
@ronsreacts 4 місяці тому
💯👍
@jerrywilsonwilliams2431 4 роки тому
❤️❤️❤️❤️❤️
@pnachtwey Рік тому
He is too long winded. Why not use a simple function of x,y. Find the derivatives and start dong a few iteration. Finally he gets to gradient descent. Gradient descent works but the are better algorithms. The line search idea is a good start. WTF is wrong with this guy? A simple python program or even excel would be much more meaningful. Thumbs down.
@SuperDeadparrot Рік тому
What the hell is a Hessian?
@John-wx3zn 9 місяців тому
This sounds like a bunch of non sense.
@beloaded3736 Рік тому
Wonderful fella professor ☺️

Наступне

Автоматичне відтворення

23. Accelerating Gradient Descent (Use Momentum)

23. Accelerating Gradient Descent (Use Momentum)

25. Stochastic Gradient Descent

25. Stochastic Gradient Descent

Solve any equation using gradient descent

Solve any equation using gradient descent

СПОРИМ ТЫ НЕ ЗНАЕШЬ ТРИ СЛОВА НА БУКВУ О? #shortsvideo #юмор #катяклон #comedy #прикол #мамадочка

СПОРИМ ТЫ НЕ ЗНАЕШЬ ТРИ СЛОВА НА БУКВУ О? #shortsvideo #юмор #катяклон #comedy #прикол #мамадочка

МІША ЛЕБІГА і АНДРІЙ ЛУЗАН в СРАЧІ #32

МІША ЛЕБІГА і АНДРІЙ ЛУЗАН в СРАЧІ #32

😯 Подарила сыну БМВ, но не ожидала такой реакции на машину! | Новостничок

😯 Подарила сыну БМВ, но не ожидала такой реакции на машину! | Новостничок

Gradient Descent, Step-by-Step

Gradient Descent, Step-by-Step

6. Singular Value Decomposition (SVD)

6. Singular Value Decomposition (SVD)

Lecture 21: Minimizing a Function Step by Step

Lecture 21: Minimizing a Function Step by Step

Gradient descent, how neural networks learn | DL2

Gradient descent, how neural networks learn | DL2

31. Eigenvectors of Circulant Matrices: Fourier Matrix

31. Eigenvectors of Circulant Matrices: Fourier Matrix

Why Does Diffusion Work Better than Auto-Regression?

Why Does Diffusion Work Better than Auto-Regression?

Applied Optimization - Steepest Descent

Applied Optimization - Steepest Descent

Intro to Gradient Descent || Optimizing High-Dimensional Equations

Intro to Gradient Descent || Optimizing High-Dimensional Equations

2024's Biggest Breakthroughs in Math

2024's Biggest Breakthroughs in Math

ШАЛОСТЬ (смешное видео, приколы, юмор, поржать)

ШАЛОСТЬ (смешное видео, приколы, юмор, поржать)

до конца, там самая счастливая табалапка🐾🐾 #тикток #табалапка

до конца, там самая счастливая табалапка🐾🐾 #тикток #табалапка

Рождение Немецкой Легенды - Mercedes 190E 2.3-16

Рождение Немецкой Легенды - Mercedes 190E 2.3-16

Дал Свою Безлимитную Карту Друзьям, Потратили Миллионы... (Хазяева, Кокошка, Дилблин, Сатир)

Дал Свою Безлимитную Карту Друзьям, Потратили Миллионы... (Хазяева, Кокошка, Дилблин, Сатир)

КТО НЕ ДВИНЕТСЯ, ПОЛУЧИТ МАШИНУ!

КТО НЕ ДВИНЕТСЯ, ПОЛУЧИТ МАШИНУ!

ПРОВЕРКА НА ВШИВОСТЬ (смешное видео, юмор, поржать, приколы)

ПРОВЕРКА НА ВШИВОСТЬ (смешное видео, юмор, поржать, приколы)

🔥"СВОшник" РОЗНОСИТЬ шоу путіністів! Ведучий ШОКОВАНИЙ від цих СЛІВ #shorts

🔥"СВОшник" РОЗНОСИТЬ шоу путіністів! Ведучий ШОКОВАНИЙ від цих СЛІВ #shorts