Tom Goldstein: "An empirical look at generalization in neural nets"

Watching Neural Networks Learn

Stanley Osher: "New Techniques in Optimization and Their Applications to Deep Learning..."

«Шнурки не зрізайте, акуратненько»: медик про реакцію військових на поранення #shorts

ФИЛЬМ! НЕВИНОВНЫЙ ГОТОВИТ ДЕРЗКИЙ ПОБЕГ С НЕПРИСТУПНОГО ОСТРОВА-ТЮРЬМЫ! Мотылёк! Русский фильм

ПРАНК НАД БОЯРСКИМ | КОНФЛИКТ НА ДОРОГЕ

Tom Goldstein: "What do neural loss surfaces look like?"

Institute for Pure & Applied Mathematics (IPAM)

Переглядів 19 305

Додати в
- Мій плейлист
- Переглянути пізніше
Поділитися

Поділитися

Вставка

Розмір відео:

Показувати елементи керування програвачем

Автоматичне відтворення

Автоповтор

Опубліковано 12 січ 2025

КОМЕНТАРІ • 15

@hxhuang9306 6 років тому ⁺¹⁵
As a noob I just want to see what loss functions in more complex networks look like. Was not dissappointed.
@vtrandal 6 місяців тому
@18:00 the speaker, Tom Goldstein is answering a questioin: Is this the whole error surface? His answer contains good news and bad news. It's good news insofar as it's pretty far relative to the weights involved, but it adding skip connections does not convert it to a convex optimization problem. At least that's what I get from the question and the answer.
@AoibhinnMcCarthy 3 роки тому ⁺²
Great lecture! Very clear of explaining the influence of loss function from networks
@dshahrokhian 5 років тому ⁺³
Great Video Summary of all the work in the Maryland lab!
@강동흔-r5o 2 роки тому
Thank you professor!! I love this video.
38:45 why do we find saddle point? How do we apply saddle point for research?
@aaAa-vq1bd Рік тому ⁺¹
Saddle points identify the points where directions are both upwards and downwards. But why are they useful? Good question. I looked it up:
“one of the reasons neural network research was abandoned (once again) in the late 90s was *because the optimization problem is non-convex*. The realization from the work in the 80s and 90s that neural networks have an exponential number of local minima, along with the breakout success of kernel machines, also led to this downfall, as did the fact that networks may get stuck on poor solutions. Recently we have evidence that the issue of non-convexity may be a non-issue, which changes its relationship vis-a-vis neural networks.”
What does this mean? Well, say we want to average the values in some neighborhood which is in n-dimensional space. But we can’t just compute the Gaussian kernel because it becomes (potentially exponentially) worse as we go up dimensions. So we need to unfold the manifold to a 2d Euclidean space (a flat coordinate system). What’s the issue? Local minima (areas which look like minima in a restricted region of a function) can get our averaging machine stuck as it applies a stochastic gradient descent algorithm. And there are exponentially many local minima in our neural network, in general, so we are worried that there’s no guarantee of optimization with neural networks at all. Well shit. The thing is though, that the critical points of high-dimensional surfaces for almost all of the trajectory are saddle points, not local minima. Saddle points pose no problem to stochastic gradient descent. And if there is any randomness in our data it’s exponentially likely that all the local minima are close to the global minima. Therefore local minima are not a problem.
Basically, saddle points are the highly prevalent critical points in parameter space that don’t pose a problem for the algorithms and architecture we want to use. Local minima do pose a problem but we’ve found that in high dimensions they are only in certain places (near global minima). So you can’t use saddle points in your data for anything special, it’s just that a lot of algorithms (like Newton, gradient descent and quasi-Newton) think saddle points are local minima and thus get stuck much more often than they should. (A side note- there’s something called “saddle-free Newton” which was written about in 2014 but it’s been seen that SGD works just as well without needing to compute a Hessian for a lot of parameters.) Hope that helps a bit.
@nguyendinhchung9677 3 роки тому ⁺¹
Very good and funny videos bring a great sense of entertainment!
@joshuafox1757 6 років тому ⁺⁴
How much computational power does it cost to evaluate the loss landscape using this method, compared to a more naive method?
@dimitermilushev575 4 роки тому ⁺²
Thanks, this is a great video. Do you see any issues/fudamental differences in applying these techniques to sequence models? Is there any research doing so?
@ProfessionalTycoons 5 років тому ⁺⁴
Amazing video
@DonghuiSun 6 років тому ⁺³
Interesting research. Does the code have been shared?
@onetonfoot 5 років тому ⁺³
github.com/tomgoldstein/loss-landscape
@XahhaTheCrimson 4 роки тому ⁺¹
This helps me a lot
@박이삭-m8o 6 років тому ⁺¹
can i get pdf file? thks

Наступне

Автоматичне відтворення

Tom Goldstein: "An empirical look at generalization in neural nets"

Tom Goldstein: "An empirical look at generalization in neural nets"

Watching Neural Networks Learn

Watching Neural Networks Learn

Stanley Osher: "New Techniques in Optimization and Their Applications to Deep Learning..."

Stanley Osher: "New Techniques in Optimization and Their Applications to Deep Learning..."

«Шнурки не зрізайте, акуратненько»: медик про реакцію військових на поранення #shorts

«Шнурки не зрізайте, акуратненько»: медик про реакцію військових на поранення #shorts

ФИЛЬМ! НЕВИНОВНЫЙ ГОТОВИТ ДЕРЗКИЙ ПОБЕГ С НЕПРИСТУПНОГО ОСТРОВА-ТЮРЬМЫ! Мотылёк! Русский фильм

ФИЛЬМ! НЕВИНОВНЫЙ ГОТОВИТ ДЕРЗКИЙ ПОБЕГ С НЕПРИСТУПНОГО ОСТРОВА-ТЮРЬМЫ! Мотылёк! Русский фильм

ПРАНК НАД БОЯРСКИМ | КОНФЛИКТ НА ДОРОГЕ

ПРАНК НАД БОЯРСКИМ | КОНФЛИКТ НА ДОРОГЕ

Кирилл Набутов. Арестович в Кремле, кто взорвал командующего в Москве, война России с НАТО

Кирилл Набутов. Арестович в Кремле, кто взорвал командующего в Москве, война России с НАТО

MIT Introduction to Deep Learning | 6.S191

MIT Introduction to Deep Learning | 6.S191

Xavier Bresson: "Convolutional Neural Networks on Graphs"

Xavier Bresson: "Convolutional Neural Networks on Graphs"

Why Does Diffusion Work Better than Auto-Regression?

Why Does Diffusion Work Better than Auto-Regression?

Miles Cranmer - The Next Great Scientific Theory is Hiding Inside a Neural Network (April 3, 2024)

Miles Cranmer - The Next Great Scientific Theory is Hiding Inside a Neural Network (April 3, 2024)

The Most Important Algorithm in Machine Learning

The Most Important Algorithm in Machine Learning

Deep Ensembles: A Loss Landscape Perspective (Paper Explained)

Deep Ensembles: A Loss Landscape Perspective (Paper Explained)

All Machine Learning algorithms explained in 17 min

All Machine Learning algorithms explained in 17 min

CS480/680 Lecture 19: Attention and Transformer Networks

CS480/680 Lecture 19: Attention and Transformer Networks

Geometric Intuition for Training Neural Networks

Geometric Intuition for Training Neural Networks

How to treat Acne💉

How to treat Acne💉

🤔Можно ли спастись от Ядерки в Холодильнике ? #shorts

🤔Можно ли спастись от Ядерки в Холодильнике ? #shorts

When you lose control of your Waboba Moon Ball. @TheWabobaTeam #wabobapartner

When you lose control of your Waboba Moon Ball. @TheWabobaTeam #wabobapartner

人是不能做到吗？#火影忍者 #家人 #佐助

人是不能做到吗？#火影忍者 #家人 #佐助

The Witcher IV - Cinematic Reveal Trailer | The Game Awards 2024

The Witcher IV — Cinematic Reveal Trailer | The Game Awards 2024

Перший наступ КНДРівців

Перший наступ КНДРівців

«Я жити не хочу»: винесли «з нуля» пораненого побратима #shorts

«Я жити не хочу»: винесли «з нуля» пораненого побратима #shorts