Noether Networks: Meta-Learning Useful Conserved Quantities (w/ the authors)

Dynamic Inference with Neural Interpreters (w/ author interview)

Scaling LLM Test-Time Compute Optimally can be More Effective than Scaling Model Parameters (Paper)

Cute Baby Ties Up Dad And Wants To Play With His Phone #funny #fatherhoodlove#cute#fatherhoodmoments

Они Скупали ВСЁ Серебро Мира и вот ЧТО Было Дальше! #shorts

😯 Подарила сыну БМВ, но не ожидала такой реакции на машину! | Новостничок

Parameter Prediction for Unseen Deep Architectures (w/ First Author Boris Knyazev)

Yannic Kilcher

Переглядів 15 570

Додати в
- Мій плейлист
- Переглянути пізніше
Поділитися

Поділитися

Вставка

Розмір відео:

Показувати елементи керування програвачем

Автоматичне відтворення

Автоповтор

Опубліковано 2 січ 2025

КОМЕНТАРІ • 34

@YannicKilcher 3 роки тому ⁺¹⁶
Deep Neural Networks are usually trained from a given parameter initialization using SGD until convergence at a local optimum. This paper goes a different route: Given a novel network architecture for a known dataset, can we predict the final network parameters without ever training them? The authors build a Graph-Hypernetwork and train on a novel dataset of various DNN-architectures to predict high-performing weights. The results show that not only can the GHN predict weights with non-trivial performance, but it can also generalize beyond the distribution of training architectures to predict weights for networks that are much larger, deeper, or wider than ever seen in training.
OUTLINE:
0:00 - Intro & Overview
6:20 - DeepNets-1M Dataset
13:25 - How to train the Hypernetwork
17:30 - Recap on Graph Neural Networks
23:40 - Message Passing mirrors forward and backward propagation
25:20 - How to deal with different output shapes
28:45 - Differentiable Normalization
30:20 - Virtual Residual Edges
34:40 - Meta-Batching
37:00 - Experimental Results
42:00 - Fine-Tuning experiments
45:25 - Public reception of the paper
ERRATA:
- Boris' name is obviously Boris, not Bori
- At 36:05, Boris mentions that they train the first variant, yet on closer examination, we decided it's more like the second
@AICoffeeBreak 3 роки тому ⁺³⁰
This is so cool! Courageous first author. 💪 It's great how you deliver this platform for the authors to defend their papers. Excited to see some next episodes like this!
@ravikiranramachandra1000 3 роки тому ⁺⁴
Wow... Great explanation... The authors of the papers should come forward like this to give their thinking behind architectures... Excellent video.
@swayson5208 3 роки тому ⁺¹⁶
oh cool, having the author -- Leveled up!
@mgostIH 3 роки тому ⁺³³
Loved this! Hope there will be more videos with authors helping explain the paper or possibly discussions with them on your Discord server too!
@TheBambooooooooo 3 роки тому ⁺¹
Oh yes this is bloody brilliant
@priyamdey3298 3 роки тому ⁺³⁸
I think I prefer the regular way of Yannic explaining the paper himself. Feels more natural. This one had a nervy feeling to it by having the author himself. Perhaps would be better if Yannic explains himself first separately and follow-up later with the authors for their thoughts / perspectives.
@mgostIH 3 роки тому ⁺⁹
It might just be because it's his first video doing this, he was much quieter a couple years ago too on his regular videos.
@laurenpinschannels 3 роки тому ⁺¹¹
I want both types of interaction, but would prefer to have a longer period of explanation before each author "answer"/additional-exposition. the really nice thing about yannic videos is he starts at the beginning, normally. letting yannic's explanation-ordering skill guide the discussion a bit more would make this a bit more digestible.
@dennisestenson7820 3 роки тому ⁺¹
I agree, especially with your second point. It would be nice if the paper were explained in one segment, and then a follow-up dialog with the author could tie up any loose ends.
@danidini4444 3 роки тому ⁺¹
Love this new interview format!
@florianjug 3 роки тому ⁺⁹
I wonder if a paper overview monologue (classic style) followed by an interview style discussion/interview with an author after that would not be an even more engaging format. Just an idea, but I think I’d love this and maybe you like this idea too? :)
@wadyn95 3 роки тому ⁺¹⁴
Боря - красавчик. Очень оригинальная идея!
@andres_pq 3 роки тому
Yeah me too
@JTMoustache 3 роки тому ⁺⁶
Boom ! Great format. I wish you asked a bit more about f(x,a,H(a,/theta)). How they made it differentiable.
@AndreiMargeloiu 3 роки тому ⁺²
Great idea! It's much more interactive if the author presents their paper.
@realinformhunter 3 роки тому
Nice one, really enjoyed the conversation and explanations
@blanamaxima 3 роки тому
amazing idea Yannic, I like this a lot!
@manuelplank5406 3 роки тому
Pretty cool Format!
@billykotsos4642 3 роки тому
Hopefully a new series !!!
@NeoShameMan 3 роки тому ⁺⁷
It's harder to follow in this format, interruption and dead air doesn't allow to focus on key points.
@maxim_ml 2 роки тому
I wonder what accuracy this NN is able to get when overfitting for one batch of architectures
@roomo7time 3 роки тому
this is gold
@zrebbesh 3 роки тому ⁺²
This is groundbreaking. The implication is that if we have a fully developed knowledge of a task, we can then experiment with different network architectures relatively easily, without having to train each one in turn. We can tear down the ones that are a major investment in resources and replace them with the most promising of a bunch of different more efficient architectures, then train to refine.
@gren287 3 роки тому ⁺¹
Cool!
@MasamuneX 3 роки тому
how to train a NN to predict NN's
@MadsterV 3 роки тому
Does it work on itself?
@michaelwangCH 3 роки тому
The statistician did this for past 50 years, only the difference is using a fully connected graph to initialize the process.
It has no potential to generalize, no potential to move into the direction of AGI - instead fitting the curve, this model is fitting the graph, continues spaces vs. discrete spaces. What is the innovation here?
@gaborfuisz9516 3 роки тому ⁺³
How are you going to roast the authors if you are on a call with them? :D
@NoNameAtAll2 3 роки тому ⁺⁴
isn't it bor-I-s? not b-O-ris
@andres_pq 3 роки тому
Didn't know you were left-handed Yannic
@MrJorgeceja123 3 роки тому
Deep Neural Networks Upside Down

Наступне

Автоматичне відтворення

Noether Networks: Meta-Learning Useful Conserved Quantities (w/ the authors)

Noether Networks: Meta-Learning Useful Conserved Quantities (w/ the authors)

Dynamic Inference with Neural Interpreters (w/ author interview)

Dynamic Inference with Neural Interpreters (w/ author interview)

Scaling LLM Test-Time Compute Optimally can be More Effective than Scaling Model Parameters (Paper)

Scaling LLM Test-Time Compute Optimally can be More Effective than Scaling Model Parameters (Paper)

Cute Baby Ties Up Dad And Wants To Play With His Phone #funny #fatherhoodlove#cute#fatherhoodmoments

Cute Baby Ties Up Dad And Wants To Play With His Phone #funny #fatherhoodlove#cute#fatherhoodmoments

Они Скупали ВСЁ Серебро Мира и вот ЧТО Было Дальше! #shorts

Они Скупали ВСЁ Серебро Мира и вот ЧТО Было Дальше! #shorts

😯 Подарила сыну БМВ, но не ожидала такой реакции на машину! | Новостничок

😯 Подарила сыну БМВ, но не ожидала такой реакции на машину! | Новостничок

МІША ЛЕБІГА і АНДРІЙ ЛУЗАН в СРАЧІ #32

МІША ЛЕБІГА і АНДРІЙ ЛУЗАН в СРАЧІ #32

JEPA - A Path Towards Autonomous Machine Intelligence (Paper Explained)

JEPA - A Path Towards Autonomous Machine Intelligence (Paper Explained)

Predicting the rules behind - Deep Symbolic Regression for Recurrent Sequences (w/ author interview)

Predicting the rules behind - Deep Symbolic Regression for Recurrent Sequences (w/ author interview)

Hardy's Integral

Hardy's Integral

HyperTransformer: Model Generation for Supervised and Semi-Supervised Few-Shot Learning (w/ Author)

HyperTransformer: Model Generation for Supervised and Semi-Supervised Few-Shot Learning (w/ Author)

A Very Nice Math Olympiad Problem || Solve for all values of x=??💯✍️🖋️

A Very Nice Math Olympiad Problem || Solve for all values of x=??💯✍️🖋️

Grokking: Generalization beyond Overfitting on small algorithmic datasets (Paper Explained)

Grokking: Generalization beyond Overfitting on small algorithmic datasets (Paper Explained)

7 Free Software That Are Actually Great! (2025 Update)

7 Free Software That Are Actually Great! (2025 Update)

DINO: Emerging Properties in Self-Supervised Vision Transformers (Facebook AI Research Explained)

DINO: Emerging Properties in Self-Supervised Vision Transformers (Facebook AI Research Explained)

Attention Is All You Need

Attention Is All You Need

Прочистка шлюзов

Прочистка шлюзов

Женская супер-сила 😂 #ComedyClub #КамедиКлаб #харламов #тнт4 #тнт #демискарибидис #богатство #кравец

Женская супер-сила 😂 #ComedyClub #КамедиКлаб #харламов #тнт4 #тнт #демискарибидис #богатство #кравец

TOY STORY IN BRAWL STARS!?

TOY STORY IN BRAWL STARS!?

СКАНДАЛЬНЫЙ бой Али, когда в ринге ему противостояли сразу ДВОЕ #shorts

СКАНДАЛЬНЫЙ бой Али, когда в ринге ему противостояли сразу ДВОЕ #shorts

How Strong Is Tape?

How Strong Is Tape?

The Witcher IV - Cinematic Reveal Trailer | The Game Awards 2024

The Witcher IV — Cinematic Reveal Trailer | The Game Awards 2024

«Шнурки не зрізайте, акуратненько»: медик про реакцію військових на поранення #shorts

«Шнурки не зрізайте, акуратненько»: медик про реакцію військових на поранення #shorts

😳Трамп ПОТІШИВ Скабєєву, але одразу РОЗЧАРУВАВ #shorts

😳Трамп ПОТІШИВ Скабєєву, але одразу РОЗЧАРУВАВ #shorts