Deep Neural Networks are usually trained from a given parameter initialization using SGD until convergence at a local optimum. This paper goes a different route: Given a novel network architecture for a known dataset, can we predict the final network parameters without ever training them? The authors build a Graph-Hypernetwork and train on a novel dataset of various DNN-architectures to predict high-performing weights. The results show that not only can the GHN predict weights with non-trivial performance, but it can also generalize beyond the distribution of training architectures to predict weights for networks that are much larger, deeper, or wider than ever seen in training. OUTLINE: 0:00 - Intro & Overview 6:20 - DeepNets-1M Dataset 13:25 - How to train the Hypernetwork 17:30 - Recap on Graph Neural Networks 23:40 - Message Passing mirrors forward and backward propagation 25:20 - How to deal with different output shapes 28:45 - Differentiable Normalization 30:20 - Virtual Residual Edges 34:40 - Meta-Batching 37:00 - Experimental Results 42:00 - Fine-Tuning experiments 45:25 - Public reception of the paper ERRATA: - Boris' name is obviously Boris, not Bori - At 36:05, Boris mentions that they train the first variant, yet on closer examination, we decided it's more like the second
This is so cool! Courageous first author. 💪 It's great how you deliver this platform for the authors to defend their papers. Excited to see some next episodes like this!
I think I prefer the regular way of Yannic explaining the paper himself. Feels more natural. This one had a nervy feeling to it by having the author himself. Perhaps would be better if Yannic explains himself first separately and follow-up later with the authors for their thoughts / perspectives.
I want both types of interaction, but would prefer to have a longer period of explanation before each author "answer"/additional-exposition. the really nice thing about yannic videos is he starts at the beginning, normally. letting yannic's explanation-ordering skill guide the discussion a bit more would make this a bit more digestible.
I agree, especially with your second point. It would be nice if the paper were explained in one segment, and then a follow-up dialog with the author could tie up any loose ends.
I wonder if a paper overview monologue (classic style) followed by an interview style discussion/interview with an author after that would not be an even more engaging format. Just an idea, but I think I’d love this and maybe you like this idea too? :)
This is groundbreaking. The implication is that if we have a fully developed knowledge of a task, we can then experiment with different network architectures relatively easily, without having to train each one in turn. We can tear down the ones that are a major investment in resources and replace them with the most promising of a bunch of different more efficient architectures, then train to refine.
The statistician did this for past 50 years, only the difference is using a fully connected graph to initialize the process. It has no potential to generalize, no potential to move into the direction of AGI - instead fitting the curve, this model is fitting the graph, continues spaces vs. discrete spaces. What is the innovation here?
Deep Neural Networks are usually trained from a given parameter initialization using SGD until convergence at a local optimum. This paper goes a different route: Given a novel network architecture for a known dataset, can we predict the final network parameters without ever training them? The authors build a Graph-Hypernetwork and train on a novel dataset of various DNN-architectures to predict high-performing weights. The results show that not only can the GHN predict weights with non-trivial performance, but it can also generalize beyond the distribution of training architectures to predict weights for networks that are much larger, deeper, or wider than ever seen in training.
OUTLINE:
0:00 - Intro & Overview
6:20 - DeepNets-1M Dataset
13:25 - How to train the Hypernetwork
17:30 - Recap on Graph Neural Networks
23:40 - Message Passing mirrors forward and backward propagation
25:20 - How to deal with different output shapes
28:45 - Differentiable Normalization
30:20 - Virtual Residual Edges
34:40 - Meta-Batching
37:00 - Experimental Results
42:00 - Fine-Tuning experiments
45:25 - Public reception of the paper
ERRATA:
- Boris' name is obviously Boris, not Bori
- At 36:05, Boris mentions that they train the first variant, yet on closer examination, we decided it's more like the second
This is so cool! Courageous first author. 💪 It's great how you deliver this platform for the authors to defend their papers. Excited to see some next episodes like this!
Wow... Great explanation... The authors of the papers should come forward like this to give their thinking behind architectures... Excellent video.
oh cool, having the author -- Leveled up!
Loved this! Hope there will be more videos with authors helping explain the paper or possibly discussions with them on your Discord server too!
Oh yes this is bloody brilliant
I think I prefer the regular way of Yannic explaining the paper himself. Feels more natural. This one had a nervy feeling to it by having the author himself. Perhaps would be better if Yannic explains himself first separately and follow-up later with the authors for their thoughts / perspectives.
It might just be because it's his first video doing this, he was much quieter a couple years ago too on his regular videos.
I want both types of interaction, but would prefer to have a longer period of explanation before each author "answer"/additional-exposition. the really nice thing about yannic videos is he starts at the beginning, normally. letting yannic's explanation-ordering skill guide the discussion a bit more would make this a bit more digestible.
I agree, especially with your second point. It would be nice if the paper were explained in one segment, and then a follow-up dialog with the author could tie up any loose ends.
Love this new interview format!
I wonder if a paper overview monologue (classic style) followed by an interview style discussion/interview with an author after that would not be an even more engaging format. Just an idea, but I think I’d love this and maybe you like this idea too? :)
Боря - красавчик. Очень оригинальная идея!
Yeah me too
Boom ! Great format. I wish you asked a bit more about f(x,a,H(a,/theta)). How they made it differentiable.
Great idea! It's much more interactive if the author presents their paper.
Nice one, really enjoyed the conversation and explanations
amazing idea Yannic, I like this a lot!
Pretty cool Format!
Hopefully a new series !!!
It's harder to follow in this format, interruption and dead air doesn't allow to focus on key points.
I wonder what accuracy this NN is able to get when overfitting for one batch of architectures
this is gold
This is groundbreaking. The implication is that if we have a fully developed knowledge of a task, we can then experiment with different network architectures relatively easily, without having to train each one in turn. We can tear down the ones that are a major investment in resources and replace them with the most promising of a bunch of different more efficient architectures, then train to refine.
Cool!
how to train a NN to predict NN's
Does it work on itself?
The statistician did this for past 50 years, only the difference is using a fully connected graph to initialize the process.
It has no potential to generalize, no potential to move into the direction of AGI - instead fitting the curve, this model is fitting the graph, continues spaces vs. discrete spaces. What is the innovation here?
How are you going to roast the authors if you are on a call with them? :D
isn't it bor-I-s? not b-O-ris
Didn't know you were left-handed Yannic
Deep Neural Networks Upside Down