OUTLINE: 0:00 - Intro & Overview 3:05 - Weight-generation vs Fine-tuning for few-shot learning 10:10 - HyperTransformer model architecture overview 22:30 - Why the self-attention mechanism is useful here 34:45 - Start of Interview 39:45 - Can neural networks even produce weights of other networks? 47:00 - How complex does the computational graph get? 49:45 - Why are transformers particularly good here? 58:30 - What can the attention maps tell us about the algorithm? 1:07:00 - How could we produce larger weights? 1:09:30 - Diving into experimental results 1:14:30 - What questions remain open? Paper: arxiv.org/abs/2201.04182 ERRATA: I introduce Max Vladymyrov as Mark Vladymyrov
Describing it as a buffet is exactly right for this amount of content. This makes it great for everyone: those looking for a summary, an in-depth dive, or looking to implement/adapt it for themselves.
Love the longer first half that’s more like your earlier work. IMO the interview should be a short Q&A that lets the authors respond about parts you were unsure about or criticized. I much prefer when the paper review is more in depth (ideally even longer than in this video)
I am a big fan of your long introduction version. In my opinion, the way you are illustrating your thought is way more insightful than at least half of the videos which authors were included. In many papers, authors could act as supplementary information for the main concepts.
Hi Yannic, I've been following your channel since the very beginning and I always enjoyed your style. Since you're asking for comments about this new style format on interviewing papers' authors, I'd like to share my 2-cent impressions. I'd rather much preferred your former style of unbiased reviews by your own which were really professional and right to the technical points. These interviews on the other hand are more "deferential" and less unbiased. I found your previous style much more insightful and useful. Thank you anyway for your work, your channel is my preferred one to keep updated on the subject, I'm a senior MLE in a big telco company in Italy. Thanks!
I gotta be honest, your explanations are the best for me because you’re very good at explaining things whereas these researchers are a little more specialized in research. I do like that you interview them though. I’d always ask a question like “how did you come up with this idea” or “what was the inspiration for this idea?” Love your content! Keep experimenting.
2:00 Why not both. If you're into recycling content, we could have 3 videos: The paper review, the interview with the authors and then the paper review interleaved with comments from the authors. Everyone is happy and you got more content for the same price (minus editting, tho if you script the interleaved video before the interview you already know where the commentary will be) EDIT: Oh, this video is kinda like this already.
Livestream interview with chat Q&A from the viewers at the end (last 15 minutes or so) would be great. Nick Zentner has been doing geology interviews long form for the last couple months and it has been superlative for discovering new questions and ideas.
Regarding the comment at 8:34: in one of my projects I'm using a neural network for a regression type problem and I found I got much smoother interpolation by switching most of the hidden layers to use asinh as the activation function. I have no idea how general that is or whether smoothness is even a desirable feature when you're trying to output weights for another neural network.
if we want to input a real number x into a nn, it is a lot better to represent it as a vector of sines sin(Ni*x) with various N (random fourier features, positional encodings etc) maybe if we want nn to output a number precisely we could make it output vector of sines and then guesstimate what number is encoded in that vector? or output it as a weighted sum of entries of a vector (like harsh and fine tuning knobs on old devices, but with a lot more knobs) with weights from geometric progression, like (0.8)^i x=1000*summ Xi*(0.8)^i
I really like the format, but i feel that the length of the videos is a bit intimidating, at least for me. I understand that it is hard to condense such in depth scientific discussion, but i think least videos at least under an hour would be more attractive for a lot of people
Maybe, maybe it might be more useful if the interviewees get to watch your explanation before the interview. Then they know what you've covered, or what they think you've made a incorrect impression of
I prefer two videos. One with the interview and one with the explanation... But I also feel you are less critical when you do the interview also. I think it might be good for you to criticize and then the author can get a chance to rebut the criticism.
Hey Yannic, I much rather have two videos, one video of you formally taking your time to go over and explain the paper and another that is the interview with the author (if you feel the paper is good enough to where it warrants it) honestly I usually just watch your interpertation to get up to speed on what the papers are for, but I tend to not want to listen to the conversations with authors just because that 'flow of information' feels different to my brain and isn't what I want when watching these videos. I do like having the option though, which is why I feel two videos are better. Then you can cross-link between the videos to drive youtube engagement even more.
OUTLINE:
0:00 - Intro & Overview
3:05 - Weight-generation vs Fine-tuning for few-shot learning
10:10 - HyperTransformer model architecture overview
22:30 - Why the self-attention mechanism is useful here
34:45 - Start of Interview
39:45 - Can neural networks even produce weights of other networks?
47:00 - How complex does the computational graph get?
49:45 - Why are transformers particularly good here?
58:30 - What can the attention maps tell us about the algorithm?
1:07:00 - How could we produce larger weights?
1:09:30 - Diving into experimental results
1:14:30 - What questions remain open?
Paper: arxiv.org/abs/2201.04182
ERRATA: I introduce Max Vladymyrov as Mark Vladymyrov
Long introduction was great, it is good to be able to understand with drawings what is actually happening.
Describing it as a buffet is exactly right for this amount of content. This makes it great for everyone: those looking for a summary, an in-depth dive, or looking to implement/adapt it for themselves.
Love the longer first half that’s more like your earlier work. IMO the interview should be a short Q&A that lets the authors respond about parts you were unsure about or criticized. I much prefer when the paper review is more in depth (ideally even longer than in this video)
I am a big fan of your long introduction version. In my opinion, the way you are illustrating your thought is way more insightful than at least half of the videos which authors were included. In many papers, authors could act as supplementary information for the main concepts.
Hi Yannic, I've been following your channel since the very beginning and I always enjoyed your style. Since you're asking for comments about this new style format on interviewing papers' authors, I'd like to share my 2-cent impressions. I'd rather much preferred your former style of unbiased reviews by your own which were really professional and right to the technical points. These interviews on the other hand are more "deferential" and less unbiased. I found your previous style much more insightful and useful. Thank you anyway for your work, your channel is my preferred one to keep updated on the subject, I'm a senior MLE in a big telco company in Italy. Thanks!
As feedback is called for, just wanted to say that I mostly watch the paper explanations. I like the way you explain, that's really good to have.
I gotta be honest, your explanations are the best for me because you’re very good at explaining things whereas these researchers are a little more specialized in research. I do like that you interview them though. I’d always ask a question like “how did you come up with this idea” or “what was the inspiration for this idea?”
Love your content! Keep experimenting.
Long intro was great - we get your explanation and then the interview is a bonus!
2:00 Why not both. If you're into recycling content, we could have 3 videos: The paper review, the interview with the authors and then the paper review interleaved with comments from the authors. Everyone is happy and you got more content for the same price (minus editting, tho if you script the interleaved video before the interview you already know where the commentary will be) EDIT: Oh, this video is kinda like this already.
Really appreciate the time you take to make videos like this!
I love in depth conversations that aren't afraid to be technical
Jesus Christ. What an incredible result!
Great Paper, Great Interview.
Damn I'm quick ;) Thanks for the content homie
Long explanation w interview please.
Livestream interview with chat Q&A from the viewers at the end (last 15 minutes or so) would be great. Nick Zentner has been doing geology interviews long form for the last couple months and it has been superlative for discovering new questions and ideas.
Regarding the comment at 8:34: in one of my projects I'm using a neural network for a regression type problem and I found I got much smoother interpolation by switching most of the hidden layers to use asinh as the activation function. I have no idea how general that is or whether smoothness is even a desirable feature when you're trying to output weights for another neural network.
Is it possible to try this approach but generate MLP models? I'm thinking whether a hypernetwork for NeRF models is possible
Love both methods more yours but lovely to have both sides
Is there any recommanded video talk about semi supervised learning research ? becuase i just know about teacher model and semi-GAN .... Thanks
if we want to input a real number x into a nn, it is a lot better to represent it as a vector of sines sin(Ni*x) with various N (random fourier features, positional encodings etc)
maybe if we want nn to output a number precisely we could make it output vector of sines and then guesstimate what number is encoded in that vector?
or output it as a weighted sum of entries of a vector (like harsh and fine tuning knobs on old devices, but with a lot more knobs) with weights from geometric progression, like (0.8)^i
x=1000*summ Xi*(0.8)^i
Question: how "Hyper"/meta can you get with a setup like this before the resulting performance gets worse/doesnt improve?
at this rate, we’ll see the hyper hyper transformer in another 4 years
I really like the format, but i feel that the length of the videos is a bit intimidating, at least for me. I understand that it is hard to condense such in depth scientific discussion, but i think least videos at least under an hour would be more attractive for a lot of people
Maybe, maybe it might be more useful if the interviewees get to watch your explanation before the interview.
Then they know what you've covered, or what they think you've made a incorrect impression of
You said his name very approximately correct so it turned out to be unintentionally insulting, lol.
Long videos please.
I prefer two videos. One with the interview and one with the explanation... But I also feel you are less critical when you do the interview also. I think it might be good for you to criticize and then the author can get a chance to rebut the criticism.
love it
Hey Yannic, I much rather have two videos, one video of you formally taking your time to go over and explain the paper and another that is the interview with the author (if you feel the paper is good enough to where it warrants it)
honestly I usually just watch your interpertation to get up to speed on what the papers are for, but I tend to not want to listen to the conversations with authors just because that 'flow of information' feels different to my brain and isn't what I want when watching these videos. I do like having the option though, which is why I feel two videos are better.
Then you can cross-link between the videos to drive youtube engagement even more.
it may be that self-attention is slightly conscious
hey no jokes here. the attention might get self conscious
I find the terminology overly misleading at this level.
Someday though, it will be used as evidence against us.
➕Long intro
The SOTA ML buffet.
Transformer is spelled with a capital T.
Bad name for this model because it subsumes a potentially very important concept. It should be renamed.