HyperTransformer: Model Generation for Supervised and Semi-Supervised Few-Shot Learning (w/ Author)

Поділитися
Вставка
  • Опубліковано 4 січ 2025

КОМЕНТАРІ • 40

  • @YannicKilcher
    @YannicKilcher  2 роки тому +3

    OUTLINE:
    0:00 - Intro & Overview
    3:05 - Weight-generation vs Fine-tuning for few-shot learning
    10:10 - HyperTransformer model architecture overview
    22:30 - Why the self-attention mechanism is useful here
    34:45 - Start of Interview
    39:45 - Can neural networks even produce weights of other networks?
    47:00 - How complex does the computational graph get?
    49:45 - Why are transformers particularly good here?
    58:30 - What can the attention maps tell us about the algorithm?
    1:07:00 - How could we produce larger weights?
    1:09:30 - Diving into experimental results
    1:14:30 - What questions remain open?
    Paper: arxiv.org/abs/2201.04182
    ERRATA: I introduce Max Vladymyrov as Mark Vladymyrov

  • @chochona019
    @chochona019 2 роки тому +25

    Long introduction was great, it is good to be able to understand with drawings what is actually happening.

  • @Zed_Oud
    @Zed_Oud 2 роки тому +13

    Describing it as a buffet is exactly right for this amount of content. This makes it great for everyone: those looking for a summary, an in-depth dive, or looking to implement/adapt it for themselves.

  • @NavinF
    @NavinF 2 роки тому +5

    Love the longer first half that’s more like your earlier work. IMO the interview should be a short Q&A that lets the authors respond about parts you were unsure about or criticized. I much prefer when the paper review is more in depth (ideally even longer than in this video)

  • @mahdipourmirzaei1048
    @mahdipourmirzaei1048 2 роки тому +2

    I am a big fan of your long introduction version. In my opinion, the way you are illustrating your thought is way more insightful than at least half of the videos which authors were included. In many papers, authors could act as supplementary information for the main concepts.

  • @enniograsso2968
    @enniograsso2968 2 роки тому +7

    Hi Yannic, I've been following your channel since the very beginning and I always enjoyed your style. Since you're asking for comments about this new style format on interviewing papers' authors, I'd like to share my 2-cent impressions. I'd rather much preferred your former style of unbiased reviews by your own which were really professional and right to the technical points. These interviews on the other hand are more "deferential" and less unbiased. I found your previous style much more insightful and useful. Thank you anyway for your work, your channel is my preferred one to keep updated on the subject, I'm a senior MLE in a big telco company in Italy. Thanks!

  • @YvesQuemener
    @YvesQuemener 2 роки тому +3

    As feedback is called for, just wanted to say that I mostly watch the paper explanations. I like the way you explain, that's really good to have.

  • @sammay1540
    @sammay1540 2 роки тому +5

    I gotta be honest, your explanations are the best for me because you’re very good at explaining things whereas these researchers are a little more specialized in research. I do like that you interview them though. I’d always ask a question like “how did you come up with this idea” or “what was the inspiration for this idea?”
    Love your content! Keep experimenting.

  • @Yenrabbit
    @Yenrabbit 2 роки тому +1

    Long intro was great - we get your explanation and then the interview is a bonus!

  • @DamianReloaded
    @DamianReloaded 2 роки тому +11

    2:00 Why not both. If you're into recycling content, we could have 3 videos: The paper review, the interview with the authors and then the paper review interleaved with comments from the authors. Everyone is happy and you got more content for the same price (minus editting, tho if you script the interleaved video before the interview you already know where the commentary will be) EDIT: Oh, this video is kinda like this already.

  • @qwerty123443wifi
    @qwerty123443wifi 2 роки тому

    Really appreciate the time you take to make videos like this!

  • @hamandchees3
    @hamandchees3 2 роки тому

    I love in depth conversations that aren't afraid to be technical

  • @boffo25
    @boffo25 2 роки тому

    Jesus Christ. What an incredible result!

  • @BboyDschafar
    @BboyDschafar 2 роки тому

    Great Paper, Great Interview.

  • @UberSpaceCow
    @UberSpaceCow 2 роки тому +1

    Damn I'm quick ;) Thanks for the content homie

  • @norm7090
    @norm7090 2 роки тому +3

    Long explanation w interview please.

  • @Guytron95
    @Guytron95 2 роки тому

    Livestream interview with chat Q&A from the viewers at the end (last 15 minutes or so) would be great. Nick Zentner has been doing geology interviews long form for the last couple months and it has been superlative for discovering new questions and ideas.

  • @quickdudley
    @quickdudley 2 роки тому

    Regarding the comment at 8:34: in one of my projects I'm using a neural network for a regression type problem and I found I got much smoother interpolation by switching most of the hidden layers to use asinh as the activation function. I have no idea how general that is or whether smoothness is even a desirable feature when you're trying to output weights for another neural network.

  • @theodorosgalanos9663
    @theodorosgalanos9663 2 роки тому +1

    Is it possible to try this approach but generate MLP models? I'm thinking whether a hypernetwork for NeRF models is possible

  • @oluwayomiolugbuyi6670
    @oluwayomiolugbuyi6670 2 роки тому

    Love both methods more yours but lovely to have both sides

  • @KnowNothingJohnSnow
    @KnowNothingJohnSnow 2 роки тому

    Is there any recommanded video talk about semi supervised learning research ? becuase i just know about teacher model and semi-GAN .... Thanks

  • @АлексейТучак-м4ч
    @АлексейТучак-м4ч 2 роки тому

    if we want to input a real number x into a nn, it is a lot better to represent it as a vector of sines sin(Ni*x) with various N (random fourier features, positional encodings etc)
    maybe if we want nn to output a number precisely we could make it output vector of sines and then guesstimate what number is encoded in that vector?
    or output it as a weighted sum of entries of a vector (like harsh and fine tuning knobs on old devices, but with a lot more knobs) with weights from geometric progression, like (0.8)^i
    x=1000*summ Xi*(0.8)^i

  • @Supreme_Lobster
    @Supreme_Lobster 2 роки тому

    Question: how "Hyper"/meta can you get with a setup like this before the resulting performance gets worse/doesnt improve?

  • @patrickl5290
    @patrickl5290 2 роки тому

    at this rate, we’ll see the hyper hyper transformer in another 4 years

  • @mr.heuhoi1446
    @mr.heuhoi1446 2 роки тому +3

    I really like the format, but i feel that the length of the videos is a bit intimidating, at least for me. I understand that it is hard to condense such in depth scientific discussion, but i think least videos at least under an hour would be more attractive for a lot of people

  • @vzxvzvcxasd7109
    @vzxvzvcxasd7109 2 роки тому

    Maybe, maybe it might be more useful if the interviewees get to watch your explanation before the interview.
    Then they know what you've covered, or what they think you've made a incorrect impression of

  • @XOPOIIIO
    @XOPOIIIO 2 роки тому +1

    You said his name very approximately correct so it turned out to be unintentionally insulting, lol.

  • @chinmayakaundanya3151
    @chinmayakaundanya3151 2 роки тому

    Long videos please.

  • @ssssssstssssssss
    @ssssssstssssssss 2 роки тому

    I prefer two videos. One with the interview and one with the explanation... But I also feel you are less critical when you do the interview also. I think it might be good for you to criticize and then the author can get a chance to rebut the criticism.

  • @starkest
    @starkest 2 роки тому

    love it

  • @CreativeBuilds
    @CreativeBuilds 2 роки тому

    Hey Yannic, I much rather have two videos, one video of you formally taking your time to go over and explain the paper and another that is the interview with the author (if you feel the paper is good enough to where it warrants it)
    honestly I usually just watch your interpertation to get up to speed on what the papers are for, but I tend to not want to listen to the conversations with authors just because that 'flow of information' feels different to my brain and isn't what I want when watching these videos. I do like having the option though, which is why I feel two videos are better.
    Then you can cross-link between the videos to drive youtube engagement even more.

  • @bojan368
    @bojan368 2 роки тому +2

    it may be that self-attention is slightly conscious

    • @laurenpinschannels
      @laurenpinschannels 2 роки тому +1

      hey no jokes here. the attention might get self conscious

    • @petevenuti7355
      @petevenuti7355 2 роки тому

      I find the terminology overly misleading at this level.
      Someday though, it will be used as evidence against us.

  • @kimchi_taco
    @kimchi_taco 2 роки тому

    ➕Long intro

  • @norik1616
    @norik1616 2 роки тому

    The SOTA ML buffet.

  • @bjornlindqvist8305
    @bjornlindqvist8305 2 роки тому

    Transformer is spelled with a capital T.

  • @ThinkTank255
    @ThinkTank255 2 роки тому

    Bad name for this model because it subsumes a potentially very important concept. It should be renamed.