Parameter Prediction for Unseen Deep Architectures (w/ First Author Boris Knyazev)

Поділитися
Вставка
  • Опубліковано 2 січ 2025

КОМЕНТАРІ • 34

  • @YannicKilcher
    @YannicKilcher  3 роки тому +16

    Deep Neural Networks are usually trained from a given parameter initialization using SGD until convergence at a local optimum. This paper goes a different route: Given a novel network architecture for a known dataset, can we predict the final network parameters without ever training them? The authors build a Graph-Hypernetwork and train on a novel dataset of various DNN-architectures to predict high-performing weights. The results show that not only can the GHN predict weights with non-trivial performance, but it can also generalize beyond the distribution of training architectures to predict weights for networks that are much larger, deeper, or wider than ever seen in training.
    OUTLINE:
    0:00 - Intro & Overview
    6:20 - DeepNets-1M Dataset
    13:25 - How to train the Hypernetwork
    17:30 - Recap on Graph Neural Networks
    23:40 - Message Passing mirrors forward and backward propagation
    25:20 - How to deal with different output shapes
    28:45 - Differentiable Normalization
    30:20 - Virtual Residual Edges
    34:40 - Meta-Batching
    37:00 - Experimental Results
    42:00 - Fine-Tuning experiments
    45:25 - Public reception of the paper
    ERRATA:
    - Boris' name is obviously Boris, not Bori
    - At 36:05, Boris mentions that they train the first variant, yet on closer examination, we decided it's more like the second

  • @AICoffeeBreak
    @AICoffeeBreak 3 роки тому +30

    This is so cool! Courageous first author. 💪 It's great how you deliver this platform for the authors to defend their papers. Excited to see some next episodes like this!

  • @ravikiranramachandra1000
    @ravikiranramachandra1000 3 роки тому +4

    Wow... Great explanation... The authors of the papers should come forward like this to give their thinking behind architectures... Excellent video.

  • @swayson5208
    @swayson5208 3 роки тому +16

    oh cool, having the author -- Leveled up!

  • @mgostIH
    @mgostIH 3 роки тому +33

    Loved this! Hope there will be more videos with authors helping explain the paper or possibly discussions with them on your Discord server too!

  • @priyamdey3298
    @priyamdey3298 3 роки тому +38

    I think I prefer the regular way of Yannic explaining the paper himself. Feels more natural. This one had a nervy feeling to it by having the author himself. Perhaps would be better if Yannic explains himself first separately and follow-up later with the authors for their thoughts / perspectives.

    • @mgostIH
      @mgostIH 3 роки тому +9

      It might just be because it's his first video doing this, he was much quieter a couple years ago too on his regular videos.

    • @laurenpinschannels
      @laurenpinschannels 3 роки тому +11

      I want both types of interaction, but would prefer to have a longer period of explanation before each author "answer"/additional-exposition. the really nice thing about yannic videos is he starts at the beginning, normally. letting yannic's explanation-ordering skill guide the discussion a bit more would make this a bit more digestible.

    • @dennisestenson7820
      @dennisestenson7820 3 роки тому +1

      I agree, especially with your second point. It would be nice if the paper were explained in one segment, and then a follow-up dialog with the author could tie up any loose ends.

  • @danidini4444
    @danidini4444 3 роки тому +1

    Love this new interview format!

  • @florianjug
    @florianjug 3 роки тому +9

    I wonder if a paper overview monologue (classic style) followed by an interview style discussion/interview with an author after that would not be an even more engaging format. Just an idea, but I think I’d love this and maybe you like this idea too? :)

  • @wadyn95
    @wadyn95 3 роки тому +14

    Боря - красавчик. Очень оригинальная идея!

  • @JTMoustache
    @JTMoustache 3 роки тому +6

    Boom ! Great format. I wish you asked a bit more about f(x,a,H(a,/theta)). How they made it differentiable.

  • @AndreiMargeloiu
    @AndreiMargeloiu 3 роки тому +2

    Great idea! It's much more interactive if the author presents their paper.

  • @realinformhunter
    @realinformhunter 3 роки тому

    Nice one, really enjoyed the conversation and explanations

  • @blanamaxima
    @blanamaxima 3 роки тому

    amazing idea Yannic, I like this a lot!

  • @manuelplank5406
    @manuelplank5406 3 роки тому

    Pretty cool Format!

  • @billykotsos4642
    @billykotsos4642 3 роки тому

    Hopefully a new series !!!

  • @NeoShameMan
    @NeoShameMan 3 роки тому +7

    It's harder to follow in this format, interruption and dead air doesn't allow to focus on key points.

  • @maxim_ml
    @maxim_ml 2 роки тому

    I wonder what accuracy this NN is able to get when overfitting for one batch of architectures

  • @roomo7time
    @roomo7time 3 роки тому

    this is gold

  • @zrebbesh
    @zrebbesh 3 роки тому +2

    This is groundbreaking. The implication is that if we have a fully developed knowledge of a task, we can then experiment with different network architectures relatively easily, without having to train each one in turn. We can tear down the ones that are a major investment in resources and replace them with the most promising of a bunch of different more efficient architectures, then train to refine.

  • @gren287
    @gren287 3 роки тому +1

    Cool!

  • @MasamuneX
    @MasamuneX 3 роки тому

    how to train a NN to predict NN's

  • @MadsterV
    @MadsterV 3 роки тому

    Does it work on itself?

  • @michaelwangCH
    @michaelwangCH 3 роки тому

    The statistician did this for past 50 years, only the difference is using a fully connected graph to initialize the process.
    It has no potential to generalize, no potential to move into the direction of AGI - instead fitting the curve, this model is fitting the graph, continues spaces vs. discrete spaces. What is the innovation here?

  • @gaborfuisz9516
    @gaborfuisz9516 3 роки тому +3

    How are you going to roast the authors if you are on a call with them? :D

  • @NoNameAtAll2
    @NoNameAtAll2 3 роки тому +4

    isn't it bor-I-s? not b-O-ris

  • @andres_pq
    @andres_pq 3 роки тому

    Didn't know you were left-handed Yannic

  • @MrJorgeceja123
    @MrJorgeceja123 3 роки тому

    Deep Neural Networks Upside Down