Noether Networks: Meta-Learning Useful Conserved Quantities (w/ the authors)

Поділитися
Вставка
  • Опубліковано 19 січ 2025

КОМЕНТАРІ • 30

  • @YannicKilcher
    @YannicKilcher  3 роки тому +4

    OUTLINE:
    0:00 - Intro & Overview
    18:10 - Interview Start
    21:20 - Symmetry priors vs conserved quantities
    23:25 - Example: Pendulum
    27:45 - Noether Network Model Overview
    35:35 - Optimizing the Noether Loss
    41:00 - Is the computation graph stable?
    46:30 - Increasing the inference time computation
    48:45 - Why dynamically modify the model?
    55:30 - Experimental Results & Discussion
    Paper: arxiv.org/abs/2112.03321
    Website: dylandoblar.github.io/noether-networks/
    Code: github.com/dylandoblar/noether-networks

  • @GreyMatter168
    @GreyMatter168 3 роки тому +11

    Thanks for the great questions Yannic! A pleasure to be on the show :)

  • @Daniel-ih4zh
    @Daniel-ih4zh 3 роки тому +15

    A lot of these works around equivariance and symmetry preservation have equations very similar to slow feature analysis. If we treat x_i+1 - x_i as a discrete gradient, minimising it's square is similar to minimising the dirchelet energy. And minimising the dirichelet energy is a variational solution to minimising some laplacian. SFA also has connections to solutions to the generalised eigenvector problem.

  • @connor-shorten
    @connor-shorten 3 роки тому +11

    Super excited to watch this later, awesome paper selection Yannic!

  • @konstantinwilleke6292
    @konstantinwilleke6292 3 роки тому +1

    The paper explained format together with the authors is simply excellent. Hope you'll keep 'em coming! I do have a minor comment though, for your consideration: The small red line at the bottom of the thumbnail (glad you aren't doing dumbnails for these) makes it seem like I had already watched it (or 75% of it).

  • @JBoy340a
    @JBoy340a 3 роки тому +1

    Nicely done. I really like this format where you have the author(s) explain the information that examples and parts of the paper are trying to convey.

  • @oneman7094
    @oneman7094 3 роки тому +5

    Thumbnail suggestion:
    I thought that i already watched the video since it has a red bar at the bottom

  • @Batu135
    @Batu135 3 роки тому +5

    This format of questioning a paper creator about his paper is the best quality I can imagine.

  • @mrbeancanman
    @mrbeancanman 3 роки тому

    love the interview style, its very helpful in clearing up any misconceptions. Great stuff!

  • @marilysedevoyault465
    @marilysedevoyault465 3 роки тому +1

    So interesting to see such a research: prediction from the data of the object of a study (one sole sequence), using its own data from this sole sequence to make a prediction. I’m sure this type of algorithm will be very useful in the future to filter out (remove) useless sequences in a large data of sequences used to make predictions in real life. I wouldn’t be surprised to see such algorithm used in robots someday. Thank you

  • @vladimirkomkov2404
    @vladimirkomkov2404 2 роки тому

    Thank you a lot for the clear explanations!
    Feature request: How about making the background static to make watching a bit easier?

  • @alvarofrancescbudriafernan2005
    @alvarofrancescbudriafernan2005 3 роки тому +2

    Molt bé Ferran, molt interessant la teva xerrada al DL BCN!
    Great job Ferran, and your talk at DL BCN was super interesting!

  • @danielejuda978
    @danielejuda978 3 роки тому +1

    It is not clear what is the use of this... Does it do real predictions? Or it is like training and testing with the same data? Also may be the only thing that is predicted as conserved is the background (r2 mode, sorry)

  • @mithrillis
    @mithrillis 3 роки тому +7

    Great breakdown of the paper! It's interesting to find some similarity between this quantity conservation approach and "no negative sampling" self-supervised learning like BYOL or SWaV. The staggered updates of inner and outer loop is like the two steps in expectation maximization algorithms. I guess this kind of restriction on what one update step can do based on some "prior" of the parameters, according to the previous training step, is crucial for preventing collapse.
    I'm still wondering how this algorithm actually learns informative conserved quantities, though. It seems you could end up with some less useful conserved quantities like frame colour spectrum, yet still plausibly get the same attention map. How do you constrain what kind of symmetries corresponding to the conserved value to learn, when you do not specify which symmetries you want to enforce?

    • @oncedidactic
      @oncedidactic 3 роки тому

      On the face of it, seems like data augmentation would help chip away at the “adversarial* symmetries” you mention, to coin a term.
      *er I mean “non robust” or whatever
      Not a good full solution but a stepping stone to study the issue.

  • @simoncorbeil4081
    @simoncorbeil4081 Рік тому

    Great and very interesting presentation.
    I have 3 questions ?
    1) I'm wandering how to make sure that the g network doesn't learn a trivial conserved quantity ?
    2) Also, if more that one quantities are conserved (like dynamical system that conserve mass and energy), is there a way handle it ?
    3) How this method would compared to PINN (physics inspired neural net) ?
    Injecting the residual of dynamical equations in the loss function is a way to guide the network to learn conserved quanties.
    Any advantage presented by one method or the other ?

  • @G12GilbertProduction
    @G12GilbertProduction 3 роки тому

    I just fought for this energy pendulum theorem, it may gains for a unentropic arguments of algorithm, or even this functions of theorem got only differences in the iteration level?

  • @dermitdembrot3091
    @dermitdembrot3091 3 роки тому

    For the pendulum experiments with results shown at 1:06:00, isn't it weird to compare noether+symbolic-regression to no-noether+MLP, wouldn't you want to compare to no-noether+symbolic-regression or even all four combinations? I don't see the intrinsic reason to have the noether procedure and symbolic regression coupled.

  • @Kram1032
    @Kram1032 3 роки тому +1

    Might be interesting to throw this method at random contexts and see what happens. Perhaps it could find useful conserved quantities in unexpected places!

  • @odysseus9672
    @odysseus9672 3 роки тому +1

    Scene changes would actually be trivial. Instead of inter-frame L2, do a triple frame median of the square error (or something similar). That would allow for occasional sudden changes while still enforcing sameness most of the time.

  • @drdca8263
    @drdca8263 3 роки тому +1

    Nitpicking on how a word was used : I think the way you used the word "behold" in "behold actually writing this down and implementing the [...]" was a nonstandard use of the term? In my experience, "behold" is generally used when the thing can be observed/seen , not just imagined-what-it-would be. Like, I think it generally implies that the thing is there, not just a hypothetical.

  • @JTMoustache
    @JTMoustache 3 роки тому +3

    A class is a conserved quantity - dogs are symmetric - DOG IS ENERGY ! 🔥🐶💣
    Also... they dont do one gradient step.. there's a loop in the train function. What's going on here? Why the big lie ? Conspiracy..
    ALSO how do you train G(x) !? Man... Is it only contrastive ?? I will need to read the paper, I have not done that since this channel was created.
    Cool stuff tho

  • @444haluk
    @444haluk 3 роки тому +1

    I think even though you understand the Noether theory correctly, you didn't understand the implications for the intelligence. Humans are map creators. There is rarely a graph of attributes when it comes to representing the problem. We are almost always doing mapping on manifolds. And it is not some aesthetic sentence. We actively act on a map and perceive the world via prediction. The symmetry encoding in the brain is via actions. You can take a route (action) with your eye (both extraocular and ciliary muscle), came back to the same point (perception) and that's a symmetry.

  • @444haluk
    @444haluk 3 роки тому

    Oh cmon I have never heard "approximate conservations". Friction turns into heat. Energy is conserved (unless the gravitational field has been effected). I don't think this works for slow dissipations.

  • @SimonJackson13
    @SimonJackson13 3 роки тому

    Conservation of acceleration? Seems a lot of broken glass agrees.

    • @SimonJackson13
      @SimonJackson13 3 роки тому

      Strange how the nature of a singularity on f(x)x/x can give 3 or 4 given interchange of integral and sum order, different approaches to a singularity.

    • @SimonJackson13
      @SimonJackson13 3 роки тому

      Because infinite summation order has examples of interchange of order of summation affecting the sum therefore it can't be assumed there are only 3 ways of behaviour toward a singularity?

    • @SimonJackson13
      @SimonJackson13 3 роки тому

      Not that much is approaching a singularity. But that's where completeness lies.