Neural ODE - Pullback/vJp/adjoint rule

Поділитися
Вставка
  • Опубліковано 13 чер 2024
  • How do you backpropagate through the integration of a Ordinary Differentiational Equation? For instance, to train Neural ODEs to fit data. This requires the solution of an adjoint ODE running backward in time. Here are the notes: github.com/Ceyron/machine-lea...
    -------
    👉 This educational series is supported by the world-leaders in integrating machine learning and artificial intelligence with simulation and scientific computing, Pasteur Labs and Institute for Simulation Intelligence. Check out simulation.science/ for more on their pursuit of 'Nobel-Turing' technologies (arxiv.org/abs/2112.03235 ), and for partnership or career opportunities.
    -------
    📝 : Check out the GitHub Repository of the channel, where I upload all the handwritten notes and source-code files (contributions are very welcome): github.com/Ceyron/machine-lea...
    📢 : Follow me on LinkedIn or Twitter for updates on the channel and other cool Machine Learning & Simulation stuff: / felix-koehler and / felix_m_koehler
    💸 : If you want to support my work on the channel, you can become a Patreon here: / mlsim
    🪙: Or you can make a one-time donation via PayPal: www.paypal.com/paypalme/Felix...
    -------
    ⚙️ My Gear:
    (Below are affiliate links to Amazon. If you decide to purchase the product or something else on Amazon through this link, I earn a small commission.)
    - 🎙️ Microphone: Blue Yeti: amzn.to/3NU7OAs
    - ⌨️ Logitech TKL Mechanical Keyboard: amzn.to/3JhEtwp
    - 🎨 Gaomon Drawing Tablet (similar to a WACOM Tablet, but cheaper, works flawlessly under Linux): amzn.to/37katmf
    - 🔌 Laptop Charger: amzn.to/3ja0imP
    - 💻 My Laptop (generally I like the Dell XPS series): amzn.to/38xrABL
    - 📱 My Phone: Fairphone 4 (I love the sustainability and repairability aspect of it): amzn.to/3Jr4ZmV
    If I had to purchase these items again, I would probably change the following:
    - 🎙️ Rode NT: amzn.to/3NUIGtw
    - 💻 Framework Laptop (I do not get a commission here, but I love the vision of Framework. It will definitely be my next Ultrabook): frame.work
    As an Amazon Associate I earn from qualifying purchases.
    -------
    Timestamps:
    00:00:00 Neural ODE integration in a wrapper function
    00:01:15 Scientific Computing Interpretation
    00:01:30 Only interested in the final time value
    00:02:22 Task: Backpropagation of cotangent information
    00:03:53 Interpretation of the input cotangents
    00:05:17 Without unrolling the ODE integrator (we want OtD instead of DtO)
    00:07:59 General Pullback or vJp definition
    00:13:39 (1a) Parameter Cotangent: Starting with ODE constraint
    00:14:18 (1b) Total derivative wrt parameter vector
    00:16:31 (1c) Inner product with adjoint variable
    00:21:57 (1d) Integration by Parts
    00:24:13 (1e) Move right-hand-side Jacobian
    00:26:09 (1f) Investigating the limit evaluation
    00:28:13 (1g) Adding an artificial zero
    00:30:38 (1h) Identify the adjoint problem
    00:39:39 (1i) Discussing the adjoint problem
    00:45:59 (2a) IC cotangent: Starting with ODE constraint
    00:47:33 (2b) Total derivative wrt initial condition
    00:49:04 (2c) Inner product with adjoint variable
    00:50:15 (2d) Integration by Parts and moving the Jacobian
    00:53:27 (2e) Add artificial zero
    00:55:31 (2f) Identify the adjoint problem
    00:59:30 (2g) Discussing the new adjoint problem
    01:02:47 (3a) Final Time Cotangent: Starting with general solution to an ODE
    01:03:51 (3b) Total derivative wrt "T"
    01:05:21 (3c) Build vector-Jacobian product
    01:07:13 Full Pullback rule
    01:11:35 No adjoint problem needed if only interested in final time cotangent
    01:12:22 Adjoint Problem can be stepped through if only interested in IC cotangent
    01:13:53 Parameter cotangent needs full adjoint trajectory
    01:14:29 How to evaluate the vJp of the ODE dynamics
    01:17:08 (1) Save primal trajectory and interpolate
    01:21:27 (2) Run primal problem reversely alongside adjoint ODE
    01:26:51 How to evaluate the functional inner product for the parameter cotangent?
    01:28:51 Introduce another auxiliary ODE problem to accumulate the quadrature reversely in time
    01:34:18 One large ODE running reversely in time
    01:39:23 Summary
    01:41:57 Outro

КОМЕНТАРІ • 18

  • @fredxu9826
    @fredxu9826 5 місяців тому +2

    this playlist is the best thing happened to me so far in 2024 :)

    • @fredxu9826
      @fredxu9826 5 місяців тому

      One question: why are we directly using the ODE rather than the integral equation in this derivation? previously we were using the integral form.

    • @MachineLearningSimulation
      @MachineLearningSimulation  5 місяців тому

      That's amazing! 😊 I really enjoyed these videos on autodiff rules. In case you are interested, I also summarized most of the rules on my website: fkoehler.site/tables/ . It's still a work in progress, let me know if you find any mistakes.
      And happy new year, btw.
      Regarding your question: I assume with the "previous video" you refer to the pushforward/Jvp video (ua-cam.com/video/69KlO-kbxJ8/v-deo.html ). I think the approach via the ODE (or, in other words, the "condition equation") is the most general. From it, we can derive both the pushforward and the pullback. If I remember correctly, I also hint this in this video. You can also see this at the point we did the total derivative of the ODE condition under point (1) github.com/Ceyron/machine-learning-and-simulation/blob/main/english/adjoints_sensitivities_automatic_differentiation/rules/broadcasted_function_pullback.pdf . You can then right-multiply this with the tangent vector \dot{\theta} and recover the auxiliary solve, we had in the pushforward video.
      I opted to use the integral equation in the pushforward video because I found it more intuitive to there (and to twist it up a bit). So far, I haven't found a good way to derive the pullback with the integral equation.

  • @r.d.7575
    @r.d.7575 Рік тому

    Another gem. Given the amount of prerequisite material needed here, it would be nice to review the entire pushforward/jvp playlist at some point. Thank you !

    • @MachineLearningSimulation
      @MachineLearningSimulation  Рік тому +1

      Thanks a lot 😊
      There is still an intro video missing to what these primitive rules are. I'm still working on a good one, maybe also together with the intro to autodiff. Can't promise a time, but it will come 😊

  • @ibonitog
    @ibonitog 3 місяці тому +1

    What software are you using for handwriting? Cheers!

  • @claudiocasellato
    @claudiocasellato Рік тому

    Thank you 🙏

  • @nickbishop7315
    @nickbishop7315 9 місяців тому +1

    I am struggling to understand where the vector Jacobian product comes from at 7:59 is there anywhere I can read more about this/see how these equalities are derived?

    • @MachineLearningSimulation
      @MachineLearningSimulation  9 місяців тому

      Great question! 😊
      This is a general finding. As of now, I don't have a good intro video yet, because I could not come up with a superb intuition. In the meantime, I think this more practical video with JAX can be helpful: ua-cam.com/video/T6IgdbDvS_E/v-deo.htmlsi=L7UyIBlp4t9hgnd6
      Next to the documentation of JAX, I can also recommend the documentation of the autodiff ecosystem in Julia: juliadiff.org/ChainRulesCore.jl/stable/maths/propagators.html
      (They call vjps "pullbacks")

    • @nickbishop7315
      @nickbishop7315 9 місяців тому

      @@MachineLearningSimulation Thanks for the links! Honestly some of the content on this channel is gold. The julia link seems particularly helpful, I will have a look. Definitely need brush up on my understanding of manifolds/differential forms I think!

  • @leonriccius2684
    @leonriccius2684 Рік тому +1

    Super nice video! Couldn't really make the connection from the general info I had on the adjoint method to how it's done in the Neural ODE paper, but this is clear to me now. Just one question: at 14:30 you say that we are taking the total derivative of the ODE w.r.t. theta, but then you only use a partial. I was also surprised about the subsequent expansion of the partial derivative of the right hand side. Shouldn't we take d/d theta instead?

    • @MachineLearningSimulation
      @MachineLearningSimulation  Рік тому

      Thanks for the kind comment 😊
      Regarding your question with the total derivative: it's a bit fuzzy here. It could have definitely been a bit clearer. There was a good reason I opted to use the "partial" notation for the total derivative, but I can't recall anymore 😅.

    • @leonriccius2684
      @leonriccius2684 Рік тому +1

      @@MachineLearningSimulation I see. Sort of glad to hear that I'm not the only one that sometimes forgets stuff that I took a deep dive into a while ago 😄 Keep up the good work, really like the channel!

    • @MachineLearningSimulation
      @MachineLearningSimulation  Рік тому

      Thanks :)

  • @RameezRaja-qc9fi
    @RameezRaja-qc9fi Рік тому

    Dear sir thank you very much for your very informative videos.
    I would like to make a request.
    Can you please make video series on cfd coding in python on natural convection in vertical channel.
    Kind regards and best wishes ,
    Ramez

    • @MachineLearningSimulation
      @MachineLearningSimulation  Рік тому +1

      Hi, thanks for the comment :)
      There will be a series on numerical methods for PDEs in more detail. However, I can't promise any concrete scenarios like the natural convection yet.