Neural ODE - Pullback/vJp/adjoint rule
Вставка
- Опубліковано 13 чер 2024
- How do you backpropagate through the integration of a Ordinary Differentiational Equation? For instance, to train Neural ODEs to fit data. This requires the solution of an adjoint ODE running backward in time. Here are the notes: github.com/Ceyron/machine-lea...
-------
👉 This educational series is supported by the world-leaders in integrating machine learning and artificial intelligence with simulation and scientific computing, Pasteur Labs and Institute for Simulation Intelligence. Check out simulation.science/ for more on their pursuit of 'Nobel-Turing' technologies (arxiv.org/abs/2112.03235 ), and for partnership or career opportunities.
-------
📝 : Check out the GitHub Repository of the channel, where I upload all the handwritten notes and source-code files (contributions are very welcome): github.com/Ceyron/machine-lea...
📢 : Follow me on LinkedIn or Twitter for updates on the channel and other cool Machine Learning & Simulation stuff: / felix-koehler and / felix_m_koehler
💸 : If you want to support my work on the channel, you can become a Patreon here: / mlsim
🪙: Or you can make a one-time donation via PayPal: www.paypal.com/paypalme/Felix...
-------
⚙️ My Gear:
(Below are affiliate links to Amazon. If you decide to purchase the product or something else on Amazon through this link, I earn a small commission.)
- 🎙️ Microphone: Blue Yeti: amzn.to/3NU7OAs
- ⌨️ Logitech TKL Mechanical Keyboard: amzn.to/3JhEtwp
- 🎨 Gaomon Drawing Tablet (similar to a WACOM Tablet, but cheaper, works flawlessly under Linux): amzn.to/37katmf
- 🔌 Laptop Charger: amzn.to/3ja0imP
- 💻 My Laptop (generally I like the Dell XPS series): amzn.to/38xrABL
- 📱 My Phone: Fairphone 4 (I love the sustainability and repairability aspect of it): amzn.to/3Jr4ZmV
If I had to purchase these items again, I would probably change the following:
- 🎙️ Rode NT: amzn.to/3NUIGtw
- 💻 Framework Laptop (I do not get a commission here, but I love the vision of Framework. It will definitely be my next Ultrabook): frame.work
As an Amazon Associate I earn from qualifying purchases.
-------
Timestamps:
00:00:00 Neural ODE integration in a wrapper function
00:01:15 Scientific Computing Interpretation
00:01:30 Only interested in the final time value
00:02:22 Task: Backpropagation of cotangent information
00:03:53 Interpretation of the input cotangents
00:05:17 Without unrolling the ODE integrator (we want OtD instead of DtO)
00:07:59 General Pullback or vJp definition
00:13:39 (1a) Parameter Cotangent: Starting with ODE constraint
00:14:18 (1b) Total derivative wrt parameter vector
00:16:31 (1c) Inner product with adjoint variable
00:21:57 (1d) Integration by Parts
00:24:13 (1e) Move right-hand-side Jacobian
00:26:09 (1f) Investigating the limit evaluation
00:28:13 (1g) Adding an artificial zero
00:30:38 (1h) Identify the adjoint problem
00:39:39 (1i) Discussing the adjoint problem
00:45:59 (2a) IC cotangent: Starting with ODE constraint
00:47:33 (2b) Total derivative wrt initial condition
00:49:04 (2c) Inner product with adjoint variable
00:50:15 (2d) Integration by Parts and moving the Jacobian
00:53:27 (2e) Add artificial zero
00:55:31 (2f) Identify the adjoint problem
00:59:30 (2g) Discussing the new adjoint problem
01:02:47 (3a) Final Time Cotangent: Starting with general solution to an ODE
01:03:51 (3b) Total derivative wrt "T"
01:05:21 (3c) Build vector-Jacobian product
01:07:13 Full Pullback rule
01:11:35 No adjoint problem needed if only interested in final time cotangent
01:12:22 Adjoint Problem can be stepped through if only interested in IC cotangent
01:13:53 Parameter cotangent needs full adjoint trajectory
01:14:29 How to evaluate the vJp of the ODE dynamics
01:17:08 (1) Save primal trajectory and interpolate
01:21:27 (2) Run primal problem reversely alongside adjoint ODE
01:26:51 How to evaluate the functional inner product for the parameter cotangent?
01:28:51 Introduce another auxiliary ODE problem to accumulate the quadrature reversely in time
01:34:18 One large ODE running reversely in time
01:39:23 Summary
01:41:57 Outro
this playlist is the best thing happened to me so far in 2024 :)
One question: why are we directly using the ODE rather than the integral equation in this derivation? previously we were using the integral form.
That's amazing! 😊 I really enjoyed these videos on autodiff rules. In case you are interested, I also summarized most of the rules on my website: fkoehler.site/tables/ . It's still a work in progress, let me know if you find any mistakes.
And happy new year, btw.
Regarding your question: I assume with the "previous video" you refer to the pushforward/Jvp video (ua-cam.com/video/69KlO-kbxJ8/v-deo.html ). I think the approach via the ODE (or, in other words, the "condition equation") is the most general. From it, we can derive both the pushforward and the pullback. If I remember correctly, I also hint this in this video. You can also see this at the point we did the total derivative of the ODE condition under point (1) github.com/Ceyron/machine-learning-and-simulation/blob/main/english/adjoints_sensitivities_automatic_differentiation/rules/broadcasted_function_pullback.pdf . You can then right-multiply this with the tangent vector \dot{\theta} and recover the auxiliary solve, we had in the pushforward video.
I opted to use the integral equation in the pushforward video because I found it more intuitive to there (and to twist it up a bit). So far, I haven't found a good way to derive the pullback with the integral equation.
Another gem. Given the amount of prerequisite material needed here, it would be nice to review the entire pushforward/jvp playlist at some point. Thank you !
Thanks a lot 😊
There is still an intro video missing to what these primitive rules are. I'm still working on a good one, maybe also together with the intro to autodiff. Can't promise a time, but it will come 😊
What software are you using for handwriting? Cheers!
Thanks, it's Xournal++.
Thank you 🙏
You’re welcome 😊
I am struggling to understand where the vector Jacobian product comes from at 7:59 is there anywhere I can read more about this/see how these equalities are derived?
Great question! 😊
This is a general finding. As of now, I don't have a good intro video yet, because I could not come up with a superb intuition. In the meantime, I think this more practical video with JAX can be helpful: ua-cam.com/video/T6IgdbDvS_E/v-deo.htmlsi=L7UyIBlp4t9hgnd6
Next to the documentation of JAX, I can also recommend the documentation of the autodiff ecosystem in Julia: juliadiff.org/ChainRulesCore.jl/stable/maths/propagators.html
(They call vjps "pullbacks")
@@MachineLearningSimulation Thanks for the links! Honestly some of the content on this channel is gold. The julia link seems particularly helpful, I will have a look. Definitely need brush up on my understanding of manifolds/differential forms I think!
Super nice video! Couldn't really make the connection from the general info I had on the adjoint method to how it's done in the Neural ODE paper, but this is clear to me now. Just one question: at 14:30 you say that we are taking the total derivative of the ODE w.r.t. theta, but then you only use a partial. I was also surprised about the subsequent expansion of the partial derivative of the right hand side. Shouldn't we take d/d theta instead?
Thanks for the kind comment 😊
Regarding your question with the total derivative: it's a bit fuzzy here. It could have definitely been a bit clearer. There was a good reason I opted to use the "partial" notation for the total derivative, but I can't recall anymore 😅.
@@MachineLearningSimulation I see. Sort of glad to hear that I'm not the only one that sometimes forgets stuff that I took a deep dive into a while ago 😄 Keep up the good work, really like the channel!
Thanks :)
Dear sir thank you very much for your very informative videos.
I would like to make a request.
Can you please make video series on cfd coding in python on natural convection in vertical channel.
Kind regards and best wishes ,
Ramez
Hi, thanks for the comment :)
There will be a series on numerical methods for PDEs in more detail. However, I can't promise any concrete scenarios like the natural convection yet.