I remember academic discussions in 1994 about how something like a dog catching a frisbee was a neural network solving an ode, which was cool. I think that was when they started engaging on the topic.
Great video! Is there a reason you use an empty data set repeated 5000 times when you actually have some data points? It seems you hard code these data points into the loss function instead.
It's a hack to just make it run 5000 times without dealing with data handling. If you have real data for the digital twinning you can batch it more nicely by using that. But since we weren't batching, it would just be equivalent.
At 14:11 it says that we can improve the fit by increasing the size of the NN, using more training points, or by training for more iterations. Wouldn't the type of loss function used also have an affect on the performance of the learning algorithm? The first thing I was thinking of is that the sum of squared loss we used isn't exactly the best for use in neural networks. If I recall correctly, sum of squared loss is non-convex with respect to the parameters when using tanh activation functions. So optimization over this loss function is going to give suboptimal results. Maybe this is negligible in the case of this problem but I think that using cross-entropy loss or one of it's derivatives would provide better performance.
what loss we are using here isn't really important. It's the idea of how to use a physical equation (such as Hooke's Law used in the video) in a loss function.
I guess cite the course notes? github.com/mitmath/18337 as a website? Or just pick a relevant SciML paper like arxiv.org/abs/2001.04385: these discussions tend to be the appendix material (expanded a bit more of course).
I remember academic discussions in 1994 about how something like a dog catching a frisbee was a neural network solving an ode, which was cool. I think that was when they started engaging on the topic.
Awesome intro my friend. Thanks
Excellent overview. Thanks!
Great video! Is there a reason you use an empty data set repeated 5000 times when you actually have some data points? It seems you hard code these data points into the loss function instead.
It's a hack to just make it run 5000 times without dealing with data handling. If you have real data for the digital twinning you can batch it more nicely by using that. But since we weren't batching, it would just be equivalent.
excellent explanation
At 14:11 it says that we can improve the fit by increasing the size of the NN, using more training points, or by training for more iterations. Wouldn't the type of loss function used also have an affect on the performance of the learning algorithm?
The first thing I was thinking of is that the sum of squared loss we used isn't exactly the best for use in neural networks. If I recall correctly, sum of squared loss is non-convex with respect to the parameters when using tanh activation functions. So optimization over this loss function is going to give suboptimal results.
Maybe this is negligible in the case of this problem but I think that using cross-entropy loss or one of it's derivatives would provide better performance.
what loss we are using here isn't really important. It's the idea of how to use a physical equation (such as Hooke's Law used in the video) in a loss function.
Hi! What is p in the loss function L(p)?
The neural network weights (parameters) being optimized.
This so interesting!!!
is it possible to formulate a loss function for system of ode's?
Yes, just add all of the losses.
Can you do the same work in python, and some videos about getting tge parameters back.
Thanks a lot for the great work
Hi how can I cite your work for my thesis?
I guess cite the course notes? github.com/mitmath/18337 as a website? Or just pick a relevant SciML paper like arxiv.org/abs/2001.04385: these discussions tend to be the appendix material (expanded a bit more of course).