Hi Brent, watched your content for a while now and I appreciate it. I’m glad you found the Cholesky decomposition way of doing OLS, that would have been my suggestion. I would say - I believe the emphasis is on the C in Cholesky, not the H. Also, I could see the if check you put in taking longer because it has to check that if statement every time, which isn’t free and is a lot of conditional branching for a modern CPU. Alternatively, I’d maybe see if you could utilize ndarray with iterators and reducers for constructing the design matrix. It could probably utilize some magic SIMD vectorized operations and could be “perfectly parralizable” with parallel iterators interfacing to Rayon. Also, I believe Cholesky decompositions are built into ndarray’s matrix library and I (strangely) found it to be faster than nalgebra in my own testing. A last point - I would consider moving towards spline regression (e.g. B-spline) over polynomial regression if you find you’re using polynomials of high degree. They tend to handle the bias/variance trade off better for complicated functions.
Hi Jordan, thanks for the kind words and the tips! Yeah, I was surprised that the if version sped it up too. Maybe the test case I used was a sweet spot and both smaller and larger cases would perform differently. The C3H2 case I ran in the video did run more slowly than I expected. I also mixed that change with moving some of the variables out of the loop too, so I might need to revisit that section more carefully. I will have to take a closer look at ndarray. I use their arrays in one of my other projects, but I haven't tried their matrix library. I only recently heard about the unstable SIMD module, so I've been itching to try to write some of that myself, but I could settle for using it under the hood! I also felt like there should be constructors for that matrix in nalgebra but couldn't find them in the documentation either. The spline suggestion is very interesting too. We typically cap out at fourth-order polynomials, which correspond to our fourth-order Taylor series expansion of a molecule's potential energy surface. It sounds like the splines might even give us a different way of modeling the potential energy surface to begin with. Anyway, thanks again! You've given me a lot more to think about.
Interesting topic! How about flamegraph / inferno? I'd like to study more about this topic. Flamegraph seems cool but I still didn't really know how to read and analyze infor that graph. if you have experience with it please make a video or you can share about it in a live streaming if you don't mind. Thanks!
I knew I had used cargo flamegraph before, and actually in that anpass project I have a recipe for it in the Makefile. I might even prefer that visualization for the time spent in each function (compared to the flow chart thing in kcachegrind), but I really like how kcachegrind can also show the source code. On a closer look, they appear to be showing slightly different data points, so it's probably worth using them both. In particular, the flamegraph is showing me spending like 33% of my time indexing matrices, which only shows up as part of another block in kcachegrind. I'll try to use it in a video or stream too, thanks for the suggestion!
One thing that may improve the performance of your project is using “&” instead of “&&” and “|” instead of “||”. This way you are saying that you want to evaluate the whole thing and doesn’t want to short circuit it. It sometimes helped me improve the performance on some benchmarks
Another thing that helped me before: use some plain linear algebra before writing the algorithm. I’ve got under 1 microsecond on linear regressions given that I know that the x axis is symmetrical. There are some basis/algorithms that let you compute on compile time and let you get a nice matrix and a nice system of equations
Hi Brent, watched your content for a while now and I appreciate it.
I’m glad you found the Cholesky decomposition way of doing OLS, that would have been my suggestion. I would say - I believe the emphasis is on the C in Cholesky, not the H.
Also, I could see the if check you put in taking longer because it has to check that if statement every time, which isn’t free and is a lot of conditional branching for a modern CPU. Alternatively, I’d maybe see if you could utilize ndarray with iterators and reducers for constructing the design matrix. It could probably utilize some magic SIMD vectorized operations and could be “perfectly parralizable” with parallel iterators interfacing to Rayon. Also, I believe Cholesky decompositions are built into ndarray’s matrix library and I (strangely) found it to be faster than nalgebra in my own testing. A last point - I would consider moving towards spline regression (e.g. B-spline) over polynomial regression if you find you’re using polynomials of high degree. They tend to handle the bias/variance trade off better for complicated functions.
Hi Jordan, thanks for the kind words and the tips!
Yeah, I was surprised that the if version sped it up too. Maybe the test case I used was a sweet spot and both smaller and larger cases would perform differently. The C3H2 case I ran in the video did run more slowly than I expected. I also mixed that change with moving some of the variables out of the loop too, so I might need to revisit that section more carefully.
I will have to take a closer look at ndarray. I use their arrays in one of my other projects, but I haven't tried their matrix library. I only recently heard about the unstable SIMD module, so I've been itching to try to write some of that myself, but I could settle for using it under the hood! I also felt like there should be constructors for that matrix in nalgebra but couldn't find them in the documentation either.
The spline suggestion is very interesting too. We typically cap out at fourth-order polynomials, which correspond to our fourth-order Taylor series expansion of a molecule's potential energy surface. It sounds like the splines might even give us a different way of modeling the potential energy surface to begin with.
Anyway, thanks again! You've given me a lot more to think about.
Interesting topic! How about flamegraph / inferno? I'd like to study more about this topic. Flamegraph seems cool but I still didn't really know how to read and analyze infor that graph. if you have experience with it please make a video or you can share about it in a live streaming if you don't mind. Thanks!
I knew I had used cargo flamegraph before, and actually in that anpass project I have a recipe for it in the Makefile. I might even prefer that visualization for the time spent in each function (compared to the flow chart thing in kcachegrind), but I really like how kcachegrind can also show the source code. On a closer look, they appear to be showing slightly different data points, so it's probably worth using them both. In particular, the flamegraph is showing me spending like 33% of my time indexing matrices, which only shows up as part of another block in kcachegrind. I'll try to use it in a video or stream too, thanks for the suggestion!
One thing that may improve the performance of your project is using “&” instead of “&&” and “|” instead of “||”. This way you are saying that you want to evaluate the whole thing and doesn’t want to short circuit it. It sometimes helped me improve the performance on some benchmarks
Another thing that helped me before: use some plain linear algebra before writing the algorithm. I’ve got under 1 microsecond on linear regressions given that I know that the x axis is symmetrical. There are some basis/algorithms that let you compute on compile time and let you get a nice matrix and a nice system of equations
Interesting suggestions! I thought short-circuiting would usually speed things up, but as always, I guess you have to measure to be sure!