A wonderful book! I never saw such a combination of book, video, and codes from the author. Everything is clearly explained. I don't know how to express my gratitude in words!
Thanks for publishing these videos. I'm more of a programmer than a maths person, but it's really nice to have an idea about what algorithms there are out there to interpret datasets.
Since the Covid crisis confined me to home, you have become one of my favorite youtubers. Great succinct explanations with real applicability to problems both abstract and praactical. THANK YOU!!
Loved this. So Sad I discovered this channel so late! Finally, a channel, which doesn't dumb down and help really improve the vigor mathematically as well as conceptually without being daunted by research papers notations and lingo. I request videos on Optimization as a series - how it works in different algorithms across Supervised, Semi, Unsupervised, Reinforcement.
Thank you for crystal clear lecture. And the topic is fantastic because: a) linear model (simplicity) b) interpretability (for the reasons you have clearly explained yourself). I am looking forward for more content and I am ready to buy yet another your book professor
I used LASSO & Elastic Net for a sports betting prediction model this year in college basketball. The LASSO model did better than EN. Thanks for the explanation! It was very timely for me. :)
Very interesting video, Professor. As you mentioned, the Elastic Net algorithm combines the benefits of the Ridge Regression and the LASSO algorithms. Is there a circumstance in which one would specifically use LASSO, rather than simply always going with Elastic Net? Does Elastic Net require significantly more computation to implement? Are there issues that come with the greater generality of Elastic Net that LASSO doesn't suffer from?
Most of the stuff (not all but most) this guy is talking about are very cool and his presentation is very good and constructive so a big thank you. But actually what he is talking about are known for pretty long time (extra non-quadratic term in the minimization was discussed by mathematicians even in 19th century and Tibshirani is not the first to discover its consequences, Americans always think that when they find something they are the first people who discovered it) and have little to do why the learning algorithms and data-driven stuff are powerful. What this guy is talking about is actually classical linear algebra put into some nice algorthmic iterations. That is not the center of gravity of the data-driven science. I mean, you have to know this stuff of course and if you studied science in Europe (not USA but Europe) you know this linear algebra and much more (in Italy you have to finish whole books about quadratic forms to pass an undergrad linear algabra exam) by the time you finished the undergraduate. The power lies on probabilistic stuff based fundamentally on theory developed by Soviet mathematicians Vapnik and Chernovenkis that really made the distinction between classical statistical and probabilistic decision theory and what people novadays call AI.
Thank you for this very helpful video. I was looking for a method for sparse regression and directly used pySindy. However, unfortunately, our data is not suited to be interpreted as a dynamical system. Long story short. From the big possible selection of regression techniques - now I have some kind of overview and now SR3 should be the next step.
Thank you so much for your clear presentations. Have you been working with causal inference? I have been reading the work of Judea Perl, I find it not very accessible. If you have experience with causal inference, it would be great to know about your insights.
Thanks for the comment. Yes, I see the confusion. The x-axis label "1/lambda" is not technically correct. It is just a trend that this increases as lambda decreases, but we shouldn't read this literally as 1/lambda. What I mean is that when lambda->0 in the upper right optimization problem, then there is no sparsity penalization and the optimization will return the least squares solution.
Thank you so much, Sir. A very insightful video. Could you please throw some light on how to decide the threshold value of lambda in LASSO Regression? Is it dependent on the number of features? Thanks again, Sir.
Amazing Math visualizations!!! In particular, what software/programming language did you use to create the 3D versions of the Tibshirani plots? (minute 20:00). I think that the intuition behind the Sparsity induced by the L1 norm is much clearer in higher dimensions. It's a shame that we have to stop at 3 dimensions. Still many thanks for the visualization!
Dear Steven, it appears that you reinvented (partially) kernel-based system identification, popularized by Dr. Lennart Ljung as ReLS. Essentially, it uses inv(x*x') instead of Tikhonov diagonal loading, which is as optimal a solution as it can get. Imho, it is all about how to formalize your "physical" knowledge of the system. BTW, the ReLS's FLOPS are orders of magnitude lower than for biased estimation, compressed sensing, LASSO, etc.
Why is the SINDy spot not located at the minimum of the test curve? You put it instead at the knee of the Pareto curve. In ML, we usually use cross validation to locate the minimum of the loss function for the test dataset.
Everything's great here, only thing..the side by side images at 15:00 aren't selling it for me. I get that l1 would be pointy while l2 would be spherical. But you say, and the consensus says, that l2 can intersect at multiple points...yet the image shows a tangent. Are we talking about the not-shown possibility of that blue line cutting through and forming a secant?, but if that's the case then the same could happen for the diamond. This is unclear to me EDIT (20 seconds later lol) : AH! The idea is that the dimensionality of the point of intersection
Economic models are typically dynamic systems of difference equations not differential equations...is SINDY applicable to difference equations?? If we can discover nonlinear systems that generate economic data, that would be awesome...but I guess interpretability would still be limited...:).
I Like Someone Who Looks Like You I Like To Be Told I Like To Take Care of You I Like To Take My Time I Like To Win I Like You As You Are I Like You, Miss Aberlin
Prof Steve.... Just keep publishing these videos forever :)
If you apply LASSO on lectures of this topic only Steves' videos will survive.
A wonderful book! I never saw such a combination of book, video, and codes from the author. Everything is clearly explained. I don't know how to express my gratitude in words!
Thanks for publishing these videos. I'm more of a programmer than a maths person, but it's really nice to have an idea about what algorithms there are out there to interpret datasets.
Fantastic lecture, Steve! Probably my favourite one to date...
Wow - The best visualization of the topic i have seen so far, it's just amazing how the world learn today, virtually from anywhere - online.
Since the Covid crisis confined me to home, you have become one of my favorite youtubers. Great succinct explanations with real applicability to problems both abstract and praactical. THANK YOU!!
These videos are so much better than any lecture that I had at the university!
Loved this. So Sad I discovered this channel so late! Finally, a channel, which doesn't dumb down and help really improve the vigor mathematically as well as conceptually without being daunted by research papers notations and lingo.
I request videos on Optimization as a series - how it works in different algorithms across Supervised, Semi, Unsupervised, Reinforcement.
My favourite channel of all time. I hope we're going to get videos on Interpretability for machine learning.
Thank you for crystal clear lecture. And the topic is fantastic because: a) linear model (simplicity) b) interpretability (for the reasons you have clearly explained yourself). I am looking forward for more content and I am ready to buy yet another your book professor
Hi Professor Steve, thank you so much ❤️.
I have learned a lot from your videos, Prof. Brunton. Thank you!
Thanks for the clear explanation and ample good examples.
I used LASSO & Elastic Net for a sports betting prediction model this year in college basketball. The LASSO model did better than EN. Thanks for the explanation! It was very timely for me. :)
GREAT lecture. Knew most of the content, but had to watch it to the end anyways.
Nice job! Great visuals. Looking forward to seeing more topics! Thanks for putting your content online.
You make these topics engaging. Thanks.
Such a great lecture! Deep but enjoyable on a Saturday morning:)Thank you professor.
This is pure gold...
Thank you for always publishing amazing videos!
Very interesting video, Professor. As you mentioned, the Elastic Net algorithm combines the benefits of the Ridge Regression and the LASSO algorithms. Is there a circumstance in which one would specifically use LASSO, rather than simply always going with Elastic Net? Does Elastic Net require significantly more computation to implement? Are there issues that come with the greater generality of Elastic Net that LASSO doesn't suffer from?
I want to know this as well!
Excellent, as always. Extremely good content.
Excellent presentation Steve.
Most of the stuff (not all but most) this guy is talking about are very cool and his presentation is very good and constructive so a big thank you. But actually what he is talking about are known for pretty long time (extra non-quadratic term in the minimization was discussed by mathematicians even in 19th century and Tibshirani is not the first to discover its consequences, Americans always think that when they find something they are the first people who discovered it) and have little to do why the learning algorithms and data-driven stuff are powerful. What this guy is talking about is actually classical linear algebra put into some nice algorthmic iterations. That is not the center of gravity of the data-driven science. I mean, you have to know this stuff of course and if you studied science in Europe (not USA but Europe) you know this linear algebra and much more (in Italy you have to finish whole books about quadratic forms to pass an undergrad linear algabra exam) by the time you finished the undergraduate. The power lies on probabilistic stuff based fundamentally on theory developed by Soviet mathematicians Vapnik and Chernovenkis that really made the distinction between classical statistical and probabilistic decision theory and what people novadays call AI.
my fav one, just keep publishing
Thanks!
Thank you for the great contribution.
Thank you for this very helpful video. I was looking for a method for sparse regression and directly used pySindy. However, unfortunately, our data is not suited to be interpreted as a dynamical system. Long story short. From the big possible selection of regression techniques - now I have some kind of overview and now SR3 should be the next step.
Thanks for this great lecture.
Thank you so much for your clear presentations. Have you been working with causal inference? I have been reading the work of Judea Perl, I find it not very accessible. If you have experience with causal inference, it would be great to know about your insights.
Hi Prof Brunton, please correct me if I'm wrong: at 25:43 the least square solution is at lambda = 1 not 0 right? Since 1/0 would throw an error.
Thanks for the comment. Yes, I see the confusion. The x-axis label "1/lambda" is not technically correct. It is just a trend that this increases as lambda decreases, but we shouldn't read this literally as 1/lambda. What I mean is that when lambda->0 in the upper right optimization problem, then there is no sparsity penalization and the optimization will return the least squares solution.
Love these videos how it is easy to watch and understand, even on morning coffee ☕
Thank you very much Dr. Steve.
Thank you so much, Sir. A very insightful video.
Could you please throw some light on how to decide the threshold value of lambda in LASSO Regression? Is it dependent on the number of features?
Thanks again, Sir.
Amazing Math visualizations!!! In particular, what software/programming language did you use to create the 3D versions of the Tibshirani plots? (minute 20:00).
I think that the intuition behind the Sparsity induced by the L1 norm is much clearer in higher dimensions. It's a shame that we have to stop at 3 dimensions. Still many thanks for the visualization!
great lectures!!!!!
many thanks!
Is there a talk on SR3? Sounds really cool! Will check out the paper
Please make a video explaining ARMAX model estimation method. Thank you.
Dear Steven, it appears that you reinvented (partially) kernel-based system identification, popularized by Dr. Lennart Ljung as ReLS. Essentially, it uses inv(x*x') instead of Tikhonov diagonal loading, which is as optimal a solution as it can get. Imho, it is all about how to formalize your "physical" knowledge of the system. BTW, the ReLS's FLOPS are orders of magnitude lower than for biased estimation, compressed sensing, LASSO, etc.
just amazing
Hello Sir
It was great video. Thank you for this.
May you also make video on SISSO
what is the reference paper that connects svm and elastic lasso?
Here is the paper: arxiv.org/abs/1409.1976
Thank you Prof :)
Thanks for watching!
Why is the SINDy spot not located at the minimum of the test curve? You put it instead at the knee of the Pareto curve. In ML, we usually use cross validation to locate the minimum of the loss function for the test dataset.
Hi. Professor, please tell us how we can support this channel, shall we just buy the book/ you would set up a Patreon account?
Awesomeness thank you👍
Hi Steve could we get a lecture on Sgd stochastic gradient decent and Backpropagation!
Thanks
What kind of app. Do you use in your videos?
You can have a similar effect with OBS studio. Add a powerpoint presentation with blue background, and use a blue chroma key to make blue transparent.
@@zhanzo thank you!
Sir can u please make a video on restricted isometry property.
Everything's great here, only thing..the side by side images at 15:00 aren't selling it for me. I get that l1 would be pointy while l2 would be spherical. But you say, and the consensus says, that l2 can intersect at multiple points...yet the image shows a tangent. Are we talking about the not-shown possibility of that blue line cutting through and forming a secant?, but if that's the case then the same could happen for the diamond. This is unclear to me
EDIT (20 seconds later lol) : AH! The idea is that the dimensionality of the point of intersection
can anyone download the book?
Economic models are typically dynamic systems of difference equations not differential equations...is SINDY applicable to difference equations?? If we can discover nonlinear systems that generate economic data, that would be awesome...but I guess interpretability would still be limited...:).
wow!!
I Like Someone Who Looks Like You
I Like To Be Told
I Like To Take Care of You
I Like To Take My Time
I Like To Win
I Like You As You Are
I Like You, Miss Aberlin
I want to do PhD again :)
another comment for algorithm
Thanks for this excellent lecture!