Linear Regression 2 [Matlab]

Steve Brunton

Додати в
- Мій плейлист
- Переглянути пізніше
Поділитися

Поділитися

Вставка

Розмір відео:

Показувати елементи керування програвачем

Автоматичне відтворення

Автоповтор

Опубліковано 21 лип 2024
This video describes how the singular value decomposition (SVD) can be used for linear regression in Matlab (part 2).
Book Website: databookuw.com
Book PDF: databookuw.com/databook.pdf
These lectures follow Chapter 1 from: "Data-Driven Science and Engineering: Machine Learning, Dynamical Systems, and Control" by Brunton and Kutz
Amazon: www.amazon.com/Data-Driven-Sc...
Brunton Website: eigensteve.com
This video was produced at the University of Washington
Наука та технологія

КОМЕНТАРІ • 30

@sergiohuaman6084 3 роки тому ⁺¹
these videos should have 10K+ views. the instructor is amazing and the methodology is clear and can be applied directly. congratulations Steve keep up the great work!
@interminas08 4 роки тому ⁺¹⁶
Looks like the housing dataset isn't available in Matlab R2020a anymore. However, you can find it in the DATA zip file for Python on the book website. :)
@Eigensteve 4 роки тому ⁺³
Thanks for the tip
@syoudipta Рік тому ⁺¹
Thank you for pointing that out. This awesome course would have felt incomplete without this exercise.
@qatuqatsi8503 4 місяці тому
Thanks, Im looking at this a full 4 years later and I couldn't find the dataset anywhere! XD
@slevon28 3 роки тому ⁺²
Thank you very much for the great content! I like the fact that you keep this topic more on the mathematical side not just this "use toolbox X"-approach. This is of high value for me. I am currently walking through all your videos and I also just bought your book a minute ago.
Finally, this channel seems massively underrated to me.
Best regards from Germany.
@alikadhim2558 4 роки тому
Thanks for the great illustration.
@delaramra5572 3 роки тому ⁺¹
many thanks. Is it similar if giving the fft (as the Fourier transform of input and output )data into the regression command? it can have a better answer in some cases in comparison with time-domain data. both parts of fft (real + imaginary) data should be given to the regression solver?is there any key point?
@ankitchatterjee5343 4 роки тому ⁺¹
Sir can you share more insights on the validation part?
@kouider76 3 роки тому
Excellent as usual thanks
@FelipeCondo 3 роки тому
The video is pretty helpful, thank you Professor. How could i put in a b=Ax a sinusoidal wave, from internal waves from the ocean. the model is SSH=A cos(k x cos(theta)+k x cos(theta)- w t- phi). where A, theta and phi are my variables. I mean i am trying to fit a plane wave, but i do not get the direction (theta right). could you give me some tip please?
@songurtechnology 3 місяці тому
Thank you Steve ❤
@athenaserra8010 2 роки тому ⁺¹
Multiple linear regression?
@gustavoexel3341 2 роки тому ⁺²
Choosing whether to watch the Matlab or the Python version is like choosing to watch a dubbed movie or with subtitles, on one hand you're watching the version made originally by the author, on the other hand if you watched the other version you would understand much better
@Eigensteve 2 роки тому ⁺²
That is such a good analogy! FWIW, we always watch subbed over dubbed :)
@_J_A_G_ Рік тому
Why choose when you can watch both? I also found it interesting to read comments and see that some questions are repeated, but some seem to come from the different mindset of those choosing respective language (rather than being language related). Though observation not statistically significant. :)
@nami1540 2 роки тому
I am confused about the design of the data matrix A at this point. DIdn't we state at the beginning, that each column is a snapshot? How can each row contain a measurement now? It makes sense when I look at this from the perspective of regression. It does not combine well, though
@_J_A_G_ Рік тому
If you by "snapshot" means "sample" I agree that the initial videos stacked the features in columns and each column was a sample (a set of measurements for the same situation). It was sometimes even an entire image reshaped into a column.
To me that was confusing, I'm used to put samples into rows. Unfortunately the convention seems to be different for different situations, as you say, and I don't think he mentioned the change.
My understanding is that columns of a matrix X could be the rows of another matrix T. That would be T = X' = V S U' (same U,S,V as from svd(X) = U S V') so in essence you have the correlation among columns and rows either way. See earlier video for that hint:
ua-cam.com/video/WmDnaoY2Ivs/v-deo.html
@happysong4631 4 роки тому ⁺¹
I think if there is something wrong with the Xlabe? Since we only have 4 ingredients.
@Eigensteve 4 роки тому
Good call. This label would be more accurate if it was "mixture of ingredients".
@David-pe2dt 4 роки тому ⁺¹
Can someone explain this: when plotting the significance /
correlation of the different attributes, the response vector b has
been sorted in the previous section, but A and A2 have not been sorted
accordingly prior to performing the new multilinear regression... surely by doing this, the attribute matrix and response vector do not match as intended?
@David-pe2dt 4 роки тому
Another question that I would like to ask concerns computing Pearson or Spearman correlation coefficients between the original attributes matrix A and the response vector b. If the correlation coefficient for a given attribute has opposite sign to the slope of that attribute from multilinear regression, does that imply that the linear model is not a good fit for that particular attribute?
@_J_A_G_ Рік тому
> b has been sorted in the previous section, but A and A2 have not been sorted
Old question, but seems to be relevant!
Looking at 9:39 the line 19 sort(b), but also get sortind back. This sortind is then used on line 22 to rearrange A for the plot.
Instead, it would have been good to update A (as did with b) to make sure that it was correctly sorted everywhere.
Line 32 uses original A (which is ok) to create A2. Line 39 regression then should have used sortind on A2 (or the original b) for the regression.
I've only looked at the code on screen, perhaps fixed elsewhere.
---
About the correlation coeff: Does that even happen? If it's close to zero, of course a sign fluctuation doesn't cause much error. If big, yes that sounds like a bad fit, on the other hand it might also be the feature that is useless, but then the correlation should have indicated that.
@PenningYu Рік тому ⁺¹
I believe that is a mistake. It makes no sense to do regression with sorted b
@user-ym8rz6mw5r 2 роки тому
So here you use svd that reduces data to the square matrix of the same size as number of variables. What happens if you use less components than variables? Is that even possible?
@_J_A_G_ Рік тому
If you remember, the components are ordered by importance from SVD. Discarding components makes a less accurate approximation, but sometimes that's fine (perhaps you hade lots of noise in measurement, and getting rid of that is actually a bonus). This is also related to "feature reduction", where you can figure out that some of the data (e.g. shoe size) is marginally relevant for your target (e.g. house price) and you should exclude it. Anyway, the "features" or components selected by SVD are rarely physically relevant or matching the actual features you had in your data.
The other aspect of this was covered in Linear Systems video. Depending on your matrices, the system may be underdetermined or overdetermined. If you have only a few "variables in X" the degrees of freedom are limited. Again, this very overdetermined system leads to a more approximative solution and you may find that it "wasn't useful" even if possible.
@nasirbudhah3063 3 роки тому
To interpret regression correctly, both x and y must be collected randomly. If x is series such as time sequence, then this cannot be called regression; it is called instead least squares line fitting
@_J_A_G_ Рік тому
I don't think there is such a distinction, maybe I misunderstood your point. "Linear regression" would usually have "least square distance" as the objective to minimize. Any way you see it, you have a linear combination of features to approximate a known (possibly approximate) target value.
@camiloruizmendez4416 3 роки тому
sorry to bother housing.data is missing in the webpage
@camiloruizmendez4416 3 роки тому ⁺¹
This code fix it
filename = 'housing.txt';
urlwrite('archive.ics.uci.edu/ml/machine-learning-databases/housing/housing.data',filename);
inputNames = {'CRIM','ZN','INDUS','CHAS','NOX','RM','AGE','DIS','RAD','TAX','PTRATIO','B','LSTAT'};
outputNames = {'MEDV'};
housingAttributes = [inputNames,outputNames];
formatSpec = '%8f%7f%8f%3f%8f%8f%7f%8f%4f%7f%7f%7f%7f%f%[^

]';
fileID = fopen(filename,'r');
dataArray = textscan(fileID, formatSpec, 'Delimiter', '', 'WhiteSpace', '', 'ReturnOnError', false);
fclose(fileID);
housing = table(dataArray{1:end-1}, 'VariableNames', {'VarName1','VarName2','VarName3','VarName4','VarName5','VarName6','VarName7','VarName8','VarName9','VarName10','VarName11','VarName12','VarName13','VarName14'});
% Delete the file and clear temporary variables
clearvars filename formatSpec fileID dataArray ans;
delete housing.txt
housing.Properties.VariableNames = housingAttributes;
X = housing{:,inputNames};
y = housing{:,outputNames};

Наступне

Автоматичне відтворення