That isn't the way I've always done it, but I like this treatment. And thanks for treating the weighted-data case; that's too often omitted. This is a topic that goes a lot further than the fitting of a straight line to coordinate pairs - you can fit all kinds of (families of) functions to a set of points in n dimensions. The end goal is to choose a particular function from the given family, that "best fits" the data. The process is simplest when the chosen family is linear in the parameters (note that it can be totally nonlinear in the data variables, without disturbing that simplicity). When it isn't, you can sometimes linearize it with a transformation, which usually requires applying weights to the points, just because of the "distortion" imposed by that transformation - that is, the transformation will generally favor some points and disfavor others, which can be compensated for with weights. The stated goal of minimizing the sum of squared deviations, can be shown equivalent to your matrix method; I think that would make a nice followup to this. As I conceive of that, it's a matter of writing down the sum of squared deviations, then setting the partial derivative of that sum, wrt each fitting parameter (here those were a and b), to 0, and solving for a and b. Of course, things get really interesting (read, "hairy!") when the fit family is nonlinear in the parameters. A question I sometimes get when explaining the method of least-squares fitting is, "Why choose the squares of the deviation to minimize? Why not the absolute value or some other function of the deviations?" So that's another thing it might be helpful to go into. Fred
I love how the lights randomly turned off and you kept on explaining what was on the board. For a moment I thought that my screen went full black mode, as you were so consistent, even with the lights off. I guess you don't need to see what you wanna say - you gotta visualize it.
I really like this approach through linear algebra, I'd like to see more about linear regression and statistics in general with this approach. I'd also like to know exactly why it works this way
I’ve only ever calculated regression polynomials using calculus and partial derivatives - this is certainly an interesting way to do it; thanks for sharing it! I think I’ll stick with the calculus way because I find it more elegant, though.
@@emperorpingusmathchannel5365 It’s pretty straightforward: Set up an explicit formula for the sum of the square areas, take the partial derivatives with respect to the coefficients of your regression polynomial, and solve the resulting linear system of equations.
@@emperorpingusmathchannel5365 The method generally used for regression is called the least-square method, and the goal is to minimize the total sum of the areas of the squares with side length equal to the difference between the y coordinate of the point and the value of the polynomial at the same input.
Gracias, en nombre de los Estadísticos. Es una presentación un tanto distinta a lo que hay en los libros de Estadística, pero es muy interesante. Gracias otra vez.
@@FunctionalIntegral I get you, but I personally really enjoy it, mostly because it gives me an opportunity to brush up on my explanatory physics skills
Hey Dr. Peyam, why do we find the line that minimizes the vertical distance from the points to the line? Isn't it better to find a line that minimizes the actual distance to the line (the normal line from the regression to the point)?
Hi, I really love your videos but why don’t you move the camera and place it right in front of the board? It’s way better for the eye, and really more enjoyable. I really hope you take this count. Thank you
Dr Peyam I mean, no? Blackpenredpen does it really well, papa flammy does it really well... you made recently a video on y=x volume calculating and it was great!....
A^T A does not necessarily have an inverse. If A is skew-symmetric (A^T = -A), then the determinant of A is 0, so the determinant of A^T A is also 0, thus A^T A is not invertible.
@@mohammedal-haddad2652 It's just a counter-example. The determinant is defined for square matrices and A could be non-square. At any rate, the fact that A^T A is a symmetric matrix does not preclude it having a determinant of 0, as evidenced by the counter-example.
There's no difference, a and b are arbitrary and since addition is commutative (the order the expression comes in doesn't matter) you can stick your x to either of them..in linear algebra, I was taught with 'y = mx + c'
Note: There's a typo at the end of the video: You also have to premultiply b by your matrix with 1/2
That isn't the way I've always done it, but I like this treatment.
And thanks for treating the weighted-data case; that's too often omitted.
This is a topic that goes a lot further than the fitting of a straight line to coordinate pairs - you can fit all kinds of (families of) functions to a set of points in n dimensions. The end goal is to choose a particular function from the given family, that "best fits" the data.
The process is simplest when the chosen family is linear in the parameters (note that it can be totally nonlinear in the data variables, without disturbing that simplicity).
When it isn't, you can sometimes linearize it with a transformation, which usually requires applying weights to the points, just because of the "distortion" imposed by that transformation - that is, the transformation will generally favor some points and disfavor others, which can be compensated for with weights.
The stated goal of minimizing the sum of squared deviations, can be shown equivalent to your matrix method; I think that would make a nice followup to this.
As I conceive of that, it's a matter of writing down the sum of squared deviations, then setting the partial derivative of that sum, wrt each fitting parameter (here those were a and b), to 0, and solving for a and b.
Of course, things get really interesting (read, "hairy!") when the fit family is nonlinear in the parameters.
A question I sometimes get when explaining the method of least-squares fitting is,
"Why choose the squares of the deviation to minimize? Why not the absolute value or some other function of the deviations?"
So that's another thing it might be helpful to go into.
Fred
I love how the lights randomly turned off and you kept on explaining what was on the board. For a moment I thought that my screen went full black mode, as you were so consistent, even with the lights off. I guess you don't need to see what you wanna say - you gotta visualize it.
I did this in AP stats earlier this year but not deriving it by Linear algebra. This is amazing! Thanks for the awesome video!
To verify the line is correct the mean value of x must spit out the mean value of y 👉 f (3,6)=2,2✔✔ excellent as always
Literally just covered this in my linear algebra class! Great video thank you for your content
I really like this approach through linear algebra, I'd like to see more about linear regression and statistics in general with this approach. I'd also like to know exactly why it works this way
I dreamed so long ago for this video!!! Thank you so much Dr. Peyamtastic!!
I’ve only ever calculated regression polynomials using calculus and partial derivatives - this is certainly an interesting way to do it; thanks for sharing it! I think I’ll stick with the calculus way because I find it more elegant, though.
Can you explain your method?
@@emperorpingusmathchannel5365
It’s pretty straightforward: Set up an explicit formula for the sum of the square areas, take the partial derivatives with respect to the coefficients of your regression polynomial, and solve the resulting linear system of equations.
@@beatoriche7301 sum of square areas of what?
@@emperorpingusmathchannel5365 The method generally used for regression is called the least-square method, and the goal is to minimize the total sum of the areas of the squares with side length equal to the difference between the y coordinate of the point and the value of the polynomial at the same input.
Gracias, en nombre de los Estadísticos. Es una presentación un tanto distinta a lo que hay en los libros de Estadística, pero es muy interesante. Gracias otra vez.
Que genialidad. Por trucos como este, es que tengo activada la campanita
Why does multiplying both sides by A-transpose give the least-square fit? I'm not asking for a proof, just a suggestion of why that might be.
I think I explain that in a video called Least Squares
Lights out seems to be a common feature of your videos now...
Sublime la primera vez, y sublime hoy q lo volví a ver
You reminded me of my lab reports in experimental physics many years ago, where I frequently did this linear regression :D.
I'm doing this course right now!
lol have fun. I really hated it at that time as a theoretical physicist. :D
@@FunctionalIntegral I get you, but I personally really enjoy it, mostly because it gives me an opportunity to brush up on my explanatory physics skills
Given that line, find the correlation coefficient of the data by hand.
I would love to see topics reflecting mathematical beauty from you. (Not saying you should stop what you are doing now)
What is the motivation behind multiplying by Atranspose?
Because Q^T Q = I (not sure if I mentioned that, but if I didn’t, check out my least-squares video)
loved it!
Great video. Thanks.
Hey Dr. Peyam, why do we find the line that minimizes the vertical distance from the points to the line? Isn't it better to find a line that minimizes the actual distance to the line (the normal line from the regression to the point)?
It’s equivalent; if one is minimized, then so is the other, and vice-versa
Dr Peyam do you consider yourself to be a frequentist or a Bayesian?
I’m a Peyamian
Hi, I really love your videos but why don’t you move the camera and place it right in front of the board? It’s way better for the eye, and really more enjoyable. I really hope you take this count. Thank you
That means that you are blocked by him as he writes, which is a really big issue for a leftie.
Exactly what sugarfrosted said!
Dr Peyam I mean, no? Blackpenredpen does it really well, papa flammy does it really well... you made recently a video on y=x volume calculating and it was great!....
2:22 That’s weird. My phone locked but I can still hear him.
Merci!
But how do we know ATA has an inverse? Thanks for the nice example.
A^T A does not necessarily have an inverse. If A is skew-symmetric (A^T = -A), then the determinant of A is 0, so the determinant of A^T A is also 0, thus A^T A is not invertible.
@@chongli297 Is this the only situation or it is just a counter example?
@@mohammedal-haddad2652 It's just a counter-example. The determinant is defined for square matrices and A could be non-square. At any rate, the fact that A^T A is a symmetric matrix does not preclude it having a determinant of 0, as evidenced by the counter-example.
@@chongli297 Thanks, that was helpful.
It has an inverse iff the columns of A are linearly independent, which is usually the case for those kinds of problems
a + bx is what we use in our stats class what is the difference!
There's no difference, a and b are arbitrary and since addition is commutative (the order the expression comes in doesn't matter) you can stick your x to either of them..in linear algebra, I was taught with 'y = mx + c'
I somehow forgot about being able to weight points.
Thanks a lot D peyam السلام عليكم
@Yo Ming it means peace for u in arabic language
@@OtiumAbscondita yes there is an arabic language in google
@Yo Ming حياك الله انا احب الجزائر والجزائريين انا جمال استاذ رياضيات فلسطيني مقيم بالاردن تحياتي لك اخ يوسف