Fitting a Line using Least Squares

Поділитися
Вставка
  • Опубліковано 6 вер 2024

КОМЕНТАРІ • 208

  • @virtually_passed
    @virtually_passed  2 роки тому +39

    Typo at 11:39 it should be
    || [A]X - b ||^2
    Don't worry, this doesn't affect anything in the video :)

  • @klafbang
    @klafbang 2 роки тому +99

    Nice video, but it's a shame you don't give more intuition for the choice of the least squares vs. the other distance measures and the fact that this is just a projection onto a linear function space - that realisation is what really made linear regression click for me and made it possible to trivially generalise it to other functions.

    • @virtually_passed
      @virtually_passed  2 роки тому +16

      Glad you liked the video and thanks for the feedback!

    • @red_rassmueller1716
      @red_rassmueller1716 2 роки тому +4

      Still, the beginning was just introduction. He doesn't have to pick it up again if he wants to talk about the square method.

    • @merseyless
      @merseyless Рік тому +3

      ​@@red_rassmueller1716Then why mention them in the first place? You can't blame us for being curious about an unresolved comparison.

    • @asthmen
      @asthmen Рік тому +1

      I agree with this comment - I've always wondered why we don't ever use the other two measures, and this would have been a good opportunity to naswer the question. Could you maybe point to any other resources that do?

  • @miguelcerna7406
    @miguelcerna7406 2 роки тому +29

    Excellent video. Love the proof behind the parabola and the global min that the squared residuals must eventually attain. Bravo sir.

    • @virtually_passed
      @virtually_passed  2 роки тому +3

      Thanks!

    • @leif1075
      @leif1075 Рік тому

      @@virtually_passed and why not just set a to zero allthe time? sint that easier--otherwise I don't see how to tell if your line starts close to origin or not

  • @navegaming8198
    @navegaming8198 3 місяці тому +2

    Not using sum notation on the first proof is making it so much easier for me to understand. Brilliant!

  • @johnchessant3012
    @johnchessant3012 2 роки тому +13

    Great video. Here's a cool fact: The first row of the matrix equation at 14:27 says that the sum of the residuals must be zero, which (after a bit of algebra) proves that the least-squares line must map the average of x to the average of y.

    • @virtually_passed
      @virtually_passed  2 роки тому +6

      Very cool fact! Thanks for sharing! I'd never heard of this before so I decided to prove it for myself:
      r1 + r2 + ... + rn = 0
      (a+bx1-y1) + (a+bx2-y2) + ... + (a+bxn-yn) = 0
      n*a + b(x1+x2+...+xn) - (y1+y2+...+yn) = 0
      divide both sides by 'n'
      a + b*x_avg - y_avg = 0
      y_avg = a + b*x_avg
      Therefore the point P = (x_avg, y_avg) will lie on the line y = a+bx. Very neat!

    • @shophaune2298
      @shophaune2298 2 роки тому +6

      @@virtually_passed ...at which point we could consider that point P to be a 'new' origin, and use coordinates relative to it to find the best fit of the data passing through that point - the simpler 1-dimensional case explored earlier in the video.

    • @morgan0
      @morgan0 2 роки тому +1

      yeah partway thru the video i stopped to remake it in desmos to see if the horizontal component could be used in some way because i was curious (tho i didn’t get anywhere with it), and i first offset them all by the x and y averages and did the 1d case

    • @VitinhoFC
      @VitinhoFC 2 роки тому +2

      This is neat indeed!

  • @matveyshishov
    @matveyshishov 2 роки тому +14

    Thanks for the visuals!
    When I was learning OLS, I remember that my primary questions were a) why is the sum a good choice, what other options are there? and b) why squares and not absolute values?
    I see that you just jump over these two questions, but from my experience, for somebody, who is trying to understand the method (as opposed to memorizing it) these are the central questions, which unlock the understanding. So you may want to add some exposition on that in the future, I'm sure many students will appreciate.

    • @virtually_passed
      @virtually_passed  2 роки тому +4

      Hi thanks for your kind words and feeback!
      I'm actually in the process of making more videos now so this is really good advice :) thanks! As a short answer to your question:
      1) One of the massive advantages of Ordinary Least Squares (OLS) is that it guarantees convexity (ie, the parabola has only one global optimum). Convexity is a big deal in the field of optimizations. Some other fitting methods don't have this feature meaning that it's possible to get stuck in local optimums, which means that you won't get the best fit.
      2) This is super computationally fast to compute.
      There are downsides to this method though which I haven't talked about. One is that it's highly sensitive to large outliers (since it squares the error). But this is partially resolved by adding a regularization term (basically adding a 1-norm and a 2-norm together in the objective).
      I'll elaborate more in a future video :)

    • @matveyshishov
      @matveyshishov 2 роки тому

      @@virtually_passed Thank you very much!

    • @virtually_passed
      @virtually_passed  2 роки тому

      @@matveyshishov you're welcome!

    • @andrewzhang5345
      @andrewzhang5345 2 роки тому

      @@virtually_passed Regarding computing, it's a bit misleading to claim you don't need iterations to find your parameters. Given a small dataset, you can fit the most complex model with the slowest optimization method quickly. Indeed, for least squares, solving the normal equation is trivial when the data set is small, but difficult with a larger dataset, and one resorts to iterative methods to solve least squares.

    • @virtually_passed
      @virtually_passed  2 роки тому +1

      @@andrewzhang5345 I agree, thanks for the comment. I've edited my response.

  • @itamar.j.rachailovich
    @itamar.j.rachailovich 2 роки тому +1

    I watched it few days after you uploaded it, but I was in bed almost sleeping. Today I watched it again. It's amazing, and you are excellent teacher!!! Keep going!!!

  • @koendos3
    @koendos3 2 роки тому +12

    Wow, ive been binge-watching the SoMe2 video's. And ive been impressed with everyone's effort. Especially this video is so sick!

  • @grinreaperoftrolls7528
    @grinreaperoftrolls7528 2 роки тому +1

    Hold up, I’ve DONE THIS before. But this is a much better explanation. Thank you.

  • @mymo_in_Bb
    @mymo_in_Bb Рік тому +1

    The method of least squares was (along with everything else going on at the time) the point when i stopped understanding my linear algebra course at uni. And now i understand it. Thanks a lot

  • @JKTCGMV13
    @JKTCGMV13 9 місяців тому +1

    Within seconds of the video playing I immediately got an intuitive explanation of the least squares method better than I've ever had

  • @BharmaArc
    @BharmaArc 2 роки тому +36

    Great video as always! Great visuals that really give insight to the problem, I also appreciate how you color code things and show every step of the computation. A tiny correction: at 11:40 it should be norm *squared*.

    • @virtually_passed
      @virtually_passed  2 роки тому +5

      Thanks for the comment! I really appreciate the kind words. You're absolutely right!
      At 11:40 it should be
      error = ||AX-b||^2
      Thanks for pointing that out :) fortunately it doesn't affect the rest of the video though :)

    • @leif1075
      @leif1075 Рік тому

      @@virtually_passed wait just because the equation at 7:50 has a bunch of squared terms does not tell you it's a parabola though so why did you say that??

    • @leif1075
      @leif1075 Рік тому

      @@virtually_passed oh and also if the linear b terms xy if some of those are negative,then even if this mightbe a parabola it might not always point up--since the negative linear b terms might be greather than the positive b squared terms..see what I mean??

    • @leif1075
      @leif1075 Рік тому

      @@virtually_passed Hope you can respond when you can. Thanks very much.

    • @virtually_passed
      @virtually_passed  Рік тому

      ​@@leif1075 Sorry for the late reply! Notice that the error has the form of a parabola: e = k1*b^2 -2*k2 * b + k3
      Where the constants k1, k2, and k3 are given by:
      k1 = x1^2 + x2^2 + ...
      k2 = x1y1 + x2y2 + ...
      k3 = y1^2 + y2^2 + ...
      Also note that k1 is always >= 0 because any real number squared is positive. It honestly doesn't matter what the values of k2 and k3 are, since the convexity of a parabola is always determined by the coefficient of the squared term. I've created a desmos link for you here to see for yourself why this is true: www.desmos.com/calculator/waagmohtua

  • @eriktempelman2097
    @eriktempelman2097 2 роки тому +3

    Absolutely wonderful ! ! !
    Combines linear algebra with calculus. This video is a GREAT "commercial" for both topics.

    • @virtually_passed
      @virtually_passed  2 роки тому

      Thanks :)

    • @eriktempelman2097
      @eriktempelman2097 2 роки тому +1

      You're welcome! Really, after all those years (I'm from 1969) this is the first time I see how both can go hand in hand.

  • @giordano7703
    @giordano7703 2 роки тому +9

    Very simple, yet effective, explanation; I come out of this video happy knowing I learned something new which I would have never tackled by myself. Great work!

  • @fire17102
    @fire17102 2 роки тому +2

    Holy $#17 this is like a dream come true, I can't believe you made this interactive! I literally just commented I want interactiveness built into #some2 videos! I haven't even gotten to the video yet... Mad respect guys you're awesome

    • @virtually_passed
      @virtually_passed  2 роки тому

      Thanks for the kind words! We intended to make it more interactive, but we ran out of time. Originally we wanted it to be a "choose your own adventure" thing where you could choose the type of proof, choose whether you wanted to see a proof for 1 unknown (easy version) or 2 unknowns (harder version). Interactivity is still a dream of mine :)

  • @tex1297
    @tex1297 2 роки тому +1

    I wish we had all this materials back in scool 30 years ago... Nice work

  • @robertkelly649
    @robertkelly649 Рік тому +3

    This an absolutely beautiful explanation of least squares and where it came from. The visual and conceptual combined was really wonderful. Wish I had this in college. It would have spared me a lot of pain. 😄

  • @allenadriansolis8032
    @allenadriansolis8032 2 роки тому +4

    Great explanation and visualization. Well done.

  • @kayleighlehrman9566
    @kayleighlehrman9566 2 роки тому

    I can't remember who originally said it but one of my favourite quotes about proofs is that "you shouldn't set out to prove something that isn't already almost intuitive."

  • @TroyaE117
    @TroyaE117 Рік тому +1

    Good video! Never had to use multi variable approach but now I know.

  • @johnathanmonsen6567
    @johnathanmonsen6567 2 роки тому +1

    I understood JUST enough linear algebra to understand how clever that is. I started to phase out on the multivariate (that's where I started flagging in college), but dang that was a really cool reveal that the 'Jacobian' was just A transverse.

    • @virtually_passed
      @virtually_passed  2 роки тому

      Glad you managed to follow it! Linear algebra is very powerful!

  • @JohnBoen
    @JohnBoen 2 роки тому +1

    Least squares...
    I had always thought of it as a square root sort of thing. I do statistics. I write queries and do data analysis as a job - for 25 years. I have a bit of a clue...
    But the the changing sizes of the squares as the line moved made me go "Ohhhhhhhhhhhh!". It just suddenly became intuitive.
    Great way to explain it- you are getting a comment 30 seconds in. Nice work :)

  • @TheDGomezzi
    @TheDGomezzi 2 роки тому +1

    Some recreational mathematics is learning cool stuff you didn’t already know, and some recreational mathematics is re-learning stuff you knew but with a better feel and intuition behind it. I think a lot of people overlook that second one, and this video shows how it can be really cool!
    Would love to see a video where you go over the three methods you suggested and their pros and cons, that would be super cool.

    • @virtually_passed
      @virtually_passed  2 роки тому

      Hey thanks for the comment and kind words. A lot of people have requested a summary video like that :) it's on the list :)

  • @ShankarSivarajan
    @ShankarSivarajan 2 роки тому +3

    3:01 It _does_ seem subjective when you put it like that. Which is why it's important to point out that the Least Squares method is equivalent to Maximum Likelihood Estimation for normal data, which makes it objectively superior.

    • @HesderOleh
      @HesderOleh 2 роки тому +1

      I don't think it is objective to assume that the MLE is the best estimator. There are plenty of circumstances where you actually want something else.

  • @web2wl00p
    @web2wl00p 2 роки тому +4

    very very nice! I have been teaching LSQ optimization to undergrads for years, now I will just point them to your video 🙂best of luck for #SoME2

  • @gustavom8726
    @gustavom8726 Рік тому +1

    THis is awsome!! It represents perfectly the SoM2 spirit, but with a very original way to explain and present.Thank you so much

  • @movax20h
    @movax20h 2 роки тому +1

    Minimizing of perpendicular distance (squared) is also sometimes used, especially when having uncertaininty in both x and y. It is however way more complex computionally. Most packages for fitting do not support it, but it is possible, and used this in the past (including estimating error of the parameter estimation).

  • @MannISNOR
    @MannISNOR 2 роки тому +1

    Great job - This is absolutely fantastic! You are doing us all a favor.

  •  2 роки тому +1

    Broh... I love you.
    This was beautiful!!!
    So helpfull to understand Vandermonde's matrix...

  • @arddenouter4553
    @arddenouter4553 Рік тому

    Maybe mentioned already, but I think that what you demonstrate is the Reduced-Major-Axis-method, where the error can be in two variables. The least-squares-method assumes an input parameter without error (say the x axis) and an output parameter with error (say the y axis). The least-squares-method reduces (in case of in error in y) the vertical distance between the line and the actual points. At least that is how I have understood it while using it some time ago.

  • @yusufkor5900
    @yusufkor5900 2 роки тому +1

    Whoa! I'm illuminated! Thanks.

  • @aamer5091
    @aamer5091 Рік тому +1

    Words don't appropriately express gratitude, but thanks.

  • @Bunnokazooie
    @Bunnokazooie Рік тому +1

    Awesome visualization

  • @mackansven3656
    @mackansven3656 6 місяців тому +1

    This was great, all of it, amazing job.

  • @nehachopra2954
    @nehachopra2954 2 роки тому +2

    Thank you so much for making this topic so so so interesting
    Hope to see much more

    • @virtually_passed
      @virtually_passed  2 роки тому

      Hey thanks for the kind words! I've made another video on least squares here and I intended to make a few more:
      ua-cam.com/video/wJ35udCsHVc/v-deo.html

  • @hpp6116
    @hpp6116 2 роки тому +1

    Fantastic presentation!

  • @pedrodaccache4026
    @pedrodaccache4026 2 роки тому

    wow, can't understand how this channel only has 13k subs. awesome video!

  • @AlexeyMatushevsky
    @AlexeyMatushevsky 2 роки тому +1

    Great video, thank you so much for your explanation!

  • @nadavperry2267
    @nadavperry2267 Рік тому

    this is really awesome! allthough as a math major I would've liked to see an expansion of the formula for n dimensions (I would assume it uses r_i^n and the jacobin and shouldn't be very hard to generalize although I may be wrong)

  • @MaxPicAxe
    @MaxPicAxe 2 роки тому +1

    Wow that was such a well-made video

  • @agspda
    @agspda 2 роки тому +1

    Love your content, luckily I just got recommended this video, I got lost a bit at the end with the multivariable calculus but I understood the reasoning and that is a lot, thanks!!

    • @virtually_passed
      @virtually_passed  2 роки тому

      Thanks for the comment. As long as you get the big picture, that's what matters most. The rest are all details :)

  • @braineaterzombie3981
    @braineaterzombie3981 Рік тому +2

    Excellent video. Make more video on statistics.

  • @pierrebegin9253
    @pierrebegin9253 2 роки тому

    Least square fit is highly sensitive to outlayer points in the fit therefore the fit is distorted by bad points in the fit. A more robust estimation can be obtained by minimizing the mediane instead of the squared error which is biased. Try it !

    • @virtually_passed
      @virtually_passed  2 роки тому

      Yes! Which is why sometimes the objective function is the sum of the 2 norm and 1 norm to make a more robust fit :)

  • @HesderOleh
    @HesderOleh 2 роки тому

    Nice video. It took far too long for me to understand this, because I didn't have the words to articulate my question of why squares instead of L1 metric throughout highschool and then uni or I would be brushed off with a silly answer like that it is just the best way. A similar question that I had unanswered for a long time is why e is the number that is raised to the i*theta for polar coordinates in the complex plane, and it often was dismissed with the fact that sin and cos were connected to e, but not why or how.
    When I did have a good professor who explained it well I was so happy. I wondered if there were ever times that we would want to use higher norms or Lp-spaces, because some of those are easily solved as well, but they told me that it would give undue weight to outliers. I was satisfied with that answer at the time, but now I wonder if there are any applications where you do want the focus to be on outliers where those datapoints are actually an important part of telling the story of what the data means.

  • @nivcohen5371
    @nivcohen5371 2 роки тому +1

    Amazing video! Very enlightening

  • @TheGoldenFluzzleBuff
    @TheGoldenFluzzleBuff 2 роки тому +1

    Wow. You more or less just summarized concisely what I spent weeks learning in 4000 level econometrics courses.
    Could you do one for multivariable (multidimensional) values?

    • @virtually_passed
      @virtually_passed  2 роки тому

      Thanks for the comment. What do you mean by multidimensional values? Do you mean to teach multivariable calculus? Or teach LS with multiple unknowns? :)

  • @EuphoricPentagram
    @EuphoricPentagram 2 роки тому +1

    Im loving this
    I was never really good at math in school (only making it to algebra 1/2 and geometry) and I'm already half way through (wanted to pause it so I don't miss any) and it's amazing
    I've been able to understand everything very well (some time programming probably helped) but you have made it so accessible and I love how you take a moment to pause and explain what the key points are (like that there's one global minimum) and that we should notice them to remember for later it's very helpful in keeping track of everything
    If Im ever teaching something I'm definitely stealing that idea
    10/10 will Like and Subscribe
    Edit: just finished it with the matrices and it was still very understandable (even if I don't fully understand it) I was able to grasp enough to see and understand the power of this
    And when you coded it it also helped a lot cus it brought it to a language I knew instead of one I'm still learning
    Still 10/10 would recommend

    • @virtually_passed
      @virtually_passed  2 роки тому

      Hey thanks so much for the kind words. I spent a lot of effort trying to make the video as accessible as possible so I'm glad it worked for you!

  • @bitroix_
    @bitroix_ 2 роки тому +2

    Love it!

  • @lcfrod
    @lcfrod 2 роки тому +1

    Excellent. Thank you.

  • @jakobr_
    @jakobr_ 2 роки тому +3

    Wow, it’s surprising how compact the expression ended up being! Very nice video.
    I wonder about one of the other approaches you showed at the beginning, namely the “minimize perpendicular distance” method. That one appeals to me because it doesn’t seem to care about the rotation of our coordinate axes. If we were to turn that into a sort of “least circles” fit, would the resulting expression be anywhere near as neat or useful?

    • @virtually_passed
      @virtually_passed  2 роки тому +3

      Hey, thanks for your comment.
      The method you're referring to is formally called Orthogonal Distance Regression. If you want all the details I'd recommend reading the book Numerical Optimization by Stephen Wright.
      In short, this method is superior in many ways but is generally more computationally expensive because the "Jacobian" matrix shown at 14:45 is no longer a constant in the general case, and so the minimization requires iterations.
      Hope that makes sense :)

    • @jakobr_
      @jakobr_ 2 роки тому

      @@virtually_passed Thanks for the detailed answer! It made a lot of sense

  • @algorithminc.8850
    @algorithminc.8850 2 роки тому +2

    Nice useful channel. Great stuff ... thanks. Cheers.

  • @maatiger3009
    @maatiger3009 Рік тому +1

    you are incredible ❤❤❤❤❤

  • @gauthierruberti8065
    @gauthierruberti8065 2 роки тому +2

    I really like this video

  • @loganreina2290
    @loganreina2290 10 місяців тому

    The 2 norm at 11:50 or so should be squared. Very nice presentation

    • @virtually_passed
      @virtually_passed  10 місяців тому

      Thanks! You're right! I've made a post about this.

  • @hcbotos
    @hcbotos Рік тому +1

    Very nice video!

  • @vijay1968jadhav
    @vijay1968jadhav 2 роки тому +1

    Wonderful video . Need more videos sir

  • @avyakthaachar2.718
    @avyakthaachar2.718 Рік тому +1

    Amazing ❤

  • @nikolaimikuszeit3204
    @nikolaimikuszeit3204 Рік тому

    Very nice visual approach, but as a physicist, I am missing the motivation of " y errors only" vs "x and y errors". In other words, one could rotate the squares and go back to the ODR that is hinted to in the beginning and still get a least-squares method. (BTW unlucky choices: vector X and vector b) A video about ODR and/or SVD would be nice.

  • @gaganaut06
    @gaganaut06 2 роки тому +2

    awesome thanks, can you do one with nonlinear curve fitting aswell

    • @virtually_passed
      @virtually_passed  2 роки тому +1

      I actually intend to do just that! First I want to make a video on another proof of linear least squares using the column space of A. Then, if I have time I'll do one on orthogonal fitting using nonlinear least sqaures

    • @gaganaut06
      @gaganaut06 2 роки тому

      @@virtually_passed awesome, waiting.....

  • @rajasvlog7729
    @rajasvlog7729 2 роки тому +1

    Nice video

  • @MurshidIslam
    @MurshidIslam 2 роки тому

    Excellent video. Can you do another video explaining the pros and cons of the other methods (i.e., the vertical distance and the perpendicular distance methods) compared to the least squares method?

    • @virtually_passed
      @virtually_passed  2 роки тому +1

      Hi thanks for the comment. Quite a few others have requested a video like that. It's on the list :)

  • @egoworks5611
    @egoworks5611 2 роки тому +1

    Great video

  • @asterixx6878
    @asterixx6878 2 роки тому

    This is so much easier and more elegant to derive using linear algebra alone. There is no need to use Multivariable Calculus.

    • @virtually_passed
      @virtually_passed  2 роки тому +1

      I agree it's beautiful and elegant to derive it using linear algebra alone! I actually just made a video doing exactly that :)
      ua-cam.com/video/wJ35udCsHVc/v-deo.html

  • @teaformulamaths
    @teaformulamaths 2 роки тому +1

    Very elegant video, great concept to choose! Very 3b1b. Is there another standard to aspire to? 🤔

    • @virtually_passed
      @virtually_passed  2 роки тому

      Thanks for the kind words. 3b1b is a hero of mine :)

  • @gregorygargioni
    @gregorygargioni 2 роки тому +1

    The sad part of this amazing applied math video is that it ends!!!

    • @virtually_passed
      @virtually_passed  2 роки тому

      Thanks! I have a follow-up proof video about least squares if you're interested ☺️

  • @kendakgifbancuher2047
    @kendakgifbancuher2047 2 роки тому +1

    Virtually Based. Thanks for the video, subscribed

  • @Duiker36
    @Duiker36 2 роки тому

    I was really hoping you'd follow through on that promise to explain why Least Squares is better than the other two approaches.

    • @virtually_passed
      @virtually_passed  2 роки тому

      I intend to. Meanwhile, I've written quite a bit on this in other people's comments. :)

  • @martinsanchez-hw4fi
    @martinsanchez-hw4fi 2 роки тому +3

    Hi! Awesome video! Which tools do you use to create the interactive exercises?

    • @virtually_passed
      @virtually_passed  2 роки тому +4

      I collaborated with someone who did most of the heavy lifting regarding the simulation. We used P5.js to make all the simulations. A link to his GitHub and his website is in the description :)

  • @AhmedHan
    @AhmedHan 2 роки тому +2

    Great video. Many thanks for the visualization of the problem.
    If I remember correctly, if we increase the length of the x vector, we could fit polynomials as well. Can you confirm this?

    • @virtually_passed
      @virtually_passed  2 роки тому +1

      Correct. For example you could try and fit data to the function
      y = a + bx + cx^2 + dx^3
      In this case the vector X would be:
      X = [a,b,c,d]
      The A matrix will also have more columns as well.

  • @jarikosonen4079
    @jarikosonen4079 2 роки тому +1

    What about method of least circles...

    • @virtually_passed
      @virtually_passed  2 роки тому +1

      Cool idea! The answer will actually be the same. Here's why:
      Instead of minimizing:
      r1^2 + r2^2 + ... + rn^2
      You will be minimizing:
      (π/4)*r1^2 + (π/4)*r2^2 + ... + (π/4)*rn^2 (this is because a circles area is πD^2/4)
      (π/4) * (r1^2 + r2^2 + ... + rn^2)
      Notice this is just a scaled version of the same minimization problem from before, so the parabola will just be a bit less steep but will have the same optimum.

  • @luckabuse
    @luckabuse 2 роки тому +1

    How about minimize areas setwise discounting the intersections of the graphic squares? It would discount dense parts and should make a better fit

    • @virtually_passed
      @virtually_passed  2 роки тому +3

      What an interesting idea! I don't know of any methods that do that. A consequence of this method is that a bunch of clumped points would have a similar weighting as a single point. That could be quite useful, actually! Interesting idea.

  • @mattgsm
    @mattgsm Місяць тому +1

    Why can't you "divide" A transpose from both sides of that final equation?

    • @virtually_passed
      @virtually_passed  Місяць тому

      Good question. When dealing with matrices we can't divide anymore. We need to multiply both sides by the inverse matrix. And this operation is only defined for square matrices. A^T isn't square in general (there could be more rows than columns or visa versa). However, in the very unlikely case that A happens to be square (ie there are just as many unique data points as unknowns) then you can inverse A^T and the pseudo inverse will collapse into the regular inverse of A.
      Hope that makes sense

  • @guardianangel1337
    @guardianangel1337 Рік тому

    I'll just buckle up and do the regression by hand. I guessed the value for b correctly. I don't need scary algorithms and maths.

    • @virtually_passed
      @virtually_passed  Рік тому

      Nothing wrong eyeballing it for simple cases :) Most programs have this inbuilt under the hood so you likely don't need to worry about the theory anyway :)

  • @mrinfinity5557
    @mrinfinity5557 2 роки тому +1

    okay, but why would the single line methods not work? especially the vertical line ones which would seem to do the same thing but without the squaring?

    • @virtually_passed
      @virtually_passed  2 роки тому

      That's a great question! The short answer is that it does work! The 'verticle lines' method is actually used in some applications! If you go through the math, the objective function that we try to minimize here is the "1-norm" of the residual vector. This is because we try to minimize the sum of the absolute value of all of the residuals.
      In fact, sometimes the least squares method is used in conjunction with the 1-norm method in an attempt to make the fit more robust to outliers. If you want to see more, click on this amazing video by Steve Brunton:
      ua-cam.com/video/GaXfqoLR_yI/v-deo.html&ab_channel=SteveBrunton

  • @livedandletdie
    @livedandletdie Рік тому

    I mean, it's just one more step than the Perpendicular lines. After all, if you have a distance, there's no need to square it, sure it keeps all distances positive, but so does the absolute in 2d...

  • @UshijPatel
    @UshijPatel 3 місяці тому +1

    can this idea be extended to fit any degree of polynomial function?

  • @adolf_08
    @adolf_08 2 роки тому

    Excelent video!

  • @PhilipSmolen
    @PhilipSmolen 2 роки тому +1

    Nice! What are the gray circles that appear in the background for about one frame at a time?

    • @virtually_passed
      @virtually_passed  2 роки тому +1

      I use Microsoft onenote to do all the handwriting mathematical equations. Sadly whenever I press too hard on my touchscreen with my hand, onenote displays that annoying graphic. I tried to get rid of most of them but sadly I couldn't get rid of them all :(

    • @PhilipSmolen
      @PhilipSmolen 2 роки тому +1

      @@virtually_passed Ah. I thought it was an easter egg or a subliminal message. Good luck in the contest.

    • @virtually_passed
      @virtually_passed  2 роки тому

      @@PhilipSmolen thanks!

  • @_earlyworm
    @_earlyworm Рік тому

    this is not for beginners but for anyone who got a B in statistics this is better than 3b1b

  • @luanmartins8068
    @luanmartins8068 Рік тому +1

    Do you have any recommendation of a material that connects this topic with QR Factorization?

    • @virtually_passed
      @virtually_passed  Рік тому

      Hi great question! I'm sure there are many resources online, but I use Chapter 10 of the book "Numerical Optimization" by Stephen J Wright. Good luck!

    • @luanmartins8068
      @luanmartins8068 Рік тому +1

      @@virtually_passed Thanks! Also, very good video. I shared with my university colleagues. I really found it very well done

    • @virtually_passed
      @virtually_passed  Рік тому

      @@luanmartins8068 thanks!

  • @MattBell
    @MattBell Рік тому

    How you not gonna name your collaborator at the start?

  • @idjles
    @idjles 2 роки тому +1

    You could have completed the square instead of calculating de/db. You would have found b without calculus.

    • @virtually_passed
      @virtually_passed  2 роки тому +1

      You're absolutely right!

    • @idjles
      @idjles 2 роки тому

      @@virtually_passed and if you replaced all the sums with Sum x^2, Sum xy and sum Y^2 then you could have done two things - solved everything without matrices, and also shown how incredibly efficient this algorithm is because you can incrementally add and remove points from those sums.

    • @virtually_passed
      @virtually_passed  2 роки тому

      @@idjles Indeed the example I showed with 2 unknowns (a and b) can be solved without matrices. However, the method I used to solve it can be applied to a polynomial with 'n' parameters! Deriving a solution for 'n' unknowns without matrices will be very very hard and messy :)

  • @jeffcarey3045
    @jeffcarey3045 Рік тому +1

    Error function: *forms a parabola*
    Me: :o

  • @agustinmartinez8980
    @agustinmartinez8980 Рік тому

    Could this be done with circles, with the points making circles, tangent to the line of best fit?

    • @virtually_passed
      @virtually_passed  Рік тому

      Yes it can! More generally, you can use it to fit ellipses. You just need to do a clever transformation. Hint: let error = x^2+y^2

  • @octavylon9008
    @octavylon9008 2 роки тому

    In my textbook and some other websites the gradient is given by this formula:
    b = S_{xy}/S_{xx} = (nsum(xy) - sum(x)sum(y)) / (nsum(x²) - (sum(x))²)
    That is not the same as the formula here (sum(xy)/sum(x²)) . Why ?

    • @virtually_passed
      @virtually_passed  2 роки тому

      Hi thanks for the question. The formula you are referring to finds the value of 'b' that fits the line y=a+bx.
      The formula that I derived at 8:27 finds the value of 'b' that fits the line y=bx. This is why the formula is different.
      However, later on in my video (16:00) I derive an even more general formula for fitting any polynomial with any amount of unknowns (not just lines!). If you were to use that formula for the special case of a line y=a+bx you'll get the same answer as the one you provided.

  • @Zwerggoldhamster
    @Zwerggoldhamster 2 роки тому

    What I don't understand: doesn't the line depend on the orientation of the coordinate system?
    I don't know if it does, but I would expect so, and - graphically - that bugs me. I know it makes sense to square the errors (parallel to the y-axis) when dealing with a data set from a measurement.
    But when I draw points on the floor and ask you, what the best line through those points is, it shouldn't depend on the coordinate system.

    • @satyampanchal-1016
      @satyampanchal-1016 2 роки тому

      i guess once you are IN a coordinate system, then the corresponding x,y data will give you unique values of a and b, changing the coordinate system will change the x, y and also it will change the corresponding a and b...so Different coordinate systems will give you different a and b. Making the line Fit every time. So it is in this sense, you will get a fit always independent of what coordinate you choose. But you will have to CHOOSE first in order to proceed. Choice IS independent.

    • @virtually_passed
      @virtually_passed  2 роки тому +1

      Hey, that's a really interesting question! If I understand you correctly, you're claiming that if you have another axis x', y' that's 10 degrees rotated clockwise from the traditional axis x,y then the fitted curve will be slightly different. Is that correct?
      I haven't done the math on it, but I strongly suspect you're right. But consider trying to fit the data with a parabola y = a+bx+cx^2 instead. In this case, the parameters the LS fitting would need to find are (a,b and c). However, in the rotated coordinate system, if you tried to fit the parabola y' = a' + b'x' + c'x^2, then you'll find there are no values of (a', b' and c') that could ever make these two parabolas look the same! And that's because a rotated parabola has an entirely different equation in the original coordinate system. So when you think about it this way, it seems quite reasonable, in my subjective opinion, that a different coordinate system can make slightly different fits. In which case, you would need to define your coordinate system first, and then perform the fit :) Hope that helps :D

    • @Zwerggoldhamster
      @Zwerggoldhamster 2 роки тому +1

      @@virtually_passed Haven't done the math either, but that's just what I suspected.
      Maybe squaring the perpendicular distances to the line and minimizing that sum would give you the same line always, independent of where the coordinate system is.

  • @benjaminmiller3620
    @benjaminmiller3620 2 роки тому +1

    Does this naturally extend to higher dimensional points? How would one find the best fitting line to a 3d point cloud?

    • @virtually_passed
      @virtually_passed  2 роки тому

      That's a great question! This method can indeed be extended to 3D data. Let's say you have n data points:
      (x1,y1,z1), (x2,y2,z2), (x3,y3,z3), .... , (xn,yn,zn)
      And let's say you wanted to fit the plane z = a + bx + cy to these data points. Here the unknowns are X = [a, b, c].
      Just like in the 2D case you can construct a residual vector. But in this case, the residuals would be the error between the z coordinate on the plane and the z coordinate of the data. Ie
      ri = a + b*xi + c*yi - zi
      And so the A matrix will look like this: A =
      [ 1 x1 y1
      1 x2 y2
      1 x3 y3
      1 x4 y4
      ....
      1 xn yn]
      and the b vector will look like this: b =
      [z1
      z2
      z3
      z4
      ...
      zn]
      Then you can use the same formula to find vector X = pinv(A)*b
      Hope that helps :)

    • @benjaminmiller3620
      @benjaminmiller3620 2 роки тому

      @@virtually_passed The plane? So you'ed have to subsequently project the points onto the resulting plane and do a 2D "least squares" to get the line? There's no shortcut? Because that's what I was doing already, just the other way around. Project to the XY & XZ planes, Least Squares, Combine to 3d Line.

    • @virtually_passed
      @virtually_passed  2 роки тому

      @@benjaminmiller3620 Hey mate, sorry I think I must have explained it poorly before. At no point is it needed to project the data to the XY and XZ planes. It's going to be hard to explain this without an image. Can you send an email to me at virtuallypassed@gmail.com and I'll reply with some images which will make that clearer :)
      In that email can you please provide me more details about the problem too? What is the exact form of the equation of the '3D line' you want to fit the data to? Is it actually a line? Or a surface?

    • @benjaminmiller3620
      @benjaminmiller3620 2 роки тому

      @@virtually_passed A line. *r* = *r_0* + _t_ * *v* (I prefer the vector equation.) Not sure where you got "surface" from.

    • @virtually_passed
      @virtually_passed  2 роки тому

      @@benjaminmiller3620 Hey Benjamin, I just replied to your email. I suggest using PCA. Details in the email :)

  • @ryanchowdhary965
    @ryanchowdhary965 2 роки тому

    Everyone is working hard eh.

  • @pyroMaximilian
    @pyroMaximilian Рік тому

    You forgot to explain why we chose squares over linear distances.

  • @MekazaBitrusty
    @MekazaBitrusty Рік тому

    Yep, I got nothing. Absolutely no idea why you use the are of a square rather than just the length of the line. Then when you started using matrices, I was lost.

  • @MCLooyverse
    @MCLooyverse 2 роки тому

    You have `invert (transpose A * A) * transpose A`... shouldn't that simplify to `invert A`? The inverse of a product is the product of the inverses, but in the opposite order, then the `invert (transpose A)` would cancel with `transpose A` by associativity.

    • @virtually_passed
      @virtually_passed  2 роки тому

      That's a great question!
      If I understand your question correctly you are saying the following, right?
      X = inv(A^T A) A^T b
      =inv(A) inv(A^T) A^T b
      =inv(A) I b
      =inv(A) b
      This can only be true if A is a square matrix! Because the rule inv(AB) = inv(B) inv(A) only applies if A and B are square matrices - the traditional inverse is only defined for a square matrix. Hope that helps! :)

    • @MCLooyverse
      @MCLooyverse 2 роки тому

      @@virtually_passed Ah! I was thinking about that, but I forgot that A^T * A would be square (and possibly invertible), even if A isn't.

    • @virtually_passed
      @virtually_passed  2 роки тому

      @@MCLooyverse correct :)

  • @theastuteangler
    @theastuteangler 2 роки тому

    how do we know that the curve is a straight line, that the function of the data is linear? Seems like it take on a logarithmic appearance. Few equations in the real world are linear. Seems like this could be an example of the problem of "lying with statistics".

    • @virtually_passed
      @virtually_passed  2 роки тому +1

      Hi that's a really great question! The form of the equation that you want to fit the data to has to come from some external information about the system you're analyzing. Typically engineers or physics have a model of the thing they're trying to analyse. For example, if this data was force Vs distance for a spring then the model will probably be linear, or a cubic. If it was population Vs time then you'll use an exponential.
      You might be tempted to avoid this problem by trying to fit a curve with many many unknown parameters (perhaps by fitting a polynomial of degree 100 or something). But this is a bad idea because then you will just be overfitting.
      If you genuinely know nothing about the data you're measuring, and so you have no model (eg you're studying a part of the human brain or something) then there are other things you can do, but that goes beyond least squares.

    • @theastuteangler
      @theastuteangler 2 роки тому

      @@virtually_passed awesome, thank you for the detailed and prompt reply! Perhaps my question could be material for your next video? I just found your channel with this video, excited to binge.

  • @Relkond
    @Relkond 2 роки тому

    Some random advice - don’t tell us that you’re manipulating us by telling us that it’s a parabola. Instead, just suggest it’s shape resembles a parabola/hyperbola - get us thinking: ‘Huh - that’s interesting. Is it a parabola? is it a hyperbola?’ That has us thinking on it’s shape, and looking for what might be defining its shape -> that engages us in the lesson more than just monologging at us, and won’t anger some of us anywhere near as much as a bold statement of ‘I’m manipulating you for your own good’.

    • @virtually_passed
      @virtually_passed  2 роки тому

      Ooo thanks for the pedagogy advice!

    • @Relkond
      @Relkond 2 роки тому

      @@virtually_passed FWIW, The Action Lab recently did a video that involved putting superconductor into an induction heater.
      At face value, he appeared puzzled by the outcome, however, if you consider the whole video, he probably expected that outcome before he ever started filming -> it’s an example of engaging the audience by presenting them with something unexpected+unexplained.
      He’s doing much what you did vis-a-vis the parabola being true, but he put the focus on the subject without calling out that he was selectively feeding information to the audience.
      Good luck with your future ventures.

  • @kristyandesouza5980
    @kristyandesouza5980 Рік тому

    Well, i think i don't have "basic highschool calculus"

  • @Kenya_Berry
    @Kenya_Berry 2 роки тому

    How did I get here from watching animators

  • @jeffreyblack666
    @jeffreyblack666 2 роки тому

    Saying it can be "easily and efficiently be implemented in software" is quite misleading by providing an example of a function.
    A single function call can be incredibly complex and inefficient.
    All that demonstrates is that it can be easily implemented.

  • @farpurple
    @farpurple 2 роки тому

    until u didnt implement matrixes it was understandable, then i tryied to continue with your dream, but lost, i need learn more math..

    • @virtually_passed
      @virtually_passed  2 роки тому +1

      Thanks for the comment! Yeah, linear algebra can be quite tough. As long as you understood the first part though (solving for 1 unknown), that's the most important thing! The other half of the video is a way to solve for 'n' unknowns and it's basically the same idea :)

  • @Nathouuuutheone
    @Nathouuuutheone 2 роки тому

    Why did you not show the vertical and perpendicular options before spending multuple minutes essentially repeating that the square option was the best?
    Also, why are the square drawn the way they are and not some other way? Why use the vertical as a basis and not an horizontal or a perpendicular? I'm almost halfway through the video and I feel like I'm getting dragged through the problem and its "best" solution instead of being told about the approach to the problem. I feel like I'm not being allowed to see the steps that get us to the answer, I'm just sitting through long praise of the good answer. Honestly, why are we proving that squares yield parabolas? There is no intuitive reason why we're talking about parabolas by that point. And that's multiple minutes spent listening to maths I had no clue why I was listening to.
    And the rest of the video is more maths that was more like being told how to write an algorithm than why use that algorithm.

    • @virtually_passed
      @virtually_passed  2 роки тому +1

      Hey, thanks so much for your comment. I really appreciate the feedback. I think I'll create another video that will describe the differences in these fitting methods in more detail. In short, there are pros and cons for each of the proposed fitting methods you've proposed. Ultimately, the 'best' method depends on the type of problem you have. However, the point of this video was to explain what the ordinary least squares method is, and to provide just a bit of motivation as to why it's so widely used. It's widely used because it's 1) very computationally efficient 2) simple to implement in software and 3) results in a convex optimization problem (the parabola only has one minimum).
      I hope that helps explain things :)

  • @jodyhensley9796
    @jodyhensley9796 2 роки тому

    promosm

  • @ABaumstumpf
    @ABaumstumpf 2 роки тому

    Nah, i would say the problem is NOT well-defined as for that you MUST define what the problem and the data actually is.
    The least-square regression is one metric that will give you a linear fit. Is it a good fit? Maybe. If your data-set is a simple F(x)=y and x is precise.
    But if you got 2D data (both X and Y have errors) then the 2nd method (orthogonal regression) would offer a more useful result.
    It is not hard to find a "good fit" for some data with a particular method, but it is hard to use the CORRECT fit for the data.
    Given a list of say 20 points it is easy to get a least-square linear fit, it is also easy to get a 19th order polynomial fit. But it might very well be that the data actually comes from a 3rd order phenomenon.

    • @virtually_passed
      @virtually_passed  2 роки тому +4

      Hey thanks so much for your comment. I agree the least squares is not the only method that can be used to fit data, and I also agree there are several downsides to least squares: 1) it's very sensitive to large outliers (since it squares the error) so a 1-norm regularisation term is sometimes added to make the fit more robust, and 2) it's easy to overfit with least squares (as you mentioned, fitting at 19th order polynomial to 20 data points), etc
      The point of this video was not to provide a rigorous comparison of all of the possible fitting methods - there are many more including nonlinear least squares! The main intention was to show the derivation behind linear least squares and why it's so often used and so computationally efficient. I fully agree with you that there are other fitting methods which are better suited to specific data types :)
      I hope that makes sense :)