Linear Regression in 12 minutes

Поділитися
Вставка
  • Опубліковано 31 тра 2024
  • The machine learning consultancy: truetheta.io
    Want to work together? See here: truetheta.io/about/#want-to-w...
    Linear Regression and Ordinary Least Squared ('OLS') are ancient and yet still useful modeling principles. In this video, I introduce these ideas from the typical machine learning perspective - the loss surface. At the end, I explain how basis expansions push this idea into a flexible and diverse modeling world.
    SOCIAL MEDIA
    LinkedIn : / dj-rich-90b91753
    Twitter : / duanejrich
    Enjoy learning this way? Want me to make more videos? Consider supporting me on Patreon: / mutualinformation
    Sources and Learning More
    Over the years, I've learned and re-learned these ideas from many sources, which means there wasn't any primary sources I reference when writing. Nonetheless, I confirmed my definitions with the wikipedia articles [1][2] and chapter 5 of [3] is an informative discussion of basis expansions.
    [1] Linear Regression, Wikipedia, en.wikipedia.org/wiki/Linear_...
    [2] Ordinary Least Squares, Wikipedia, en.wikipedia.org/wiki/Ordinar...
    [3] Hastie, T., Tibshirani, R., & Friedman, J. H. (2009). The Elements of Statistical Learning: Data Mining, Inference, and Prediction. 2nd ed. New York: Springer.

КОМЕНТАРІ • 51

  • @mCoding
    @mCoding 3 роки тому +32

    Just here to tell you that this video is going to EXPLODE! Maybe not immediately, but eventually almost surely :). Keep at it, awesome visuals. Small note: there is a bit of an echo, which can be fixed by putting some padding/blankets on the walls/ceiling.

    • @Mutual_Information
      @Mutual_Information  3 роки тому +5

      I hope you’re right! I’m trying not to get ahead of myself here, but, especially with your words, I’m hopeful.
      Also, thanks for the echo advice. I just ordered some surrounding curtains which should help. I’ve already shot several videos without them, but eventually the echo will be gone.
      Thanks again - it’s always nice to hear from you.

  • @stevenstonie8252
    @stevenstonie8252 9 місяців тому +4

    this video is so underrated. the way you explained it alongside the provided demonstration makes it easily a top 5 between all tutorials regarding linear regression

    • @Mutual_Information
      @Mutual_Information  9 місяців тому

      Thank you - it's one of my less appreciate ones, but you're changing that

  • @WilliamDye-willdye
    @WilliamDye-willdye 3 роки тому +7

    In case you're wondering if anyone laughed at the 0:19 joke, I definitely laughed. Dunno if it qualified as a rare honest use of the term "LOL", but it was a clearly audible guffaw, so, a CAG I guess.

    • @Mutual_Information
      @Mutual_Information  3 роки тому +3

      Haha good thing I didn’t cut it! I almost did

    • @definesigint2823
      @definesigint2823 3 роки тому +3

      Same here; just sitting quietly and couldn't help laughing. Good thing I wasn't in a library; I might've been shush'd😁

  • @j.adrianriosa.4163
    @j.adrianriosa.4163 2 роки тому +3

    I was not sure how to comment... as I can hardly find the words to express how good this explanation is. Thanks a lot!

    • @Mutual_Information
      @Mutual_Information  2 роки тому

      I think you expressed it well - glad you enjoyed! More coming

  • @suleymanarifcakr3609
    @suleymanarifcakr3609 4 місяці тому

    perfect explanation and visualization! thank you a lot making this

  • @Murphyalex
    @Murphyalex 2 роки тому +1

    Going back to review your earlier stuff. It's good to see it was quality also from early on.

  • @enknee1
    @enknee1 3 роки тому +3

    I hope your channel blows up. This is a clear, concise discussion that keeps things on point without pulling too many punches. Great presentation.
    Can I request random effects next? ;-)

    • @Mutual_Information
      @Mutual_Information  3 роки тому +1

      Thank you! I sure hope so too :)
      And hell yea I’m going to do random effects. May take a little while to get to, but those hierarchical models are def on the list.

    • @definesigint2823
      @definesigint2823 3 роки тому +1

      Honestly, I feel a little privileged having found a good-content channel this early. I hope to see it grow too 😊

    • @markmiller7133
      @markmiller7133 3 роки тому

      @define SIGINT I echo this. I have yet to find a good resource on mixed models...just "pretty good" ones.

  • @Capitalust
    @Capitalust Рік тому +2

    Bro, the visualizations in the last half of the video were fantastic. Amazing work man. Keep it up!

  • @NachoSchips
    @NachoSchips Рік тому +1

    Man i love this channel

  • @gustavojuantorena
    @gustavojuantorena Рік тому +1

    Great video! Thank you.

  • @daigakunobaku273
    @daigakunobaku273 10 місяців тому +2

    AFAIK the LSE was developed by Gauss to estimate the parameters of orbits of comets, and, if the errors of observations have the normal distribution (which is sometimes named after Gauss for a good reason: IIRC, he researched the distribution to solve this very problem), the LSE estimate is actually the maximal likelihood estimate as well, and that was the original reasoning behind this method, not the computational feasibility per se. A damn good explanation tho, thank you!

    • @Mutual_Information
      @Mutual_Information  10 місяців тому

      Yea, good point - I didn't research the truly original use of OLS, but I've heard this story. I'm doing thinking in a familiar, modern context

  • @felinetech9215
    @felinetech9215 2 роки тому +2

    That was some high quality explanation you managed to put in there! The math seemed a little fast but hey, we can always rewind and watch. Cheers!

  • @hugosetiawan8928
    @hugosetiawan8928 10 місяців тому

    I'm a student of economics and will be very helped by your content. Thank you!

  • @siquod
    @siquod 10 місяців тому

    Actually, it is not just for ease of solution that we minimize the squared error. It corresponds to the often reasonable assumption that the noise is Gaussian. And minimizing absolute differences corresponds to Laplace-distributed noise.

  • @antoinestevan5310
    @antoinestevan5310 3 роки тому +3

    As already said by some of you guys, visuals were really great this time! Never seen such least squared errors, nor these basis functions :-) Appreciate it a lot!
    So even though linear regression is well known, it still can be fun to learn new things about it ;-)

    • @Mutual_Information
      @Mutual_Information  3 роки тому

      Really appreciate it. My goal was to introduce the idea of a basis expansion for the bernstein basis video, and that's a little less known.

  • @nikoskonstantinou3681
    @nikoskonstantinou3681 3 роки тому +4

    "All my videos are about math. Non of them are cool". The only wrong sentence in this whole video 😁

  • @davusieonus
    @davusieonus 2 роки тому +1

    your vids are cool, thanks for the effort and I love watching these

  • @xy9439
    @xy9439 3 роки тому +1

    This was an awesome video man 😳👌 keep it up

  • @manueltiburtini6528
    @manueltiburtini6528 Рік тому +1

    Hi! Thank you a lot for your contents! It’s a pleasure to watch ever for a non math guy. The issue during model fitting is indeed understand the data to model. Seem easy in 2D or 3D but real data are a completely different story… I hope to improve my regression skills!

  • @NoNTr1v1aL
    @NoNTr1v1aL 2 роки тому +1

    Amazing video!

  • @yunusd8167
    @yunusd8167 Рік тому +1

    This is great channel.

  • @markmiller7133
    @markmiller7133 3 роки тому +4

    Do you use the same package as 3B1B? This channel has that feel. Also, you should consider starting a Patreon. Your content is quite good, and I could see it garnering a considerable following. There is such a huge stats/ML community out there that is lacking a 3B1B level of content contribution...this is a huge opportunity. Thanks for publishing!

    • @Mutual_Information
      @Mutual_Information  3 роки тому +3

      Thank you! I do not using Manim (but eventually I should explore it). Instead, I use a personal library that heavily uses the Python plotting library Altair.
      And I hope you’re right. It would be nice to grow, but I’m not necessarily optimizing for that directly. I think there is a trade off between the size of audience and level of technical detail, and I’d like to keep a high level of that detail. But we’ll see!
      And I actually do have a Patreon ( patreon.com/MutualInformation )but I haven’t advertised it, just because the channel is so small and I haven’t figured out how I’d offer extra content to the patrons. But it’s certainly in the works.
      Thanks again for your advice and appreciation. Really helps out a lot, especially when I’m just starting.

    • @markmiller7133
      @markmiller7133 3 роки тому +1

      @@Mutual_Information I'm getting a 404 error from the link and no luck direct searching.

    • @Mutual_Information
      @Mutual_Information  3 роки тому +2

      Oh oops, parenthesis got caught in there. Should be good to know. Thanks!

    • @AICoffeeBreak
      @AICoffeeBreak 2 роки тому +2

      Thanks for asking! The question about Manim was exactly my question too.

    • @Mutual_Information
      @Mutual_Information  2 роки тому +1

      Ah nice to hear from you Letitia!

  • @user-or7ji5hv8y
    @user-or7ji5hv8y 2 роки тому +1

    So many new insights here. This explanation connected so many dots.
    Is this leading to Gaussian Processes?

    • @Mutual_Information
      @Mutual_Information  2 роки тому +1

      That’s crazy you mention that. This video wasn’t intended for that, but GPs are my next vid. It’s coming out in about 2 weeks.

  • @sridharbajpai420
    @sridharbajpai420 Рік тому +1

    linear regression makes assumption that error are gaussian distributed that's why we minimize mean square error

  • @BabaNoami
    @BabaNoami Рік тому +1

    haha. came for the math, stayed for the jokes.

  • @steffenmuhle6517
    @steffenmuhle6517 Рік тому +1

    Wow! I never realized linear regression is only about linearity in beta! With your expertise also in exponential families, I'd love to see you make a video about GLM! I still don't understand why GLM needs its errors to be distributed via an exponential family - maybe you'll make it clear!

    • @Mutual_Information
      @Mutual_Information  Рік тому +2

      Ah yes! GLM is right there on my list. I have some RL stuff to finish up but I want to get to that. That and causal inference - lot's of cool shit in the pipeline

    • @taotaotan5671
      @taotaotan5671 Рік тому

      @@Mutual_Information excited to see these topics! Would love to hear your explanations on connections between Bayesian networks, counter factual and linear regression.

  • @PraecorLoth970
    @PraecorLoth970 2 роки тому

    What's the relationship between using a basic function and doing what I've been taught is called a linearization? For example, for an exponential, you apply the natural log to your data then fit a linear model. Also, does using either method mess with the assumptions of ols? I've read that doing a linearization is worse than iteratively fitting a non-linear model, because you can't assume a normal distribution of errors around your data after the transformation.

    • @Mutual_Information
      @Mutual_Information  2 роки тому

      Hm, I'm not sure what the term "linearization" here means. It could mean what I showed, where you transform x into f(x) and then regress y on f(x). In that case, the major assumptions aren't violated since most lin-reg assumption have to do with the distribution of y given x.. and they don't say anything about how x is distributed (though, you need whatever you regress on to be linear independent- I guess that's an assumption).
      It sounds like what you might be referring to is a transformation on y to say g(y) and then regressing g(y) on x (or f(x)). That can sometimes certainly be a bad idea. Lin-reg wants p(y|x) to be approximately normal. If p(y|x) is approximately normal, then p(g(y)|x) may not be (or visa versa)! In general, in a case where you want to transform y.. I would say either 1) use generalized linear models - they are designed exactly for this and they do as you suggest - use an iterative procedure. or 2) be sure that p(g(y)|x) looks normal.