The Bias Variance Trade-Off

Поділитися
Вставка
  • Опубліковано 20 лип 2024
  • The machine learning consultancy: truetheta.io
    Join my email list to get educational and useful articles (and nothing else!): mailchi.mp/truetheta/true-the...
    Want to work together? See here: truetheta.io/about/#want-to-w...
    Article on the topic: truetheta.io/concepts/machine...
    The Bias Variance Trade-Off is an essential perspective for developing models that will perform well out-of-sample. In fact, it's so important for modeling that most hyperparameters are designed to move you between the high bias-low variance and low bias-high variance ends of the spectrum. In this video, I explain what it says exactly, how it works intuitively, and how it's used typically.
    SOCIAL MEDIA
    LinkedIn : / dj-rich-90b91753
    Twitter : / duanejrich
    Enjoy learning this way? Want me to make more videos? Consider supporting me on Patreon: / mutualinformation
    TIMESTAMPS
    0:00 The Importance and my Approach
    0:46 The Bias Variance Trade off at a High Level
    2:06 A Supervised Learning Regression Task and Our Goal
    3:41 Evaluating a Learning Algorithm
    5:39 The Bias Variance Decomposition
    7:19 An Example True Function
    8:07 An Example Learning Algorithm
    9:41 Seeing the Bias Variance Trade Off
    12:59 Final Comments
    SOURCES
    The explanation I've reviewed the most is in section 2.9 of [1]. Also, I found
    Kilian Weinberger's excellent lecture [2] useful. If you'd like to learn how this concept generalizes beyond a regression model's square error, see [3]
    [1] Hastie, T., Tibshirani, R., & Friedman, J. H. (2009). The Elements of Statistical Learning: Data Mining, Inference, and Prediction. 2nd ed. New York: Springer.
    [2] Weinberger, K. (2018). Machine Learning Lecture 19 Bias Variance Decomposition -Cornell CS4780 SP17, UA-cam, • Machine Learning Lectu...
    [3] Tibshirani, R (1996), Bias, variance and prediction error for classification rules. Department of Preventive Medicine and Biostatistics and Department of Statistics, University of Toronto, Toronto, Canada

КОМЕНТАРІ • 80

  • @superman39756
    @superman39756 Рік тому +4

    This channel will explode soon - quality of content is too good, thank you !

  • @karanshah1698
    @karanshah1698 3 роки тому +23

    The moment you flashed the decomposed equation, it clicked to me this looks a lot like Epistemic and Aleatoric Uncertainty components. P.S: We need much more quality content like this on high-end academic literature, please keep going full throttle. You earned my subscribe!

    • @Mutual_Information
      @Mutual_Information  3 роки тому +2

      Thank you very much! I’m not familiar with those components, but I’m glad to hear you are seeing relationships I don’t :) and will do, I have 4-5 videos in the pipeline. New one every 3 weeks!

  • @charudattamanwatkar8340
    @charudattamanwatkar8340 Рік тому +12

    Your videos have the perfect balance between rigor and simplicity. Kudos to you! Keep making such great content. You're destined to be really successful. 🎉

  • @winoo1967
    @winoo1967 3 роки тому +9

    Great video!! The beginning as a creator in yt is pretty hard, so don't give up

  • @kellsierliosan4404
    @kellsierliosan4404 2 роки тому +6

    I am studying a MSc in Stats at a decent uni and I have to say that your channel is damn amazing. Good job there, the intuition that you manage to put in your videos is mindblowing. You gained a subscriber :)

    • @Mutual_Information
      @Mutual_Information  2 роки тому

      Thank you! Very happy to have you. More good stuff coming soon :)

    • @Mutual_Information
      @Mutual_Information  2 роки тому

      And if you’d think it be helpful to your classmates, please share it with them 😁

  • @hspadim
    @hspadim 3 роки тому +8

    Incredible work man! I’m truly looking forward for more content!

  • @ConnectinDG
    @ConnectinDG 2 роки тому +4

    I have been reading a lot on bias-variance trade-off and have been using it for some time now. But the way you explained it with amazing visuals, it was mind-blowing and very intuitive to understand. Totally like your content and will be keep waiting for more content like this in future.

  • @Boringpenguin
    @Boringpenguin Рік тому +3

    This is probably the best take on Bias Variance Trade-Off I have ever seen on UA-cam, the one from ritvikmath is a close second.
    Please don't ever stop making video like this, great stuff :)

  • @taotaotan5671
    @taotaotan5671 2 роки тому +2

    Wow this video truly opened my mind.
    I have been heard this term from ML people many many times, but it remains vague until I watch this video!

  • @arongil
    @arongil Рік тому +1

    I love the humor at the end ("if you make the heroic move of checking my sources in the description"). I'm learning so much from you, thank you!

  • @akhilezai
    @akhilezai 3 роки тому +7

    This is GOLD

  • @juanvelez3889
    @juanvelez3889 2 роки тому +3

    This is sooooo good. Thanks a lot for sharing your knowledge in such an amazing explanation!

  • @AbhishekJain-bv6vv
    @AbhishekJain-bv6vv 2 роки тому +2

    Until 7:22, I thought this was very theoretical, but as soon as you started the animations, everything made more sense and became clear . Truly incredible, amazing work. Lots of love from India, and please keep up the good work. You are the 3blue1brown of data science.

    • @Mutual_Information
      @Mutual_Information  2 роки тому

      Thank you, encouragement like this means a lot. I’ll make sure to keep the good stuff coming :)

    • @AbhishekJain-bv6vv
      @AbhishekJain-bv6vv 2 роки тому

      @@Mutual_Information I am a student in IIT Kanpur (one of the premier institutes of India), and I am currently doing a course
      Statistical Methods for Business Analytics.
      Here is the link to the playlist and the (lecture slides in the description).
      ua-cam.com/play/PLEDn2e5B93CZL-T8Srj_wz_5FIjLMMoW-.html
      Just play any video in this, and tell me would you be willing to learn from these videos . The way of teaching is lagging far behind in our country.

  • @vladvladislav4335
    @vladvladislav4335 3 роки тому +1

    Hey DJ, the quality of your videos is mindblowing, I subscribed even before watching the video till the end. I'm 100% sure your channel will blow up in the nearest future!

    • @Mutual_Information
      @Mutual_Information  3 роки тому

      Thank you brother! I’m very happy to hear you like them and excited to have you as a sub. More to come!

  • @tirimula
    @tirimula Рік тому +1

    Awesome graphical visualization.

  • @akhaita
    @akhaita 2 роки тому +1

    Beautifully done!

  • @kashvinivini2264
    @kashvinivini2264 2 роки тому +1

    Highly underrated video! Great work

  • @cerioscha
    @cerioscha Рік тому +1

    Great video thanks!. I've never seen this explained in a regression context, only for classification in terms of VC dimension.

    • @Mutual_Information
      @Mutual_Information  Рік тому

      Glad you appreciate it. This is an old video but I learned to lighten up on the on screen text, but I'm glad it still works for some

  • @FarizDarari
    @FarizDarari Рік тому +1

    Simply awesome explanation!

  • @Lucifer-wd7gh
    @Lucifer-wd7gh 2 роки тому +2

    I can see bright future of this channel. God job man . Keep uploading ❤️
    .
    From United States Of India 🇮🇳😆

  • @revooshnoj4078
    @revooshnoj4078 Рік тому +1

    clearly explained thanks!

  • @NoNTr1v1aL
    @NoNTr1v1aL 2 роки тому +1

    Amazing video!

  • @orvvro
    @orvvro 3 роки тому +3

    Thank you, very clear video

  • @murilopalomosebilla2999
    @murilopalomosebilla2999 2 роки тому +1

    Well explained! Thanks!!

  • @arminkashani5695
    @arminkashani5695 2 роки тому +1

    Great explanation! Thanks so much.

  • @MP-if2kf
    @MP-if2kf 2 роки тому +1

    Great video!

  • @antoinestevan5310
    @antoinestevan5310 3 роки тому +1

    really nice content and intuitions, liked it a lot !

  • @Kopakabana001
    @Kopakabana001 3 роки тому +1

    Awesome info!

  • @navanarun
    @navanarun 11 місяців тому

    Thank you! this is amazing content.

  • @JoseManuel-pn3dh
    @JoseManuel-pn3dh 8 місяців тому +1

    thanks you're carrying my MsC

  • @loukafortin6225
    @loukafortin6225 3 роки тому +1

    this needs more views

  • @sathyakumarn7619
    @sathyakumarn7619 2 роки тому

    This video clearly deserves a lot more views than this. Keep up the good work.

    • @Mutual_Information
      @Mutual_Information  2 роки тому

      Thanks! Slowly things are improving. I think eventually more people will come to appreciate this one.

  • @eulefranz944
    @eulefranz944 2 роки тому +1

    Excellent.

  • @wexwexexort
    @wexwexexort 6 місяців тому

    fantastic visualizations

  • @peterkonig9537
    @peterkonig9537 8 місяців тому +1

    Super cool stuff.

  • @partyhorse420
    @partyhorse420 Рік тому +1

    Have you seen recent results in deep learning that show larger neural networks have both lower bias and lower variance than smaller models? Past a point, more parameters give less variance, which is amazing! See “Understanding Double Descent Requires a Fine-Grained Bias-Variance Decomposition” Adlam et Al

    • @Mutual_Information
      @Mutual_Information  Рік тому

      I hadn't seen this before but now that I've read some of it, it's quite an interesting idea. Maybe it explains some of the weird behavior observed in the Grokking paper? I still am mystified by how these deep NNs sometimes defy the typical U shape of test error.. wild! Thanks for sharing

  • @manueltiburtini6528
    @manueltiburtini6528 Рік тому +1

    A masterpiece of yt

    • @Mutual_Information
      @Mutual_Information  Рік тому +1

      I'm glad you think so.. I was actually thinking about re-doing this one

  • @LvlLouie
    @LvlLouie 2 роки тому +1

    Subscribed want to learn this stuff but not sure where to start!

    • @Mutual_Information
      @Mutual_Information  2 роки тому

      Well I may be biased, but I think this channel is a fine place to start :)

  • @bajdoub
    @bajdoub 3 роки тому +2

    Excellent video! One question I have is in practice, what is the relationship between EPE and the mean square error (MSE) loss we usually optimize for in practice for regression problem? Is EPE an expected value of MSE? Or is MSE only related to the bias term in EPE? or are they completely unrelated?

    • @Mutual_Information
      @Mutual_Information  3 роки тому +1

      Glad you enjoyed it! They are certainly related :) To make MSE and EPE comparable, the first thing we'd have to do is integrate EPE(x_0) over the domain of x, which we can call EPE, as you do. In that case, MSE is a biased estimate of EPE (to answer your question, it's an estimate of the whole of EPE - not any one of the terms). The MSE is going to be more optimistic/lower than EPE. This is because when fitting, you chose parameters to make MSE low.. if you had many parameters, you could make MSE really low (overfitting!). But EPE measures how good your model is relative to the p(x, y) - more parameter doesn't necessarily mean a better model! To get a better estimate, you could look at MSE out of sample. And that's what we do to determine those hypers.

    • @bajdoub
      @bajdoub 3 роки тому +2

      @@Mutual_Information thanks so much for taking the time to reply! I will need sometime and probably another pass of the video and putting things on paper before I digest it all :-D but you have given me all elements of explanation. Keep up the good work your videos are some of the best out there, you put the bar very high! :-)

    • @Mutual_Information
      @Mutual_Information  3 роки тому +1

      @@bajdoub thanks! It means a lot. I’ll try to keep the standard high :)

  • @theleastcreative
    @theleastcreative 5 місяців тому +1

    It feels like you're reading out of that textbook on the table behind you

    • @Mutual_Information
      @Mutual_Information  5 місяців тому

      The whole channel started b/c I actually wanted to write a book on ML.. but then I figured few people would read it, so might as well communicate the same those on a YT channel, where it had a better chance. Literally, I'd say "It's a textbook in video format". But then I realized, it can make the videos very dense and a little dry. So I've evolved a bit since.

  • @jadtawil6143
    @jadtawil6143 2 роки тому +1

    subscribed. would u mind sharing how to quickly make the visuals with the math equations? Id love to use a similar resource for my students.

    • @Mutual_Information
      @Mutual_Information  2 роки тому

      Hey Jad. I have plans to open source my code for this, but it’s not ready yet. I’ll make an announcement when it’s ready,

  • @Throwingness
    @Throwingness 2 роки тому

    I love the channel. I have a few topic requests... KL Divergence. Diffusion Networks. Policy Gradient RL models.

    • @Mutual_Information
      @Mutual_Information  2 роки тому

      Policy Gradient RL methods will be out this summer! Diffusion.. that's a whole beast I don't have plans for right now. I'd need to learn quite a bit to get up to speed. KL Divergence, for sure I'll do that. Possibly later this year.

    • @Throwingness
      @Throwingness 2 роки тому

      @@Mutual_Information
      Diffusion.
      Did you see Dalle-2? It's a milestone. I can't wait for the music and videos a system like this well create.

  • @melodyparker3485
    @melodyparker3485 3 роки тому +3

    Do you use the Manim Python Library for your animation?

    • @Mutual_Information
      @Mutual_Information  3 роки тому +3

      No, though I should explore that one day. I use a personal library that leans heavily on Altair, which is a Python static plotting library based on d3.

    • @melodyparker3485
      @melodyparker3485 3 роки тому +2

      @@Mutual_Information Cool!

  • @MegaSesamStrasse
    @MegaSesamStrasse Рік тому

    So can I understand bias and variance in terms of a sampling distribution from which my specific model is taken? If the variance is high, the mean of this sampling distribution will be quite close to the true value. But since the variance of this distribution is so large, it is unlikely that my specific model represents the true value (but not impossible?). And if the model is very low in complexity, the variance of the sampling distiribution will be quite small. But since the expected value from the sampling distribution is far from the true value, it is very unlikely that my specific model represents the true value?

    • @Mutual_Information
      @Mutual_Information  Рік тому +1

      That sounds about right. Think of it this way. There is some true data generating mechanism that is unknow to your model. A complex model is more likely to be able to capture it. In doing so, if you re-sample from the true data generating process.. fit the model.. and look at the average of those fits.. then those will equal the average of the true distribution. This is what I mean when I say "The complex model can 'capture' the true data generating mechanism". Aka, the model is low bias. However, the cost of such flexibility is that the model produces very different ("high variance") fits over different re-samplings of the data.
      Does that make sense?

  • @AlisonStuff
    @AlisonStuff 3 роки тому +2

    Woo!

    • @Mutual_Information
      @Mutual_Information  3 роки тому +1

      Haha thank you sister

    • @AlisonStuff
      @AlisonStuff 3 роки тому +2

      @@Mutual_Information your welcome brother. How are you? How was your day?

  • @ilyboc
    @ilyboc 2 роки тому +1

    😮😮😯❤️

  • @ClosiusBeg
    @ClosiusBeg 2 роки тому +1

    Man, please more pictures..

  • @kashvinivini2264
    @kashvinivini2264 2 роки тому +1

    Please provide subtitles for foreign language speakers!

    • @Mutual_Information
      @Mutual_Information  2 роки тому +1

      I have a list of outstanding changes I need to make and this is one of them. I’ll make it priority! Thanks for the feedback