8.3 Bias-Variance Decomposition of the Squared Error (L08: Model Evaluation Part 1)

Поділитися
Вставка
  • Опубліковано 5 вер 2024
  • Sebastian's books: sebastianrasch...
    In this video, we decompose the squared error loss into its bias and variance components.
    -------
    This video is part of my Introduction of Machine Learning course.
    Next video: • 8.4 Bias and Variance ...
    The complete playlist: • Intro to Machine Learn...
    A handy overview page with links to the materials: sebastianrasch...
    -------
    If you want to be notified about future videos, please consider subscribing to my channel: / sebastianraschka

КОМЕНТАРІ • 37

  • @bluepenguin5606
    @bluepenguin5606 2 роки тому +5

    Hi Professor, thank you so much for the excellent explanation!! I learned bias variance decomposition long time ago but never fully understand it until I watch this video! Detailed explanation of each definition helps a lot. Also, with the code implementation, it helps me not only understand the concepts but also be able to implement into the real application which is the part I always struggle with! I'll definitely find time to watch other videos to make my ML foundation more solid.

  • @elnuisance
    @elnuisance 2 роки тому +4

    This was life saving. Thank you so much Sebastian. Especially for explaining why 2ab = 0 while deriving the decomposition

  • @PriyanshuSingh-hm4tn
    @PriyanshuSingh-hm4tn 2 роки тому +1

    The best explanation of bias & variance I've countered so far.
    it would be helpful if you could include the "noise" too.

    • @SebastianRaschka
      @SebastianRaschka  2 роки тому +1

      Thanks! Haha, I would defer the noise term to my statistics class but yeah, maybe I should do a bonus video on that. A director's cut. :)

  • @whenmathsmeetcoding1836
    @whenmathsmeetcoding1836 2 роки тому +1

    This was wonderful Sebastian after looking no such video available on you tube with such explanation

  • @user-st6sl4zo7p
    @user-st6sl4zo7p 9 місяців тому

    Thank you so much for the intuitive explanation! The notations are clear to understand and it just instantly clicked.

  • @kairiannah
    @kairiannah Рік тому

    this is how you teach machine leanring, respectfully the prof. at my university needs to take notes!

  • @khuongtranhoang9197
    @khuongtranhoang9197 3 роки тому +2

    Do you know that you are doing truly good work! clear to every single details

  • @gurudevilangovan
    @gurudevilangovan 2 роки тому +2

    Thank you so much for the bias variance videos. Though I intuitively understood it, these equations never made sense to me before I watched the videos. Truly appreciated!!

    • @SebastianRaschka
      @SebastianRaschka  2 роки тому

      Awesome, I am really glad to hear that I was able to explain it well :)

  • @ashutoshdave1
    @ashutoshdave1 2 роки тому +1

    Thanks for this! Provides one of the best explanations👏

    • @SebastianRaschka
      @SebastianRaschka  2 роки тому

      Thanks! Glad to hear!

    • @ashutoshdave1
      @ashutoshdave1 2 роки тому

      @@SebastianRaschka Hi Sebastian, visited your awesome website resource for ML/DL. Thanks again. Can't wait for the Bayesian part to be completed.

  • @krislee9296
    @krislee9296 2 роки тому +1

    Thank you so much. This helps me to understand perfectly about Bias-Variance mathmetically.

  • @imvijay1166
    @imvijay1166 2 роки тому +2

    Thank you for this great lecture series!

  • @andypandy1ify
    @andypandy1ify 3 роки тому +3

    This is an absolutely brilliant video Sebastian - thank you.
    I have no problem deriving the Bias-Variance Decomposition mathematically, but no one seems to explain what the variance or expectation is with respect to - is it just on one value? over multiple training sets? different values within one training set? You explained it excellently.

  • @justinmcgrath753
    @justinmcgrath753 7 місяців тому

    At 10:20, the bias comes out backward because the error should be y_hat - y, not y - y_hat. The "true value" in an error is substracted from the estimate. Not the other way around. This is easily remembered from thinking of a a simple random variable with mean mu and error e: y = mu + e. Thus, e = y - mu.

  • @siddhesh119369
    @siddhesh119369 11 місяців тому

    Hi, thanks for teaching, really helpful 😊

  • @Rictoo
    @Rictoo 3 місяці тому

    I have a couple of questions: Regarding the variance, is this calculated across different parameter estimates given the same functional form of the model? Also, these parameter estimates depend on the optimization algorithm used, right, ie., implying the model predictions are 'empirically-derived models' vs. some sort of theoretically optimal parameter combinations, given a particular functional form? If so, would this mean that _technically speaking_, there is an additional source of error in the loss calculation, which could be something like 'implementation variance' due to our model likely not having the most optimal parameters, compared to some theoretical optimum? Hope this makes sense, I'm not a mathematician. Thanks!

  • @tykilee9683
    @tykilee9683 2 роки тому +1

    So helpful😭😭😭

  • @bashamsk1288
    @bashamsk1288 Рік тому

    When you say bias^2+variance that is for a single model
    In the beginning you said bias and variance for different models trained on different datasets which one is it?
    If we consider single model then bias is nothing but mean error and variance is mean squared error?

  • @jayp123
    @jayp123 3 місяці тому

    I don’t understand why you can’t multiply ‘E’ the expectation by ‘y’ the constant

  • @kevinshao9148
    @kevinshao9148 3 роки тому

    Thanks for the great video! One question: 8:42 why y is constant? y=f(x) here also has distribution, is a R.V. is that correct? and when you say "apply expectation on both sides, this expectation over y or over x?

    • @SebastianRaschka
      @SebastianRaschka  3 роки тому +1

      Good point. For simplicity, I assumed that y is not a random variable but a fixed target value instead

    • @kevinshao9148
      @kevinshao9148 3 роки тому

      @@SebastianRaschka Thank you so much for the reply! yeah that's where my confuse sticks. So what do you expectation over of? If you expectation over all the x value, then you cannot do this assumption right?

  • @DeepS6995
    @DeepS6995 3 роки тому

    Professor, does your bias_variance_decomp work in google colab? It did not with me. It worked just fine in Jupyter. But the problem with Jupyter is that bagging is way slower (that's my computer) than what I could get in colab.

    • @SebastianRaschka
      @SebastianRaschka  3 роки тому

      I think Google Colab has a very old version of MLxtend as the default. I recommend the following:
      !pip install mlxtend --upgrade

    • @DeepS6995
      @DeepS6995 3 роки тому

      @@SebastianRaschka It works now. Thanks for the prompt response

  • @1sefirot9
    @1sefirot9 3 роки тому

    any good sources or hints on dataset stratification for regression problems ?

    • @SebastianRaschka
      @SebastianRaschka  3 роки тому +1

      Not sure if this is the best way, but personally I approached that by manually specifying bins for the target variable and then proceeding with stratification like for classification. There may be more sophisticated techniques out there, though, e.g., based on KL divergence or so.

    • @1sefirot9
      @1sefirot9 3 роки тому

      @@SebastianRaschka hm, given a sufficiently large number of bins this should be a sensible approach, and easy to implement. I will play around with that. I am trying some of the things taught in this course on the Walmart Store Sales dataset (available from Kaggle), a naive training of LGM already returns marginally better results than what the instructor on udemy had (he used xgboost with hyperparameters returned by the AWS Sagemaker auto tuner).