Why do we divide by n-1 to estimate the variance? A visual tour through Bessel correction

Поділитися
Вставка
  • Опубліковано 27 чер 2024
  • Correction: At 30:42 I write "X = Y". They're not equal, what I meant to say is "X and Y are identically distributed".
    The variance is a measure of how spread out a distribution is. In order to estimate the variance, one takes a sample of n points from the distribution, and calculate the average square deviation from the mean.
    However, this doesn't give a good estimate of the variance of the distribution. The best estimate, however, is obtained when dividing by n-1 instead of n.
    WHY!?!?!?!?!?!?!?
    In this video, we dig deeper into why the variance calculation should be divided by n-1 instead of by n. For this, we use an alternate definition of the variance, which doesn't use the mean in its calculation.
    [0:00] Introduction and Bessel's Correction
    - Introducing Bessel's Correction and why we divide by \( n-1 \) instead of \( n \) to estimate variance.
    [0:12] Introduction to Variance Calculation
    - Explaining the premise of calculating variance and introducing the concept of estimating variance using a sample instead of the entire population.
    [1:01] Definition of Variance
    - Defining variance as a measure of how much values deviate from the mean and outlining the basic steps of variance calculation.
    [1:52] Introduction to Bessel's Correction
    - Discussing why we divide by \( n-1 \) when calculating variance and introducing Bessel's Correction.
    [2:35] Challenges of Bessel's Correction
    - Sharing personal challenges in understanding the rationale behind Bessel's Correction and discussing my research process on the topic.
    [3:20] Alternative Definition of Variance
    - Presenting an alternative definition of variance to aid in understanding Bessel's Correction and expressing curiosity about its presence in the literature.
    [4:45] Quick Recap of Mean and Variance
    - Briefly revisiting the concepts of mean and variance, demonstrating how they are calculated with examples, and explaining how variance reflects different distributions.
    [7:05] Sample Mean and Variance Estimation
    - Explaining the challenges of estimating the mean and variance of a distribution using a sample and discussing why sample variance is not a good estimate.
    [8:49] Bessel's Correction and Why \( n-1 \) is Used
    - Explaining how Bessel's Correction provides a better estimate of variance and why we divide by \( n-1 \) instead of \( n \). Emphasizing the importance of making a correct variance estimate.
    [10:51] Why Better Estimation Matters?
    - Discussing why the original estimate is poor and why making a better estimate is crucial. Explaining the significance of sample mean as a good estimate.
    [13:02] Issues with Variance Estimation
    - Illustrating the problems with variance estimation and demonstrating with examples why using the correct mean is essential for accurate estimates. Explaining the accuracy of estimates made using \( n-1 \).
    [15:04] Introduction to Correcting the Estimate
    - Discussing the underestimated variance and the need for correction in estimation.
    [15:57] Adjusting the Variance Formula
    - Explaining the adjustment in the variance formula by changing the denominator from \( n \) to \( n - 1 \).
    [16:22] Calculation Illustration
    - Demonstrating the calculation process of variance with the adjusted formula using examples.
    [16:57] Better Estimate with Bessel's Correction
    - Discussing how the corrected estimate provides a more accurate variance estimation.
    [18:24] New Method for Variance Calculation
    - Introducing a new method for calculating variance without explicitly calculating the mean.
    [20:06] Understanding the Relation between Variance and Variance
    - Explaining the relationship between variance and variance, and how they are related mathematically.
    [21:52] Demonstrating a Bad Calculation
    - Illustrating a flawed method for calculating variance and explaining the need for correction.
    [23:37] The Role of Bessel's Correction
    - Explaining why removing unnecessary zeros in variance calculation leads to better estimates, equivalent to Bessel's Correction.
    [25:08] Summary of Estimation Methods
    - Summarizing the difference between the flawed and corrected estimation methods for variance.
    [26:02] Importance of Bessel's Correction
    - Emphasizing the significance of Bessel's Correction for accurate variance estimation, especially with smaller sample sizes.
    [30:19] Mathematical Proof of Variance Relationship
    - Providing two proofs of the relationship between variance and variance, highlighting their equivalence.
    [35:24] Acknowledgments and Conclusion
    Thanks @mkan543 for the summary!
  • Наука та технологія

КОМЕНТАРІ • 52

  • @tandavme
    @tandavme 27 днів тому +3

    Thank you, your videos are always deep and easy to follow!

  • @Meine_Rede
    @Meine_Rede 15 днів тому

    Very intuitive examples and well explained. Thanks a lot.

  • @wiwaxiasilver827
    @wiwaxiasilver827 24 дні тому +3

    The degrees of freedom explanation is similar. It actually also comes up in regression, chi-squared function and F-distribution. Because the variance is the average of differences from the average, the sample itself counts as an entity, and so it takes away a degree of freedom. Technically, it’s always n - 1, but the distribution is assumed to have infinite sample size when we cover the whole population, at which point n - 1 is equivalent to n. Still, it was very nice to learn about the Bar(x) and how it comes out as 2*variance :) It’s also interesting how it seems to look a bit similar to covariance, hence why the n - 1 comes up again in regression :)

  • @AJoe-ze6go
    @AJoe-ze6go 22 дні тому +2

    Now you have me questioning whether I really understand Bessel's correction! I always heard the argument from degrees of freedom, and to me that meant that the first sample point contributes nothing to the variance. For example, if your first sample yields a value of 1.5, then the average value is just 1.5/1, which is still 1.5, so the difference is zero. It's not until your second sample (and subsequent samples) that difference becomes meaningful (i.e., there is a value that can be different from the true mean). That has always been my understanding of why you divide by n-1; no matter how many samples you take, the variance can only be a function of n-1 of them, because a single point can contribute no difference.

  • @robharwood3538
    @robharwood3538 29 днів тому +6

    A while back I came across an explanation of the (n-1) correction term from a Bayesian perspective. (It might have been E. T. Jaynes' book _Probability Theory: The Logic of Science,_ but I can't recall for certain.)
    I was hoping you might go over it in this video, but I guess you didn't come across it in your search for the answer.
    One thing that is relevant -- and illuminated by the Bayesian perspective -- is that the Bessel correction for *_estimated_* variance implicitly assumes a particular sampling method from the population. In particular, I believe it assumes you are performing sampling _with replacement_ (or that the population is so large that sampling without replacement is nearly identical to with replacement).
    But in some non-trivial cases, that may not actually be the case, and so the Bessel correction may not be the appropriate estimator in such cases. For example, if the entire population is a small number, like 10 or 20 or so, then if you sample *_without_** replacement* then the distribution would behave differently. In the same way that a hypergeometric distribution is sometimes better than a binomial distribution, for example.
    As an extreme manifestation of this, suppose you sample (without replacement) all 10 items from a population of just 10 items. Then using the Bessel correction would obviously give the wrong 'estimate' of the true variance, which should be divided by n, not (n-1).
    A Bayesian approach (supposing that the population size, N=10 is a 'given' assumption) would correctly adjust the 'posterior variance' estimate to the *real* best estimate for sample sizes all the way up to 10, at which point it would be equivalent to the true variance.
    Unfortunately, I don't remember how to derive the Bayesian estimate of Variance. But maybe if you found it it might shed even more light on your ultimate question of 'why (n-1)?' and perhaps you could do a follow up video? Just an idea!
    Cheers!

    • @SerranoAcademy
      @SerranoAcademy  29 днів тому

      Thanks! Ah this is interesting, I think somewhat I’m looking at sampling with no replacement, but in a different light. I like your Bayesian argument, I need to take a closer look and get back.

  • @ekarpekin
    @ekarpekin 29 днів тому +4

    Thank you Luis for the video.
    I also had a long time obsession with why the hack is (n-1) instead of n. Well, having watched your video I now explain it to myself as follows:
    1) when we calculate the mean value 'mu' out of, say, 10 numbers making up a sample, these 10 numbers are independent for the mean calculation. Knowing the mean number 'mu' of a sample and 10 numbers constituting the sample, we cannot say that all 10 numbers are independent. If we know 'mu', we can compute any single number out of 10 if we know the other 9.
    2) Now, when we come to variance, we take the difference between the 'mu' and each of 10 numbers, so we have 10 deltas. Yet, out of these 10 deltas, only 9 are independent, because the another delta we can calculate provided we know 9 numbers and the 'mu'. Hence, for the variance we devide the total sum of deltas by (n-1) - a count of independent deltas (or differences)...

    • @tamojitmaiti
      @tamojitmaiti 28 днів тому +1

      This exact same reasoning seamlessly transitions into ANOVA calculations as well. I personally think the widely accepted proof of unbiasedness of the estimator is intuitive enough. Math always doesn't have to cater to the logical faculties of what a 5 year old can comprehend. I'm a big fan of Luis' content but this video came off as a bit weak in the math intuition part, not to mention super tedious.

    • @wiwaxiasilver827
      @wiwaxiasilver827 24 дні тому

      Ah yes, the degrees of freedom explanation

  • @juancarlosrivera1151
    @juancarlosrivera1151 29 днів тому +4

    I would use x\bar instead of mu in the right hand side equation (mins 9 or 10)

  • @prof.nevarez2656
    @prof.nevarez2656 Місяць тому

    Thank you for this breakdown Luis!

  • @michaelzumpano7318
    @michaelzumpano7318 21 день тому

    I wasn’t too sure after the first two minutes, but you started winning me over. I thought, ok, I’ll watch five minutes. I watched to the end. Good job on this video! Subscribed!

  • @dragolov
    @dragolov 24 дні тому

    Deep respect, Luis Serrano!

  • @epepchuy
    @epepchuy 6 днів тому

    Exvelente explciacion!!!

  • @jbtechcon7434
    @jbtechcon7434 29 днів тому +7

    I once got in a shouting match at work over this. I was right.

  • @uroy8665
    @uroy8665 29 днів тому

    Thank you for detail explanation, I came to knw about new BAR function. About (n-1) , at start I thought this: let's say we have 100 people , and we want to find variance of height, suppose there is one person with exact mean height, in that case mean will be correct by dividing by 100, but not the variance by dividing by 100, as one term will be zero and that will lower the variance, if there is no people with perfect mean then dividing by 100 is good for variance, but latter thought if 2 /3/4 etc persons have height of mean, then that would not work. Anyway after watching this video, my thinking changed and better as I am not from STAT.

    • @SerranoAcademy
      @SerranoAcademy  29 днів тому

      Thanks, that’s a great argument! Yeah at some point I was thinking about it in a similar way, or considering having an extra person with the mean height. I couldn’t finish the argument but I believe that’s an alternate way to obtain the n-1.

  • @cc-qp4th
    @cc-qp4th 29 днів тому +6

    The reason for dividing by n-1 is that by doing so the sample variance is an unbiased estimator of the population variance.

  • @Fractured_Scholar
    @Fractured_Scholar 17 днів тому

    The degrees of freedom//n-1 thing has to do with the properties of linear equations. Take a standard form linear equation ax+by=c, where {a,b,c} are known constants and {x,y} are variables. The *instant* you choose a specific x, there is only one possible y that satisfies that equation. This same property is true for linear equations with n variables. The instant you choose specific values for the first n-1 variables, the nth variable is "forced" -- it is no longer a "free choice."

  • @TheStrings-83639
    @TheStrings-83639 19 днів тому

    I think I can get the main idea of why we do this in variance, but how does it work on various types of statistical tests? Like, I can get why we'd substract more than one when doing a t-test, because each coefficient would be like a mean value with the sample variance of theirs, but how to derive this fact using this method of Bariance?

  • @archangecamilien1879
    @archangecamilien1879 29 днів тому +1

    I know someone who was obsessed with knowing that, lol, back in the day...didn't manage to find a good explanation...there are other things he tried to understand the reasons for, lol...he wasn't sure that many others cared...well, lol, eventually he didn't care much himself, but at a time he did...

    • @archangecamilien1879
      @archangecamilien1879 29 днів тому

      Lol...2:35...I hadn't reached that part before I made that comment...so, lol, the person in question wasn't the only one who would obsess over things like that in math...they often just feed you something without explanation, lol...even if you are a math major, I suppose they are thinking "They'll get it later"...they just tell you "You do this", lol...they also just fed the Jacobian to the students in the person's class, without explanation...well, lol, I suppose the student in question didn't have a textbook, but he doubts they explained where the Jacobian comes from in the textbook...

    • @archangecamilien1879
      @archangecamilien1879 29 днів тому

      ...he would search online, lol...I don't think there were many math videos back then, or perhaps there were and he didn't notice them...

    • @archangecamilien1879
      @archangecamilien1879 29 днів тому

      Lol...the person in question didn't even really understand what was meant by "degrees of freedom"...I mean, lol...they would just throw the term around..."if you have a sample of 6 elements, there are 5 degrees of freedom", I think it could get more complicated than that, forgot, like the product of two sample spaces or something?...Not sure, lol...they would then do some other gymnastics...but they would just throw in the word "degrees of freedom" like it was a characteristic, like height, eye color, hair color, etc, lol...like, there would be tables, and they would inform you how many degrees of freedom there were, and that's the only times I would see the term ever appear...and everyone else seemed fine with it, lol, or maybe the student in question was just an idiot and everyone else had an intuitive sense of what was going on (he says he doubts it, lol)...

    • @SerranoAcademy
      @SerranoAcademy  29 днів тому +1

      lol! Your friend sounds a lot like me 😊

  • @f1cti
    @f1cti 23 дні тому

    Great video!! Thanks so much, Luis! One question, though: when calculating the BARIANCE, why is the distance between two points calculated twice? Say we have only two points A and B...I have´t been able to wrap around my head around why we need to calculate A - B and B-A. Thanks!

    • @SerranoAcademy
      @SerranoAcademy  23 дні тому

      Thank you! Great question! If we’re taking different points, it’s exactly the same to do:
      A-B and divide by 1
      Than to do
      A-B and B-A and divide by 2.
      This is because when you take pairs of distinct points, it’s the same to take them ordered than unordered, only by a factor of 2.
      However, things change when we allow for repetition. If we allow A-A and B-B, then we have to consider all the possibilities for the first and for the second. So we need to throw in A-B and B-A, and then divide by 4. If we were to only take A-B and not B-A, and still take the two differences that are 0, then we end up not counting all the cases.
      It’s a subtle point, but please let me know if something is not clear or if you have any further questions!

    • @f1cti
      @f1cti 23 дні тому

      @@SerranoAcademy I greatly appreciate your response Luis!! Your explanation does clear things up a bit more, but I still wonder why we allow A-A and B-B in the first place: by definition a point has no distance from itself, so why allow it?

    • @SerranoAcademy
      @SerranoAcademy  23 дні тому +1

      @@f1cti Thank you! Yes, exactly, it's a bad idea to pick A-a and B-B. So here's the thing:
      If you pick A-A and B-B, you have a bad estimate. In this estimate, you divide by n*n.
      If you don't pick A-A and B-B, you get a good estimate. In this estimate, you divide by n*(n-1) (because you're removing the n pairs).
      So you changed an n by an n-1 in the denominator to get from a bad to a good estimate. And that's exactly what Bessel correction says!

  • @levi-civita1360
    @levi-civita1360 28 днів тому +2

    I have read a book on statistics "Introduction to Probability and Statistics for Engineers and Scientists" by Sheldon M. Ross and there he used Let d = d(X) be an estimator of the parameter θ. Then
    bθ (d) = E[d(X)] − θ is called the bias of d as an estimator of θ. If bθ (d) = 0 for all θ, then we say that d is
    an unbiased estimator of θ.
    and he proves if we use formula of sample variance with (n-1) then we get unbiased estimator other wise not.

  • @TruthOfZ0
    @TruthOfZ0 22 дні тому +1

    The Variance of the world n-1 to exclude the observer that calculates that xD

  • @sahhaf1234
    @sahhaf1234 22 дні тому

    This is a superb explanation.

  • @AbhimanyuKumar-ke3qd
    @AbhimanyuKumar-ke3qd 28 днів тому

    5:54 can you please explain why we square in order to remove negative values....we could have taken absolute values as well i.e., |x1 - u| + |x2 - u| ....
    Same doubt in case of linear regression, least squares...

    • @SerranoAcademy
      @SerranoAcademy  28 днів тому

      Great question! We can square or take absolute values. Same thing for regression. When you do absolute values for regression, it’s called L1, when you do squares it’s called L2.
      I think the reason squares are more common is because a sum of squares is easier to differentiate. The derivative of an absolute value has a discontinuity at zero because th function y=|x| has a spike, while the function y = x^2 is smooth.

    • @AbhimanyuKumar-ke3qd
      @AbhimanyuKumar-ke3qd 28 днів тому

      @@SerranoAcademy wow! Never thought about it in terms of differentiability...
      Thank you so much!
      If you can make a video on it, it would be very helpful

    • @vikraal6974
      @vikraal6974 24 дні тому +3

      Gauss proved that least squares are the best estimators for regression analysis. We could artificially create more estimators such as (x-mu)^4 which will behave just like the squared variant but it will be a bad estimate. Least Squares connect many different areas of mathematics such as Linear Algebra, Functional Analysis, Measure Theory and Statistics.

    • @AbhimanyuKumar-ke3qd
      @AbhimanyuKumar-ke3qd 24 дні тому

      @@vikraal6974 Thanks ✨

  • @santiagocamacho264
    @santiagocamacho264 29 днів тому

    @ 8:20 you say "Calculating the correct mean". Do you probably mean (no pun intended) "calculating (Estimating) the correct variance"?

    • @SerranoAcademy
      @SerranoAcademy  29 днів тому +1

      Ah good catch! Yes that’s what I **mean**t, 🤣 gracias Santi! 😊

  • @Tom-sp3gy
    @Tom-sp3gy 26 днів тому

    Me too! I was always mystified by it

  • @weisanpang7173
    @weisanpang7173 29 днів тому +2

    The algebraic explanation of bariance vs variance was somewhat sloppy.

    • @numeroVLAD
      @numeroVLAD 25 днів тому

      Yeah, it’s missing the formula that relates mean and variance. But I think the author intended audience is one that already knows that formula well

  • @PT-dz3iv
    @PT-dz3iv 25 днів тому +1

    By the picture at 15:11, I think you confused two different sampling models: 1) a sample with 2 iid random chosen elements, which should allow repetitions; 2) a sample without replacement. Your calculation here eliminates the repetitions, so you are indeed dealing with the 2nd model at this point but all your video is intending to explain the i.i.d samples. That is not correct. In fact, in the iid case, the original definition of variance (7.5 in your example) is correct, but you need to use Bessel correction when you estimate. So the estimates corresponding to your picture at 15:11 would be 0.5,8,24.5,4.5,18,4.5. The average is 120/16=7.5, which is the real variance.

  • @ASdASd-kr1ft
    @ASdASd-kr1ft 23 дні тому

    I dont fully understand why Bar(x) = 2*Var(x)

    • @SerranoAcademy
      @SerranoAcademy  23 дні тому

      Yes good point. I don't fully understand it either. I can see why it's bigger, since you're looking at distances from two different points, instead of a point in the middle. As for why it's twice, I have the mathematical proof which happens to work out if you expand it, but I'm still looking for a good intuitive visual explanation. If anything comes to mind, please let me know!

  • @layt01
    @layt01 25 днів тому

    Fun fact: in Spanish "V" and "B" are pronounced the same (namely as "B"). Bery good bideo, one of the vest eber!

  • @KumR
    @KumR 20 днів тому

    Whoa.....

  • @mcan543
    @mcan543 Місяць тому +4

    **[**0:00**] Introduction and Bessel's Correction**
    - Introducing Bessel's Correction and why we divide by \( n-1 \) instead of \( n \) to estimate variance.
    **[**0:12**] Introduction to Variance Calculation**
    - Explaining the premise of calculating variance and introducing the concept of estimating variance using a sample instead of the entire population.
    **[**1:01**] Definition of Variance**
    - Defining variance as a measure of how much values deviate from the mean and outlining the basic steps of variance calculation.
    **[**1:52**] Introduction to Bessel's Correction**
    - Discussing why we divide by \( n-1 \) when calculating variance and introducing Bessel's Correction.
    **[**2:35**] Challenges of Bessel's Correction**
    - Sharing personal challenges in understanding the rationale behind Bessel's Correction and discussing my research process on the topic.
    **[**3:20**] Alternative Definition of Variance**
    - Presenting an alternative definition of variance to aid in understanding Bessel's Correction and expressing curiosity about its presence in the literature.
    **[**4:45**] Quick Recap of Mean and Variance**
    - Briefly revisiting the concepts of mean and variance, demonstrating how they are calculated with examples, and explaining how variance reflects different distributions.
    **[**7:05**] Sample Mean and Variance Estimation**
    - Explaining the challenges of estimating the mean and variance of a distribution using a sample and discussing why sample variance is not a good estimate.
    **[**8:49**] Bessel's Correction and Why \( n-1 \) is Used**
    - Explaining how Bessel's Correction provides a better estimate of variance and why we divide by \( n-1 \) instead of \( n \). Emphasizing the importance of making a correct variance estimate.
    **[**10:51**] Why Better Estimation Matters?**
    - Discussing why the original estimate is poor and why making a better estimate is crucial. Explaining the significance of sample mean as a good estimate.
    **[**13:02**] Issues with Variance Estimation**
    - Illustrating the problems with variance estimation and demonstrating with examples why using the correct mean is essential for accurate estimates. Explaining the accuracy of estimates made using \( n-1 \).
    **[**15:04**] Introduction to Correcting the Estimate**
    - Discussing the underestimated variance and the need for correction in estimation.
    **[**15:57**] Adjusting the Variance Formula**
    - Explaining the adjustment in the variance formula by changing the denominator from \( n \) to \( n - 1 \).
    **[**16:22**] Calculation Illustration**
    - Demonstrating the calculation process of variance with the adjusted formula using examples.
    **[**16:57**] Better Estimate with Bessel's Correction**
    - Discussing how the corrected estimate provides a more accurate variance estimation.
    **[**18:24**] New Method for Variance Calculation**
    - Introducing a new method for calculating variance without explicitly calculating the mean.
    **[**20:06**] Understanding the Relation between Variance and Variance**
    - Explaining the relationship between variance and variance, and how they are related mathematically.
    **[**21:52**] Demonstrating a Bad Calculation**
    - Illustrating a flawed method for calculating variance and explaining the need for correction.
    **[**23:37**] The Role of Bessel's Correction**
    - Explaining why removing unnecessary zeros in variance calculation leads to better estimates, equivalent to Bessel's Correction.
    **[**25:08**] Summary of Estimation Methods**
    - Summarizing the difference between the flawed and corrected estimation methods for variance.
    **[**26:02**] Importance of Bessel's Correction**
    - Emphasizing the significance of Bessel's Correction for accurate variance estimation, especially with smaller sample sizes.
    **[**30:19**] Mathematical Proof of Variance Relationship**
    - Providing two proofs of the relationship between variance and variance, highlighting their equivalence.
    **[**35:24**] Acknowledgments and Conclusion**

    • @SerranoAcademy
      @SerranoAcademy  Місяць тому +1

      Thank you so much! @mcan543

    • @SerranoAcademy
      @SerranoAcademy  Місяць тому +2

      I pasted it into the comments, it's a really good breakdown. :)