I have been looking for this explanation for a long time. It's surprisingly rare given the large number of statistics resources, so I'm especially grateful for your video. Thank you!
You are great in your videos. A simple foundation can help explain the toughest ideas. I am taking a graduate business statistics class and honestly i wish professors used your approach. Instead of getting to technical questions that people dont want or cant understand its better to keep the basics solid and then present the outcomes.
There is a logical flaw in the 'degrees of freedom' thing. It requires you to know the sample mean before knowing all the sampled values which is impossible. So you have to sample the very last value to calculate sample mean ( 5 in the video ). Am I missing something?
The 'degree of freedom' thing assumes that the sample mean is known while the population mean is unknown. I think it's basically the minimum number of observations from the whole sample that you have to necessarily have in order to essentially have to full sample, which again assumes that the sample mean is known. It is an inherent property of the data regardless if the sample mean is actually known or unknown, as far as my understanding goes. I'm replying to a comment made 2 years ago so I guess you might already know the answer and please enlighten me if I'm wrong. Thanks :)
What an explanation on lost degrees of freedom when inferring population mean from sample mean. In other words, why calculation of sample variance is divided by n-1 and not n. Hats-off.
4 years of uni and 1 year later I finally understand why it is divided by n-1 and an idea of what this "degree of freedom" means. Thanks for the laugh, finally make a lot of sense.
Just a doubt, even when we have the population mean (mu), the sum of the deviations from the mean must be equal to zero right (in this case the third observation should be 11 making the sum zero). Why is it that we apply this logic only while considering a sample?
I think the idea is that the sample mean was found from the 3 observations. The population mean is the real mean and is not from that sample so in that case the three observations wouldn't have deviations that have to sum up to zero.
It is actually just a kludge that provides a more conservative estimate of sd because, in general, it is customary to provide an overestimate of deviation (or variance) from the mean so that those benefiting from the study are not surprised when they encounter outliers. I used to think this was just my historical and research driven opinion, but I recently heard that n-1 is not acceptable any longer in IB mathematics for secondary. Anyone else?
We have a custom of overestimating the variance so that people are not surprised by outliers? If that were true, it would be a very strange custom and rationale for it. If the population mean were known, then nobody would be making an argument for dividing by n-1 (rather than n) in the sample variance formula. (Nobody would be saying “Hey, let's nudge this up a bit so that we tend to overstimate the variance.”) When we're estimating the population variance, the population mean is, for all practical intents and purposes, always unknown. So we need to replace the population mean with the sample mean in the sample variance formula. But, as I discuss in this video, this tends to make the sample variance too small, so we adjust by dividing by n-1. This has the result of making the sample variance an unbiased estimator of the population variance (as I prove in another video). There are arguments put forth for dividing by n rather than n-1, varying from simple arguments like “it is simpler and more understandable, and it really doesn't make much of a difference” to more advanced arguments based on maximum likelihood or mean square error. In any event, the standard method of dividing by n-1 was not motivated by a desire to overestimate the variance; it results in an unbiased estimator of the population variance, and is, at worst, a very reasonable way of going about our business. Cheers.
I think you essentially have the entire historical development of this backwards. It is not that dividing by n makes more sense, but rather that it is where we start from, i.e., the mean "absolute" deviation, indeed that is why the word absolute is being used in this context. Dividing by n-1 was done ad-hoc in the development of statistics and explanations and so-called proofs for why it is in fact more desireable, including df arguments, etc., are post-facto. You claim this makes the sample variance too small, so my question to you is "why?" Why do you think that historically this value is considered too small? Is there some reason inherent in the operation of the mathematics? Answer: no. It is an arbitrary convenience and explanations came later. The most typical explanation, i.e., that one should use n for the population (if obtainable) and n-1 for the sample is also designed upon the same vanity, i.e, that since we do not know the behavior of the population, and only have a sample to test with, that we should therefore use a conservative estimate, i.e., we should provide ourselves with more freedom with which to assign samples to the null hypothesis, in order to thereby limit samples from nullifying the prevailing hypothesis. This makes the statistician a deliverer of proposals, ideas, hypotheses, etc., that are less admissable of change, or more conservative changed. This then is why the typical textbook understanding argues that the variance is too small unless n-1 is used.
Your viewpoint on this is so radically different from mine that we don't have a starting point and I see no reason to continue this discussion. I stand by everything I stated in the video and my earlier response.
@@sanchitakanta1018 I'm not sure why you feel a sample size of 30 has any relevance here. The formula is the formula, regardless of the sample size. For large sample sizes, the difference between dividing by n and dividing by n-1 is minimal, but we don't change the formula based on the sample size. It's better to completely forget that you ever heard of a an "n > 30" rule than to try to apply it everywhere.
@@jbstatistics we were taught we need to divide by n-1 in the forumala for S.D only when the sample size is small that is less than or equal to 30. We need not follow that right? We will do n-1 regardless of the sample size?
Hey guys, if you can't understand the relation to the degree of freedom then just ignore it, and I suggest you all watch the video mentioned in the info for the proof.
@1:39 why would 'x bar' minimize the value if its an estimate, as it could be smaller or larger than the true mean. Do you have a link for the explanation of this idea. Whenever you see this I would appreciate a reply.
(There is a calculus explanation below this hand-waving argument.) I think it stands to reason that the value of c that minimizes the summation sum(x_i - c)^2 would lie somewhere in the “middle” of the x_i values (somewhere between the minimum and maximum). If we chose a value outside of that, the further outside we went (on one side or the other), the greater this sum would be. That said, I don’t think it’s obvious (to most) why the choice of the sample mean as c would minimize this quantity, so here’s a full explanation using calculus: d/dc sum(x_i - c)^2 = sum(x_i - c)(2)(-1). Setting this equal to 0, sum(x_i - c)(2)(-1) = 0, then sum(x_i - c) = 0, sum(x_i) - nc = 0, and c = sum(x_i)/n = x bar. This yields a minimum, as the second derivative is positive (d/dc -2(sum(x_i - c) = 2).
Thanks for this video! Is the main distinction here that the calculation for the population mean is based on three observations that are not the sum total of all observations - so they can be anything. With the sample mean, the three observations here are all observations, leaving all except one observation to vary? I'll look at your other vids to figure out why this makes mathematical sense...thanks again.
Thanks a lot sir. But I would be glad if I can also watch the prove to pooled variance which is an unbiased estimator of the population variance (sigma square)
+Emmanuel Muyiwa Once we have that E(S^2) = sigma^2, then proofs for some types of pooled variance fall right out. As an example, consider sampling independently from 2 groups, and using the pooled variance Sp^2 = (S1^2(n1-1) + S2^2(n2-1))/(n1+n2-2). Then E(Sp^2) = E(S1^2(n1-1) + S2^2(n2-1))/(n1+n2-2) = (1/(n1+n2-2))E((n1-1)S1^2 + (n2-1)S2^2) = (1/(n1+n2-2))((n1-1)sigma^2 + (n2-1)sigma^2) = (1/(n1+n2-2))((n1+n2-2)sigma^2) = sigma^2.
I don't see how this distinguishes between the population variance and the sample variance, because even if you have data for a whole population, if you know the population mean then you only need n - 1 numbers to determine the final number. Like if you have a population of 100, and you know the population mean, you only need 99 numbers to determine the final number in the population.
why we divide by degrees of freedom instead of number of observation after we know what degrees of freedom is? What's logic or theory behind dividing by df instead of n?
Hello, thank you for this video! But I don't quite understand the contrast bit in the end, for instance, in the population mean example, why can all three numbers can be of any value, while in the sample example, one number does not have the "freedom" to be of any values? Thanks! :)
It would depend on what topic you wish to learn. I've created UA-cam playlists on the major topics that I've covered so far, and I have a full list of videos (in a reasonable watching order) on jbstatistics.com.
Wait... what? So the 'sample variance' is not a measure of the sample's variance? Its an estimation of the population variance? I get it - things in science get silly names sometimes, but why on earth don't stats teachers ever mention that? I'm pretty sure this is what trips everyone up - it would save millions of wasted man-hours if someone just kinda like... said.
I've got to say, though - your degrees of freedom explanation seemed a bit backwards to me. Surely you take the sample and then the numbers are just what they are. You calculate the sample mean from your sample values, not the other way around.
That said, it was the most helpful video I've seen on this - and I've seen a fair few by now. I'm looking forward to checking out the mathematical proof.
Thanks for the compliments. Yes, you calculate the sample mean from the sample values, but that doesn't negate anything I say in this video. The use of the sample mean (in place of the population mean) in the formula for the sample variance results in a loss of 1 degree of freedom. Cheers.
Apparently so, but I can't understand that part. I can't understand where any degrees of freedom enter into the explanation for using n-1. Losing a degree of freedom seems like losing the right to fly unassisted. As far as I can see, You have your population mean, which is fixed, your population variance, which is fixed, your sample values, which are fixed, your sample mean, which is fixed... I can't see any freedom anywhere. We're just crunching numbers. Now that I know the sample variance is not a sample variance, it makes perfect sense that we need to compensate for the smaller variance that we get from a sample compared to the population, and I've watched the other video you mentioned and I can see it all works out algebraically... I fully (as far as I am aware) understand that if you stipulate a parameter, such as a mean then you lose a degree of freedom in your set of numbers... but how that applies in an explanatory way to this matter is completely lost to me.
I still don’t understand why the standard deviation of the observation is SMALLER than that of the whole population. Assume the population is 1,2,3,4,5,6,7 (N=7) and the observed sample is 4,5,6 (n=3) then it’s clear that sample mean (=5) is actually LARGER than population mean (=4). Could u answer that??
If we use n as the divisor, then *on average* the sample standard deviation will be less than the population standard deviation. In your example, you start off talking about the standard deviation then flip to the mean, so I'm not sure what you're getting at there. The sample standard deviation might be greater than the population standard deviation, and it might be less. If we use n-1 as the divisor, then *on average*, the sample standard deviation equals the population standard deviation. Cherry picking specific examples isn't meaningful here. A randomly selected 8 year old girl might weigh more than a randomly selected 40 year old male, but that doesn't imply that 8 year old girls weigh more than 40 year old males on average.
Oops I was careless. I pose the wrong example. Right example should be: Assume the population is 1,2,3,4,5,6,7 (N=7) and the observed sample is 1,4,7 (n=3) then it’s clear that sample variance (=22) is actually LARGER than population variance (=4). But u surmised correctly about what I meant. I omitted that it’s AVERAGE variance/AVERAGE standard deviation that we are talking about. Thank you!!
@@jbstatistics Thanks for the explanation. I think the reason is "Dont ask why the formula looks like this." Funny that after graduating from college for so many years, I am interested to know how come the formula is divided by n-1!!
I am looking for the explanation of why we divide by N-1 in population variance when proving unbiasedness of sample variance estimate which is divided by n-1. The problem is not n - 1 but N - 1. There is no degrees of freedom in population, sir. So dividing by n-1 does not make variance estimator unbiased by itself. Just take a population of {1,2,3,4}, N=4, and draw all samples for n=3, calculate variances using n-1 and calculate expected value. It will not be equal to Population variance with N, but N-1. I cant find an answer to this. As far as I know, population variance should be calculated by dividing N.
sir one question.. why data set of known mean 'mu' has standard deviation non zero while data set of unknown mean 'x bar' has standard deviation zero after all they are same set of data.
You're melding together a few different concepts. Standard deviation is the square-root of variance, which is not the same as the sum of distances from the mean. The explanation in the video is referring to the sum of distances for each data point from the mean. Next, "data set" describes the exact data that you have. If you know all of the data points for the true population, you would have a known population mean and known population variance (your data set is the entire population). If you were to sum up the distance of every *population* data point from the *population* mean, then you would get 0. It is rare in real life to know the true population parameters, and often that is a moving target in biology. So instead, we estimate those numbers based on a sample of the population (your data set is the sample population). If we knew the true population mean, we could calculate the distance of each *sample* point from the *population* mean and the sum of those differences would probably not be zero, because it is very unlikely that the sample mean is exactly the same as the population mean. But if you sum the distance of each *sample* point from the *sample* mean, that would have to be zero (because of the way we calculate mean - you can test that yourself by creating your own mini data sets and finding the mean and summed distances from the mean).
When we know all the values we know their mean, of course. But the deal is there are only n pieces of information. When we're using the sample mean, based on those n pieces of information, there are only n - 1 pieces of information left.
Yes, sure. If we have a population consisting of N values, x_1, x_2, ..., x_N, then mu = x_i / N and sum (x_i - mu) = 0 (where the sums are from i=1 to N). But I have a feeling you have a follow up question :)
But when people ask with n-1 they truly understand that dividing by n is underestimating the population. They want to understand why n-1 and not n-1/2 or n-pi/4. Just why the -1 and not something else. Nobody is able to respond to the question. Do we need a PhD in mathematics to understand the proof or is it possible to just explain why ?
First, this statement of yours is far from the truth: "But when people ask with n-1 they truly understand that dividing by n is underestimating the population." Do you think that everyone, when looking at that formula, thinks something like "Well, sure, of course we don't just take a regular average, as that would result in an estimator that is too small on average. But just what should we divide by?" The proportion of people who would think that intuitively is tiny. Second, the description of this video is very accurate. In it, I describe what this video is about, and then state this: I have another video with a mathematical proof that dividing by n-1 results in an unbiased estimator of the population variance, available at ua-cam.com/video/D1hgiAla3KI/v-deo.html. And if you watch that you'll note that no, you don't need a PhD to understand the proof. But you do need to know some information beyond what would be the typical background of students when the sample variance is introduced. So it would be silly for me to go into that as *the* explanation. Hence the two videos: 1) "An informal discussion...", 2) "Proof that...".
@@jbstatistics I mean a step by step proof. Beginning from the formal definition with all the propositions demonstrated so it's possible to understand why. But I recognise that I made the mistake. Maybe I should find others videos
@@Toto-cm5ux If my video proof isn't what you're looking for, then I'm really not sure what you're looking for and you're welcome to check out other stuff. But that's a video proof showing that E(S^2) = sigma^2. If you need all those earlier elements (linearity of expectation, the variance of a random variable, variance of sample mean, etc.) explained and demonstrated in a single video, starting from scratch, and then working up to showing E(S^2) = sigma^2 in that video, that's simply not going to happen (from anybody on Earth, ever). It simply doesn't make sense to start a video from those basics and work all the way through that proof. Learning those things earlier, then coming up to this topic, is the natural progression of events. Jumping to a proof of this after first encountering the sample variance is not. So everyone reasonably assumes a certain background knowledge when working through that proof.
"On average, this estimator equals the population variance sigma squared" On average these statistics are the right statistics We took statistics about your statistics and your statistics check out Yo Dawg, we heard you liked statistics...
"We took statistics about your statistics and your statistics check out" It's more like: We investigated the theoretical properties of your statistical methods, and they check out.
@@bivanchakraborty8203 I don’t know what you mean. The estimate might be greater than the value of the parameter or less than it. Since in practical cases we don’t know the value of the parameter, we cannot possibly know whether our estimate is greater or less than the parameter’s true value, and thus there is no adjustment we can possibly make to compensate for any overestimation or underestimation. Often we know what will happen *on average* (with certain assumptions), and here we know that if we divide by n-1 then on average the sample variance equals the population variance. What happens for any given sample, who knows. Maybe the sample variance is close to the population variance, maybe it’s not, that’s up to the fates.
I don’t like the language you’re using. You said using n-1 “ properly compensates for the problem”.... what!? What does it mean to “properly compensate for the problem” !? Does it compensate for it perfectly? A little bit? More than not doing it? The estimator does NOT equal the population variance sigma squared. Where did you get that? Demonstrate why that is. Don’t just say “it turns out that...” I don’t want to take your word for it, I want to see proof.
If you're keen on seeing the proof, and not the "informal discussion" that this video states it is in the description, then perhaps you should see my proof that I reference in the description. I mean, c'mon man. And at no point did I say the estimator equals the population variance sigma squared. "Where did you get that?"
Just a doubt, even when we have the population mean (mu), the sum of the deviations from the mean must be equal to zero right (in this case the third observation should be 11 making the sum zero). Why is it that we apply this logic only while considering a sample?
I think, We always deal with sample to predict population. So let us take sample and calculate variance (sigma^2). and as explained if we make prediction using the samples so all used samples should be unbiased that is why take n-1.
I have been looking for this explanation for a long time. It's surprisingly rare given the large number of statistics resources, so I'm especially grateful for your video. Thank you!
You are very welcome! I'm glad you've found this video helpful!
Perfect - as always! Your explanation zero into the topic and have crystal clear clarity!
Thanks!
You are great in your videos. A simple foundation can help explain the toughest ideas. I am taking a graduate business statistics class and honestly i wish professors used your approach. Instead of getting to technical questions that people dont want or cant understand its better to keep the basics solid and then present the outcomes.
There is a logical flaw in the 'degrees of freedom' thing. It requires you to know the sample mean before knowing all the sampled values which is impossible. So you have to sample the very last value to calculate sample mean ( 5 in the video ). Am I missing something?
2 years and no one's responded to this...
The 'degree of freedom' thing assumes that the sample mean is known while the population mean is unknown. I think it's basically the minimum number of observations from the whole sample that you have to necessarily have in order to essentially have to full sample, which again assumes that the sample mean is known. It is an inherent property of the data regardless if the sample mean is actually known or unknown, as far as my understanding goes.
I'm replying to a comment made 2 years ago so I guess you might already know the answer and please enlighten me if I'm wrong. Thanks :)
answer here: en.wikipedia.org/wiki/Bessel%27s_correction , Maybe.
Degrees of freedom requires you to know the sample's size, not the sample's mean.
@@andrewlachance2062 I have responded
What an explanation on lost degrees of freedom when inferring population mean from sample mean. In other words, why calculation of sample variance is divided by n-1 and not n. Hats-off.
It had been killing me that I did not have a good understanding on n-1. This helped A LOT. Thank you SO SO MUCH
JB > Khan Academy
Thank you very much, sir. Great explanation
Thanks!
You're smart 👌
they are best in their way
4 years of uni and 1 year later I finally understand why it is divided by n-1 and an idea of what this "degree of freedom" means. Thanks for the laugh, finally make a lot of sense.
This channel helped me get my Statistics degree. Thank you.
better explanation than what ive seen in previous video! thank you!
Just a doubt, even when we have the population mean (mu), the sum of the deviations from the mean must be equal to zero right (in this case the third observation should be 11 making the sum zero). Why is it that we apply this logic only while considering a sample?
I think the idea is that the sample mean was found from the 3 observations. The population mean is the real mean and is not from that sample so in that case the three observations wouldn't have deviations that have to sum up to zero.
I have the same question too. The population mean is calculated with sum(xi)/N, so it's sum of the deviations from the mean will be equal to zero
This is the best definition for a layman of what are degrees of freedom.
Thanks, take care. ✌😷
Thank you sir!
It is actually just a kludge that provides a more conservative estimate of sd because, in general, it is customary to provide an overestimate of deviation (or variance) from the mean so that those benefiting from the study are not surprised when they encounter outliers. I used to think this was just my historical and research driven opinion, but I recently heard that n-1 is not acceptable any longer in IB mathematics for secondary. Anyone else?
We have a custom of overestimating the variance so that people are not surprised by outliers? If that were true, it would be a very strange custom and rationale for it.
If the population mean were known, then nobody would be making an argument for dividing by n-1 (rather than n) in the sample variance formula. (Nobody would be saying “Hey, let's nudge this up a bit so that we tend to overstimate the variance.”) When we're estimating the population variance, the population mean is, for all practical intents and purposes, always unknown. So we need to replace the population mean with the sample mean in the sample variance formula. But, as I discuss in this video, this tends to make the sample variance too small, so we adjust by dividing by n-1. This has the result of making the sample variance an unbiased estimator of the population variance (as I prove in another video).
There are arguments put forth for dividing by n rather than n-1, varying from simple arguments like “it is simpler and more understandable, and it really doesn't make much of a difference” to more advanced arguments based on maximum likelihood or mean square error. In any event, the standard method of dividing by n-1 was not motivated by a desire to overestimate the variance; it results in an unbiased estimator of the population variance, and is, at worst, a very reasonable way of going about our business.
Cheers.
I think you essentially have the entire historical development of this backwards. It is not that dividing by n makes more sense, but rather that it is where we start from, i.e., the mean "absolute" deviation, indeed that is why the word absolute is being used in this context. Dividing by n-1 was done ad-hoc in the development of statistics and explanations and so-called proofs for why it is in fact more desireable, including df arguments, etc., are post-facto. You claim this makes the sample variance too small, so my question to you is "why?" Why do you think that historically this value is considered too small? Is there some reason inherent in the operation of the mathematics? Answer: no. It is an arbitrary convenience and explanations came later. The most typical explanation, i.e., that one should use n for the population (if obtainable) and n-1 for the sample is also designed upon the same vanity, i.e, that since we do not know the behavior of the population, and only have a sample to test with, that we should therefore use a conservative estimate, i.e., we should provide ourselves with more freedom with which to assign samples to the null hypothesis, in order to thereby limit samples from nullifying the prevailing hypothesis. This makes the statistician a deliverer of proposals, ideas, hypotheses, etc., that are less admissable of change, or more conservative changed. This then is why the typical textbook understanding argues that the variance is too small unless n-1 is used.
Your viewpoint on this is so radically different from mine that we don't have a starting point and I see no
reason to continue this discussion. I stand by everything I stated in the video and my earlier response.
This video and the next complete eachother! Thanks for sharing your knowledge with the world, in such a clear and direct way!
you guys have done a great job by making these kind of useful videos for statistic students
finally i found an amazing side for learning statistic!
I'm glad to be of help! (And "you guys" is just me -- this is a one-man show!)
most clear explanation of degree of freedom ever! thank you
Thanks!
@@jbstatistics Can you please say if we need to do n-1 even if my sample size is greater than 30?
@@sanchitakanta1018 I'm not sure why you feel a sample size of 30 has any relevance here. The formula is the formula, regardless of the sample size. For large sample sizes, the difference between dividing by n and dividing by n-1 is minimal, but we don't change the formula based on the sample size. It's better to completely forget that you ever heard of a an "n > 30" rule than to try to apply it everywhere.
@@jbstatistics we were taught we need to divide by n-1 in the forumala for S.D only when the sample size is small that is less than or equal to 30.
We need not follow that right?
We will do n-1 regardless of the sample size?
Excellent explanation! I couldn’t understand it this clearly previously in under grad.
Only by dividing by n-1 instead of n is s^2 an UNBIASED estimator of the population parameter. It is just a mathematical fact. Good explanation.
Hey guys, if you can't understand the relation to the degree of freedom then just ignore it, and I suggest you all watch the video mentioned in the info for the proof.
hmm
I have been looking for this and the explanation of degrees of freedom...... Thank you.
@1:39 why would 'x bar' minimize the value if its an estimate, as it could be smaller or larger than the true mean. Do you have a link for the explanation of this idea. Whenever you see this I would appreciate a reply.
(There is a calculus explanation below this hand-waving argument.)
I think it stands to reason that the value of c that minimizes the summation sum(x_i - c)^2 would lie somewhere in the “middle” of the x_i values (somewhere between the minimum and maximum). If we chose a value outside of that, the further outside we went (on one side or the other), the greater this sum would be.
That said, I don’t think it’s obvious (to most) why the choice of the sample mean as c would minimize this quantity, so here’s a full explanation using calculus:
d/dc sum(x_i - c)^2 = sum(x_i - c)(2)(-1). Setting this equal to 0, sum(x_i - c)(2)(-1) = 0, then sum(x_i - c) = 0, sum(x_i) - nc = 0, and c = sum(x_i)/n = x bar. This yields a minimum, as the second derivative is positive (d/dc -2(sum(x_i - c) = 2).
@@jbstatistics thank you!
Finally I got a more intuitive understanding of df because of your tutorial... Appreciated
Thanks for this video! Is the main distinction here that the calculation for the population mean is based on three observations that are not the sum total of all observations - so they can be anything. With the sample mean, the three observations here are all observations, leaving all except one observation to vary? I'll look at your other vids to figure out why this makes mathematical sense...thanks again.
Finally I got this concept. Thank you so much for the clear explanation.
Great channel! Very clear on everything. Thanks a bunch
Thanks a lot sir. But I would be glad if I can also watch the prove to pooled variance which is an unbiased estimator of the population variance (sigma square)
+Emmanuel Muyiwa Once we have that E(S^2) = sigma^2, then proofs for some types of pooled variance fall right out. As an example, consider sampling independently from 2 groups, and using the pooled variance Sp^2 = (S1^2(n1-1) + S2^2(n2-1))/(n1+n2-2). Then E(Sp^2) = E(S1^2(n1-1) + S2^2(n2-1))/(n1+n2-2) = (1/(n1+n2-2))E((n1-1)S1^2 + (n2-1)S2^2) = (1/(n1+n2-2))((n1-1)sigma^2 + (n2-1)sigma^2) = (1/(n1+n2-2))((n1+n2-2)sigma^2) = sigma^2.
Praise the Lord. Thank you again.
I don't see how this distinguishes between the population variance and the sample variance, because even if you have data for a whole population, if you know the population mean then you only need n - 1 numbers to determine the final number. Like if you have a population of 100, and you know the population mean, you only need 99 numbers to determine the final number in the population.
Great way to explain the intuition behind! Thanks!
why we divide by degrees of freedom instead of number of observation after we know what degrees of freedom is? What's logic or theory behind dividing by df instead of n?
statistics textbooks are some of the most inconsistent things in the world. it's called sample mean square in the textbook that I have.
Hello, thank you for this video! But I don't quite understand the contrast bit in the end, for instance, in the population mean example, why can all three numbers can be of any value, while in the sample example, one number does not have the "freedom" to be of any values? Thanks! :)
Did you go to Concordia University by any chance?
No, I've never attended or taught at Concordia.
What would a 'technical' discussion of the d.f. entail?
Nicely explained, very easy to digest. Thanks and keep it up
Thanks!
No words. It's amazing.
I will like you to enlighten me on this theorem... "In the population of size N, if sampling is without replacement, and if the sample size is n
Thank you for introducing the degree of freedom. I get the n - 1 now.
Brilliant video!
Thanks
You are very welcome, and thanks for the compliment!
great explanation
Thank you so much for this! I finally understand the intuition :)
You're very welcome!
why sample mean is smaller than the true population mean?
i want to learn stats from your videos but confused which one to watch and understand first... kindly help
It would depend on what topic you wish to learn. I've created UA-cam playlists on the major topics that I've covered so far, and I have a full list of videos (in a reasonable watching order) on jbstatistics.com.
I want to learn MLE, method of moments, chi square distribution...
Good video. Very clear and useful.
Wait... what? So the 'sample variance' is not a measure of the sample's variance? Its an estimation of the population variance? I get it - things in science get silly names sometimes, but why on earth don't stats teachers ever mention that? I'm pretty sure this is what trips everyone up - it would save millions of wasted man-hours if someone just kinda like... said.
I've got to say, though - your degrees of freedom explanation seemed a bit backwards to me. Surely you take the sample and then the numbers are just what they are. You calculate the sample mean from your sample values, not the other way around.
That said, it was the most helpful video I've seen on this - and I've seen a fair few by now. I'm looking forward to checking out the mathematical proof.
Thanks.
Thanks for the compliments. Yes, you calculate the sample mean from the sample values, but that doesn't negate anything I say in this video. The use of the sample mean (in place of the population mean) in the formula for the sample variance results in a loss of 1 degree of freedom. Cheers.
Apparently so, but I can't understand that part. I can't understand where any degrees of freedom enter into the explanation for using n-1. Losing a degree of freedom seems like losing the right to fly unassisted. As far as I can see, You have your population mean, which is fixed, your population variance, which is fixed, your sample values, which are fixed, your sample mean, which is fixed... I can't see any freedom anywhere. We're just crunching numbers.
Now that I know the sample variance is not a sample variance, it makes perfect sense that we need to compensate for the smaller variance that we get from a sample compared to the population, and I've watched the other video you mentioned and I can see it all works out algebraically... I fully (as far as I am aware) understand that if you stipulate a parameter, such as a mean then you lose a degree of freedom in your set of numbers... but how that applies in an explanatory way to this matter is completely lost to me.
Addictive and lovely channel⭐
Excellent explanation
Thanks!
Perfect explanation
I still don’t understand why the standard
deviation of the observation is SMALLER than that of the whole population. Assume
the population is 1,2,3,4,5,6,7 (N=7) and the observed sample is 4,5,6 (n=3)
then it’s clear that sample mean (=5) is actually LARGER than population mean
(=4).
Could u answer that??
If we use n as the divisor, then *on average* the sample standard deviation will be less than the population standard deviation. In your example, you start off talking about the standard deviation then flip to the mean, so I'm not sure what you're getting at there.
The sample standard deviation might be greater than the population standard deviation, and it might be less. If we use n-1 as the divisor, then *on average*, the sample standard deviation equals the population standard deviation.
Cherry picking specific examples isn't meaningful here. A randomly selected 8 year old girl might weigh more than a randomly selected 40 year old male, but that doesn't imply that 8 year old girls weigh more than 40 year old males on average.
Oops I was careless. I pose the wrong example. Right example should be:
Assume the population is 1,2,3,4,5,6,7 (N=7) and the observed sample is 1,4,7 (n=3) then it’s clear that sample variance (=22) is actually LARGER than population variance (=4).
But u surmised correctly about what I meant. I omitted that it’s AVERAGE variance/AVERAGE standard deviation that we are talking about.
Thank you!!
@@梦醒红楼 sample variance = 22?
How did i went through Business Stats class with a pure A without knowing this, LOL.
I suspect that most students who get an A still don't understand this concept fully. It's a bit tricky!
@@jbstatistics Thanks for the explanation. I think the reason is "Dont ask why the formula looks like this." Funny that after graduating from college for so many years, I am interested to know how come the formula is divided by n-1!!
Awesome explanation!
I am looking for the explanation of why we divide by N-1 in population variance when proving unbiasedness of sample variance estimate which is divided by n-1. The problem is not n - 1 but N - 1. There is no degrees of freedom in population, sir. So dividing by n-1 does not make variance estimator unbiased by itself. Just take a population of {1,2,3,4}, N=4, and draw all samples for n=3, calculate variances using n-1 and calculate expected value. It will not be equal to Population variance with N, but N-1. I cant find an answer to this. As far as I know, population variance should be calculated by dividing N.
Excellent explanation!!!
sir one question..
why data set of known mean 'mu' has standard deviation non zero while data set of unknown mean 'x bar' has standard deviation zero after all they are same set of data.
You're melding together a few different concepts. Standard deviation is the square-root of variance, which is not the same as the sum of distances from the mean. The explanation in the video is referring to the sum of distances for each data point from the mean. Next, "data set" describes the exact data that you have. If you know all of the data points for the true population, you would have a known population mean and known population variance (your data set is the entire population). If you were to sum up the distance of every *population* data point from the *population* mean, then you would get 0. It is rare in real life to know the true population parameters, and often that is a moving target in biology. So instead, we estimate those numbers based on a sample of the population (your data set is the sample population). If we knew the true population mean, we could calculate the distance of each *sample* point from the *population* mean and the sum of those differences would probably not be zero, because it is very unlikely that the sample mean is exactly the same as the population mean. But if you sum the distance of each *sample* point from the *sample* mean, that would have to be zero (because of the way we calculate mean - you can test that yourself by creating your own mini data sets and finding the mean and summed distances from the mean).
thanks sir..
now i got to know what you means....
with regard from Sonam
Aha! This had bugged me for a long time until now! :)
how is knowing the sample mean different than knowing each sample’s values?
When we know all the values we know their mean, of course. But the deal is there are only n pieces of information. When we're using the sample mean, based on those n pieces of information, there are only n - 1 pieces of information left.
Super Useful !!!! Super Amazing !!!
Thankyou so much!
Hi sir ,
I have a question. Shouldn't the sum of deviations from mean should also be 0 for population ?
Yes, sure. If we have a population consisting of N values, x_1, x_2, ..., x_N, then mu = x_i / N and sum (x_i - mu) = 0 (where the sums are from i=1 to N). But I have a feeling you have a follow up question :)
But when people ask with n-1 they truly understand that dividing by n is underestimating the population. They want to understand why n-1 and not n-1/2 or n-pi/4. Just why the -1 and not something else.
Nobody is able to respond to the question. Do we need a PhD in mathematics to understand the proof or is it possible to just explain why ?
First, this statement of yours is far from the truth:
"But when people ask with n-1 they truly understand that dividing by n is underestimating the population."
Do you think that everyone, when looking at that formula, thinks something like "Well, sure, of course we don't just take a regular average, as that would result in an estimator that is too small on average. But just what should we divide by?" The proportion of people who would think that intuitively is tiny.
Second, the description of this video is very accurate. In it, I describe what this video is about, and then state this: I have another video with a mathematical proof that dividing by n-1 results in an unbiased estimator of the population variance, available at ua-cam.com/video/D1hgiAla3KI/v-deo.html.
And if you watch that you'll note that no, you don't need a PhD to understand the proof. But you do need to know some information beyond what would be the typical background of students when the sample variance is introduced. So it would be silly for me to go into that as *the* explanation. Hence the two videos: 1) "An informal discussion...", 2) "Proof that...".
@@jbstatistics I mean a step by step proof. Beginning from the formal definition with all the propositions demonstrated so it's possible to understand why.
But I recognise that I made the mistake. Maybe I should find others videos
@@Toto-cm5ux If my video proof isn't what you're looking for, then I'm really not sure what you're looking for and you're welcome to check out other stuff. But that's a video proof showing that E(S^2) = sigma^2. If you need all those earlier elements (linearity of expectation, the variance of a random variable, variance of sample mean, etc.) explained and demonstrated in a single video, starting from scratch, and then working up to showing E(S^2) = sigma^2 in that video, that's simply not going to happen (from anybody on Earth, ever). It simply doesn't make sense to start a video from those basics and work all the way through that proof. Learning those things earlier, then coming up to this topic, is the natural progression of events. Jumping to a proof of this after first encountering the sample variance is not. So everyone reasonably assumes a certain background knowledge when working through that proof.
@@jbstatistics I understand your point of view. However I found it on internet. I will present you what I would have
@@jbstatistics I am sorry. I didn't want to bother you.
Thanks a lot, sir.
INCREDIBLY USEFULL!!! THANK U SO MUCH
You are very welcome!
Very well explained.
thank you so much for this well explained video
Mean absolute deviation folks. That’s why it’s concluded to n-1
"On average, this estimator equals the population variance sigma squared"
On average these statistics are the right statistics
We took statistics about your statistics and your statistics check out
Yo Dawg, we heard you liked statistics...
"We took statistics about your statistics and your statistics check out"
It's more like: We investigated the theoretical properties of your statistical methods, and they check out.
thx -finally found a good explanation!!!
I love you. Great explanation!!!
Awesome video! Thank you!
good video .....very helpful
Thanks!
Can anyone explain if the sample variance can be greater than population variance (overestimation) or not??
Sure it can. Like almost every estimator, it can be less than, greater than, or (in a strange twist of fate) equal to the parameter it estimates.
@@jbstatistics okk... Then is there any way to rectify such overestimation??
@@bivanchakraborty8203 I don’t know what you mean. The estimate might be greater than the value of the parameter or less than it. Since in practical cases we don’t know the value of the parameter, we cannot possibly know whether our estimate is greater or less than the parameter’s true value, and thus there is no adjustment we can possibly make to compensate for any overestimation or underestimation.
Often we know what will happen *on average* (with certain assumptions), and here we know that if we divide by n-1 then on average the sample variance equals the population variance. What happens for any given sample, who knows. Maybe the sample variance is close to the population variance, maybe it’s not, that’s up to the fates.
@@jbstatistics OK... I understand... Thank you so much ❤️😇
Thank you very much 👍🏻 this video helped me a lot.
Wow! I perfect sir! I I'm really grateful sir
+Emmanuel Muyiwa I'm glad I could help!
If this video's volume were larger......
Great explanation! Thanks! :)
+Kaysar777 You are welcome!
+jbstatistics phew! I can't believe that today I understoood the concept of degrees of freedom. LOve, love, love
+jbstatistics I also want to learn abt the derivation of second formula of standard deviation and bessel's correction.
thanx in advance.
Perfect intuition
Oh you knooooow you're in the brain-melting classes when "I won't show you this here right now" is said 3 times in a minute.
Made everything click. Thanks!
I'm glad to be of help!
Great! Thank you so very much!
You are very welcome!
he is my professor
excellent and very helpful
I don’t like the language you’re using.
You said using n-1 “ properly compensates for the problem”.... what!?
What does it mean to “properly compensate for the problem” !? Does it compensate for it perfectly? A little bit? More than not doing it?
The estimator does NOT equal the population variance sigma squared. Where did you get that?
Demonstrate why that is. Don’t just say “it turns out that...” I don’t want to take your word for it, I want to see proof.
If you're keen on seeing the proof, and not the "informal discussion" that this video states it is in the description, then perhaps you should see my proof that I reference in the description. I mean, c'mon man. And at no point did I say the estimator equals the population variance sigma squared. "Where did you get that?"
You are amazinggg!
Me at 6:10
Woaaahh * mind=blown *
I hope that's a good thing! :)
it sure is, I saw this in two of my courses but couldn't understand it until now, so thanks :D
You are very welcome!
I followed it until he says we know the next will be 2. He knows, I don't. How do you get 2?... Why... I don't get it😅
If you know 3 things add up to 15, and you know the first two add up to 13, what is the value of the third thing?
@@jbstatistics thanks mate
Roughly speaking.... like if you know the jbp reference
Makes no sense
Amazing
Anyone here from Guelph lol
🎉
nuff with all the lies!!
Just a doubt, even when we have the population mean (mu), the sum of the deviations from the mean must be equal to zero right (in this case the third observation should be 11 making the sum zero). Why is it that we apply this logic only while considering a sample?
I think, We always deal with sample to predict population. So let us take sample and calculate variance (sigma^2). and as explained if we make prediction using the samples so all used samples should be unbiased that is why take n-1.
When population is large in number, there is negligible difference between divided the sum of square by n and n-1, for instance 1/10000 and 1/9999
@@praja110 Can you please say if we need to do n-1 even if my sample size is greater than 30?