I'm not sure if you read your comments, but I'd really like some help to understand the interpretation of confidence intervals better. Let's say I have 6 apples, of which 4 are rotten. Therefore 66.7% of my apples are rotten. So if I randomly pick up an apple, the probability that that specific apple is rotten is 66.7%, right? Now for confidence intervals. I draw a random sample of some population and construct a 95% CI of my sample mean. The correct interpretation is that 95% of similarly constructed intervals will contain the true population mean, but it's not possible to make a claim about whether (or the probability) my particular CI contains the true value. Therefore, it is incorrect to say that my particular sample CI has a 95% probability of containing the true mean. But if I draw a parallel to the apple example - why is this not true? I chose a random apple and the chance it was rotten was 66.7%. So why not if I choose a random sample (which I did), the chance it contains the true mean is 95%? Is there something about conditional probability (ie. the fact that I've already chosen the apple/sample) and how it changes things, that I'm missing here? I'd be very grateful for any help understanding this. Thank you!
Lol, after pondering about this for a long time, I stumbled upon one other video right after typing this comment and found my answer which I hope is correct. From the frequentist point of view - the probability of my particular apple being rotten is not 66.7%. It's either 0% or 100%, because the apple either is or isn't rotten. Whether or not I know if it's rotten (eg. if I'm blindfolded) doesn't affect the objective truth about it being rotten or not. So there's no meaning to asking what the probability of it being rotten is. Therefore similarly, within the frequentist POV in which the confidence interval is constructed, the CI either does or does not contain the mean. There's no meaning to a "probability" of it containing the mean, and the fact that we don't know if it does, doesn't change the truth. In contrast, what I described in my original comment was a bayesian POV. Even after I've picked the apple, if I have a blindfold on and don't know if it's rotten or not, then the probability to me is still my prior (66.7%). However I cannot apply a similar bayesian reasoning to the confidence interval, because it was constructed with a frequentist method. So I cannot ascribe a probability to it containing the true mean. I cannot use a "prior" of 95% to imply that my particular CI is 95%. I really hope this right. Please do correct me if not. Thank you!
@@jayashrishobna There are some really helpful examples here: stats.stackexchange.com/q/26450/176202 Your interpretation is largely correct, but a purist frequentist view goes even further: You can't even say it has 0 or 100% chance, there is simply no probability involved anymore. A recent example someone asked on CV was: "What if I write down a confidence interval on a piece of paper many times, and now I randomly pick a piece of paper?" In this example, of course, 95% of the pieces of paper contain the true interval, but an individual piece of paper does not need to have a 95% chance to contain the true parameter. Why? Here is an absurd, but technically valid confidence procedure: On 95% of the pieces of paper, I write (-infinity, +infinity), and on the remaining 5% of pieces of paper, I write nothing... This still results in 95% of papers containing the true parameter, so it is a 95% confidence procedure. But hopefully you'll agree the individual pieces of paper cannot have a 95% chance to contain the true parameter.
You talk a lot about wording, so what I miss is one clear sentence, that describes the confidence interval I compute for my research. What does it tell me?
That's a great question, because not every statistician will agree on this, but personally, I think it is very reasonable to use the definition in the second quiz answer here: fransrodenburg.github.io/Applied-Statistics/simplelinearregression.html#quiz "A range of plausible values for the estimate." This circumvents the difficulty of succinctly expressing that the 95% is not actually about the interval itself. A correct definition of the confidence *procedure*, that does include the percentage, would be: "In repeated experimentation, at least 95% of intervals contain the true value."
Beautiful, thanks for the clear explanation.😍
This is an absolutely incredible video. Wow. Thank you!
I found this to be helpful and informative. Thank you for taking the time to make this video and help me to better understand confidence intervals.
actually makes sense now. 👍
I'm not sure if you read your comments, but I'd really like some help to understand the interpretation of confidence intervals better.
Let's say I have 6 apples, of which 4 are rotten. Therefore 66.7% of my apples are rotten. So if I randomly pick up an apple, the probability that that specific apple is rotten is 66.7%, right?
Now for confidence intervals. I draw a random sample of some population and construct a 95% CI of my sample mean. The correct interpretation is that 95% of similarly constructed intervals will contain the true population mean, but it's not possible to make a claim about whether (or the probability) my particular CI contains the true value. Therefore, it is incorrect to say that my particular sample CI has a 95% probability of containing the true mean.
But if I draw a parallel to the apple example - why is this not true? I chose a random apple and the chance it was rotten was 66.7%. So why not if I choose a random sample (which I did), the chance it contains the true mean is 95%?
Is there something about conditional probability (ie. the fact that I've already chosen the apple/sample) and how it changes things, that I'm missing here? I'd be very grateful for any help understanding this. Thank you!
Lol, after pondering about this for a long time, I stumbled upon one other video right after typing this comment and found my answer which I hope is correct.
From the frequentist point of view - the probability of my particular apple being rotten is not 66.7%. It's either 0% or 100%, because the apple either is or isn't rotten. Whether or not I know if it's rotten (eg. if I'm blindfolded) doesn't affect the objective truth about it being rotten or not. So there's no meaning to asking what the probability of it being rotten is. Therefore similarly, within the frequentist POV in which the confidence interval is constructed, the CI either does or does not contain the mean. There's no meaning to a "probability" of it containing the mean, and the fact that we don't know if it does, doesn't change the truth.
In contrast, what I described in my original comment was a bayesian POV. Even after I've picked the apple, if I have a blindfold on and don't know if it's rotten or not, then the probability to me is still my prior (66.7%). However I cannot apply a similar bayesian reasoning to the confidence interval, because it was constructed with a frequentist method. So I cannot ascribe a probability to it containing the true mean. I cannot use a "prior" of 95% to imply that my particular CI is 95%.
I really hope this right. Please do correct me if not. Thank you!
@@jayashrishobna There are some really helpful examples here: stats.stackexchange.com/q/26450/176202
Your interpretation is largely correct, but a purist frequentist view goes even further: You can't even say it has 0 or 100% chance, there is simply no probability involved anymore.
A recent example someone asked on CV was: "What if I write down a confidence interval on a piece of paper many times, and now I randomly pick a piece of paper?" In this example, of course, 95% of the pieces of paper contain the true interval, but an individual piece of paper does not need to have a 95% chance to contain the true parameter.
Why? Here is an absurd, but technically valid confidence procedure: On 95% of the pieces of paper, I write (-infinity, +infinity), and on the remaining 5% of pieces of paper, I write nothing... This still results in 95% of papers containing the true parameter, so it is a 95% confidence procedure. But hopefully you'll agree the individual pieces of paper cannot have a 95% chance to contain the true parameter.
@Frans_Rodenburg I am so so grateful for your reply. That's an awesome illustrative example, and thanks for sharing the link too!
You talk a lot about wording, so what I miss is one clear sentence, that describes the confidence interval I compute for my research. What does it tell me?
That's a great question, because not every statistician will agree on this, but personally, I think it is very reasonable to use the definition in the second quiz answer here: fransrodenburg.github.io/Applied-Statistics/simplelinearregression.html#quiz
"A range of plausible values for the estimate."
This circumvents the difficulty of succinctly expressing that the 95% is not actually about the interval itself.
A correct definition of the confidence *procedure*, that does include the percentage, would be:
"In repeated experimentation, at least 95% of intervals contain the true value."
This fallacy is unbelievably annoying.