Thought I'd just add a tidbit here since I find the terminology a bit confusing, and find some of the setup, especially the conclusion, unexplained or poorly explained (this results in some of the confusion in the comments). Anyone, please correct me if I'm wrong. The intention of determining the confidence interval, is to see 1) Is there weight loss between the 2 diets and 2) How much? To that end, we need to determine, on average, what is the weight difference between the 2 groups (this is why you take the difference of the means). So now you have a new distribution, the distribution of the weight differences between group 1 and group 2. The next step, is to determine, within 95% confidence (i.e. 95% chance if I were to pick a random person between group 1 and group 2, they would have lost this amount of weight), how much weight is lost between group 1 and group 2. You do all the math, and you arrive with a confidence interval of 0.7 to 3.12lbs. This means if you were to pick 2 random people between group 1 and group 2, there is a 95% chance the weight lost between the 2 would be between 0.7lbs to 3.12lbs. Now to answer the 2 initial questions. Is there weight loss between the 2 diets? Yes, because even the lowest value (0.7lbs) is above 0lbs. Again, this is not an absolute 100% there is weight loss (maybe if you do 99.9999% your range will be -1lbs to 5lbs, in which case there are people who haven't lost weight), but you are 95% confident (at least that's how I look at it). As to how much? Again, not an absolute 100%, but I am 95% confident with the data given, it is between 0.7-3.12lbs.
Thank you for the great video Sal! :D I was thinking, maybe I am mistaken, but I don't believe that a 95% confidence interval means that there is a 95% chance that the confidence interval contains the population mean. Because if that was true, then, no matter how "far off" your sample mean was, there would always be a 95% chance that the confidence interval around it contains the population mean. This makes it seem as if the population mean is "moving around". A 95% confidence interval means that 95% of all the SAMPLES you take, will contain the population mean. Not that one sample has a 95% chance of containing the population mean.
Yes...the probability that the true difference between the population means lies within the confidence interval calculated from the difference between his sample means is either 0% or 100%.
Question, since the sample standard deviations of both samples were used (as opposed to population standard deviations) why was the z-distribution (and thus z-table) used to estimate the critical value for the confidence interval? Wouldn't a t-distribution and t-table be more appropriate? Thanks!
Because the sample size is 100, which is greater than 30, meaning that it would be appropriate to use normal distribution instead of t -distribution. Thus, we look up the z-table.
Exactly! I also felt like they are not correct and I used t-distibutuon and I found that the difference between population means is between 0.69 and 3.13
The expected value of sample variance is an unbiased estimation of population variance, thats why the s is used to "replace" the σ of the population X1
Hi khan, I think you might have a mistake here... You assumed the sample means are the true means of the population. Isn't it better if you can calculate a 95% confidence interval of the distribution of sample means of x1 and the control, and then say that the mean of their differences is 95%*95% between the differences of the calculated condifence intervals?
You are right, but at 06:20 he was just calculating how many standard deviations the (X-mean) is, not until 13:20 then he calculated the X (the actual mean) which are within the (1.96* std dev + mean) with 95% confidence interval
We just have the mean of the sample distributions, and not the actual mean of the population. x1(bar)-x2(bar) doesn't actually represent the population mean value. Which is why we cannot exactly use the formula you mentioned. Sal is trying to find the CI for the actual mean around the calculated sample mean.
Only if you are fairly certain that the samples are from a normally distributed population. If you look, 100 degrees of freedom on the Student's t table for a 95% confidence interval is still significantly off from the relative Z score.
Wish I could like this video twice! You are single handedly getting me through my Biostats module in my masters. Thank you!!!!
This video helped a little but it's all over the place, definitely needs more organization.
I owe Khan Academy 50% of my academic career
Thought I'd just add a tidbit here since I find the terminology a bit confusing, and find some of the setup, especially the conclusion, unexplained or poorly explained (this results in some of the confusion in the comments). Anyone, please correct me if I'm wrong.
The intention of determining the confidence interval, is to see 1) Is there weight loss between the 2 diets and 2) How much? To that end, we need to determine, on average, what is the weight difference between the 2 groups (this is why you take the difference of the means). So now you have a new distribution, the distribution of the weight differences between group 1 and group 2. The next step, is to determine, within 95% confidence (i.e. 95% chance if I were to pick a random person between group 1 and group 2, they would have lost this amount of weight), how much weight is lost between group 1 and group 2. You do all the math, and you arrive with a confidence interval of 0.7 to 3.12lbs. This means if you were to pick 2 random people between group 1 and group 2, there is a 95% chance the weight lost between the 2 would be between 0.7lbs to 3.12lbs. Now to answer the 2 initial questions.
Is there weight loss between the 2 diets? Yes, because even the lowest value (0.7lbs) is above 0lbs. Again, this is not an absolute 100% there is weight loss (maybe if you do 99.9999% your range will be -1lbs to 5lbs, in which case there are people who haven't lost weight), but you are 95% confident (at least that's how I look at it). As to how much? Again, not an absolute 100%, but I am 95% confident with the data given, it is between 0.7-3.12lbs.
Thank you for helping me through Biostats.
All I can say may the good Lord bless more with wisdom 😊😊
Thank you for being the goat of explaining stuff my that my teachers cannot
Thank you for the great video Sal! :D
I was thinking, maybe I am mistaken, but I don't believe that a 95% confidence interval means that there is a 95% chance that the confidence interval contains the population mean. Because if that was true, then, no matter how "far off" your sample mean was, there would always be a 95% chance that the confidence interval around it contains the population mean. This makes it seem as if the population mean is "moving around".
A 95% confidence interval means that 95% of all the SAMPLES you take, will contain the population mean. Not that one sample has a 95% chance of containing the population mean.
+Ihatenicknames1 yeah I was wondering the same thing.
Yes...the probability that the true difference between the population means lies within the confidence interval calculated from the difference between his sample means is either 0% or 100%.
I believe the correct wording is "we are 95% confident that this interval overlaps the true population mean"
Question, since the sample standard deviations of both samples were used (as opposed to population standard deviations) why was the z-distribution (and thus z-table) used to estimate the critical value for the confidence interval? Wouldn't a t-distribution and t-table be more appropriate? Thanks!
Because the sample size is 100, which is greater than 30, meaning that it would be appropriate to use normal distribution instead of t -distribution. Thus, we look up the z-table.
Exactly! I also felt like they are not correct and I used t-distibutuon and I found that the difference between population means is between 0.69 and 3.13
The expected value of sample variance is an unbiased estimation of population variance, thats why the s is used to "replace" the σ of the population X1
its crazy how he talks so fast when he constantly stumbles over words and repeats himself and is always correcting himself. makes it hard to learn
I want to ask.. should the difference between two means be always postive? I mean X1 should be the bigger one so the result is positive
Thank you SO much for all of your video's Sal!! They have helped me SO much!
Please include the formulas in your calculations
Hi khan, I think you might have a mistake here... You assumed the sample means are the true means of the population. Isn't it better if you can calculate a 95% confidence interval of the distribution of sample means of x1 and the control, and then say that the mean of their differences is 95%*95% between the differences of the calculated condifence intervals?
dont we need to calculate pooled variance?
I don't get the intuition behind why amalgamating the two sample means tells us anything?
I love your video's, I t is annoying though that you repeat everything you say as soon as you say it.
Sometimes it helps getting the point across some dumb students like myself. :-( . But okay.
isn't Z score = ( X - Mean)/ std dev .. so at 06:20 it should be 1.96* std dev + mean , right ??
You are right, but at 06:20 he was just calculating how many standard deviations the (X-mean) is,
not until 13:20 then he calculated the X (the actual mean) which are within the (1.96* std dev + mean) with 95% confidence interval
We just have the mean of the sample distributions, and not the actual mean of the population. x1(bar)-x2(bar) doesn't actually represent the population mean value. Which is why we cannot exactly use the formula you mentioned. Sal is trying to find the CI for the actual mean around the calculated sample mean.
Is it 1.96* sd or 1.96*SE (standard error?)
thank you for all !!!!!!!!!
Is this course, Probability and Statistics, available in PDF, Sal?
Nina 1 ppp
very explicit. Thanks
How come the true mean is x1-x2 and not x1+x2 divided by 2?? Thanks Sal for the video(s)
would the difference of the sample means, 1.91, be the point estimate?
That's right
What if x bar - x bar = 0???
I thought 95% is mean ± 2*stdev ??
please - please- please number these :) thanks for the help cheers
I'm confused.... what makes this a z test rather than a t test?
if a sample size is 30 or more we use z, if not we use t.
Okay thank you do much!
Only if you are fairly certain that the samples are from a normally distributed population. If you look, 100 degrees of freedom on the Student's t table for a 95% confidence interval is still significantly off from the relative Z score.
Idk, but the way he narrates is repetitive and confusing because of so much repetition. It's so hard to understand with the way it is explained.
But why did we choose 95% confidence interval... why not 99% or some other value??
Because we are taking samples the closest percentage we can take us 95% in this subject. I think thats wat my lecturer said 🤣
you can choose whatever confidence interval u want to, depending on how confident you want to be in your results
What does that 95% mean in plain English?
95% of samples contain the mean in the interval
Perez Kevin Anderson Angela Young Donna