Support StatQuest by buying The StatQuest Illustrated Guide to Machine Learning!!! PDF - statquest.gumroad.com/l/wvtmc Paperback - www.amazon.com/dp/B09ZCKR4H6 Kindle eBook - www.amazon.com/dp/B09ZG79HXC
You are correct, which is why I did not say "probability". I said "likelihood", which is different. Likelihood refers to the y-axis coordinate for a specific x-axis coordinate.
The one tailed p-value test here tests the null hypothesis that the new measurement is better, and we get a p value of 0.03 which is smaller than 0.05, that means the null hypothesis is likely to be wrong. Am I understanding it right? P is low, null must go! Right?
Amazing videos! Just a quick question, how did the number of false positives increase from ~500 to ~800 when a one-tailed test was used moving forward? 05:52
hmmmm, I'm confused, if the 2 - sided p - value only tells you if you can reject the null hypothesis, and the null hypothesis in the video is "no differences between the new and standard treatments". How does it tells you if the new treatment is better or worse? I thought it only tells you if these 2 treatments are significantly different from each other.
The p-value only tells you if you can reject the null that there is no difference. To know if the treatment is better or worse, you just look at the means.
@@statquest This is kind of confusing. If we can look at the means and see that meanA is greater than meanB, it seems logical to t-test for statistical significance of this difference using one-tail test. If we can infer the direction of the effect this way, it seems strange to me to consider a second tail. Nevertheless, thank you for the great content!
@@nerd0us69 The key to understanding what's going on here is to know that we have to decide on significance (is the p-value less than some threshold (usually 0.05)?) before we look at the means. Otherwise we will increase the probability of getting a false positive (getting a significant p-value and rejecting the null hypothesis when we shouldn't). To learn more about this problem, see: ua-cam.com/video/HDCOUXE3HMM/v-deo.html
Could someone explain me, why we have more or less equal amount of tests in each p-value "basket" (4:50 in the video)? Since p-value is related to the probability itself, shouldn't we have more tests with bigger p-values? Sorry for (probably) a silly question.
95% of the tests will have p-values > 0.5, so your intuition is correct. However, only 5% will have p-values between 0.05 and 1. And only 5% will have p-values between 0.1 and 0.15 etc. This is pretty easy to show using simulations in R. You just select values from a standard normal curve and call that group A, then select 3 values from a standard normal curve and call that group b, and then do the test test comparing A and B. Do that a lot of times and plot a histogram of the p-values.
@@statquest thank you for the answer! That's quite an ecnouragement for me, a starting learner :D The common sense saves me. But why ur exact histogram in the video isn't skewed to the right then? Each of the "baskests" is having more or less same number of tests, though u made 10 000 of them, what is a lot.
@@hepcat93 It isn't skewed to the right because the the probability of getting a p-value between 0.95 and 1 is still only 5 percent. I encourage you to do the simulation.
Thanks for the video! I can't get the point of the experiment with a normal distribution. Why do you call non-overlapping samples a false-positive and why it has a probability of 5%?
These are great questions. Let me answer them. 1) The reason why I am doing experiments with a normal distribution is that a lot of "real" experiments use data that comes from normal distributions. For example, the height of a plant is normally distributed. So if I compared the heights of two types of plants, I would be comparing data from a normal distribution. Thus, in this video I use the normal distribution because it accurately represents a lot of potential experiments that people might do. For more details, check out my StatQuests on statistical distributions: ua-cam.com/video/oI3hZJqXJuc/v-deo.html and the normal distribution: ua-cam.com/video/rzFX5NWojp0/v-deo.html 2) To answer your second question... both samples come from the same distribution - so if we do a t-test on those samples and the p-value is < 0.05, then the t-test is suggesting that the samples do not come from the same distribution. This is called a "false positive." The t-test is designed so that if you take 2 random samples from the same distribution, 5% of the time it will report a false positive. For more details, check out the StatQuest on P-hacking: ua-cam.com/video/UFhJefdVCjE/v-deo.html
Wonderful as always, one question though: In the starting example, when we perform a t-test, in order to determine the p-value we would calculate the f-value then randomly generate data (I assume within the max and min values) repeatedly and determine the f-values and plot them in histogram to get the f-distribution and use the original f-value to determine the p-value. In this context, what does it mean to have a 1-sided vs 2-sided test? Confused due to the shape of the f-distribution. Thanks : )
So, the "one-sided" vs "two-sided" nomenclature really only applies two symmetric distributions. For non-symmetric distributions, maybe it would be better to say "equal to" or "greater/less than" p-values. The "equal to" would be equivalent to a "two-sided" p-value and the "greater/less than" would be equivalent to a "one-sided" p-value. And, by default, an F-test calculates "equal to" (aka "two-sided") p-values.
You should watch my videos on linear models. These sound really fancy, but they are simple. Part 1 introduces the concept, and part 2 covers t-tests and anova: ua-cam.com/video/nk2CQITm_eo/v-deo.html ua-cam.com/video/NF5_btOaCig/v-deo.html
I've been doing statistics for 17 years and I've never been in a situation when a 1 tailed t-test was appropriate. So I can't give you an example from my own experience. I can safely say that 1 tailed t-tests are never appropriate for academic research. However, in a commercial setting there may be a justification for it. Imagine you had 2 different drugs to cure a disease. One is very expensive, one is very cheap. You could use a 1 tailed t-test to show that the cheap drug is no worse than the expensive one. However, even this situation is somewhat artificial. If you had a competing drug, you would still want to know if the cheaper drug was better than the expensive one. So, ultimately, you would still want a 2 tailed test.
what I'm understanding is that when you use 1-tailed, the Zscore must be above 1.64 while if you use 2-tail, the Zscore must be above 1.96 and in this sense, two tailed is more skeptic to h1 so its good. but how can the same data result in 2 p values while the X, mean, sd and n have the same values?
Tks for the video sir, So the learning from this video is, when we do 1 tailed test the chances of reporting 'false positive' is more than 2 tailed test, Hence it's always recommended to go with 2 tailed test whenever possible. Correct understanding ?Also in this video we didn't conclude whether new treatment is better than standard one, it was just an example to start with, correct ? just got confused whether conclusion was drawn or not.....pls help to clarify...Tks in advance.
Yes. However, I discuss t-tests in an unusual way - in the context of general linear models, which I think is a better approach. For details, see: ua-cam.com/play/PLblh5JKOoLUIzaEkCLIUxQFjPIlapw8nU.html and all of my other videos can be found here: statquest.org/video-index/
Hi Josh. Thank you for your amazing explanations. I have one question. Let say I want to know whether treatment A is better than treatment B. First, I test it using two-tailed test with null hypothesis "treatment A is not significantly different than treatment B" and then (let say) from the result I reject the null hypothesis. Second, because i reject the null hypothesis from the first result, the possibilities are either "treatment A is better than treatment B" or "treatment B is better than treatment A". I test it again using one-tailed test with null hypothesis "treatment A is no worse than treatment B" and then (let say) from the result I failed to reject the null hypothesis. So, based on these two results, I conclude that "treatment A is better than treatment B" is that the right way? if not, what is the better way to know whether treatment A is better than treatment B? Thank you
You can just do the 2-tailed test without the follow up 1-tailed test. Once you establish that there is a difference, just look at the means. If A is better than B, then you conclude that A is better than B.
@@statquest But 2-tailed test is said to be non-directional, however, you are actually identifying the direction this way. Or do I miss something? Actually, after reading several papers on 1- and 2-tailed tests, the whole story is more unclear to me now than before :-(. See, e.g., www.onesided.org/.
@@svozild I have a newer video on p-values that may help you understand the concept of two-sided p-values better: ua-cam.com/video/JQc3yx0-Q9E/v-deo.html
It is getting there. Originally, the purpose of these videos was just to answer questions that my coworkers had. So they would ask a question, and I would make a video. So the videos did not have any systematic approach - they were just individual solutions to individual problems. However, now I've got a whole selection of videos that are relatively comprehensive of most of the basic topics in Stats. Here's the link: statquest.org/video-index/
If your hypothesis beforehand is that your medicine will improve the condition, and you state beforehand that you are going to use a one-sided t-test, I do not see the problem.
Sure, you could do that. However, if you do, you should also explicitly say that you did not test to see if the medicine makes things worse and that that remains a possibility since it was not tested.
Support StatQuest by buying The StatQuest Illustrated Guide to Machine Learning!!!
PDF - statquest.gumroad.com/l/wvtmc
Paperback - www.amazon.com/dp/B09ZCKR4H6
Kindle eBook - www.amazon.com/dp/B09ZG79HXC
Thanks for helping us and for sure your are going to get more subscribers.You have an amazing voice.Please update videos frequently.
You're welcome! Working as fast as I can, I add 2 or 3 videos a month. :)
2 sided because we need to decide what test BEFORE we experiment (otherwise p hacking). Perfect explanation!
bam! :)
Hi, Josh. Thanks for the great video! One minor problem at 3:45, it's a probability density function, so y-axis doesn't mean the probability..
You are correct, which is why I did not say "probability". I said "likelihood", which is different. Likelihood refers to the y-axis coordinate for a specific x-axis coordinate.
@@statquest sorry for the wrong location, 3:22, "probability".
@@hq9248 Yeah, i should have said "related to" or something like that.
The one tailed p-value test here tests the null hypothesis that the new measurement is better, and we get a p value of 0.03 which is smaller than 0.05, that means the null hypothesis is likely to be wrong. Am I understanding it right? P is low, null must go! Right?
Yes.
Amazing videos! Just a quick question, how did the number of false positives increase from ~500 to ~800 when a one-tailed test was used moving forward? 05:52
This is explained at 5:13
@@statquest Ahh I see. I rewatched it. Got it now. Thank you!
hmmmm, I'm confused, if the 2 - sided p - value only tells you if you can reject the null hypothesis, and the null hypothesis in the video is "no differences between the new and standard treatments". How does it tells you if the new treatment is better or worse? I thought it only tells you if these 2 treatments are significantly different from each other.
The p-value only tells you if you can reject the null that there is no difference. To know if the treatment is better or worse, you just look at the means.
@@statquest 😯that makes sense, thanks a lot
@@statquest This is kind of confusing. If we can look at the means and see that meanA is greater than meanB, it seems logical to t-test for statistical significance of this difference using one-tail test. If we can infer the direction of the effect this way, it seems strange to me to consider a second tail.
Nevertheless, thank you for the great content!
@@nerd0us69 The key to understanding what's going on here is to know that we have to decide on significance (is the p-value less than some threshold (usually 0.05)?) before we look at the means. Otherwise we will increase the probability of getting a false positive (getting a significant p-value and rejecting the null hypothesis when we shouldn't). To learn more about this problem, see: ua-cam.com/video/HDCOUXE3HMM/v-deo.html
Could someone explain me, why we have more or less equal amount of tests in each p-value "basket" (4:50 in the video)? Since p-value is related to the probability itself, shouldn't we have more tests with bigger p-values? Sorry for (probably) a silly question.
95% of the tests will have p-values > 0.5, so your intuition is correct. However, only 5% will have p-values between 0.05 and 1. And only 5% will have p-values between 0.1 and 0.15 etc. This is pretty easy to show using simulations in R. You just select values from a standard normal curve and call that group A, then select 3 values from a standard normal curve and call that group b, and then do the test test comparing A and B. Do that a lot of times and plot a histogram of the p-values.
@@statquest thank you for the answer! That's quite an ecnouragement for me, a starting learner :D The common sense saves me. But why ur exact histogram in the video isn't skewed to the right then? Each of the "baskests" is having more or less same number of tests, though u made 10 000 of them, what is a lot.
@@hepcat93 It isn't skewed to the right because the the probability of getting a p-value between 0.95 and 1 is still only 5 percent. I encourage you to do the simulation.
Thank you very much Josh
You bet!
I have to say, you are an AWESOME teacher!!
Thank you! 😃
Thanks for the video! I can't get the point of the experiment with a normal distribution. Why do you call non-overlapping samples a false-positive and why it has a probability of 5%?
These are great questions. Let me answer them. 1) The reason why I am doing experiments with a normal distribution is that a lot of "real" experiments use data that comes from normal distributions. For example, the height of a plant is normally distributed. So if I compared the heights of two types of plants, I would be comparing data from a normal distribution. Thus, in this video I use the normal distribution because it accurately represents a lot of potential experiments that people might do. For more details, check out my StatQuests on statistical distributions: ua-cam.com/video/oI3hZJqXJuc/v-deo.html and the normal distribution: ua-cam.com/video/rzFX5NWojp0/v-deo.html 2) To answer your second question... both samples come from the same distribution - so if we do a t-test on those samples and the p-value is < 0.05, then the t-test is suggesting that the samples do not come from the same distribution. This is called a "false positive." The t-test is designed so that if you take 2 random samples from the same distribution, 5% of the time it will report a false positive. For more details, check out the StatQuest on P-hacking: ua-cam.com/video/UFhJefdVCjE/v-deo.html
does a two tailed t-test tell mean of distribution A is greater than mean of distribution B? or it just tells me that they are different?
The p-value just tells you that they are different, but you can then look at the means to tell which direction they are different.
Wonderful as always, one question though: In the starting example, when we perform a t-test, in order to determine the p-value we would calculate the f-value then randomly generate data (I assume within the max and min values) repeatedly and determine the f-values and plot them in histogram to get the f-distribution and use the original f-value to determine the p-value. In this context, what does it mean to have a 1-sided vs 2-sided test? Confused due to the shape of the f-distribution. Thanks : )
So, the "one-sided" vs "two-sided" nomenclature really only applies two symmetric distributions. For non-symmetric distributions, maybe it would be better to say "equal to" or "greater/less than" p-values. The "equal to" would be equivalent to a "two-sided" p-value and the "greater/less than" would be equivalent to a "one-sided" p-value. And, by default, an F-test calculates "equal to" (aka "two-sided") p-values.
@@statquest Thank you very much : )
good to know about the p hacking
After watching the P value vedio I watched this one.I can't find vedios introducing t test on your list . Am I missing something?
You should watch my videos on linear models. These sound really fancy, but they are simple. Part 1 introduces the concept, and part 2 covers t-tests and anova:
ua-cam.com/video/nk2CQITm_eo/v-deo.html
ua-cam.com/video/NF5_btOaCig/v-deo.html
Could you tell me an experiment in witch it's correct to chose 1 tailed t test a priori?
I've been doing statistics for 17 years and I've never been in a situation when a 1 tailed t-test was appropriate. So I can't give you an example from my own experience. I can safely say that 1 tailed t-tests are never appropriate for academic research. However, in a commercial setting there may be a justification for it. Imagine you had 2 different drugs to cure a disease. One is very expensive, one is very cheap. You could use a 1 tailed t-test to show that the cheap drug is no worse than the expensive one. However, even this situation is somewhat artificial. If you had a competing drug, you would still want to know if the cheaper drug was better than the expensive one. So, ultimately, you would still want a 2 tailed test.
what I'm understanding is that when you use 1-tailed, the Zscore must be above 1.64 while if you use 2-tail, the Zscore must be above 1.96 and in this sense, two tailed is more skeptic to h1 so its good. but how can the same data result in 2 p values while the X, mean, sd and n have the same values?
For more details about how to calculate p-values, check out: ua-cam.com/video/vemZtEM63GY/v-deo.html
Tks for the video sir, So the learning from this video is, when we do 1 tailed test the chances of reporting 'false positive' is more than 2 tailed test, Hence it's always recommended to go with 2 tailed test whenever possible. Correct understanding ?Also in this video we didn't conclude whether new treatment is better than standard one, it was just an example to start with, correct ? just got confused whether conclusion was drawn or not.....pls help to clarify...Tks in advance.
Here's a video that should answer all of your questions: ua-cam.com/video/JQc3yx0-Q9E/v-deo.html
@@statquest Tks a lot sir, will watch....
Hello Josh, do we have some videos for statistical tests or t-test?
Yes. However, I discuss t-tests in an unusual way - in the context of general linear models, which I think is a better approach. For details, see: ua-cam.com/play/PLblh5JKOoLUIzaEkCLIUxQFjPIlapw8nU.html and all of my other videos can be found here: statquest.org/video-index/
@@statquest thanks for your reply. Will watch that series
Hi Josh. Thank you for your amazing explanations.
I have one question. Let say I want to know whether treatment A is better than treatment B. First, I test it using two-tailed test with null hypothesis "treatment A is not significantly different than treatment B" and then (let say) from the result I reject the null hypothesis. Second, because i reject the null hypothesis from the first result, the possibilities are either "treatment A is better than treatment B" or "treatment B is better than treatment A". I test it again using one-tailed test with null hypothesis "treatment A is no worse than treatment B" and then (let say) from the result I failed to reject the null hypothesis. So, based on these two results, I conclude that "treatment A is better than treatment B"
is that the right way? if not, what is the better way to know whether treatment A is better than treatment B?
Thank you
You can just do the 2-tailed test without the follow up 1-tailed test. Once you establish that there is a difference, just look at the means. If A is better than B, then you conclude that A is better than B.
@@statquest okay. thank you very much for the answer :)
@@statquest But 2-tailed test is said to be non-directional, however, you are actually identifying the direction this way. Or do I miss something? Actually, after reading several papers on 1- and 2-tailed tests, the whole story is more unclear to me now than before :-(. See, e.g., www.onesided.org/.
@@svozild I have a newer video on p-values that may help you understand the concept of two-sided p-values better: ua-cam.com/video/JQc3yx0-Q9E/v-deo.html
@@statquest Thanks a lot! I missed this one, going to watch it carefully. And thank you very much, Josh, such a great work for the community!
Hello Josh, can you mention an example of a test where it only makes sense to do a 1 tailed test?
I've never used a one tailed t-test.
You’re amazing
Bam! :)
please provide videos on the hypothesis testing with examples!
I plan on doing that soon.
hope this statistics series will be more complete and systematic
It is getting there. Originally, the purpose of these videos was just to answer questions that my coworkers had. So they would ask a question, and I would make a video. So the videos did not have any systematic approach - they were just individual solutions to individual problems. However, now I've got a whole selection of videos that are relatively comprehensive of most of the basic topics in Stats. Here's the link: statquest.org/video-index/
I am confused.....So why the two tests have contradict results?
I have a new video that does a better job explaining this concept here: ua-cam.com/video/JQc3yx0-Q9E/v-deo.html
If your hypothesis beforehand is that your medicine will improve the condition, and you state beforehand that you are going to use a one-sided t-test, I do not see the problem.
Sure, you could do that. However, if you do, you should also explicitly say that you did not test to see if the medicine makes things worse and that that remains a possibility since it was not tested.