I've taken statistical methods at a master's level but only knew how to get to the answer as I was told to and had never understood the small concepts' subtle and intrinsic meanings. Thanks for the video!
Not only are you teaching me much needed Stat concepts, your data examples from the last few sessions are also motivating me to get back into exercising..
I'm a data science grad student brushing up on the basics. I found your entire playlist to be the perfect thing. I even went to your store and bought some study guides. Thank you so much for what you do!
Sir , I really appreciate your effort. It is the best channel for study+ for fun + for easy understanding + for learning new things....Thanks a lot for making it easier .😊 I have tried to make these type of videos in past and it requires lots of time in that software and with that you are delivering best quality content and examples...So thank you soooooo much sir....
Wow! Thank you for clarifying the idea of why we can't "accept" the alternative hypothesis-a concept that was always murky for me.... until now :P You're the best Josh, thank you :)
@@statquest Hi Josh, I'm still not sure I get why failing to reject = overfitting. I get that a "sample" in stats is analogous to "training data" in machine learning, and how failing to reject means you cannot generalize your results to the population. But how is it similar to the statistical model "fitting" the sample dataset too well?
Hey Josh, thanks for the great explanation! I have a doubt at 5:14 - I'm not quite sure how alternate hypothesis results in overfitting the data. Is it because we are doing a drug-specific mean calculation, as opposed to calculating a single mean in null hypothesis? Or is it for some other reason?
If we insist on using two separate means to fit the data, we are over fitting the data because, statistically, there is no difference between the two means.
Josh, could you elaborate on the idea on 8:35 - you said «And depending on which one (of alternative hypotheses) we use in the statistical test we can end up making a different decision about the Null». I didn’t get how varying alternative hypothesis can affect rejecting/not rejecting the Null hypothesis.
If you look, you'll see that I have illustrated two different alternative hypotheses. Since we compare the null to one of the two alternatives, we can get different results since the alternatives are different.
Hi, I'm obscure what means failing to reject null hypothesis is the same thing that using two averages means "overfitting" in ML. Is that means we don't need to using two averages when estimate the whole of the average? As I know, the "overfitting" means that the model doesn't represent the population(not whole but only training set). Could you explain more clearly?
Failing to reject the null hypothesis can mean one of two things: 1) we are over fitting the model to the data by assuming that there are two separate groups when, in reality, there is only one or 2) we were not able to collect enough data to determine a difference. Those are two very different statements. However, we can rule out the second one by doing something called a "power analysis" before we collect the data. I explain power analyses here: ua-cam.com/video/VX_M3tIyiYk/v-deo.html
First of all, great video. I'm really thankful. I just didn't realize, when there are 3 different populations, the objective is to reject or fail to reject the null hypotheses. How do I choose the best alternative hypotheses if they aren't supposed to be accepted anyway?
Bam this was good explanation on hypothesis, if you dont mind will you cover A/B testing please, like its used cases and all, that will really help and strengthen this video more
Hi Josh! Suppose I want to check if the real (population) mean recover time of people taking (new improved) drug D is 10 hours less compared to people taking (old) drug C. How would I formulate the H0 and H1?
In machine learning, the more variables and parameters in your model, the easier it is to overfit the training data. So, in the example here, where we compare a simple model to one with more parameters and fail to see a difference, then the model with more parameters is probably over fitting the data. If, in contrast, we had rejected the null hypothesis, then we would have justification in using more parameters in our model and confidence that we had not overfit the data.
Can you explain why failing to reject the Null Hypothesis means that we have overfit the data? I am familiar with Machine Learning lingo, but I guess not enough to connect the dots 😆
In machine learning, the more variables and parameters in your model, the easier it is to overfit the training data. So, in the example here, where we compare a simple model to one with more parameters and fail to see a difference, then the model with more parameters is probably over fitting the data. If, in contrast, we had rejected the null hypothesis, then we would have justification in using more parameters in our model and confidence that we had not overfit the data.
@@statquest Sorry if I'm stupid but in the video on the null hypothesis at 13:21 you say " Then we could reject the null hypothesis. And then we know that there is a difference between drug C and D" Isn't it accept the alternative hypothesis ?
@@Marcelofer94 I'm sorry I was sloppy with my wording. Even when we reject the null hypothesis, we are still not 100% certain that we are correct. It's possible that the small p-value is the result of a false positive. So we only say that we reject the null hypothesis.
BAM - There are too many possibilities to test to know if we have accepted the correct one. And this is why we only **reject** or **fail to reject** the **null** or **primary hypothesis**. Double BAM - When we only have two groups of data, the **Alternative Hypothesis** is super obvious because it is just the opposite of the **Null Hypothesis**. But when we have 3 or more groups we have options for the Alternative Hypothesis, and depending on which one we use on the statistical test, we can end up making a different decision about the **Null Hypothesis**
Come back from machine learning statquest, I still cannot understand "Fail to reject the null hypothesis is the same thing as realizing that using two averages means that you have overfit the data." Can you elaborate it? Thank you very much.
Overfitting means that the predictive model is too rigid - works well on training data but fails on new data. This is the case - your model will try to split data into two _different_ test distributions (according to their averages) but in reality there is only one distribution with one mean. So if you fail to reject the null hypothesis, you realise that your model should predict only one distribution.
At 5:03 I say that "failing to reject the null...is the same thing as realizing that we overfit the data". The "failing to reject" part is key. When we fail to reject, we are not confident that the means are different. Thus, it would be better to use a single mean value to model all of the data. (p.s. if you're not familiar with overfitting data in machine learning, see: ua-cam.com/video/EuBBz3bI-aA/v-deo.html
thank you you really simplify and help me learn these difficult concepts! I want to buy a T-shirt but it looks like they’re only extra large. How do I get a medium? Or small?
Huh. I just tried it out and was able to select 'S' for small and 'M' for medium. Maybe try it again? If you continue to have trouble, contact me through my website: statquest.org/contact/
If the hypothesis was to prove that there are no difference between two variables, is it to prove the null hypothesis or is the null hypothesis change to be the opposite depending on our hypothesis?
We can't make a strong argument that there is no difference between two things. The best we can do is "fail to reject the idea that they are the same". I explain why this is, in detail, in my video on the null hypothesis here: ua-cam.com/video/0oc49DyA3hU/v-deo.html
In machine learning, the more variables and parameters in your model, the easier it is to overfit the training data. So, in the example here, where we compare a simple model to one with more parameters and fail to see a difference, then the model with more parameters is probably over fitting the data. If, in contrast, we had rejected the null hypothesis, then we would have justification in using more parameters in our model and confidence that we had not overfit the data.
With two groups, the alternative hypothesis is always the opposite of the null, or primary hypothesis. With 3 or more groups, the alternative hypothesis can take on different meanings.
In statistics (as in life) we can never prove that something is 100% true. However we can be reasonably confident when we reject the null hypothesis. Unfortunately, accepting the alternative hypothesis is a much weaker proposition.
The video is correct. If the distances around the 2 means are significantly shorter than the distances around a single mean, than we should reject the null hypothesis.
@@statquest thanks for the reply, I'm still confused about the "distance around a single mean", do u mean the distance between each data and a single mean?
@@mpn6362 At 4:03 we have 2 graphs of the same sets of data, one on the left and one on the right. In the graph on the left we have drawn the mean of all 6 data points (the mean of both the green and pink data points) at a little less than 30. Also in the graph on the right we have blue dotted-lines that illustrate the distances from the data points to the line that represents the mean line. In contrast, in the graph on the right we drew two means, one for just the green data points and one for just the pink data points. We also drew the distances from the green points to the mean for the green points with blue dotted-lines and we drew the distances from the pink points to the mean for pink points with blue dotted-lines. Those blue dotted lines in the graph on the right tend to be shorter than the blue dotted lines in the graph on the right. Does that make sense?
To understand more about what's going on here, see: ua-cam.com/video/0oc49DyA3hU/v-deo.html ua-cam.com/video/vemZtEM63GY/v-deo.html ua-cam.com/video/JQc3yx0-Q9E/v-deo.html
Failing to reject the null hypothesis is the same thing as realizing that using two averages means that you have overfit the data ===> Josh, I know some machine learning but I don't get this point....
I disagree with referring to lifestyle as "random" things. We can try to assign treatments randomly, but it would be better to use the wording "other factors" or something like that.
Support StatQuest by buying my book The StatQuest Illustrated Guide to Machine Learning or a Study Guide or Merch!!! statquest.org/statquest-store/
You guys have accompanied me all throughout my degree so I just wanted to thank you.
Hooray!!! Thank you so much for supporting StatQuest!!! TRIPLE BAM! :)
I've taken statistical methods at a master's level but only knew how to get to the answer as I was told to and had never understood the small concepts' subtle and intrinsic meanings. Thanks for the video!
Happy to help!
I passed this subject with flying colors in undergrad but my true foundations are getting built through these videos, thank you so much!
Bam! :)
Not only are you teaching me much needed Stat concepts, your data examples from the last few sessions are also motivating me to get back into exercising..
Double BAM! :)
Haha 😂. Don’t be that outlier!
I'm a data science grad student brushing up on the basics. I found your entire playlist to be the perfect thing. I even went to your store and bought some study guides. Thank you so much for what you do!
Wow! Thank you for your support!!! :)
you're way better at explaining this than some profs at one of the most prestigious universities in the world. can personally confirm.
Thanks!
The explanation of why we don’t accept alternative hypotheses instead just fail to reject the null is GREAT!
Thank you! :)
yeah my teacher failed to explain. this video is great
Your teaching method is amazing, you break down complex concepts into simple bite-sized chunks. Super Teacher, Triple BAM!
Wow, thanks!
Bam-sized chunks, if you will.
LOL! You have the weirdest best teaching method ever! So unique and helpful
Thank you! 😃
Sir , I really appreciate your effort. It is the best channel for study+ for fun + for easy understanding + for learning new things....Thanks a lot for making it easier .😊 I have tried to make these type of videos in past and it requires lots of time in that software and with that you are delivering best quality content and examples...So thank you soooooo much sir....
Thank you very much! :)
@@statquest No sir thank to you sir...Really you are doing great work...😇
StatQuest is the best everrrr! Thanks a million Josh :) ♡
Thank you! :)
Wow! Thank you for clarifying the idea of why we can't "accept" the alternative hypothesis-a concept that was always murky for me.... until now :P
You're the best Josh, thank you :)
Hooray!!!!
My favorite part was realizing that failing to reject is the same thing as realizing that you've over fit the data.
@@statquest I agree !
@@statquest Hi Josh, I'm still not sure I get why failing to reject = overfitting. I get that a "sample" in stats is analogous to "training data" in machine learning, and how failing to reject means you cannot generalize your results to the population. But how is it similar to the statistical model "fitting" the sample dataset too well?
I really appreciate this tutorial. It’s like a saving grace for my introduction to AI course
Thank you so much Josh
Glad it was helpful!
Understood it finally!! Thankyou Josh.
Hooray!!!!
Thanks Josh, I have learned a lot from you. You definetely deserve at least triple BAM.
Thanks! :)
Your song is exactly what I am doing. Definitely a lot better than watching netflix and wasting our lives!
BAM! :)
@@statquest :D
Thanks Josh you are saving my stats grade. Can't wait to go triple BAM when I have exam this Monday!
Good luck! :)
Clicked on the Like button only because of the song in the beginning. StatQuest!!!
Thank you! :)
Your channel is truly amazing! Triple BAM!
Wow, thanks!
Josh,one of the best on earth.
Thank you!
Thank you very much for the brilliant videos. All concepts are explained in a very fun way.
Glad you like them!
#### learning with statquest in rainy morning day
Bam! :)
Hey Josh, thanks for the great explanation!
I have a doubt at 5:14 - I'm not quite sure how alternate hypothesis results in overfitting the data. Is it because we are doing a drug-specific mean calculation, as opposed to calculating a single mean in null hypothesis? Or is it for some other reason?
If we insist on using two separate means to fit the data, we are over fitting the data because, statistically, there is no difference between the two means.
I am a fan of your videos. Amazing explanation. I pledge to contribute to patreon for a long time..
Thank you very much! :)
your videos are enjoying and helpful. thank you very much
Glad you like them!
Josh, could you elaborate on the idea on 8:35 - you said «And depending on which one (of alternative hypotheses) we use in the statistical test we can end up making a different decision about the Null». I didn’t get how varying alternative hypothesis can affect rejecting/not rejecting the Null hypothesis.
If you look, you'll see that I have illustrated two different alternative hypotheses. Since we compare the null to one of the two alternatives, we can get different results since the alternatives are different.
Its winter and its raining, too cold to go outside and watching StatQuest
Fortunate
bam!
So Loves Statquest!! Bam Bam Bam! Trifecta.
Triple BAM! :)
You deserve a Medal of Freedom
That would be a triple bam! :)
Now it's time for SHAMELESS SELF PROMOTION! LoL! You got me here! ;D
bam! :)
Hi, I'm obscure what means failing to reject null hypothesis is the same thing that using two averages means "overfitting" in ML. Is that means we don't need to using two averages when estimate the whole of the average?
As I know, the "overfitting" means that the model doesn't represent the population(not whole but only training set).
Could you explain more clearly?
Failing to reject the null hypothesis can mean one of two things: 1) we are over fitting the model to the data by assuming that there are two separate groups when, in reality, there is only one or 2) we were not able to collect enough data to determine a difference. Those are two very different statements. However, we can rule out the second one by doing something called a "power analysis" before we collect the data. I explain power analyses here: ua-cam.com/video/VX_M3tIyiYk/v-deo.html
First of all, great video. I'm really thankful. I just didn't realize, when there are 3 different populations, the objective is to reject or fail to reject the null hypotheses. How do I choose the best alternative hypotheses if they aren't supposed to be accepted anyway?
You select the alternative based on what difference you are interested in providing evidence for.
Bam this was good explanation on hypothesis, if you dont mind will you cover A/B testing please, like its used cases and all, that will really help and strengthen this video more
Thanks! I'll keep that in mind.
your music is amazing :)
:)
Hi Josh! Suppose I want to check if the real (population) mean recover time of people taking (new improved) drug D is 10 hours less compared to people taking (old) drug C. How would I formulate the H0 and H1?
That's not usually done (to learn why, see my video on the null hypothesis): ua-cam.com/video/0oc49DyA3hU/v-deo.html
Thanks. May I suggest you make a video that shows in a tree like structure the sequence of each video, like the one before and the one after ?
That's not a bad idea. For the time being, I have the videos organized by topic and from simple to complex here: statquest.org/video-index/
does rejecting the null hypothesis mean an automatic acceptance of the alternative hypothesis?
No.
Thank you!!
Double :)
Can you explain how is failing to reject null hypothesis related to overfitting in machine learning?
In machine learning, the more variables and parameters in your model, the easier it is to overfit the training data. So, in the example here, where we compare a simple model to one with more parameters and fail to see a difference, then the model with more parameters is probably over fitting the data. If, in contrast, we had rejected the null hypothesis, then we would have justification in using more parameters in our model and confidence that we had not overfit the data.
H0: whenever Josh says BAM, pause reflect and take notes for better learning (exam score)
H1: Keep watching without pause for better learning (exam score) 😀
bam!
Great fan of your channel .Thank you so much for these wonderful videos. Can we have a video on degrees of freedom too ?
I hope to do that one day.
Can you explain why failing to reject the Null Hypothesis means that we have overfit the data?
I am familiar with Machine Learning lingo, but I guess not enough to connect the dots 😆
In machine learning, the more variables and parameters in your model, the easier it is to overfit the training data. So, in the example here, where we compare a simple model to one with more parameters and fail to see a difference, then the model with more parameters is probably over fitting the data. If, in contrast, we had rejected the null hypothesis, then we would have justification in using more parameters in our model and confidence that we had not overfit the data.
If there are only 2 groups of data (A & B) and we reject the null hypothesis does that mean we accept the alternative hypothesis?
No. To understand why, see my video on the null hypothesis: ua-cam.com/video/0oc49DyA3hU/v-deo.html
@@statquest Sorry if I'm stupid but in the video on the null hypothesis at 13:21 you say " Then we could reject the null hypothesis. And then we know that there is a difference between drug C and D" Isn't it accept the alternative hypothesis ?
@@Marcelofer94 I'm sorry I was sloppy with my wording. Even when we reject the null hypothesis, we are still not 100% certain that we are correct. It's possible that the small p-value is the result of a false positive. So we only say that we reject the null hypothesis.
Bam 3000! Amazing Channel
Thank you! :)
BAM
- There are too many possibilities to test to know if we have accepted the correct one. And this is why we only **reject** or **fail to reject** the **null** or **primary hypothesis**.
Double BAM
- When we only have two groups of data, the **Alternative Hypothesis** is super obvious because it is just the opposite of the **Null Hypothesis**. But when we have 3 or more groups we have options for the Alternative Hypothesis, and depending on which one we use on the statistical test, we can end up making a different decision about the **Null Hypothesis**
You got it! :)
Come back from machine learning statquest, I still cannot understand "Fail to reject the null hypothesis is the same thing as realizing that using two averages means that you have overfit the data." Can you elaborate it? Thank you very much.
It just means that the simpler model, with just one mean, is probably better.
Overfitting means that the predictive model is too rigid - works well on training data but fails on new data. This is the case - your model will try to split data into two _different_ test distributions (according to their averages) but in reality there is only one distribution with one mean.
So if you fail to reject the null hypothesis, you realise that your model should predict only one distribution.
Can you please explain why is rejecting a null hypothesis the same thing as realising that using two averages means that you overfit the data?
At 5:03 I say that "failing to reject the null...is the same thing as realizing that we overfit the data". The "failing to reject" part is key. When we fail to reject, we are not confident that the means are different. Thus, it would be better to use a single mean value to model all of the data. (p.s. if you're not familiar with overfitting data in machine learning, see: ua-cam.com/video/EuBBz3bI-aA/v-deo.html
thank you you really simplify and help me learn these difficult concepts! I want to buy a T-shirt but it looks like they’re only extra large. How do I get a medium? Or small?
Huh. I just tried it out and was able to select 'S' for small and 'M' for medium. Maybe try it again? If you continue to have trouble, contact me through my website: statquest.org/contact/
If the hypothesis was to prove that there are no difference between two variables, is it to prove the null hypothesis or is the null hypothesis change to be the opposite depending on our hypothesis?
We can't make a strong argument that there is no difference between two things. The best we can do is "fail to reject the idea that they are the same". I explain why this is, in detail, in my video on the null hypothesis here: ua-cam.com/video/0oc49DyA3hU/v-deo.html
good job!
Thanks!
5:09 Maybe I am feeling slow mentally today but I could not quite grasp the overfitting part. Can someone explain this?
In machine learning, the more variables and parameters in your model, the easier it is to overfit the training data. So, in the example here, where we compare a simple model to one with more parameters and fail to see a difference, then the model with more parameters is probably over fitting the data. If, in contrast, we had rejected the null hypothesis, then we would have justification in using more parameters in our model and confidence that we had not overfit the data.
what is single mean.....? how did you calculate that using data of different drugs.....?
"A single mean" is the average value from all of the measurements collected, regardless of which group or category they originally came from.
I wish you were my instructor.
Thanks!
So alternatibe hypothesis is some artificial thing in case of two groups and it has more sense in case of > 2 groups, correct?
With two groups, the alternative hypothesis is always the opposite of the null, or primary hypothesis. With 3 or more groups, the alternative hypothesis can take on different meanings.
So is there a way of proving the Alternative hypothesis to be definitely true?
In statistics (as in life) we can never prove that something is 100% true. However we can be reasonably confident when we reject the null hypothesis. Unfortunately, accepting the alternative hypothesis is a much weaker proposition.
@@statquest Neat. Thanks!
Viral Load is also a very important parameter for the infection which should not be named.
:)
If there are only two groups of data and we reject the null hypothesis, then we ACCEPT THE ALTERNATIVE HYPOTHESIS?
Technically, we only reject the null.
@@statquest Thank you!
I love you BAM
:)
Josh check 4:03 it seems twisted to me, maybe it should be much longer not shorter, thanks good videos
The video is correct. If the distances around the 2 means are significantly shorter than the distances around a single mean, than we should reject the null hypothesis.
@@statquest thanks for the reply, I'm still confused about the "distance around a single mean", do u mean the distance between each data and a single mean?
@@mpn6362 At 4:03 we have 2 graphs of the same sets of data, one on the left and one on the right. In the graph on the left we have drawn the mean of all 6 data points (the mean of both the green and pink data points) at a little less than 30. Also in the graph on the right we have blue dotted-lines that illustrate the distances from the data points to the line that represents the mean line. In contrast, in the graph on the right we drew two means, one for just the green data points and one for just the pink data points. We also drew the distances from the green points to the mean for the green points with blue dotted-lines and we drew the distances from the pink points to the mean for pink points with blue dotted-lines. Those blue dotted lines in the graph on the right tend to be shorter than the blue dotted lines in the graph on the right. Does that make sense?
1:56
2:31Rem
6:11
i don't understand the meaning of the statistical test and it's correlation with the p-value, if there's any at all.
To understand more about what's going on here, see: ua-cam.com/video/0oc49DyA3hU/v-deo.html ua-cam.com/video/vemZtEM63GY/v-deo.html ua-cam.com/video/JQc3yx0-Q9E/v-deo.html
Failing to reject the null hypothesis is the same thing as realizing that using two averages means that you have overfit the data ===> Josh, I know some machine learning but I don't get this point....
For more details on what it means to over fit data in machine learning see: ua-cam.com/video/EuBBz3bI-aA/v-deo.html
00:02 it's actually raining here
bam!
And then we said Triple Bam!
YES! :)
I disagree with referring to lifestyle as "random" things. We can try to assign treatments randomly, but it would be better to use the wording "other factors" or something like that.
Noted!
I guess you're objecting on the word "chaotic" rather than "random". I might be wrong though.
Everything is temporary except 'BAM' 😀😀😀
Ha! :)
I am Abhishek,
BAM!
BAM BAM BAM
Triple bam! :)
Quadruple BAM
YES!
Hypothesis*
Since we talking about multiple hypotheses, I use the plural.
@@statquest Why is it that you are never wrong? The best
My man.👞
:)
BAAM !!
:)
Dare someone dislike it.
Nice! :)
It's alternative hypothesis after all!
:)
Kind of monotonous :S
Sorry. :(
i think this part is so boring... the youtuber was so verbose.
Noted