Thanks for this video. I knew the theory of hypothesis testing and can do it on paper but it wasn't easy to do the same in python until I saw your video. Thanks for the simplicity.
thank you very useful video. just wondering for two sample or paired tests, is there a way to test if null hypothesis is not just 0 but some none-zero value. For example if S1 is the first sample and S2 is the second sample, then how do we test the hypothesis that S1 - S2 > 1
That is a good point Valda. The distributions should be normal for the t-test, which can be checked through informal means like inspecting a histogram or normal qqplot or more formally with a test like scipy.stats.shapiro(). If the sample data is large enough though, say 50+, that might be adequate for the sample data due to the normality of the sampling distribution via the central limit theorem, but I'm not sure there's a good hard-and-fast rule as to when things are "not normal enough." It is probably a good idea to also run a non-parametric test like the Mann-Whitney test for independent samples and Wilcoxon Signed Rank test for paired samples if normality questionable.
@@DataDaft thanks for response! When I compute these types of statsicial testing, I always firstly do normality test (scipy.stats.shapiro). Based on the result of normality testing I choose either from parametric ttests or nonparametric tests (like Wilcoxon or Mann-Whitney).
Hypothesis testing is a core statistical idea that plays a role in many other concepts in data science and machine learning. Basically any time you have a situation where you want to investigate whether one sample of data differs another (or from a population), hypothesis testing is something to consider. For example, it is at the core of A/B testing which is used to choose between two different options, like which version of an ad or website attracts more clicks.
Please, what if I have a different number of records for each of the testing group. For instance 2000 records for control and 2050 for test group. Can I use python function : t_stat, p_val= ss.ttest_ind(df_cnt.exp_rev,df_trt.exp_rev)? I got result: T-score = 0.16434444604672976 # There is 16 % deviation from H0 mean # p-value = 0.8694662602367074 # p-value is > than significance level i.e. 0.05 # Therefore I am rejecting H1 the treatment did not performed better than the control Can I interpret it like this? Thank you very mucho in advance.
I want to thank you for these videos as I'm struggling in my college data science course. This has helped me massively!
you explained this concept in the simplest way i have ever seen
Thank you very much!!! I couldn't find anywhere else this test as well explained as you did it. So accurated explanation. Thank you! A+ !
Thanks for this video. I knew the theory of hypothesis testing and can do it on paper but it wasn't easy to do the same in python until I saw your video. Thanks for the simplicity.
This is exactly what I needed, thank you.
you are so awesome !! you explained so well ...
which statistical test can be used to find difference between two groups' percentage values?
Thank you so much. This is very helpful.
thank you very useful video. just wondering for two sample or paired tests, is there a way to test if null hypothesis is not just 0 but some none-zero value. For example if S1 is the first sample and S2 is the second sample, then how do we test the hypothesis that S1 - S2 > 1
what do we do to our model if we accept an alternative hypothesis?
How you have learn statistics? Please mention some good resource to learn
Shouldn't normality testing be done before performing ttests?
(Otherwise, great video, thanks 👍🏻)
That is a good point Valda. The distributions should be normal for the t-test, which can be checked through informal means like inspecting a histogram or normal qqplot or more formally with a test like scipy.stats.shapiro(). If the sample data is large enough though, say 50+, that might be adequate for the sample data due to the normality of the sampling distribution via the central limit theorem, but I'm not sure there's a good hard-and-fast rule as to when things are "not normal enough." It is probably a good idea to also run a non-parametric test like the Mann-Whitney test for independent samples and Wilcoxon Signed Rank test for paired samples if normality questionable.
@@DataDaft thanks for response! When I compute these types of statsicial testing, I always firstly do normality test (scipy.stats.shapiro). Based on the result of normality testing I choose either from parametric ttests or nonparametric tests (like Wilcoxon or Mann-Whitney).
@@valda313 Thanks for the input! it is helpful to have knowledge viewers fill in gaps (or make me aware of errors). Helps everyone learn.
quality material!!
Do I need hypothesis testing in machine learning modeling,? or lets say when should i do hypotheses testing in dataset , as a data scientist
Hypothesis testing is a core statistical idea that plays a role in many other concepts in data science and machine learning. Basically any time you have a situation where you want to investigate whether one sample of data differs another (or from a population), hypothesis testing is something to consider. For example, it is at the core of A/B testing which is used to choose between two different options, like which version of an ad or website attracts more clicks.
Please, what if I have a different number of records for each of the testing group.
For instance 2000 records for control and 2050 for test group. Can I use python function :
t_stat, p_val= ss.ttest_ind(df_cnt.exp_rev,df_trt.exp_rev)?
I got result:
T-score = 0.16434444604672976
# There is 16 % deviation from H0 mean
# p-value = 0.8694662602367074
# p-value is > than significance level i.e. 0.05
# Therefore I am rejecting H1 the treatment did not performed better than the control
Can I interpret it like this? Thank you very mucho in advance.
can i say that a p-value = false positive probability?
You are damnnn good m loving it to study with you.
Amazing!!!
Awesome
Why do you set degree of freedom to 49?
Did you got answer
df= n -1......
Sample size n = 50 ...
so 50-1=49