Dear M. Jbstatistics, I've struggling with a problem I was hopeless to solve and your ressources helped me solve it. I wanted to tell you: A BIG THANK YOU You're the boss ! Big thanks
There is no real advantage to doing a pooled-variance t test as a regression. You get exactly the same information using both methods. I made that video to illustrate 2 things: 1) As a brief intro to including categorical variables in regression, which can be very useful in a multiple regression setting. 2) To illustrate the relationship between the two methods, which may help students understand the model, the assumptions, and the proper interpretation of results.
There is no need to do it as a regression. My "pooled-variance t test as a regression" video is part of my regression playlist, and it's a topic best discussed in regression. In regression, it's part of the bigger picture of including categorical explanatory variables in regression analysis. We may wish to include both quantitative and categorical variables as explanatory variables in a regression, and we include categorical variables by declaring appropriate indicator variables.
Although one can do a pooled-variance t test as a regression (and I have a video outlining that), here I've simply expressed this as an ordinary test of a difference in means between two groups. The number of groups has nothing to do with the sample size. If we did this as a regression, and coded our groups as X=0 and X=1, then values of X between 0 and 1 would be completely meaningless. The explanatory variable is categorical, and it does not make sense to discuss values between 0 and 1.
When sampling from normally distributed populations, the difference between z tests and t tests depends on whether the population standard deviations are known are not. Since the population standard deviations are pretty much never known, in real world situations tests like these are done as t tests. Sure, the t distribution gets very close to the standard normal distribution as the degrees of freedom increase, but that doesn't mean we should just jump to the approximation when it works reasonably well. If it's a t test, it's a t test, regardless of sample size. Your instructor (and other sources, including some texts) may say something different, so if you are in a stats course listen to your instructor to find our what you should do in your particular course.
Ok the whole comment was meant for this video The Pooled Variance t Test as a Regression .Sorry for the trouble I've downloaded the videos since its easier to pause and go back that's why the mess. Now is it possible to give me a better understanding why would use the regression line since as you mentioned yourself the points between 0 and 1 don't express anything. Besides of what you mentioned in that video is there any other practical meaning of plotting the regression? Sorry for the queries.
Hi thanks for this video A quick question 1)Suppose We are doing manually to get T Critical From Ttable instead of using Excel or any software Df=117 But Highest Df is 1000 And second highest df is 120 And third highest df is 60 Whcih one should I chose? or how should I chose ? to get t critical value 2) I understood how to get confidence interval. Can I say as an analyst "95% of sample means of particular population lies between lower and upper limit. And remaining 5% the analyst is taking the risk saying that"5% of sample mean are not from the same population" please clarify
1) It's best to use software. If you're using a table, then either take the conservative approach (go down to 60 DF), linearly interpolate, or say "meh, 120 is close enough to 117 for me". There are pros and cons to each one of those, and an instructor might recommend any of them. Relying on software is best. 2) No, we can't say that. Keep in mind that interpretations of confidence intervals always relate to the value of *parameters* and never sample statistics.
Thanks for the feedback. One more question regarding this. Then practically what does the regression line accomplish in this situation. I mean you found the same results as your your hypothesis test but why would I wanna make a categorical regression line in the first place? Wasn't the whole point of making one so I can estimate the values on the line given some hypothetical x values?
One of the assumptions here is "Normally Distributed Populations" . I don't get what that exactly mean ? If we have an unknown population distribution ( which may not be normal ) , if we take large enough sample size , and plot the sample distribution plot then by CLT it will come close to Normal distribution as the sample size increases . Now , the sample distribution is Normal but population distribution is not ...so my doubt here is can I actually apply everything what said in this video to find the confidence interval . Someone please help
Ok I think overall I understand what you did. I have the following questions: Basically you quantified X and then you made your regression line. But is this regression line trustworthy. I mean basically it's made by only 2 xi so my sample is really small. Or because these xi come from a bigger initial sample before we don't mind? Also second question what do the other x between 0 and 1 express in this example when you made your regression line?
instead of the t value i think you are looking at the t table, why is the t value 1.980 when in the t table for 95% we have for df 100 1.984 and for infinite 1.96, I was gonna use 1.96 but can someone tell me why it is 1.980
I used software to get the actual value that we need, the one corresponding to 117 degrees of freedom. If you drop down to 100 DF in a table, you'll end up with a slightly larger value.
As mentioned, this is equivalent to regressing the outcome on a dummy variable. But in a regression context, we also assume equal variances. Correct? Also, is there an advantage to conducting a t-test in this fashion? I would think it is because you can invoke the Welch procedure. What least-squares regression methods are available when group variances are unequal? Thank you for any insight. I hope my questions make sense!
i am confused with these terms. There is a t test. t test have subtopics as paired and unpaired for two samples. Paired t test have subtopics as pooled and non-pooled?
Dear M. Jbstatistics,
I've struggling with a problem I was hopeless to solve and your ressources helped me solve it.
I wanted to tell you:
A BIG THANK YOU
You're the boss ! Big thanks
You are very welcome!
i just ended up here before my final exam of statistics 1. i just wanted to thank you with all my heart. Thanks and greetings from istanbul!
You are very welcome! I hope your exam went well!
it went really well sir. thanks a lot.
Thank you for the video - a fantastic explanation of pooled variance t-tests - this has really helped me!
There is no real advantage to doing a pooled-variance t test as a regression. You get exactly the same information using both methods. I made that video to illustrate 2 things:
1) As a brief intro to including categorical variables in regression, which can be very useful in a multiple regression setting.
2) To illustrate the relationship between the two methods, which may help students understand the model, the assumptions, and the proper interpretation of results.
There is no need to do it as a regression. My "pooled-variance t test as a regression" video is part of my regression playlist, and it's a topic best discussed in regression. In regression, it's part of the bigger picture of including categorical explanatory variables in regression analysis. We may wish to include both quantitative and categorical variables as explanatory variables in a regression, and we include categorical variables by declaring appropriate indicator variables.
This is so useful. Thank you!
Although one can do a pooled-variance t test as a regression (and I have a video outlining that), here I've simply expressed this as an ordinary test of a difference in means between two groups. The number of groups has nothing to do with the sample size.
If we did this as a regression, and coded our groups as X=0 and X=1, then values of X between 0 and 1 would be completely meaningless. The explanatory variable is categorical, and it does not make sense to discuss values between 0 and 1.
Since the sample size is large here, why don't we use Z statistics instead of T statistic?
When sampling from normally distributed populations, the difference between z tests and t tests depends on whether the population standard deviations are known are not. Since the population standard deviations are pretty much never known, in real world situations tests like these are done as t tests. Sure, the t distribution gets very close to the standard normal distribution as the degrees of freedom increase, but that doesn't mean we should just jump to the approximation when it works reasonably well. If it's a t test, it's a t test, regardless of sample size.
Your instructor (and other sources, including some texts) may say something different, so if you are in a stats course listen to your instructor to find our what you should do in your particular course.
@@jbstatistics Thank you so much for your reply!
Ok the whole comment was meant for this video The Pooled Variance t Test as a Regression .Sorry for the trouble I've downloaded the videos since its easier to pause and go back that's why the mess. Now is it possible to give me a better understanding why would use the regression line since as you mentioned yourself the points between 0 and 1 don't express anything. Besides of what you mentioned in that video is there any other practical meaning of plotting the regression? Sorry for the queries.
Hi thanks for this video
A quick question
1)Suppose
We are doing manually to get T Critical From Ttable instead of using Excel or any software
Df=117
But Highest Df is 1000
And second highest df is 120
And third highest df is 60
Whcih one should I chose? or how should I chose ?
to get t critical value
2) I understood how to get confidence interval.
Can I say as an analyst "95% of sample means of particular population lies between lower and upper limit.
And remaining 5% the analyst is taking the risk saying that"5% of sample mean are not from the same population"
please clarify
1) It's best to use software. If you're using a table, then either take the conservative approach (go down to 60 DF), linearly interpolate, or say "meh, 120 is close enough to 117 for me". There are pros and cons to each one of those, and an instructor might recommend any of them. Relying on software is best.
2) No, we can't say that. Keep in mind that interpretations of confidence intervals always relate to the value of *parameters* and never sample statistics.
Thanks for the feedback. One more question regarding this. Then practically what does the regression line accomplish in this situation. I mean you found the same results as your your hypothesis test but why would I wanna make a categorical regression line in the first place? Wasn't the whole point of making one so I can estimate the values on the line given some hypothetical x values?
One of the assumptions here is "Normally Distributed Populations" . I don't get what that exactly mean ? If we have an unknown population distribution ( which may not be normal ) , if we take large enough sample size , and plot the sample distribution plot then by CLT it will come close to Normal distribution as the sample size increases . Now , the sample distribution is Normal but population distribution is not ...so my doubt here is can I actually apply everything what said in this video to find the confidence interval . Someone please help
Ok I think overall I understand what you did. I have the following questions: Basically you quantified X and then you made your regression line. But is this regression line trustworthy. I mean basically it's made by only 2 xi so my sample is really small. Or because these xi come from a bigger initial sample before we don't mind? Also second question what do the other x between 0 and 1 express in this example when you made your regression line?
how did you find this p value in R? is it not 1-pt(4.267,117)????? then double?
Yes, that'll do it. (It'll also be part of the t.test output, if that's being used.)
instead of the t value i think you are looking at the t table, why is the t value 1.980 when in the t table for 95% we have for df 100 1.984 and for infinite 1.96, I was gonna use 1.96 but can someone tell me why it is 1.980
I used software to get the actual value that we need, the one corresponding to 117 degrees of freedom. If you drop down to 100 DF in a table, you'll end up with a slightly larger value.
@@jbstatistics ohhh okok ,thank you very much for the fast response
As mentioned, this is equivalent to regressing the outcome on a dummy variable. But in a regression context, we also assume equal variances. Correct?
Also, is there an advantage to conducting a t-test in this fashion? I would think it is because you can invoke the Welch procedure. What least-squares regression methods are available when group variances are unequal?
Thank you for any insight. I hope my questions make sense!
is this paired t test?
No. I work through an example of a paired t test here: ua-cam.com/video/upc4zN_-YFM/v-deo.html
i am confused with these terms. There is a t test. t test have subtopics as paired and unpaired for two samples. Paired t test have subtopics as pooled and non-pooled?
How the fuck did you calculate s1 and s2? Thats all i am looking for..