We got two examples of how to use ancova here, but no real explanation of what ancova actually *does*. :( An ancova is a regression where you try to fit multiple lines at once. The covariate (e.g. the baby's age) is your X variable, and the Y variable is your response as usual (baby weight gain). In the simplest case, each group gets its own line [y=bx+a] with the same *slope* (b), but different *intercepts* (a1, a2, etc.); in this case, one intercept for each baby formula type. Lines that have the same slope but differ in intercept will appear parallel, but at different "heights" on the plot: this would mean, in this case, that age has the same overall effect on weight gain, but one formula gives a higher "baseline" weight gain at any given time point than the other. So the p value for your covariate is an answer to the question "is there a significant effect, i.e. a nonzero slope, of X (age) on Y (weight gain)?", while the p value for your grouping variable is the answer to the question "is there a significant difference in intercept between the regression line for one type of formula and the regression line of the other type of formula?" In this case, there was a significant difference - and reorganizing the data according to an X variable (age) made us able to detect this difference, whereas when age was not considered, all data around each regression line was smushed together around a single mean, obscuring the effect of formula with a whole lot of noise. This is why the p value changed when age was added. If an interaction term is included, it tests for a difference in *slope* between the groups. So an age-by-formula interaction might mean, for example, that weight gain is similar shortly after birth, but it makes a difference which formula you use later on in the baby's life. This would appear in the regression plot as two lines (representing the formula types) that are close together near X=0, but move apart farther to the right on the X-axis (as age increases) because one line is steeper than the other.
Do you also understand repeated measures ANOVA and the example of the effects of music on running? Does she mean that we should normalize/standardize the values of running time with respect to each individual?
The F-test in ANOVA is in its most simple form the ratio of explained variance/unexplained variance. So if we omit age then the denominator is large. It becomes smaller when age is added to the model. Hence the value of the F statistic is higher and more likely to exceed the critical value. Hence the probability that we falsely reject the null when the null is true (p-value) smaller in the second model.
Does anyone understand repeated measures ANOVA and the example of the effects of music on running? Does she mean that we should normalize/standardize the values of running time with respect to each individual?
Pretty much yes, using a method known as random effects. It's the same logic that you would apply, for example, if you've got a sample of twenty lab mice from five different mothers, and you want to account for the fact that mice sharing a parent might be more similar on average, meaning they don't quite constitute twenty independent measurements. In this case, instead of grouping mice by mother, you group running time measurements by person. We say that you "include person in the model as a random effect".
1) Why bother with an ANCOVA when you can run a regression with both variables? Redhead / not redhead is binary, so it will work just fine in a regression model. If you wanted to expand it to include redheads compared to other hair colors, we could just dummy code each category and drop one from the model as the reference variable. 2) What was up with the ~ in the error for the RMA after accounting for the base run time? Does ~ represent an approximate value, even though your decimal was wicked long? At first glance I thought it was a - (negative) instead of ~, but that wouldn't make sense. So...?
A regression model with two explanatory variables, one continuous and one categorical or binary, is exactly what an ancova is. It works the same way whether the categorical variable has two or more possible values, so no dummy coding required. No idea what's up with that tilde. I think it's either a mistake, or it's supposed to convey that it's a mixed model (so the error term is calculated in a somewhat different way).
Ok then, if it is the same, what's the advantage of the ANCOVA option if we consider the outputs provided by various software? I can run a regression and get the F results table, but I get the added benefit of the regression coefficients table and all the fun stuff that comes with it, such as residual diagnostics. Sounds like running an ANCOVA would give a light touch analysis rather than taking a deeper look.
Mathematically, the same thing is happening under the hood. If you've got a piece of statistical software that gives you different output when picking the "ancova" option and the "regression" option, that's because it's being selective about what part of the results to show you. But even if you just picked "regression", entered one categorical and one continuous explanatory variable... you just ran an ancova. An ancova isn't *definitionally* more reductive than a regression analysis. It's just another name for this particular case of a linear model, much like the t-test is really just a linear model with a single binary x-variable. Point in case, if you're using R, the same function ("lm") can be used to run any of them, short of a GLM or a mixed-effects model.
I'm a bit late to this crash course, but as someone who works with Excel (statistics or otherwise), the tables in this video look weird to me. I recommend right aligning all numbers, add thousands separators and also align so that the decimal symbols of numbers of the same column are aligned horizontally. It will greatly improve the readability.
We got two examples of how to use ancova here, but no real explanation of what ancova actually *does*. :(
An ancova is a regression where you try to fit multiple lines at once. The covariate (e.g. the baby's age) is your X variable, and the Y variable is your response as usual (baby weight gain). In the simplest case, each group gets its own line [y=bx+a] with the same *slope* (b), but different *intercepts* (a1, a2, etc.); in this case, one intercept for each baby formula type. Lines that have the same slope but differ in intercept will appear parallel, but at different "heights" on the plot: this would mean, in this case, that age has the same overall effect on weight gain, but one formula gives a higher "baseline" weight gain at any given time point than the other.
So the p value for your covariate is an answer to the question "is there a significant effect, i.e. a nonzero slope, of X (age) on Y (weight gain)?", while the p value for your grouping variable is the answer to the question "is there a significant difference in intercept between the regression line for one type of formula and the regression line of the other type of formula?" In this case, there was a significant difference - and reorganizing the data according to an X variable (age) made us able to detect this difference, whereas when age was not considered, all data around each regression line was smushed together around a single mean, obscuring the effect of formula with a whole lot of noise. This is why the p value changed when age was added.
If an interaction term is included, it tests for a difference in *slope* between the groups. So an age-by-formula interaction might mean, for example, that weight gain is similar shortly after birth, but it makes a difference which formula you use later on in the baby's life. This would appear in the regression plot as two lines (representing the formula types) that are close together near X=0, but move apart farther to the right on the X-axis (as age increases) because one line is steeper than the other.
Life Happens Feels bad lol!
Thank you for the explanation!
Do you also understand repeated measures ANOVA and the example of the effects of music on running? Does she mean that we should normalize/standardize the values of running time with respect to each individual?
If only I were addicted to stats in the same way
So much Tetris content from the Green brothers this week! Love it!
2:34 i see this every time i recall sums of square variation from mean. so helpful. 👻🌩💜
Are you going to cover multivariate analysis? I’d love to watch it 😍, as much as I love the way you explain every topic :)
this is a gr8 ep. talk about application of moving parts finally. perfect. it is not just memorize these buttons in R
Great content and timing! Have an RPM assignment dued tomorrow and always good to have a refresher
Didn't expect to see ANCOVA in a stats intro series! Now I am expecting the multivariate analysis...
I'm just saying, y'all should make a music theory series. :)
My test is tomorrow morning!
Good luck bro! :)
Boom! Tetris for Adriene!
Is it a coincidence or was it planned for this to come out the same week as the Classic Tetris World Championship? :D
B O O M ! T E T R I S 4 J E F F
Will you cover the various means comparisons methods?.. Bonferroni, Tukey, Sidak, Scheffe, Fisher, Holm-Bonferroni, and Holm-Sidak...
8:40 why isn't that done through a simple linear regression?
How did the p-value change before and after you added "age in days" variable in 6:00 ?
The F-test in ANOVA is in its most simple form the ratio of explained variance/unexplained variance. So if we omit age then the denominator is large. It becomes smaller when age is added to the model. Hence the value of the F statistic is higher and more likely to exceed the critical value. Hence the probability that we falsely reject the null when the null is true (p-value) smaller in the second model.
Does anyone understand repeated measures ANOVA and the example of the effects of music on running? Does she mean that we should normalize/standardize the values of running time with respect to each individual?
Pretty much yes, using a method known as random effects. It's the same logic that you would apply, for example, if you've got a sample of twenty lab mice from five different mothers, and you want to account for the fact that mice sharing a parent might be more similar on average, meaning they don't quite constitute twenty independent measurements.
In this case, instead of grouping mice by mother, you group running time measurements by person. We say that you "include person in the model as a random effect".
1) Why bother with an ANCOVA when you can run a regression with both variables? Redhead / not redhead is binary, so it will work just fine in a regression model. If you wanted to expand it to include redheads compared to other hair colors, we could just dummy code each category and drop one from the model as the reference variable.
2) What was up with the ~ in the error for the RMA after accounting for the base run time? Does ~ represent an approximate value, even though your decimal was wicked long? At first glance I thought it was a - (negative) instead of ~, but that wouldn't make sense. So...?
A regression model with two explanatory variables, one continuous and one categorical or binary, is exactly what an ancova is. It works the same way whether the categorical variable has two or more possible values, so no dummy coding required.
No idea what's up with that tilde. I think it's either a mistake, or it's supposed to convey that it's a mixed model (so the error term is calculated in a somewhat different way).
Ok then, if it is the same, what's the advantage of the ANCOVA option if we consider the outputs provided by various software? I can run a regression and get the F results table, but I get the added benefit of the regression coefficients table and all the fun stuff that comes with it, such as residual diagnostics. Sounds like running an ANCOVA would give a light touch analysis rather than taking a deeper look.
Mathematically, the same thing is happening under the hood. If you've got a piece of statistical software that gives you different output when picking the "ancova" option and the "regression" option, that's because it's being selective about what part of the results to show you. But even if you just picked "regression", entered one categorical and one continuous explanatory variable... you just ran an ancova.
An ancova isn't *definitionally* more reductive than a regression analysis. It's just another name for this particular case of a linear model, much like the t-test is really just a linear model with a single binary x-variable. Point in case, if you're using R, the same function ("lm") can be used to run any of them, short of a GLM or a mixed-effects model.
Life Happens Thanks for the insights!
@@RUJedi Happy to help. :D
What decades long argument? Nobody has ever argued that the best Tetris piece is anything other than the straight
*laughs in T-shape*
The 5 people who disliked this content
r *FLAT EARTHERS* 😒
Glad to hear Tetris has educational value.
Huhhhhhh&$26llp0pp0p9utyrdtfughjn,know nnnnj
Jjjjlk(
*jj*j*nj*just
+Flaming Basketball Club He doesn't watch any of the videos before making a general statement. It's hilarious. 😂
I thought that you were talking about female models
I'm a bit late to this crash course, but as someone who works with Excel (statistics or otherwise), the tables in this video look weird to me. I recommend right aligning all numbers, add thousands separators and also align so that the decimal symbols of numbers of the same column are aligned horizontally. It will greatly improve the readability.
There's a terrible (60hz electrical?) background hum in this video :(
Shouldn't it be Anocova?
Y
دەستان خۆش
When you speak this fast I feel like I'm watching an episode of "Gilmore Girls" and I'm not laughing.
A Spoiler Alert for which Tetromino is best would have been nice. Now the whole game is ruined for me!
First
3rd comment