ANCOVA - How to do it in R, interpretation

Поділитися
Вставка
  • Опубліковано 27 лип 2024
  • In this lecture, I explain the meaning of ANCOVA and interaction in ANCOVA.
    The diagnostic plots are discussed in greater detail here: • Lecture: Model Diagnos...
    When you use multiple linear regression (including ANCOVA), the question of which variables and/or interactions arises. For a more definitive discussion about model selection, see: • Lecture: Model Selection
    If you use multiple linear regression (including ANCOVA) for hypothesis testing (i.e. do my variables have a significant effect?), you should use a form of multiple testing correction: • Mini Lecture: Multiple...
    0:00 Introduction
    1:28 Recap of simple linear regression
    4:29 Simple linear regression in R
    7:55 Multiple linear regression
    10:57 Dummy variables (ANCOVA)
    14:42 ANCOVA in R
    18:00 ANCOVA in R: Interpreting the output
    19:23 ANCOVA in R: Changing the intercept (relevel)
    22:07 Interaction
    24:32 Interaction in R
    25:09 Interaction in R: Interpreting the output
    28:08 Comparing a model with and without interaction
    31:30 Summary: Interaction
    32:31 Summary: All models discussed so far
    33:24 ANCOVA can be more powerful than ANOVA

КОМЕНТАРІ • 29

  • @luiseramossoto4061
    @luiseramossoto4061 2 роки тому +11

    You should be proud of this lecture. I have looked for weeks, and this is the first video that feels complete and clear.

    • @Frans_Rodenburg
      @Frans_Rodenburg  2 роки тому +5

      That's so nice of you to say. I'm glad you found it useful!

    • @gordonlim2322
      @gordonlim2322 7 місяців тому

      Very true. I watched several other fews that only covered bits of what this video has covered. This truly is complete. I did however wished you covered the other statistics in the output e.g. what the standard error means.

  • @genevieveemefaasare8352
    @genevieveemefaasare8352 2 роки тому +4

    Thanks a lot for this masterpiece.

  • @shantanutamuly6932
    @shantanutamuly6932 11 місяців тому +1

    Excellent explanation

  • @yoyohu6522
    @yoyohu6522 Рік тому +1

    Thanks! super clear!

  • @BapakBuayah
    @BapakBuayah 2 роки тому +2

    This video save my semester final!😭

  • @lorenzoplaserrano8734
    @lorenzoplaserrano8734 5 місяців тому

    This is amazing ❤

  • @Frans_Rodenburg
    @Frans_Rodenburg  3 роки тому +1

    Chapters have now been included.

  • @marziehsafari2440
    @marziehsafari2440 2 роки тому +3

    That was great! I really liked it. However I am wondering in the case when you have multi factorial mixed model how could you apply ANCOVA, as I am trying with "lmer" including my random effects in the formula along with covariate I am looking at, it gives me this error: "boundary (singular) fit: see ?isSingular"

    • @Frans_Rodenburg
      @Frans_Rodenburg  2 роки тому +2

      Hi Marzieh, a singularity means that one of the estimated variance components was exactly 0, meaning it could not be estimated. This usually happens because (A) the variance component is very small, or (B) you do not have enough data to estimate it. The next step would be to simplify the random effects structure of your model (e.g., switch to a random intercept instead of a slope, get rid of nested random effects). When all else fails, you can resort to a fixed effects model with a dummy variable for the random effect. I explain this to some extent here: ua-cam.com/video/Z1sA5ZGzVJI/v-deo.html

  • @violetaroggenbauch157
    @violetaroggenbauch157 Рік тому +2

    Thanks for the video! What is exactly the difference between a multiple linear regression and an ANCOVA? Is it only the dummy coding?

    • @Frans_Rodenburg
      @Frans_Rodenburg  Рік тому +1

      Yes that's right! ANCOVA isn't a different kind of model than multiple linear regression. It is just a name commonly found in literature for a multiple linear regression involving both numeric and categorical explanatory variables.

  • @EduardoRojasacubens
    @EduardoRojasacubens 10 місяців тому +1

    Thanks for the great explanation! Just one quick question: the setosa variant has the smallest sepal lengths of them all (looking at a boxplot or scatterplot is obvious), so shouldn't the interpretation of the estimates at 18:00 be the other way around? That is: I. Versicolor has 1.6 mm larger sepals than I. setosa on avarage; I. virginica has 2.1 mm larger sepals than I.setosa on average.

    • @Frans_Rodenburg
      @Frans_Rodenburg  10 місяців тому +2

      Hi Eduardo, nice question!
      Setosa definitely has lower sepal length on average, if we do not look at petal length. But an ANCOVA does not compare group means, it compares regression lines. And if you run the following, you can see that the line estimated for setosa has a higher starting point:
      plot(Sepal.Length ~ Petal.Length, iris, col = Species)
      ANCOVA

    • @EduardoRojasacubens
      @EduardoRojasacubens 10 місяців тому +1

      ​@@Frans_Rodenburg I get it! The complete thing is: when petal length is 0, versicolor is 1.6 mm smaller than setosa. Actually you explained it on 12:20. Sorry about that. Again, thanks for the videos.

    • @Frans_Rodenburg
      @Frans_Rodenburg  10 місяців тому

      No problem, I'm glad it was useful!

  • @someonedoe9591
    @someonedoe9591 Рік тому +2

    What exactly is the difference between the calls:
    "lm(Sepal.Length ~ Petal.Length + Species, data = iris)" and
    "lm(Sepal.Length ~ Petal.Length + Petal.Length:Species, data = iris)"?
    Honestly I find this confusing so I'm trying my best to figure this out.
    My interpretation is I want to perform an ANCOVA of Sepal Length by Petal Length, but accounting for how the effect is different depending on the species. Is that not an interaction as the effect of petal.Length depends on the Species?
    Where "+ Species" adds species as an effect, effectively creating a coefficent for petal length and species separately, but assuming the coefficent for sepal ~ petal is the same across species, and there is a coefficient that makes the difference across species.
    "Petal.Length:Species" renders the coefficients between Petal length and sepal length different depending on species.
    Wouldn't the former be more accurate?
    I suppose it begs the question when should you add as a categorical variable vs as an interaction?

    • @Frans_Rodenburg
      @Frans_Rodenburg  Рік тому +1

      Your interpretation of interaction in a regression model is correct.
      The first model gives you three different starting points for each species, whereas the second model gives you one starting point for every species, but three different slopes.
      I have rarely used the latter and find a more realistic way to model interaction (differences in slopes) to be allowing both the intercepts and slopes to vary.
      The shorthand notation for this is: lm(Sepal.Length ~ Petal.Length * Species, iris)
      And written in full it would be: lm(Sepal.Length ~ Petal.Length + Species + Petal.Length:Species, iris)
      Adding an interaction or not depends on your prior belief about the process as a domain expert (meaning you base the decision on your knowledge of biology for example, instead of statistics), or you can try fitting a model with and without an interaction to see which fits best. I explain that to some extent here: ua-cam.com/video/n2kWXqR5nnw/v-deo.html

    • @someonedoe9591
      @someonedoe9591 Рік тому

      @@Frans_Rodenburg Ooooh, I see. That makes sense. I'm really wondering why my university courses never clarified what the different Rstudio call combinations did.

    • @Taricus
      @Taricus Рік тому +1

      You want to be sure that if you include an interaction term, you use both the variables in the model. For instance,
      lm(Sepal.Length ~ Petal.Length + Petal.Length:Species, data = iris)
      should be:
      lm(Sepal.Length ~ Petal.Length + Species + Petal.Length:Species, data = iris)
      or the shorthand:
      lm(Sepal.Length ~ Petal.Length * Species, data = iris)

    • @Taricus
      @Taricus Рік тому

      The reason why is that it doesn't really make sense to put an interaction term for something, if you are not including what the variable is interacting with. You want to have both the variables and the interaction between them.

    • @Taricus
      @Taricus Рік тому

      Also, another quick shorthand if you have a lot of variables that you want interaction terms for... say, it was between addictions to different drugs:
      lm(patient_retention ~ (opioid + alcohol + stimulants)^2, data = addiction)
      ^^^ This would be the same as:
      lm(patient_retention ~ opioid + alcohol + stimulants + opioid:alcohol + opioid:stimulants + alcohol:stimulants, data = addiction)
      The (........)^2 just means you want interactions between each 2 variables in parentheses.
      If you did this one:
      lm(patient_retention ~ (opioid + alcohol + stimulants)^3, data = addiction)
      It would also add in opioid:alcohol:stimulants, because you said you wanted all interactions to up to 3 variables.
      It's just a quick way to not have to type out all the interaction terms ^~^

  • @melissachisholm7107
    @melissachisholm7107 3 роки тому +2

    Great content. Too quiet!

    • @Frans_Rodenburg
      @Frans_Rodenburg  3 роки тому +1

      Hi Melissa, thank you for the feedback! I will try to increase the gain for the next video.

  • @someonedoe9591
    @someonedoe9591 Рік тому +3

    My god. So my new religion is sacrificing a goat every night to you for blessings of statistical education. My university lecturers are ironically incompetent at teaching in a course of statistics and scientific communication.
    I just have one question. Is performing ANCOVA in Rstudio literally the identical process to making a multiple linear regression?

    • @Frans_Rodenburg
      @Frans_Rodenburg  Рік тому +3

      Thank you, but one goat every two nights should suffice.
      ANCOVA is indeed identical to multiple linear regression with dummy variables. In fact, so is ANOVA. You can try for yourself by fitting a model with aov and lm and observe that you will obtain identical estimates. (In fact, aov is just a wrapper that performs linear regression and returns an ANOVA table, but it calls lm under the hood.) There's a couple of nice write-ups about that here: stats.stackexchange.com/q/175246/176202

    • @someonedoe9591
      @someonedoe9591 Рік тому +1

      @@Frans_Rodenburg Thank you Deus.