Multilevel models in R

Поділитися
Вставка
  • Опубліковано 18 січ 2025

КОМЕНТАРІ • 37

  • @zhengzhang127
    @zhengzhang127 4 роки тому

    Thanks for recapping all the modelling packages at the end. It helps a lot.

  • @gma7205
    @gma7205 10 місяців тому

    Amazingly well-explained, thanks! Please, make more videos. Nonlinear models, Bayesian... some extra content would be nice!

  • @ryanmann799
    @ryanmann799 3 роки тому +1

    I hope you continue to make content. Excellent work!

    • @kasperwelbers
      @kasperwelbers  3 роки тому +1

      Thanks! It's currently mainly just stuff that I happened to need for online teaching/workshops, but I'll definitely try to keep this up in general.

    • @jukazyzz
      @jukazyzz 3 роки тому

      @@kasperwelbers Indeed, great work. Hope you plan to cover glmer in detail in the future.

  • @zafarnasim9267
    @zafarnasim9267 Рік тому

    Great video, nicely explained

  • @harismuhammad3279
    @harismuhammad3279 2 роки тому

    Thanks for the video, helpful for my Data science project

  • @hamza-fk20
    @hamza-fk20 2 роки тому

    This is awesome!

  • @drdilsad1
    @drdilsad1 Рік тому

    Hello Kasper, thanks for this great video. Just wondering where I will get the document/chapter where all the codes are given. I mean the document from where you copied the codes and paste them into the R. Please let me know.

    • @kasperwelbers
      @kasperwelbers  Рік тому

      Hi @Dr Dilsad. Sorry, it seems I only included the link in the first video (about GLMs). More generally, we maintain some R tutorials that we regularly use in education on this GitHub page: github.com/ccs-amsterdam/r-course-material . The multilevel one is under frequentist statistics. There is a short version in the "Advanced statistics overview" that I think is the one from this video, and also a slightly more elaborate one in the "Multilevel models" tutorial.

  • @Maicolacola
    @Maicolacola Рік тому

    Thanks for the video. Very clear explanations. I was wondering why you used days as a random effect slope, and didn’t add day:subject as an interaction term in your model?

    • @kasperwelbers
      @kasperwelbers  Рік тому +1

      Hi Michael. Good question. You could indeed fit something very similar to a multilevel model using dummies and interaction terms. Specifically, instead of random intercepts, we could have included fixed effects for every day using dummy variables. And instead of random slopes, we could then have added interaction terms for these dummy variables with the 'days' variable. So more generally speaking, we could indeed have used fixed effects instead of random effects to model differences between subjects.
      There are some benefits to using random effects though. Aside from not having your model cluttered with many dummies and interactions, using fixed effects eats up degrees of freedom. For example, if you used dummies for subjects, you could not add something like gender to the model, because there are no df left at the subject level.

    • @Maicolacola
      @Maicolacola Рік тому

      @@kasperwelbers Ty for the reply, that clears up some of my confusion. I'm new to multi-level models, but have experience with multiple linear regression. I think I'm a bit confused about when one would choose to use a covariate as a random effect versus an interaction term as a fixed effect, as well as all the other possibilities.
      I kind of wish there was a complex enough dummy dataset where each possible scenario could be displayed with figures. For example: 1) response ~ var_1; 2) response ~ var_1:var_2, 3) response ~ var_1 + (1 | var_2); etc. etc...

    • @kasperwelbers
      @kasperwelbers  Рік тому +1

      ​@@Maicolacola A great way to learn about how these models work is actually to create your own dummy data. You can think of fitting models as trying to find the data generating process. So if you understand the model you can generate data that will fit in a certain way. I should perhaps do a video on this, but here's a quick example:
      ## simulate data about the effect of doing homework on grade.
      ## students are nested in classes, which have different average grades, and
      ## a different effect (random slope) of doing homework.
      n = 10000 ## students
      groups = 100 ## classes
      homework = rnorm(n, mean=10, sd=5) ## simulate time spend on homework
      group = sample(1:groups, n, replace=T) ## assign random groups
      ## generate the random parts of the model
      ri = rnorm(groups, mean=10, sd=3) ## random intercepts (group level)
      rs = rnorm(groups, mean=2, sd=1) ## random slopes (group level)
      e = rnorm(n, mean=0, sd=10) ## individual level variance
      ## simulate the grade. Note that we can use the 'group' integers as index of ri and rs
      grade = ri[group] + rs[group] * homework + e
      ## put the data together
      d = data.frame(homework, grade, group)
      ## Now run a random intercepts+slopes model to see if we can recover the parameters
      ## we plugged in above (mean intercept is 10, mean homework slope is 2)
      library(lme4)
      library(sjPlot)
      m = lmer(grade ~ homework + (1 + homework | group), data=d)
      tab_model(m)

    • @Maicolacola
      @Maicolacola Рік тому

      @@kasperwelbers Great idea about simulating with dummy data. I'll give that a try. Cheers!

  • @MrJegerjeg
    @MrJegerjeg Рік тому

    What if you have combinations of two different groups. For example, you measure blood pressure from volunteers after drinking a certain number of units of alcohol. You do that in two different locations. So you want to fit a line per individual, but you also want to control for the location effect. Right?

    • @kasperwelbers
      @kasperwelbers  Рік тому +1

      You can certainly have multiple groups. First, you could have groups nested in groups. If you perform the same experiment in many countries across the world, your units would be observations nested in people (group 1) nested in countries (group 2). Second, you could have cross-nested (or cross-classified) groups. For example, say we want to study if the effect of more alcoholic beverages on blood pressure differs depending on the type of alcoholic beverage (beer, wine, etc.). In that case, each person could have observations for multiple beverages, and each beverage could have observations for multiple people.

    • @MrJegerjeg
      @MrJegerjeg Рік тому

      @@kasperwelbers I see, thanks. I can imagine that having all these nested and cross-nested groups can complicate quite a lot the model and its interpretation.

  • @ekaterinapronizius5955
    @ekaterinapronizius5955 3 роки тому

    An amazing video!

  • @bobmany5051
    @bobmany5051 Рік тому

    Hello Kasper, I appreciate your great video. I have a question. Regarding your example data, what if there are two or more data points for each day for each person? Let's assume that you measure reaction time 4 times each day across participants. Do you need to average those data points and make one data point for each day? or do you use all data points?

    • @kasperwelbers
      @kasperwelbers  Рік тому

      Interesting question. We can actually add more groups to the model instead of aggregating, but it depends on your question. In the example, we used days as a continuous variable, because we wanted to test if there was a linear effect on reaction time. If you also want to consider the time of the day as a continous variable, then it indeed becomes awkward how to combine them.
      However, maybe your reason for the four measurements is just to get more data points, so you think of them as factors rather than continuous. While aggregating might be viable, you could also consider adding another level to your model, for whether the measurement was in the (1) morning, (2) afternoon, (3) evening, or (4) night. You could then have random intercept, for instance to take into account that people might on average have lower reaction times in the evening due to their after-dinner-dip. (though note that with just 4 groups you might rather want to use fixed effects with dummy variables)
      Perhaps more generally, what you're interested in is multilevel models with more than one group level. This is possible and very common/powerfull. Groups can then either be nested or crossed.
      be nested, for instance people living in cities.

  • @OriginalJoseyWales
    @OriginalJoseyWales Місяць тому

    reaction time should increase with sleep deprivation, no?

  • @bignatesbookreviews
    @bignatesbookreviews Рік тому

    god bless you

  • @tayebehsaghapour6505
    @tayebehsaghapour6505 2 роки тому

    Hi Kasper, thanks so much for the very useful video on multilevel modelling. I am trying to run a multilevel nested logit model, just wondering if you are aware of any R packages that can be used for this modelling? most packages I know haven't gone beyond the multinomial logistic model; however, in nested logit, the outcome measure itself has a hierarchical structure which makes the model even more complicated. I appreciate your advice on this. Thanks

  • @kar2194
    @kar2194 2 роки тому

    Hi Kasper, first thank you for the wonderful video! A question from me and I would really appreciate if you could help.
    What is the data type of Reactions, Days, and Subject? Are they , , and ?
    I am actually confused with data type for multilevel modeling, My dataset has 4 columns, example:
    Countries Australia Australia, Malaysia, Malaysia..
    Status Developed, Developed, Developing, Developing
    Year : 2000, 2001, 2002, 2003, 2004
    Life expectancy : 87, 76, 69, 64
    Should I just convert Year to numeric, and having
    2000 as 0,
    2001 as 1,
    2002 as 2,
    2004 as 4 ?
    Many thanks mate

    • @kasperwelbers
      @kasperwelbers  2 роки тому

      I indeed use num, num, fct, but I think it would also work if subject is a string/character. I think R (or in this case lme4) automatically considers non-numeric values as factor in linear models.
      For your case, I think it indeed makes sense to use year as numeric and not as a factor, but note that this is only if you want to include year as a fixed effect. If you include it as a group, it will still be treated as a factor. This could be what you want, but I'm not sure what you want to model. Going by the data I suspect maybe something like a pooled time-series analysis, in which case you indeed want year as a (numeric) fixed effect, and your group would be Countries. Converting 2000 to 0, 2001 to 1 etc seems like a good solution (you could also use 2000, 2001 etc directly, but large difference in scales between variables could lead to convergence issues).

  • @siddhft3001
    @siddhft3001 3 роки тому

    Great video! One of the clearest explanations I've seen in UA-cam. Can you please provide the link to the website you're using?

    • @kasperwelbers
      @kasperwelbers  3 роки тому +1

      Hi Siddh, Thanks! Did you mean our github page with the material? If so: github.com/ccs-amsterdam/r-course-material

  • @mutedr524
    @mutedr524 2 роки тому

    Hello Kasper, thanks a lot for your awesome explanations. You help a lot in many ways. By the way, I have a general query about considering "Days" as a numeric vector in the context of sleep deprivation. I think sleep deprivation for 1,2, 3, ....., etc days should not carry the same weight in reaction. Also 1.5, 2.5, 3.5, or any other days as fraction doesn't make any sense to me. So, "Days" shouldn't be numeric values. Maybe I'm sleep-deprived as well--not quite getting the logic. :) Would you mind explaining this a bit? Thanks again for your great help.

    • @kasperwelbers
      @kasperwelbers  2 роки тому +1

      Hi @MutedR, that's a good question! There are good reasons to use days as numeric, but I understand the concerns. If I may reformulate your question, you point out two things. One is that days is actually discrete (a count variable) and not continuous. The other is that the relation between 'days' (of sleep deprivation) and 'reaction speed' might not be linear.
      That our days variable is discrete is not a big problem. Both discrete and continuous measures can be used as interval or ratio variables. Note that we also don't have to (and in this case shouldn't) interpret what 1.5 days means. We can limit ourselves to interpreting coefficients as the effect on reaction speed for every additional day of sleep deprivation. (I should have made this clearer by not including the fractions in the x-axis labels).
      Regarding treating the relation as linear. There is merit to this, because we would expect that as the number of sleep deprived days increases, reaction speed is more likely to go down. However, it is indeed possible that the impact on reaction speed decays (or increases) at some point, and after a certain amount of days reaction speed might stop decreasing at all (case in point, people die). Am I right to assume that this decaying effect is what you mean by days not having 'the same weight'? There are ways to account for non-linear relations that could model a decaying effect, or even more weird effect where reaction speed would somehow go up again after a certain number of days. In this example I just kept it simple by assuming the relation is linear 'in this data' (even if the relation is not perfectly linear, a linear model might still be good-enough if the relation for the range of days we're looking at is roughly linear).

    • @mutedr524
      @mutedr524 2 роки тому

      @@kasperwelbers, Thanks a lot for the nice explanation! You are spot on and I appreciate your kind reply. Have a nice time!

    • @kasperwelbers
      @kasperwelbers  2 роки тому +1

      @@mutedr524 no problem! Glad if this cleared things up. Have a good day!

  • @vanessafrei4052
    @vanessafrei4052 3 роки тому

    Thank you so much - these videos are some of the best tutorials on GLMs that I've come across.... Really helpful for my own analysis.
    One question: Is there also a tutorial / overview on multilevel logistic models? I've noticed that you mention it at the end of this video - my main struggle is the output - I have a slighlty more complex model than the one you used for the last tutorial on GLMs, but would you say that the interpretation is the same for multi level glms? The thing is I have a binary response variable as well as mostly categorical predictors....
    Or do you have any source on it maybe? Thanks so much!

    • @kasperwelbers
      @kasperwelbers  3 роки тому +1

      Thanks! Always nice to hear that these videos are of use.
      I'm sure there are good R tutorials for using glmer, but I myself haven't recorded one. The interpretation does follow quite naturally from combining the ideas of GLMs and multilevel models. The coefficients for logistic regression are still odds ratios (or actually still log odds ratios, but if you use sjPlot as we do here, it's automatically transformed to odds ratios).
      Interpretation of effects for categorical predictors is also still the same. What might help for interpretation is to calculate the predicted probabilities. This way you can see for each category what the probability of y==1 is. This is easy with sjPlot's plot_model(model, type='pred'). This function generates a list of plots for each variable, that shows the marginal effects. For a categorical variable this basically shows the probability of y==1 for each category (other stuff held constant).

  • @shahrzadshahabzi1007
    @shahrzadshahabzi1007 3 роки тому

    hi kasper! do you do turoring by any chance :)

    • @kasperwelbers
      @kasperwelbers  3 роки тому

      Hi Shahrzad! I'm afraid I really wouldn't be able to find the time for that, but I appreciate the consideration. Good luck finding a good tutor!