GLM in R

Kasper Welbers

Додати в
- Мій плейлист
- Переглянути пізніше
Поділитися

Поділитися

Вставка

Розмір відео:

Показувати елементи керування програвачем

Автоматичне відтворення

Автоповтор

Опубліковано 17 жов 2024

КОМЕНТАРІ • 37

@randomdude4411 4 місяці тому
This is a brilliant tutorial on GLM in R with a very good breakdown of all the information in step by step fashion that is understandable for a beginner
@ammarparmr 3 роки тому ⁺²
Very well explained !!! However, using the coefficients in the summary in my opinion is by far mush easier to understand than the way with tab model
@kasperwelbers 2 роки тому ⁺²
Hi Ammar, sorry I missed this comment, but I would like to break a lance for odds ratios ;). Benefit of the log odds ratios is, I think, only that the sign corresponds to the effect direction. But the values are very hard to interpret. With odds ratios you can say things like "for a unit increase in x, the odds of y increase by a factor 2 (aka twice the odds)". Is there a benefit of using the log odds ratios that I'm overlooking?
@rubyanneolbinado95 6 місяців тому
Hi, why is R studio producing different results even though I am using the same call and data.
@kasperwelbers 6 місяців тому
Hi! Do you mean vastly different results, or very small differences? I do think some of the multilevel stuff could in potential differ due to random processes in converging the model, but if so it should be really minor.
@954giggles 2 роки тому
Do you need to install any packages to run the glm code?
@kasperwelbers 2 роки тому ⁺²
The glm function is in the stats package, which comes shipped with the basic R installation. So you dont necessarily need other packages. But in the tutorial I do use some packages for convenience, such as the sjplot package for making a regression table. If you run this without sjplot the results are the same, but you'll need to do some calculations yourself. For instance, logistic regression gives log odds ratio coefficients, so you'd need to take the exponent (exp function) to get the odds ratios. Tldr; you dont need to install packages, but it does make life easier
@Gravelbiken Рік тому
Hi Kasper, what/how much does the intercept tells us in this case?
@kasperwelbers Рік тому ⁺¹
Good Question! It's similar to ordinary regression, in that it just means: the expected value of y if x (or all x-es in a multiple regression) is zero. This is mainly interpretable if there is a clear interpretation of what x=0 means. For instance, say your model is: having_fun = intercept + b*beers_drank. In that case, the intercept is the expected fun you have if you haven't had any beers.
Now saw we have a binomial model. Our dependent variable is binary, namely whether or not a person had a hangover the day after a party. This time, the effect is more like (but not exactly, i'm ignoring the link function): hangover = intercept * b^beers_drank. Notice that ^ in b^beers_drank. Thats the multiplicative part: we expect that the odds of having a hangover increase by a 'factor of b' for every unit increase in beers. But whats most relevant for us now is that an exponent of zero is always 1! So b^0 (zero beers) is 1. So here as well, it means that when x is zero, the intercept is just our expected value.
If we've transformed our coefficints to odds ratios, then if we haven't had any beers, the intercept would represent the odds that someone had a hangover. So if the intercept is 2, it would mean that the odds that someone who didn't have any beers has a hangover is 2-to-1, so a probability of 0.66 (odds of 2-to-1 means 2 people out of 3). That sounds weird, but they probably had whisky instead.
I don't know how much that helped. The key takeaway is that like with ordinary regression, it's mainly interpretable if you have a clear idea of what x=0 means.
@audreyq.nkamngangk.7062 Рік тому
Thank you for the tutorial. Is it possible to create a glm model with a variable to explain which has 3 modalities
@kasperwelbers Рік тому ⁺¹
If I understand you correctly, I think it's indeed possible to model a dependent variable with a tri-modal distribution with glm. Actually, you might not even need glm for that. Whether a distribution is multimodal is a separate matter of the distribution family. A tri-modal distribution might be a mixture of three normal distributions, three binomial distributions, etc. Take the following simulation as an example. Here we create a y variable that is affected by a continuous variable x, and a factor with three groups. Since there is a strong effect of the group on y, this results in y being tri-modal.
## simulate 3-modal data
n = 1000
x = rnorm(n)
group = sample(1:3, n, replace=T)
group_means = c(5,10,15)
y = group_means[group] + x*0.4 + rnorm(n)
hist(y, breaks=50)
m1 = lm(y ~ x)
m2 = lm(y ~ as.factor(group) + x)
summary(m1) ## bad estimate of x (should be around 0.4)
plot(m1, 2) ## error is non-normal
summary(m2) ## good estimate after controlling for group
plot(m2, 2) ## error is normal after including group
@朝に弱い人 2 роки тому
Hi Kasper, thank you for wonderful video. I have a question, which is about R2 and R2 adjusted of GLM models on R. How we can get R2 and R2 adjusted on R console? On my console, I can not find these values when I run a code “summary()”. Any specific code to get them on console?
@kasperwelbers 2 роки тому ⁺¹
Hi, great question! The thing is, there actually isn't a R2 or R2 adjusted for GLM. Instead, to evaluate model fit, it is more common to compare models (in the second link in the description, see logistic regression -> interpreting model fit and pseudo R2). There ARE, however, also some 'pseudo R2' measures, such as the R2 Tjur seen in the video. These measures try to imitate the property of R2 as a measure of explained variance. You'll never get these scores in the basic glm output though, because there are many possible pseudo R2 measures. But there are packages that implement them. For instance, the 'performance' package has an r2() function which calculates a (pseudo) r2 for different types of models.
I'd also recommend reading about the model comparison approach though (if you don't know about it already), because journals often like to see this rather than or in addition to some pseudo R2.
@朝に弱い人 2 роки тому
@@kasperwelbers Thank you so much for quick reply! It was really helpful and easy to understand:)
One mor question! I will be conducting GLM in my master’s thesis. Which one would you recommend?
1. Report AIC value (and I would write like “this model had the smallest AIC value)
2. Try calculating pseudo R2 measures and report them
@kasperwelbers 2 роки тому
@@朝に弱い人 I'd actually recommend reporting Deviance AND some pseudo R2. The pseudo R2 is nice to help along interpretation, but deviance is more appropriate, and also provides a nice test to see if adding variables to a model provides a significant increase in fit. Say you have models of increasing complexity (i.e. adding variables): m0, m1 and m2. For glm's, you can then use: anova(m0, m1, m2, test = "Chisq"). In the ouput, the deviance column for the m1 row tells you how much deviance decreased compared to m0, and the pr(chi) column tells you whether this increase was significant (and same for m2 compared to m1). Alternatively, you could use sjPlot's tab_model and just add the AIC and/or deviance directly to the table: tab_model(m0, m1, m2, show.aic = T, show.dev = T).
@朝に弱い人 2 роки тому
@@kasperwelbers Thank you so much, Kasper! I will try calculating deviance and pseudo R2 using the code you suggested :) Can I ask another question via email or something? I’m sorry to be a pain, but I think you can answer another big question I have🙇‍♂️
@kasperwelbers 2 роки тому
@@朝に弱い人 No problem! I do however prefer to keep questions based on these videos confined to youtube (and not too big). Especially at the moment with the whole corona teaching situation I'm swamped with emails, and I do need to prioritize my direct students. For bigger questions, I also do think it's best to find someone at your uni (ideally supervisor or someone in same department). Not only because they supposedly can invest more time, but also because in more specific problems there tend to be differences across disciplines / traditions in how to do statistics.
@draprincesa01 2 роки тому
how can i vizualized if some variables are factors like yes or no
@kasperwelbers 2 роки тому
I think sjPlot handles those pretty nicely! There's some great explanations on the website, under the regression plots tab: strengejacke.github.io/sjPlot/
@JT-ph3hk Рік тому
use the function str(yourbasename). If the variable is not yet a factor you can transform it using the following yourbasename$nameof the factor
@djyi2174 2 роки тому
Thank you so much for the tutorial.
@philip_che 3 роки тому
Thank you for these videos!
@michellelaurendina 7 місяців тому
THANK. YOU.
@kariiamba7324 3 роки тому
Thankyou for this helpful video
@hm.91 2 роки тому
Thank you!
@DavidKoleckar 9 місяців тому
nice audio bro. you record in bathroom?
@kasperwelbers 9 місяців тому
Ahaha, not sure whether that's a question or a burn 😅. This is just a Blue Yeti mic in the home office I set up during the COVID lock downs. The room itself has pretty nice acoustic treatment, but I was still figuring out in a rush how to make recordings for lectures/workshops and it was hard to get clear audio without keystrokes hitting through.
@gotnolove923 10 днів тому
Tabmodel doesnt work😮
@Whycantijustdeletethis 10 днів тому
Surely we can make it work. What error do you get?
@kasperwelbers 10 днів тому
@@gotnolove923 ah haha, that was me on another account that I was trying to delete.
@AndersonDouglas-v5c 17 днів тому
Weissnat Shores
@DiamondScheiber-j9w Місяць тому
Kailey Islands
@HarlanEngdahl-e3l 18 днів тому
Hilll Streets
@StracheyAnnabelle-w8c Місяць тому
Garcia Paul Wilson William Young Karen

Наступне

Автоматичне відтворення

Understanding the glm family argument (in R)