Understanding Generalized Linear Models (Logistic, Poisson, etc.)
Вставка
- Опубліковано 31 тра 2024
- Do you want to take a class with me? Visit simplistics.net to register for a class. You can either do "live" classes, where you'll learn from me directly via zoom. Or you can register for "self-guided" courses, complete with a schedule, discussion boards, quizzes, readings, etc.
See my original video on GLMS here: • Become a generalized l...
Probability density functions: • Probability, Part 3: P...
Poisson does mean fish, but the distribution was named after a mathematician: en.wikipedia.org/wiki/Poisson...
Nested versus non-nested models: • Model Comparisons in R
Learning Objectives:
#1.Understand when to use GLMS
#2. Know the three components of a GLM
#3. Difference between transformation and a link function
#4. Know when to use logistic, poisson, gamma, etc.
This video is part of my multivariate playlist: • Multivariate Statistics
And here's a paper I wrote about my eight step approach to data analysis: psyarxiv.com/r8g7c/
Undergraduate curriculum playlist (GLM-based approach): ua-cam.com/users/playlist?list...
Graduate curriculum playlist (also GLM-based approach): ua-cam.com/users/playlist?list...
Exonerating EDA paper: psyarxiv.com/5vfq6/
Download JASP (and visual modeling module): www.jasp-stat.org
That introduction though 😂 I have never seen someone so excited to be asked about GLMs.
I've encountered GLMs for years, this was the best explanation I've ever seen. Well done and thank you for your service! 👏🙇♂️
You are a fabulous professor, ur students are lucky
Honestly, thank you so much for this explanation!! It's super super helpful to have someone actually explain the different types of glm's in a easy to understand way. I had not idea what they were nor when to use them, and now I don't have to keep bashing my head against a wall trying to understand the world of statistics :)
You are so good at keeping up attention, which i think is so important for people teaching! Keep up the good work!
I spent hours and hours trying to understand GLM from text books and still came out confused. Your 20 mins video cleared everything up. THANK YOU!
I'm an actuary and we work with GLMs every day! Great explanation.
Thank you for the brief but clear explanation about different "distributions".
You are awesome. It takes only a few minutes to let me understand why GLM is so important. Love your lecture.
Seriously good, you are demystifying many issues I have struggled to understand
This is the best video I have ever watched on the Internet. Thank you so much for sharing your insights with the research community. God bless you, sir!!!
Amazing video, just understood GLM's, of course after not understanding with books and web pages. I was assigned to teach this topic in class and you just saved the day. Thank you Dustin!
Great explanation on the GLMs. It gave me some new insights for sure. May you keep growing! Thanks for the video. I guess I'm gonna land at your channel quite often :)
Honestly the best content on UA-cam
this is true!!
The negative binomial distribution is obtained by the compound distribution of a Poisson distribution with Gamma-distributed inter-arrival times. It generalizes the Poisson distribution to have over-dispersion (i.e. the mean being less than the variance). The negative binomial cannot give underdispersion where the variance is less than the mean, but this can be achieved using the generalized Poisson distribution.
I use a generalization of Poisson regression called inhomogenous Poisson point process regression. It is useful for modelling arrivals of discrete units into a system over time.
Thanks so much for these videos! You're an amazing teacher.
Thank you, I loved this, I was smiling during the whole video and - most importantly - understood what generalized linear models are about!
It’s so much fun and informative to listen to you. And you were are talking about general linear models.
Thank you so much for your videos- I'm so grateful for the explanations, and feel they've been clarifying sticking points for me left and right!
Question: A sticking point I'm still struggling through is the relationship between the shape of your data, the shape of your residuals, and what this means for your choices in building a GLM.
1- You mention that if your data isn't normal, you should use a GLM. If it's the residuals that really matter here, is that because if your data isn't normal your residuals are likely also not normal?
2. following up on the above- if your data are not normal, but your residuals are normal- does that mean you can just proceed with the model you've got as is? Or might you still run into problems?
3. Are normal residuals a sign of you having a decent model fit? So if they aren't normal, this is a sign you should use a GLM...for a better fit? And when having done so...do your residuals hopefully become normal as a result? In other words- does a GLM "fix" your model to give you normal residuals -or- does a GLM handle non-normal residuals such that it gives accurate estimates of for e.g. "95% confidence" for a non-normal distribution that fits your residuals?
Hope those questions even make sense, and thank you so much again!!! I teach and know how much work it takes to put together things like this and answer so many questions- grateful for your time!
Great video, thank you!!! Could I ask please - what would you choose for dependent variables that are measured using a Likert scale with 5 levels? Would that be ordered logistic (for ordinal variables)? Thank you!!
Thanks for your presentation. Please in the case we use ordinal logit , should we report pearson correlation and omnibus test value? if it is the case, how to interpret them (for exemple, under or above p value, what it is the meaning). Also shoud we consider the sig level from the table 'Test of model effect' or 'Parameter Estimates' table to say that a relationship between the predictor and outcome variable is significant. I am really looking for ways to interpret, your answer will really help me. thanks.
Thank you very much for the vide. It's very helpful. However I have few questions.
1. How do I find out if my data follows gaussian or gamma? I did Shapiro Wilk test to check for normality and it is not normal. But I am not sure if they follow gamma distribution.
2. How does the prediction change based on the family and link function? Suppose I have the same gamma distribution but have different link functions, how will it affect the model fitness? Or rather how can I choose the link function?
3. Is there any method to check the goodness of fit?
You are great! And I love music in the background, gives a crazy feeling which eases up information for some reason.
Your work is appreciated, Thank you very much!!
Your value is more than your appearance
You are amazing.
Thanks for rapping me to the point of the truth regarding GLM
The residuals from the conditional mean from a gamma generalized linear model will not be gamma-distributed. A quick way to confirm this is to realize that the outcome variable is sometimes less than the predicted mean value, resulting in a negative residual. But a gamma distribution has non-negative support, and therefore cannot be the distribution of the residuals. In general the residuals do not follow the same distribution as the likelihood.
I'm trying to understand parameter estimates in the generalized linear model output from SPSS. For example, if I have a categorical predictor with levels A, B, and C and level A has an estimate of 0.50 and level B has an estimate of 1.2, and C is the reference category so its estimate is 0, how do I interpret the impact of A, B and C on the outcome variable? Does level A have half the impact of C, and B has 1.2 times the impact of C? Or is it better to just use indicators for A, B and C?
i wish every professor was like you. how you kept my attention was amazing.
Thanks! 😃
You first spoke of data being normally distributed and then residuals being normally distributed. Could you please distinguish between the two?
Really great explanation and I love your style! One question: At 18:10 you show us the R-code of an ordered logistic regression with something like Y= Health rating and X=Mi. However, in the following graf there is X=Agility and Y= Injuries again. Am I missing something? Cheers!
Great explanation, it put so many things I had in mind in the right order. Sub. Thank you!
Great video! May I suggest that a short blog post to summarise this content will be very helpful as well!
Thank you for your work, your videos are great. :)
Thanks for your explanation! If you have some examples how to apply them, it would be extremly helpful! Thanks a lot.
Hi and thanks very much for this video. I've been using GLMs for a while, but now lots of things are clearer!
Anyway, I've a little question for you: what distribution and link function would you suggest for a bimodal distribution?
My DV is the score given to a 100-point slider. The slider was initially at 50, so the participants' heuristics was something like "go below or go above, never stay at 50"; thus, it produced a drop in the 50ish zone of the distribution. What do you think?
Cheers,
Alessandro
In my experience, residuals are rarely bimodal. The raw distribution might be, but generally when I include my predictors, it will explain the shift in modes, rendering the residuals normal.
why am I just NOW finding you. love the style! 2:20 is my style.
Is is normal that we convert a countinous variable to a count variable and then use GLm for that? My dependent variable is "time spent fast walking" in minutes and it is not normal. The statistician told me to remove decimals and consider it it as a count variable. Is this approach correct?
Thank you for the video! The explanation is clear.
Very clearly explained!! Thank you sir
Man u are an amazing teacher
This is the most helpful video I've ever found
Alright, let me comment on your video!
The moment I started the video, the first few seconds I thought I wouldn't be able to make it to the end of the video, may be because the way you spoke (its not your problem, but mine. I am little too sensitive and can't bear loud noise. My sincere apology for writing this)
BUT, after a minute, my brain started enjoying it because of the simplicity in your explanation, your deep knowledge of the subject and your power to connect with your students (people watching this video).
I am so grateful to you 🙏😊
(subscribed, clicked on the bell icon, and going to be regular visitor to your channel 😄)
Quick question for you, if you're still checking these comments! When taking the next step and moving up to GLMMs because of the requirements of data structure, is it a necessity to still use a link function in your code? Thanks, love your videos
So if I have only one poisson distributed independent variable and one poisson distributed dependent variable, they have a linear relationship, should I be using 'poisson' distribution as the random component, and 'identity' as my link function? In MATLAB: glmfit(x, y, 'poisson', 'link', 'identity');
Great video. One remark: At 9:55 the link function of linear regression is not 1, it is identity function f(x) = x
This was so great, thanks!!
Thanks you and I’m waiting for gamma distribution example will be useful in my resurch
Amazing hahaha it helped me more than I expected. Thanks
I love your craziness, and you are doing us a great service. Going forward, I’m going to scream “Generalised Linear Model!!!” At people who need it.
Can you do a full course on GLM, the math behind it and I guess any other regression analysis theory. I think that would be awesome, or if you have already done this I couldn’t find it 🙁
I have a couple playlists related to what you're asking for. I tend not to get mathy (because it scares my students :))
Are link functions a special case of activation functions (in the context of NNs)?
your videos are brilliant, thank you so much
Wow. Fun. Thanks learned a lot without getting bored
Glad you enjoyed it!
I was surprised at how complex problems can be solved with a simple two-layer feedforward binary classification neural network. With a single hidden layer with a ReLU activation function, followed by an output layer with a sigmoid activation, it is able to learn very complex binary classifications (Such as learning financial signals). Unfortunately, I did not see any tutorials on financial data modeling using linear layers - most are using CNN, LSTM, and GRU model types. Those model types just don't seem to learn my dataset as well as this two-layer feedforward binary classification neural network does.
Fun topic!
amazing vid, thank you so much, subscribed
as an ecologist in progrees I can say, in ecology EVERYONE is using GLM all the time even when they could be using other simpler methods so here I am trying to actually understand them ahjhahaha
Thank you for the video and all the work behind! You really made a complicated topic (at least in my head) look very easy. Two questions I'd appreciate if you could reply:
1. When checking whether to use gaussian or gamma GLMM, should I check distributions of the original data or of the residuals? (I often see people checking the original data while it is often said we should check the residuals)
2. Can I blindly trust AIC or BIC to quickly determine whether to use gaussian or gamma GLMM? i.e., without needing to plot the data.
Thanks in advance!
1- You are right. We look at the *residuals*.
2-I wouldn't trust anything without plotting the data :)
@@QuantPsych To clarify #1, is that the residuals of a linear regression fit?
What is the "E" he is referring to when talking about difference between transformation and link function at 10.50 min?
Error term
How about inverse binomial and tweedie distribution? Can you make a video?
This is was very nice, had a nice laugh but very educational too, lmao
the links to the graduate and undergraduate playlists are broken could you please Post them in a comment
The video plots a density for a Poisson distribution, but a Poisson distribution is discrete. Thus such a density plot is just a rough approximation of the probability mass function of a Poisson distribution.
The plot is kinked, so it is discrete. But he def should have made a histogram instead
@@qwerty11111122 I might not be understanding what you mean by "kink".
If by "kink" we mean a discontinuity, then you should consider the counterexample found in the Laplace distribution. The density function of a Laplace distribution is non-smooth at its mode, which also for this distribution equals the median and mean. Even though it isn't smooth everywhere (it has a "kink"), is it not a discrete probability distribution. Fortunately a weak derivative exists at this point even though ordinary derivatives do not, so many of the same results can be obtained almost-surely (i.e. up to a set of measure zero).
Excelent video, first time I saw it I though you were really really annoying with your voice and impressions, but the second time I watch it I got really clarified :) But still, I have a question: when we use OLS, we assume that our residuals must follow a normal distribution and if they don't, we can either try to find a better model (more variables, transformations, whatever) or switch the model from a Linear Regression to, let's say, a Poisson Regression (GLM of Poisson Family). But my doubt is this: is there any chance that our residuals will not resemble a poisson distribution and it's our coefficients that get crazy or, on the other hand, we might fit good coeficients with nice p-values, but our residuals will not follow a poisson distribution, but a normal distribution..? I don't know how clear I got with this question, but I guessing my doubt is related with how can I validate that my poisson fit is actually the best model to be fitted, given the p-values and the residual distribution?
Kind Regards, you are the best
Do you teach at Rowan University in New Jersey?
Thank you. Clear explanation. Can we use GLM when observations are dependent or correlated? Or is it a situation where GLMs not applicable?
You cannot. You'll have to use mixed models (or time-series models).
this was amazing! thank you :)
I cannot believe that you have only 3.7 k subscribers.
Heyy, thank you for your great video!! I have a question on the difference between transformations and link functions.
Is it right that this shouldn't be the same?
mean(log(y))
log(mean(y))
And this should be the same?
mean(log(predict(mod)))
log(mean(predict(mod)))
If yes, why is this the case?
Thank you a lot!
Really well explained
How does one actually test for significance with these models?
Thank you for awesome videos (and flexplot), suddenly statistics are not so boring anymore! Could I ask you a question? I’m an ecologist who wants to break the habit of using Wilcox, Kruskal-Wallis and similar. I’ve been trying to use GLM for analyzing seed germination data, but ca. 50% of my values are zeros (germination was poor) and the rest are ratios between 0 and 1, for each plant tested. With non-integer dependent variable, I cannot use zero-inflated models, but the GLMs I’ve tried all have bad-looking QQ-plots and questionable results (I suspect it’s because of all the zeros). My main independent variables are factors (testing difference between years and sites + interaction), so GAM and gamlss (zero-inflated beta regression) don’t seem to work well either. I’m out of ideas, could you help me find a model that doesn’t suck? :) Thanks!
can you model as count instead of proportion? Then you can do either a poisson or a zero inflated poisson. Also, I don't think QQ plots are going to help. They are to assess normality, but you're not going to get normality and so are not needed. (See this link: stats.stackexchange.com/questions/298197/interpreting-qq-plot-of-poisson-regression)
@@QuantPsych Thank you for responding so quickly - I tried your suggestion, the results made sense and the distribution of residuals was similar between different factors! :D Thanks a lot - I thought this transformation would violate ZIP/ZINB’s assumption, given that percentages have a fixed range and in my case, I tested 20 seeds for each tree, giving 5, 10, 15%… but since nothing ever came close to 100% germination, perhaps we’re good. :) That being said, I wish I knew of user-friendly methods of modelling percentage data that are zero-inflated and overdispersed with mostly factorial independent variables, because most all my data are like that.
This is what I always need, someone explaining things with some fun and at the same time in dummie terms xd
This was great
Thank you, this is excellent. I did find the music distracting, however. :)
thank you so much for your videos, greeting froms mexico
You're a legend, thanks a lot
难以置信的好视频!我能够感觉到他是真的懂
Rowan University! I was in the first year of freshman to go all 4 years majoring in bioinformatics!!
Edit: negative binomial mentioned 15:15
A fellow prof!
its a great video, thank you. but can i ask you some question, if i use poisson with 2 predictors, can i make it into plot diagram? sorry for my bad english, im from indonesia
With flexplot you can.
In 12:10 it says log, but the systematic components seem to be exponentiated. Which one is correct?
The link function is applied to y, so you get f(y) = systematic component, that why you apply your systematic component to the inverse of the link function. Note that for id and 1/x, the link function is its own inverse that's why you only spotted it for Poisson
@@RomainPuech Thanks, I got it now!
Not me giggling about your "Poisson" pronunciation in my office. Didn't know GLMs could be so funny.
Great Video!!!!
what do you mean by residuals, and are you referring to 'y' as dependent variable?
See this video for an explanation of residuals: ua-cam.com/video/AK1iZyY6lMo/v-deo.html
And yes, y = dependent variable
@@QuantPsych I understand the concept of GLM now, but how do I use it in python?
Thank you so much!
Thank you!
i love this video so much
very concise video
very.
concise
Thank you from Nepal
Thanks a lot man
What happens if you have a mixture of variable types. Continous, discrete etc.
Fit a mixture model. I haven't used them often, except for zero-inflated models.
Bro you're the best
Awesome 😎👍
i love this teacher
Bravo, sir.
Congratulations! Nice video, I have two questions
1. What if the data and residuals behave normal even if they are counts, should I still apply a glm? And
2. What if the gamma or poisson distribution data contains zero values?
I would be very grateful if you would help me with these questions, and again congratulations! Greetings from Mexico
1-You can do normal, if you wish.
2-Just add 1 to each value.
@@QuantPsych Thank you so much!!
What an interesting host who are full of statistics.
Good Video
“No one uses these models”
Cries in ecologist
Lots of useful info. For me, better if desk is not slammed actually. Or may be the microphone is too sensitive.
My hand thanks you for the comment.
The Gordon Ramsay of statistics