This is by far one of the best videos addressing the validation of the regression assumption. Thank you so much for making it easy and simple to understand!
Thank you for the excellent video. Great explanation. I have a question. You ran diagnostic tests for an entire mutliple regression model, which has multiple IVs. Aren't we supposed to run seperate diagnostic tests for each IV-DV relationship in a model, instead of running diagnostic tests for the entire set of variables in a model at once? I have checked the other tutorials online, and they applied the former approach. Or is it matter of preference?
When it comes to assumptions concerning the residuals (normality, homoskedasticity) you have to test for the entire multiple regression model you use, in the end, for your hypothesis tests. For linearity, there bivariate scatterplots can make sense.
Thank you for a quick reply. My multivariate model (including control variables) doesn't violate any of the assumptions. The moment I regress DV on the individual IVs from the multivariate model and conduct diagnostics for them, results come awful. The residuals are not normal, and the variance is not constant. This work is for my graduation thesis. Do you think an inferential analysis from such case would still be reliable?
@@ahmetjeyhunov4435 One crucial implicit assumption for a regression analysis is that you have include all relevant predictors. So, it is quite possible that with a single regression you get a violation of assumptions. Example: Gender (m/f) as a covariate has a significant influence. If you don't control for gender, then it is quite likely that the distribution of the resiudals will be nonnormal because this could lead to a bimodal distribution of the residuals (one peak for males, the other for females). I haven't seen any academic literature yet that suggests that you should test the assumptions for a mulitple regression by looking at the single regressions using only one IV each.
Very usefull video! Thanks! I have one question: I am running a moderated mediation model using Process Hayes method (model 7). Do I have to check these assumptions as well? And if so, do I make a reg.fit model using lm with (DV ~ IV 1 + IV 2....+ W + M1 +M2 +M3)? So, would I add my mediators and moderators as independent variables and run lm so that I can check all assumptions? Many thanks in advance!
Here is a video about assumption checks with a PROCESS model: ua-cam.com/video/D6t5TZ0-j6g/v-deo.html So, yes, you could replicate the different regression models PROCESS calculates in R and check the regression assumptions there. With 3 Mediators I think you would get 4 regression models (one for the moderated prediction of each mediator and one for the prediction of the DV), you would have to check those regarding the regression assumptions.
@@RegorzStatistik Many thanks! That video is very usefull as well! One thing I still wonder is regarding the lineairity assumption. Is this supposed to be checked by plotting the IV on the x-axis and the DV on the Y axis, or do you have to look at the residuals? On some websites I see the first, but sometimes it's only about the residuals.
@@lisakrijnen2453 For a simple regression you could look at the residuals, too. But for a mulitple regression I prefer bivariate scatterplots (one predictor + criterion), because with the residuals alone it's hard to spot which predictor has a nonlinear relationsship to the criterion variable.
@@RegorzStatistik Ok thanks! I have some trouble interpreting my scatterplots, as they are not clearly lineair. Do you know whether there are tests / other ways to assess whether it is a lineair relationship?
There is the rainbow-test (0:05:47 in this video), a significance test for linearity. It is part of the lmtest-package in R. cran.r-project.org/web/packages/lmtest/lmtest.pdf
Terrific video! Thanks!! Do you know how to label those outliers in the plots ? I run the same code but the outliers in the plot are not labelled as it showed in this video.
Hey, great video and everything basically worked, however I have a model with gender as independent variable (0 and 1) and rapport level as dependent variable (scale data). That makes some ofthe graphs look kinda weird since all the dots are exclusively on 0 and one and this code does not work with a categorical IV: ols_vif_tol(reg.fit) Do you have advice on that?
This is gold. I searched everywhere and this is the best video I have seen on this topic. Thank you!
Thank you very much. I am from Uzbekistan. I am learning R. This video is very useful to me.
This is by far one of the best videos addressing the validation of the regression assumption. Thank you so much for making it easy and simple to understand!
your explanation is INCREDIBLE, exactly how it's supposed to be.
You have a new sub.
Great job!
Thank you for the excellent video. Great explanation. I have a question. You ran diagnostic tests for an entire mutliple regression model, which has multiple IVs. Aren't we supposed to run seperate diagnostic tests for each IV-DV relationship in a model, instead of running diagnostic tests for the entire set of variables in a model at once? I have checked the other tutorials online, and they applied the former approach. Or is it matter of preference?
When it comes to assumptions concerning the residuals (normality, homoskedasticity) you have to test for the entire multiple regression model you use, in the end, for your hypothesis tests. For linearity, there bivariate scatterplots can make sense.
Thank you for a quick reply. My multivariate model (including control variables) doesn't violate any of the assumptions. The moment I regress DV on the individual IVs from the multivariate model and conduct diagnostics for them, results come awful. The residuals are not normal, and the variance is not constant. This work is for my graduation thesis. Do you think an inferential analysis from such case would still be reliable?
@@ahmetjeyhunov4435 One crucial implicit assumption for a regression analysis is that you have include all relevant predictors. So, it is quite possible that with a single regression you get a violation of assumptions.
Example:
Gender (m/f) as a covariate has a significant influence. If you don't control for gender, then it is quite likely that the distribution of the resiudals will be nonnormal because this could lead to a bimodal distribution of the residuals (one peak for males, the other for females).
I haven't seen any academic literature yet that suggests that you should test the assumptions for a mulitple regression by looking at the single regressions using only one IV each.
@@RegorzStatistik Awesome! Thank you so much!
Very helpful and well explained video! Dankeschön :)
Very usefull video! Thanks! I have one question: I am running a moderated mediation model using Process Hayes method (model 7). Do I have to check these assumptions as well? And if so, do I make a reg.fit model using lm with (DV ~ IV 1 + IV 2....+ W + M1 +M2 +M3)? So, would I add my mediators and moderators as independent variables and run lm so that I can check all assumptions? Many thanks in advance!
Here is a video about assumption checks with a PROCESS model:
ua-cam.com/video/D6t5TZ0-j6g/v-deo.html
So, yes, you could replicate the different regression models PROCESS calculates in R and check the regression assumptions there. With 3 Mediators I think you would get 4 regression models (one for the moderated prediction of each mediator and one for the prediction of the DV), you would have to check those regarding the regression assumptions.
@@RegorzStatistik Many thanks! That video is very usefull as well! One thing I still wonder is regarding the lineairity assumption. Is this supposed to be checked by plotting the IV on the x-axis and the DV on the Y axis, or do you have to look at the residuals? On some websites I see the first, but sometimes it's only about the residuals.
@@lisakrijnen2453 For a simple regression you could look at the residuals, too. But for a mulitple regression I prefer bivariate scatterplots (one predictor + criterion), because with the residuals alone it's hard to spot which predictor has a nonlinear relationsship to the criterion variable.
@@RegorzStatistik Ok thanks! I have some trouble interpreting my scatterplots, as they are not clearly lineair. Do you know whether there are tests / other ways to assess whether it is a lineair relationship?
There is the rainbow-test (0:05:47 in this video), a significance test for linearity. It is part of the lmtest-package in R.
cran.r-project.org/web/packages/lmtest/lmtest.pdf
Terrific video! Thanks!! Do you know how to label those outliers in the plots ? I run the same code but the outliers in the plot are not labelled as it showed in this video.
I don't know why you don't get labels.
Excellent! Thank you.
Hey, great video and everything basically worked, however I have a model with gender as independent variable (0 and 1) and rapport level as dependent variable (scale data). That makes some ofthe graphs look kinda weird since all the dots are exclusively on 0 and one and this code does not work with a categorical IV:
ols_vif_tol(reg.fit)
Do you have advice on that?
For a binary IV you don't have to check the linearity assumption. Nonlinearity can only be an issue with more than two values for the IV.
sir this video runs on a data? where is the data sir?
Unfortunately, that data set is not publicly available.
I want to write an article for Scopus. If someone knows Econometrics/analysis well, we will write together. I will write the rest.