Testing Assumptions Linear Regression in R

Поділитися
Вставка
  • Опубліковано 25 жов 2021
  • How do you test the assumptions for linear regression or multiple regression in R? This video tutorial shows you how to test the necessary regression assumptions in R using R commands based on various packages.
    CONTENT:
    The following procedures for testing regression assumptions are shown here:
    Homoscedasticity
    Homoscedasticity in a linear regression with R can be tested both graphically (scatter plot of predicted values and residuals) and with a hypothesis test, e.g. the Breusch-Pagan test.
    Normal distribution
    If you test the normal distribution in R, or more precisely the normal distribution of the residuals, you can do this in several graphical and numerical ways.
    Graphical tests:
    - Histogram for normal distribution
    - QQ plot
    Numerical tests:
    - Shapiro-Wilk test
    - Skewness and Kurtosis
    - Agostino test and Anscombe test (for skewness and kurtosis)
    Linearity
    When testing linearity in R, you can look at scatterplots, you can also run a hypothesis test for linearity, e.g. the Rainbow test.
    Multicollinearity
    Checking the absence of strong multicollinearity using Variance Inflation Factors (VIF).
    Outliers
    To determine outliers in R, there are many different diagnostic procedures. When I want to detect outliers in R, I mainly use the following diagnostics:
    - Studentized residuals
    - Cook's distance
    - Leverage values
    - DfBetaS
    The assumptions uncorrelatedness/independence of the residuals as well as the appropriate scale properties (metric variables, predictors also binary possible), have to be checked by looking at the study design.
    Companion webpage to this tutorial with all the R code examples used in the video:
    www.regorz-statistik.de/en/r_t...

КОМЕНТАРІ • 25

  • @timmytesla9655
    @timmytesla9655 4 місяці тому +2

    This is gold. I searched everywhere and this is the best video I have seen on this topic. Thank you!

  • @bepultalimschool5171
    @bepultalimschool5171 5 місяців тому +1

    Thank you very much. I am from Uzbekistan. I am learning R. This video is very useful to me.

  • @salmaaita8967
    @salmaaita8967 Рік тому +1

    This is by far one of the best videos addressing the validation of the regression assumption. Thank you so much for making it easy and simple to understand!

  • @matteoframba1854
    @matteoframba1854 2 роки тому +1

    your explanation is INCREDIBLE, exactly how it's supposed to be.
    You have a new sub.
    Great job!

  • @hannahantonia2540
    @hannahantonia2540 2 роки тому +1

    Very helpful and well explained video! Dankeschön :)

  • @jacekbuczny4567
    @jacekbuczny4567 2 роки тому +1

    Excellent! Thank you.

  • @haoranxi3319
    @haoranxi3319 2 роки тому

    Terrific video! Thanks!! Do you know how to label those outliers in the plots ? I run the same code but the outliers in the plot are not labelled as it showed in this video.

  • @popps6402
    @popps6402 Рік тому

    Hey, great video and everything basically worked, however I have a model with gender as independent variable (0 and 1) and rapport level as dependent variable (scale data). That makes some ofthe graphs look kinda weird since all the dots are exclusively on 0 and one and this code does not work with a categorical IV:
    ols_vif_tol(reg.fit)
    Do you have advice on that?

    • @RegorzStatistik
      @RegorzStatistik  Рік тому

      For a binary IV you don't have to check the linearity assumption. Nonlinearity can only be an issue with more than two values for the IV.

  • @lisakrijnen2453
    @lisakrijnen2453 2 роки тому +1

    Very usefull video! Thanks! I have one question: I am running a moderated mediation model using Process Hayes method (model 7). Do I have to check these assumptions as well? And if so, do I make a reg.fit model using lm with (DV ~ IV 1 + IV 2....+ W + M1 +M2 +M3)? So, would I add my mediators and moderators as independent variables and run lm so that I can check all assumptions? Many thanks in advance!

    • @RegorzStatistik
      @RegorzStatistik  2 роки тому +1

      Here is a video about assumption checks with a PROCESS model:
      ua-cam.com/video/D6t5TZ0-j6g/v-deo.html
      So, yes, you could replicate the different regression models PROCESS calculates in R and check the regression assumptions there. With 3 Mediators I think you would get 4 regression models (one for the moderated prediction of each mediator and one for the prediction of the DV), you would have to check those regarding the regression assumptions.

    • @lisakrijnen2453
      @lisakrijnen2453 2 роки тому

      @@RegorzStatistik Many thanks! That video is very usefull as well! One thing I still wonder is regarding the lineairity assumption. Is this supposed to be checked by plotting the IV on the x-axis and the DV on the Y axis, or do you have to look at the residuals? On some websites I see the first, but sometimes it's only about the residuals.

    • @RegorzStatistik
      @RegorzStatistik  2 роки тому +1

      @@lisakrijnen2453 For a simple regression you could look at the residuals, too. But for a mulitple regression I prefer bivariate scatterplots (one predictor + criterion), because with the residuals alone it's hard to spot which predictor has a nonlinear relationsship to the criterion variable.

    • @lisakrijnen2453
      @lisakrijnen2453 2 роки тому +1

      @@RegorzStatistik Ok thanks! I have some trouble interpreting my scatterplots, as they are not clearly lineair. Do you know whether there are tests / other ways to assess whether it is a lineair relationship?

    • @RegorzStatistik
      @RegorzStatistik  2 роки тому +1

      There is the rainbow-test (0:05:47 in this video), a significance test for linearity. It is part of the lmtest-package in R.
      cran.r-project.org/web/packages/lmtest/lmtest.pdf

  • @ahmetjeyhunov4435
    @ahmetjeyhunov4435 9 місяців тому

    Thank you for the excellent video. Great explanation. I have a question. You ran diagnostic tests for an entire mutliple regression model, which has multiple IVs. Aren't we supposed to run seperate diagnostic tests for each IV-DV relationship in a model, instead of running diagnostic tests for the entire set of variables in a model at once? I have checked the other tutorials online, and they applied the former approach. Or is it matter of preference?

    • @RegorzStatistik
      @RegorzStatistik  9 місяців тому +1

      When it comes to assumptions concerning the residuals (normality, homoskedasticity) you have to test for the entire multiple regression model you use, in the end, for your hypothesis tests. For linearity, there bivariate scatterplots can make sense.

    • @ahmetjeyhunov4435
      @ahmetjeyhunov4435 9 місяців тому

      Thank you for a quick reply. My multivariate model (including control variables) doesn't violate any of the assumptions. The moment I regress DV on the individual IVs from the multivariate model and conduct diagnostics for them, results come awful. The residuals are not normal, and the variance is not constant. This work is for my graduation thesis. Do you think an inferential analysis from such case would still be reliable?

    • @RegorzStatistik
      @RegorzStatistik  9 місяців тому +1

      @@ahmetjeyhunov4435 One crucial implicit assumption for a regression analysis is that you have include all relevant predictors. So, it is quite possible that with a single regression you get a violation of assumptions.
      Example:
      Gender (m/f) as a covariate has a significant influence. If you don't control for gender, then it is quite likely that the distribution of the resiudals will be nonnormal because this could lead to a bimodal distribution of the residuals (one peak for males, the other for females).
      I haven't seen any academic literature yet that suggests that you should test the assumptions for a mulitple regression by looking at the single regressions using only one IV each.

    • @ahmetjeyhunov4435
      @ahmetjeyhunov4435 9 місяців тому +1

      @@RegorzStatistik Awesome! Thank you so much!

  • @bepultalimschool5171
    @bepultalimschool5171 5 місяців тому

    I want to write an article for Scopus. If someone knows Econometrics/analysis well, we will write together. I will write the rest.

  • @krishnaiyer2556
    @krishnaiyer2556 2 роки тому

    sir this video runs on a data? where is the data sir?

    • @RegorzStatistik
      @RegorzStatistik  2 роки тому

      Unfortunately, that data set is not publicly available.