What is Multicollinearity | Data Science Interview Questions and Answers | Thinking Neuron

Поділитися
Вставка
  • Опубліковано 7 вер 2024
  • thinkingneuron...
    Collinearity means a linear relationship between two variables.
    Two variables are perfectly collinear if there is an exact linear relationship between them. For example, V2 = a* V1 +b. If there is such a relation, then V1 and V2 are collinear.
    Multicollinearity refers to a situation in which two or more explanatory variables in a Multiple Regression model are highly linearly related.
    More commonly, the issue of multicollinearity arises when there is an approximately linear relationship between two or more independent variables(Predictors).
    In simple terms Two Predictor variables have a high correlation value will generate Multicollinearity.
    It is bad for R2 value since it inflates it. This happens because the model thinks it is explaining a lot of variances, but it is actually explaining the same variance twice (High Correlation between Predictors).
    Q. How to remove Multicollinearity in Data?
    1. Check the VIF of all the Predictor variables using vif() function from the library(car) in R. OR the variance_inflation_factor() function present in statsmodel lib in python
    2. If any variable Has VIF greater than 5 then remove it from the regression equation.
    3. Re-check the VIF
    4. Repeat Steps 1-3 till all variables have VIF less than 5

КОМЕНТАРІ • 1

  • @VISHALHS-qw1hc
    @VISHALHS-qw1hc 6 місяців тому

    What explain with confidence what is vif other than the abbreviation