Multicollinearity with R

Поділитися
Вставка
  • Опубліковано 6 жов 2024
  • Includes,
    what is multicollinearity?
    what problems it creates?
    how to assess its presence or absence?
    what is the solution?
    use of variance inflation factor (vif)
    example with r
    R is a free software environment for statistical computing and graphics, and is widely used by both academia and industry. R software works on both Windows and Mac-OS. It was ranked no. 1 in a KDnuggets poll on top languages for analytics, data mining, and data science. RStudio is a user friendly environment for R that has become popular.

КОМЕНТАРІ • 98

  • @jbracing6885
    @jbracing6885 3 роки тому +1

    Thank you very much, this is the only video on this topic that actually makes sense to me!

    • @bkrai
      @bkrai  3 роки тому

      You're very welcome!

  • @mariamadam9626
    @mariamadam9626 Рік тому +1

    Thank you so much Dr Rai. Your Explanation is clear and simple

    • @bkrai
      @bkrai  Рік тому

      You are welcome!

  • @Nairanugrah
    @Nairanugrah 4 роки тому +1

    Thank you so much Sir, very rarely is there such a crisp explanation to a topic on UA-cam

    • @bkrai
      @bkrai  4 роки тому +1

      You are most welcome!

  • @reuberlanaantoniazzijunior9069
    @reuberlanaantoniazzijunior9069 5 років тому +3

    Thanks for your nice video. Using my data set the function "vif" did not work, however, when I changed to the "car" package using the function "vif" and yes, then ran well! Cheers!

    • @bkrai
      @bkrai  5 років тому

      Thanks for the update!

  • @sudiptapaul2919
    @sudiptapaul2919 4 роки тому +1

    Very nice explanation. Thank you Professor.

    • @bkrai
      @bkrai  4 роки тому

      You are welcome!

  • @berfintas931
    @berfintas931 4 роки тому +1

    Thank you so much for the video Mr. Rai !

    • @bkrai
      @bkrai  4 роки тому

      Thanks for comments!

  • @abhi250482
    @abhi250482 5 років тому +1

    Excellent explanation Bharatendra Rai Ji. I would love to watch more your R tutorial videos with such easier and simpler explanations.

    • @bkrai
      @bkrai  5 років тому

      Thanks for your comments!

  • @devyaninitturkar8662
    @devyaninitturkar8662 3 роки тому +1

    Thank you so much for this information

    • @bkrai
      @bkrai  3 роки тому

      You are welcome!

  • @franciscojavieralvarezapar8894
    @franciscojavieralvarezapar8894 5 років тому +2

    Thank you Sir, for your very nice explanation. F.

    • @bkrai
      @bkrai  5 років тому

      Thanks for comments!

  • @thabangsetlalekgomo8947
    @thabangsetlalekgomo8947 5 років тому +2

    Thank you very much for the video. I just wish you could have showed how we can go about it if multicollinearity is identified in our data. Thank you very much

    • @bkrai
      @bkrai  5 років тому +2

      You can use PCA for that. Here is the link:
      ua-cam.com/video/OowGKNgdowA/v-deo.html

  • @Boyzzzzzzzzzzz
    @Boyzzzzzzzzzzz 4 роки тому +1

    Use of adjusted r square ,give an example of heteroscasdisty model and also multicollinearity ..how we overcame that .. please sir one video... your video is too much helpful for students

    • @bkrai
      @bkrai  4 роки тому

      If there is multicollinearity problem, use this link:
      ua-cam.com/video/_3xMSbIde2I/v-deo.html

  • @prakritisingh3798
    @prakritisingh3798 3 роки тому +1

    Thank you sir

    • @bkrai
      @bkrai  3 роки тому

      You are welcome!

  • @johannilsson410
    @johannilsson410 3 роки тому +1

    Thanks doc!

    • @bkrai
      @bkrai  3 роки тому

      Welcome!

  • @navdeepagrawal7819
    @navdeepagrawal7819 2 роки тому +1

    Thank you very much, sir, You really saved my day!!!! I have one query...
    Shall we have to develop desired model before VIF estimation, or linear regression model is enough though one is not going to use that model further?

    • @bkrai
      @bkrai  2 роки тому

      Refer to this playlist for detailed coverage of regression:
      ua-cam.com/play/PL34t5iLfZddsiQ9PK2s3cd7LVd2FjOmIp.html

  • @esperanzazagal7241
    @esperanzazagal7241 3 роки тому +2

    If the p-value of unemployed and military are not significant, do you keep them in the model even if the overall F-statistic is significant? Would love to hear more about this. Great video!

    • @bkrai
      @bkrai  3 роки тому

      If p-value is not significant, they can be dropped.

  • @Jayy_Arra
    @Jayy_Arra 4 роки тому +1

    Super helpful!

    • @bkrai
      @bkrai  4 роки тому

      Thanks for comments!

  • @yifengli482
    @yifengli482 7 років тому +1

    Really helpful, thanks a lot.

    • @bkrai
      @bkrai  3 роки тому

      Welcome!

  • @tmuffly1
    @tmuffly1 5 років тому

    Hi Dr. Rai, I really enjoy your videos. Thank you. I have two continuous variables: rcs(Age, 5) and rcs(GRE_score, 6) that I relaxed the cubic splines on and now I a getting huge VIF values for each of those variables. Does VIF work with variables that have relaxed cubic splines please? Thank you for your important work.

  • @flamboyantperson5936
    @flamboyantperson5936 6 років тому +2

    Great video Sir. When we expect your new video in R? Sir please Naive Bayes classifier this time.

    • @bkrai
      @bkrai  6 років тому

      Ok, sure.

    • @flamboyantperson5936
      @flamboyantperson5936 6 років тому +1

      Thank you so much Sir.

    • @bkrai
      @bkrai  6 років тому

      Here is the one you were looking for:
      ua-cam.com/video/RLjSQdcg8AM/v-deo.html

    • @flamboyantperson5936
      @flamboyantperson5936 6 років тому +1

      Thank you so much Sir. Thank you very much. Thank thank you :-)

  • @kdquran3709
    @kdquran3709 2 роки тому +1

    Hi doctor thank you for this wonderful explanation. I would like to know if its possible to get the excel sheet used in this presentation?

    • @bkrai
      @bkrai  2 роки тому

      It was available within R.

  • @mauanu100
    @mauanu100 5 років тому +1

    Good one.

    • @bkrai
      @bkrai  5 років тому

      Thanks for comments!

  • @tiannadermody4761
    @tiannadermody4761 2 роки тому +1

    I have a dataset that has a lot of variables. When I use vif() command I get around 4 variables with a vif > 5. Do I remove all 4 variables with a high vif before simplifying my model (by removing insignificant variables)?... Or do I remove the variable with the highest vif > 5, refit the model, test vif again, remove the variable with the highest vif > 5, refit the model, repeat until vif < 5?
    I've tried both methods and get very different results once I simplify the models.

    • @bkrai
      @bkrai  2 роки тому

      You can refer to this more detailed coverage:
      ua-cam.com/video/ICi8MqvE_40/v-deo.html

  • @homicide58halo
    @homicide58halo 6 років тому +1

    this was very helpful!!

    • @bkrai
      @bkrai  3 роки тому

      Thanks!

  • @sumanghorai265
    @sumanghorai265 5 років тому +1

    👌👌👌👌👌👌

    • @bkrai
      @bkrai  3 роки тому

      Thanks!

  • @mohamedaiman
    @mohamedaiman 4 роки тому +1

    best video.

    • @bkrai
      @bkrai  4 роки тому

      Many many thanks

  • @soumyendupaul9556
    @soumyendupaul9556 3 роки тому +1

    Sir, can u make a video for durbin watson test for autocorrelation?

    • @bkrai
      @bkrai  3 роки тому

      Thanks for the suggestion!

  • @ankurrattan6406
    @ankurrattan6406 7 років тому +1

    Very helpful

    • @bkrai
      @bkrai  3 роки тому

      Thanks!

  • @devyaninitturkar8662
    @devyaninitturkar8662 3 роки тому +1

    Sir suppose here we get VIF>10 then we have to use one variable out of these two ..right ?and then we have to write model

    • @bkrai
      @bkrai  3 роки тому +1

      Yes you can use one of them and re-run model. For more details refer to:
      ua-cam.com/video/ICi8MqvE_40/v-deo.html

    • @devyaninitturkar8662
      @devyaninitturkar8662 3 роки тому +1

      @@bkrai thank you so much sir

    • @bkrai
      @bkrai  3 роки тому

      You are welcome!

  • @drgones545
    @drgones545 5 років тому +1

    Great

    • @bkrai
      @bkrai  5 років тому +1

      Thanks!

  • @dr.naeemhaider4747
    @dr.naeemhaider4747 7 років тому

    sir can you please do a video of panel data or longitudinal data analysis ....

  • @nez01
    @nez01 Рік тому +1

    do you know why vif() doesn't work? I downloaded the car package but it's still not working

    • @bkrai
      @bkrai  Рік тому +1

      After downloading make sure to run the library line.

    • @nez01
      @nez01 Рік тому +1

      @@bkrai thank you!

  • @iamyhk850
    @iamyhk850 4 роки тому +1

    Why year is removed?

    • @bkrai
      @bkrai  4 роки тому

      You can try running with year too.

  • @evansumido6191
    @evansumido6191 3 роки тому +1

    hi. your video has helped me a lot. but i just like to ask a question. how about if all independent variables in the model are highly significant with 3 asterisks each and then the multiple r squared is very low like 0.03813. is the model still acceptable? thanks.

    • @bkrai
      @bkrai  3 роки тому

      It can happen. For detailed coverage, see this playlist:
      ua-cam.com/video/s23CMIjfwHk/v-deo.html

  • @rahulbansal7208
    @rahulbansal7208 5 років тому +1

    Please clear my doubt.
    There is 0.67 correlation between birth and marriage which is significantly high. Why VIF is not coming out high for them?

    • @bkrai
      @bkrai  5 років тому

      Usually correlation coefficient of 0.95 or more are too high for multi-collinearity issues.

  • @arunshowri7829
    @arunshowri7829 6 років тому +1

    Thank you Sir, very nice explanation.
    I have a question Sir, for example In my data if there are 5 dependent variables x1 to x5. And I got vif as x1(1.9), x2(34.25), x3(12.75), x4(7.6) and as x5(10.85).
    So I have to choose x1, x2 and x4 is that correct ? Can you please guide me Sir ?

    • @bkrai
      @bkrai  6 років тому +2

      You can check which variable has high correlation with x2. You may decide to drop x2 or the variable that's highly correlated with x2.

    • @arunshowri7829
      @arunshowri7829 6 років тому +1

      Thanks Sir. I will get back to you if I have any queries. Thanks once again

  • @modelsscale3211
    @modelsscale3211 7 років тому

    Hello sir, in a realtime project I have categorical variables with multiple levels (50+) in few of the variables. I have the data with 50 variables among which 20 are nominal/categorical data. I have dummified these categorical variables, in this scenario how do I check the collinearity with numerical and categorical variables at once? As there are many variables it is not feasible for me to do chi squared test or anova on these variables

  • @sonalikapanda8398
    @sonalikapanda8398 7 років тому

    Hi Sir ,i am unable to install the library(faraway)& not able to find divusa dataset.Kindly suggest.

  • @myknowledgeyourwisdom-bydi5649
    @myknowledgeyourwisdom-bydi5649 4 роки тому +1

    Error: unexpected symbol in "fit

    • @bkrai
      @bkrai  4 роки тому

      Following line doesn't look complete, that's why there is an error:
      fit

  • @saikiraraju.m.r
    @saikiraraju.m.r 4 роки тому +1

    how to check for logistic regression...i.e, among categorical and continuous?

    • @bkrai
      @bkrai  4 роки тому +1

      Check this for when to use logistic regression:
      ua-cam.com/video/EV5N-pIdvJo/v-deo.html

    • @saikiraraju.m.r
      @saikiraraju.m.r 4 роки тому +1

      @@bkrai sir...I was asking for developing a logistic regression.....
      1.How to perform collinearity diagnostics if the dependent variable is categorical and independent variable is categorical and continuous?....
      2.How to perform correlation analysis in selection of variables dependent variables with respect to independent variables?
      Kindly help me.
      Can u help how to do this in spss it would be much more beneficial to me?

    • @bkrai
      @bkrai  4 роки тому +1

      You can decide to keep or exclude a variable based on p-values. You don’t need correlation analysis for categorical variables . If you have many categorical variables, I would suggest use of random forest.

    • @saikiraraju.m.r
      @saikiraraju.m.r 4 роки тому +1

      @@bkraiis check for multocollinearity not required sir?....
      I have one dependent categorical variable.....7 categorical independent variables....and 2 continuous variables and I prefer to carry out logistic regression....what must be the first step sir....?

    • @saikiraraju.m.r
      @saikiraraju.m.r 4 роки тому +1

      @@bkrai for logistic regression for checking of association between independent categorical variables'.
      what is the test that is to be implemented?

  • @boopeshjayabalaji9787
    @boopeshjayabalaji9787 8 років тому

    What is the package in which the vif is used from.
    ?

    • @bkrai
      @bkrai  8 років тому

      +boopesh jaya balaji Package name is faraway.

  • @akashwaghmare3957
    @akashwaghmare3957 6 років тому

    vif function not running in my R studio

    • @bkrai
      @bkrai  6 років тому

      I just now ran these lines and it runs fine:
      library(faraway)
      data("divusa")
      data

  • @pavanchunduri4222
    @pavanchunduri4222 5 років тому

    Hi can i get the dataset to practice

    • @bkrai
      @bkrai  5 років тому

      The data used is inbuilt in R. When you run the codes, it will become available.

  • @georgegl3192
    @georgegl3192 7 років тому +1

    don't give examples with non-examples. thx

  • @chathurangaprabhath1543
    @chathurangaprabhath1543 5 років тому +2

    worst lecture ever

    • @bkrai
      @bkrai  5 років тому

      Sorry to hear that you didn’t find it useful, but thanks for feedback!