Multicollinearity with R
Вставка
- Опубліковано 6 жов 2024
- Includes,
what is multicollinearity?
what problems it creates?
how to assess its presence or absence?
what is the solution?
use of variance inflation factor (vif)
example with r
R is a free software environment for statistical computing and graphics, and is widely used by both academia and industry. R software works on both Windows and Mac-OS. It was ranked no. 1 in a KDnuggets poll on top languages for analytics, data mining, and data science. RStudio is a user friendly environment for R that has become popular.
Thank you very much, this is the only video on this topic that actually makes sense to me!
You're very welcome!
Thank you so much Dr Rai. Your Explanation is clear and simple
You are welcome!
Thank you so much Sir, very rarely is there such a crisp explanation to a topic on UA-cam
You are most welcome!
Thanks for your nice video. Using my data set the function "vif" did not work, however, when I changed to the "car" package using the function "vif" and yes, then ran well! Cheers!
Thanks for the update!
Very nice explanation. Thank you Professor.
You are welcome!
Thank you so much for the video Mr. Rai !
Thanks for comments!
Excellent explanation Bharatendra Rai Ji. I would love to watch more your R tutorial videos with such easier and simpler explanations.
Thanks for your comments!
Thank you so much for this information
You are welcome!
Thank you Sir, for your very nice explanation. F.
Thanks for comments!
Thank you very much for the video. I just wish you could have showed how we can go about it if multicollinearity is identified in our data. Thank you very much
You can use PCA for that. Here is the link:
ua-cam.com/video/OowGKNgdowA/v-deo.html
Use of adjusted r square ,give an example of heteroscasdisty model and also multicollinearity ..how we overcame that .. please sir one video... your video is too much helpful for students
If there is multicollinearity problem, use this link:
ua-cam.com/video/_3xMSbIde2I/v-deo.html
Thank you sir
You are welcome!
Thanks doc!
Welcome!
Thank you very much, sir, You really saved my day!!!! I have one query...
Shall we have to develop desired model before VIF estimation, or linear regression model is enough though one is not going to use that model further?
Refer to this playlist for detailed coverage of regression:
ua-cam.com/play/PL34t5iLfZddsiQ9PK2s3cd7LVd2FjOmIp.html
If the p-value of unemployed and military are not significant, do you keep them in the model even if the overall F-statistic is significant? Would love to hear more about this. Great video!
If p-value is not significant, they can be dropped.
Super helpful!
Thanks for comments!
Really helpful, thanks a lot.
Welcome!
Hi Dr. Rai, I really enjoy your videos. Thank you. I have two continuous variables: rcs(Age, 5) and rcs(GRE_score, 6) that I relaxed the cubic splines on and now I a getting huge VIF values for each of those variables. Does VIF work with variables that have relaxed cubic splines please? Thank you for your important work.
Great video Sir. When we expect your new video in R? Sir please Naive Bayes classifier this time.
Ok, sure.
Thank you so much Sir.
Here is the one you were looking for:
ua-cam.com/video/RLjSQdcg8AM/v-deo.html
Thank you so much Sir. Thank you very much. Thank thank you :-)
Hi doctor thank you for this wonderful explanation. I would like to know if its possible to get the excel sheet used in this presentation?
It was available within R.
Good one.
Thanks for comments!
I have a dataset that has a lot of variables. When I use vif() command I get around 4 variables with a vif > 5. Do I remove all 4 variables with a high vif before simplifying my model (by removing insignificant variables)?... Or do I remove the variable with the highest vif > 5, refit the model, test vif again, remove the variable with the highest vif > 5, refit the model, repeat until vif < 5?
I've tried both methods and get very different results once I simplify the models.
You can refer to this more detailed coverage:
ua-cam.com/video/ICi8MqvE_40/v-deo.html
this was very helpful!!
Thanks!
👌👌👌👌👌👌
Thanks!
best video.
Many many thanks
Sir, can u make a video for durbin watson test for autocorrelation?
Thanks for the suggestion!
Very helpful
Thanks!
Sir suppose here we get VIF>10 then we have to use one variable out of these two ..right ?and then we have to write model
Yes you can use one of them and re-run model. For more details refer to:
ua-cam.com/video/ICi8MqvE_40/v-deo.html
@@bkrai thank you so much sir
You are welcome!
Great
Thanks!
sir can you please do a video of panel data or longitudinal data analysis ....
do you know why vif() doesn't work? I downloaded the car package but it's still not working
After downloading make sure to run the library line.
@@bkrai thank you!
Why year is removed?
You can try running with year too.
hi. your video has helped me a lot. but i just like to ask a question. how about if all independent variables in the model are highly significant with 3 asterisks each and then the multiple r squared is very low like 0.03813. is the model still acceptable? thanks.
It can happen. For detailed coverage, see this playlist:
ua-cam.com/video/s23CMIjfwHk/v-deo.html
Please clear my doubt.
There is 0.67 correlation between birth and marriage which is significantly high. Why VIF is not coming out high for them?
Usually correlation coefficient of 0.95 or more are too high for multi-collinearity issues.
Thank you Sir, very nice explanation.
I have a question Sir, for example In my data if there are 5 dependent variables x1 to x5. And I got vif as x1(1.9), x2(34.25), x3(12.75), x4(7.6) and as x5(10.85).
So I have to choose x1, x2 and x4 is that correct ? Can you please guide me Sir ?
You can check which variable has high correlation with x2. You may decide to drop x2 or the variable that's highly correlated with x2.
Thanks Sir. I will get back to you if I have any queries. Thanks once again
Hello sir, in a realtime project I have categorical variables with multiple levels (50+) in few of the variables. I have the data with 50 variables among which 20 are nominal/categorical data. I have dummified these categorical variables, in this scenario how do I check the collinearity with numerical and categorical variables at once? As there are many variables it is not feasible for me to do chi squared test or anova on these variables
Hi Sir ,i am unable to install the library(faraway)& not able to find divusa dataset.Kindly suggest.
Error: unexpected symbol in "fit
Following line doesn't look complete, that's why there is an error:
fit
how to check for logistic regression...i.e, among categorical and continuous?
Check this for when to use logistic regression:
ua-cam.com/video/EV5N-pIdvJo/v-deo.html
@@bkrai sir...I was asking for developing a logistic regression.....
1.How to perform collinearity diagnostics if the dependent variable is categorical and independent variable is categorical and continuous?....
2.How to perform correlation analysis in selection of variables dependent variables with respect to independent variables?
Kindly help me.
Can u help how to do this in spss it would be much more beneficial to me?
You can decide to keep or exclude a variable based on p-values. You don’t need correlation analysis for categorical variables . If you have many categorical variables, I would suggest use of random forest.
@@bkraiis check for multocollinearity not required sir?....
I have one dependent categorical variable.....7 categorical independent variables....and 2 continuous variables and I prefer to carry out logistic regression....what must be the first step sir....?
@@bkrai for logistic regression for checking of association between independent categorical variables'.
what is the test that is to be implemented?
What is the package in which the vif is used from.
?
+boopesh jaya balaji Package name is faraway.
vif function not running in my R studio
I just now ran these lines and it runs fine:
library(faraway)
data("divusa")
data
Hi can i get the dataset to practice
The data used is inbuilt in R. When you run the codes, it will become available.
don't give examples with non-examples. thx
worst lecture ever
Sorry to hear that you didn’t find it useful, but thanks for feedback!