Thanks for the video, it's so helpful. One thing: Why in a prediction analysis we can let multicollinearity exist, won't it increase the dimension which is redundant, thus leading to poor performance on test set.
Can you please explain in detail what you mean by, ‘restrict the scope of the model to coincide with the range of predictor variables that exhibit the same pattern of multicollinearity’ ?
Great talk. Quick question, when performing ridge regression. Are we changing one x value with sqrt of k or lots? You mentioned diagonally changing the x values...
If we have p regressor variables in the model, then we add p new "data points" to our data set. The first one will have a value of sqrt(k) for the first regressor, then zero for all other regressors. The second one will have a value of sqrt(k) for the second regressor, then zero for all other regressors. Etc. This looks like we are adding a diagonal matrix to our data set.
I have a doubt that I can't find the answer anywhere: If you have 2 predictor variables with perfect collinearity (one is a linear combination of the other), how can ridge regression decide which one to use and which one to get rid of, since both are in essence the same variable?
With PERFECT collinearity, there is nothing you can do. But that doesn't happen in real life (except a possibly a coding error, which is fixed by fixing that error).
Ridge regression will not perform subset selection and hence, ridge will not tell you which predictor to decide to use. Lasso regression on the other hand can shrink coefficients to 0 and can perform subset selection. But if you wanted to mix the benefits from Lasso and Ridge, you could look at elastic net. Elastic net performs subset selection but it includes grouping effects more than the Lasso which will often pick one variable in a group of highly correlated covariates.
thank you so much, it's quite helpful!
Thanks for the video, it's so helpful. One thing: Why in a prediction analysis we can let multicollinearity exist, won't it increase the dimension which is redundant, thus leading to poor performance on test set.
Can you please explain in detail what you mean by, ‘restrict the scope of the model to coincide with the range of predictor variables that exhibit the same pattern of multicollinearity’ ?
Great talk. Quick question, when performing ridge regression. Are we changing one x value with sqrt of k or lots? You mentioned diagonally changing the x values...
If we have p regressor variables in the model, then we add p new "data points" to our data set. The first one will have a value of sqrt(k) for the first regressor, then zero for all other regressors. The second one will have a value of sqrt(k) for the second regressor, then zero for all other regressors. Etc. This looks like we are adding a diagonal matrix to our data set.
That makes sense. Cheers Chris!
I have a doubt that I can't find the answer anywhere: If you have 2 predictor variables with perfect collinearity (one is a linear combination of the other), how can ridge regression decide which one to use and which one to get rid of, since both are in essence the same variable?
With PERFECT collinearity, there is nothing you can do. But that doesn't happen in real life (except a possibly a coding error, which is fixed by fixing that error).
Ridge regression will not perform subset selection and hence, ridge will not tell you which predictor to decide to use. Lasso regression on the other hand can shrink coefficients to 0 and can perform subset selection. But if you wanted to mix the benefits from Lasso and Ridge, you could look at elastic net. Elastic net performs subset selection but it includes grouping effects more than the Lasso which will often pick one variable in a group of highly correlated covariates.