Why would I use a Standard Scaler on a categorical column ? Also if I use a Standard Scaler on numerical columns and not on the columns on which I applied One Hot Encoder can I then drop the column ?
Mostly with Logistic Regression , Linear Regression and Linear SVM this is an issue. So using drop at that time must be important ? How can SciKit Learn prevent an error in these cases ?
Thanks a lot for this and the other of your videos! But what's the right way to deal with this issue when using unregularized regression? I need to drop one category because of multicollinearity but I don't want my unknown category to be encoded the same way as my base-category is. Please help me out. Thank you
I think after watching your video on effective machine learning method i now know why not. As you discussed there i usually let my gridsearch decide it.....
does this also apply to a regular logistic regression (not regularized)? I dont think the model would converge with perfectly co-corelated dummy variables. How does sklearn handle this?
Thanks for watching! 🙌 If you're new to OneHotEncoder, you may want to watch this video as well: ua-cam.com/video/0w78CHM_ubM/v-deo.html
Does it mean I should also not drop if_binary or drop array ? Thanks very much !!!
Very useful tip! Thank you! 👍
You're very welcome!
Why would I use a Standard Scaler on a categorical column ? Also if I use a Standard Scaler on numerical columns and not on the columns on which I applied One Hot Encoder can I then drop the column ?
Mostly with Logistic Regression , Linear Regression and Linear SVM this is an issue. So using drop at that time must be important ? How can SciKit Learn prevent an error in these cases ?
Also does not dropping columns affect the interpretability of the model ? I do not know what it means just asking you what it means ?
Thanks a lot for this and the other of your videos! But what's the right way to deal with this issue when using unregularized regression? I need to drop one category because of multicollinearity but I don't want my unknown category to be encoded the same way as my base-category is. Please help me out. Thank you
If you set handle_unknown to 'error', then this won't be a problem. Hope that helps!
I think after watching your video on effective machine learning method i now know why not. As you discussed there i usually let my gridsearch decide it.....
Glad to hear, Ganesh!
Which video is it?
does this also apply to a regular logistic regression (not regularized)? I dont think the model would converge with perfectly co-corelated dummy variables. How does sklearn handle this?
In scikit-learn, logistic regression is regularized by default.
Thank you!
You're welcome!