Three reasons not to use drop='first' with OneHotEncoder

Поділитися
Вставка
  • Опубліковано 11 січ 2025

КОМЕНТАРІ • 17

  • @dataschool
    @dataschool  3 роки тому +5

    Thanks for watching! 🙌 If you're new to OneHotEncoder, you may want to watch this video as well: ua-cam.com/video/0w78CHM_ubM/v-deo.html

    • @eatbreathedatascience9593
      @eatbreathedatascience9593 3 роки тому

      Does it mean I should also not drop if_binary or drop array ? Thanks very much !!!

  • @puzobaklan
    @puzobaklan 3 роки тому +2

    Very useful tip! Thank you! 👍

  • @rishidixit7939
    @rishidixit7939 Місяць тому

    Why would I use a Standard Scaler on a categorical column ? Also if I use a Standard Scaler on numerical columns and not on the columns on which I applied One Hot Encoder can I then drop the column ?

  • @rishidixit7939
    @rishidixit7939 Місяць тому

    Mostly with Logistic Regression , Linear Regression and Linear SVM this is an issue. So using drop at that time must be important ? How can SciKit Learn prevent an error in these cases ?

  • @rishidixit7939
    @rishidixit7939 Місяць тому

    Also does not dropping columns affect the interpretability of the model ? I do not know what it means just asking you what it means ?

  • @sebastianweiler3997
    @sebastianweiler3997 2 роки тому

    Thanks a lot for this and the other of your videos! But what's the right way to deal with this issue when using unregularized regression? I need to drop one category because of multicollinearity but I don't want my unknown category to be encoded the same way as my base-category is. Please help me out. Thank you

    • @dataschool
      @dataschool  2 роки тому

      If you set handle_unknown to 'error', then this won't be a problem. Hope that helps!

  • @21Gannu
    @21Gannu 3 роки тому

    I think after watching your video on effective machine learning method i now know why not. As you discussed there i usually let my gridsearch decide it.....

  • @MrTulufan
    @MrTulufan 3 роки тому

    does this also apply to a regular logistic regression (not regularized)? I dont think the model would converge with perfectly co-corelated dummy variables. How does sklearn handle this?

    • @dataschool
      @dataschool  3 роки тому

      In scikit-learn, logistic regression is regularized by default.

  • @aytekin8669
    @aytekin8669 3 роки тому

    Thank you!