Use OrdinalEncoder instead of OneHotEncoder with tree-based models

Поділитися
Вставка
  • Опубліковано 23 гру 2024

КОМЕНТАРІ • 14

  • @dataschool
    @dataschool  3 роки тому +2

    Have you tried OrdinalEncoder with your tree-based model? Let me know how it compares to OneHotEncoder!

    • @sophiazhou9119
      @sophiazhou9119 2 роки тому

      I tried with randomforest and tree classifier, but the problem with ordinalEncoder is that the tree might treat it as a real number and break it down into a decimal number when spitting. How do you deal with that?

  • @dhirajkumarsahu999
    @dhirajkumarsahu999 3 роки тому +2

    Yes, this makes sense to me. Models like linear regression gives importance to features based on the weights. Hence using one hot encoding in case of unordered categories is important in case of linear regression. Please correct me if I am wrong.

  • @elmoreglidingclub3030
    @elmoreglidingclub3030 2 роки тому

    Very interesting. I’d like to work with this a bit; what is the data set you used?
    I have an interesting data set (~2,300 rows, 13 features) that can give some bizarre accuracy results using a single classification tree but performs much, much better with a random forest. I’ll try ordinal encoding on it and let you know how it performs. Good stuff! Again, please, what is this data set?

    • @dataschool
      @dataschool  2 роки тому +1

      See here: nbviewer.org/github/justmarkham/scikit-learn-tips/blob/master/notebooks/43_ordinal_encoding_for_trees.ipynb

  • @grzegorzzawadzki8718
    @grzegorzzawadzki8718 3 роки тому

    Thanks! That was very helpful.

  • @Dara-lj8rk
    @Dara-lj8rk 3 роки тому +1

    Good one thanks

  • @alfathterry7215
    @alfathterry7215 3 роки тому

    interesting...

  • @anandvyavahare2031
    @anandvyavahare2031 3 роки тому

    Who on earth even tried it to find it? 😂😂