Feature Hashing for Scalable Machine Learning - Nick Pentreath

Поділитися
Вставка
  • Опубліковано 22 сер 2024

КОМЕНТАРІ • 4

  • @siddhawan5190
    @siddhawan5190 4 роки тому +1

    can I use it for city names of a country

    • @utkarshprakash2723
      @utkarshprakash2723 4 роки тому

      Came here searching for a similar type of problem...I have a categorical feature with 1000s of categories
      Did you find something which worked for your motive?

    • @siddhawan5190
      @siddhawan5190 4 роки тому

      @@utkarshprakash2723 no haven't found anything .... If you will find anything please tell.

    • @utkarshprakash2723
      @utkarshprakash2723 4 роки тому +1

      @@siddhawan5190 so I landed upon 3 things:
      1. Use the top n most occuring category for one hot encoding.
      2. Use feature hashing or binary encoding.
      3. Use mean encoding for nominal features (this has higher chance of overfiting)
      Although people use these techniques whenever there are >100 or maybe >1000 categories...I didn't found any pin point rule to use any one of these... I am totally confused...only way is to use cross validation to check how the encoding technique is performing on end result