One Hot Encoding with Python | Handling Categorical Data

Поділитися
Вставка
  • Опубліковано 8 лип 2024
  • In this tutorial you can see how one hot encoding is applied in order to handle categorical data, step-by-step, in a real world data problem environment.
    You can check out the whole project from A to Z on my GitHub page:
    github.com/danbochman/FARS_LE...
    If you have any questions, feel free to ask in the comments!
    Please let me know if there are any specific machine learning tutorials you wish to see, and I'll be happy to do them.
    Music: www.bensound.com

КОМЕНТАРІ • 17

  • @interiorarttv8449
    @interiorarttv8449 3 роки тому +2

    I can't thank you enough! you saved my life i spent 12 hours trying to solve an error and what you explained helped me in 15 mins THANK YOU SO MUCH

    • @danbochman
      @danbochman  3 роки тому

      Very happy to hear this video helped you!
      Good luck!

  • @pressiyamu2187
    @pressiyamu2187 4 роки тому

    you deserve trophy

  • @swatithapa3311
    @swatithapa3311 5 років тому +1

    Thank you so much Dan

  • @piyushsharma417
    @piyushsharma417 5 років тому

    Nice explanation Dan

  • @penguin3196
    @penguin3196 4 роки тому

    Explained it very well. Thanks a ton...SUBSCRIBED!!!

  • @charliesaber7901
    @charliesaber7901 5 років тому

    Thank you, very helpful

  • @SH-fe6fu
    @SH-fe6fu 4 роки тому

    Great video. Keep it up!

  • @marcelokautzman1838
    @marcelokautzman1838 6 років тому

    Amazing!

  • @ankitshah7288
    @ankitshah7288 4 роки тому

    Hey Dan very well explained and thank you for uploading and helping others. I had a question - How do you extend the threshold to the test set or production input since the columns could be different in train set and the one we get in production for high cardinallity categories

  • @brianmuriuki6106
    @brianmuriuki6106 4 роки тому

    Goood work man..Keep it up

  • @2false637
    @2false637 Рік тому

    Saved my ass, man. Thanks!

  • @samable9585
    @samable9585 2 роки тому

    wonder what is difference between encoding and mapping. For example if STATE_CD goes from 1 to 50 say, now its numeric - can it be used in AI learning without resorting to one hot encoding?

    • @danbochman
      @danbochman  2 роки тому

      If you map states to numbers 1 to 50, it can be used in ML, but you inserted an internal relationship that doesn’t exist (state 1 is more similar to state 2, very far from state 50)

    • @samable9585
      @samable9585 2 роки тому

      @@danbochman thanks for the reply. understood

  • @gouravchoubey860
    @gouravchoubey860 4 роки тому

    Can you make same video using pyspark libraries instead of pandas

    • @danbochman
      @danbochman  4 роки тому

      Hey Gourav,
      Have you tried Dask? It has many of the same abilities of PySpark, but it's native to Python and the Pandas syntax.
      I made a tutorial video about it:
      ua-cam.com/video/Alwgx_1qsj4/v-deo.html
      One Hot Encoding with Dask:
      dask-ml.readthedocs.io/en/latest/modules/generated/dask_ml.preprocessing.DummyEncoder.html