Target Encoding for Categorical Values in Data Science

Поділитися
Вставка
  • Опубліковано 21 лип 2024
  • Target encoding is a technique that can be used to encode categorical values. Unlike dummy variables, target encoding only requires a single element in the feature vector.
    Source code for this vide: github.com/jeffheaton/present...
  • Наука та технологія

КОМЕНТАРІ • 21

  • @lawrenceguo91
    @lawrenceguo91 4 роки тому

    Thank you for the clear explanation and code examples.

  • @DanielWeikert
    @DanielWeikert 5 років тому +2

    Thanks for sharing Jeff. Love to see more on your kaggle Tips or even a whole project. Would be really interesting to see how you approach a competition.
    Thanks again and best regards

  • @somevideos2102
    @somevideos2102 4 роки тому

    Thank you. Your video helped me with one of my projects.

  • @gauravmalik3911
    @gauravmalik3911 Рік тому

    Great video, thank you for the explanation

  • @nanditaprasad5467
    @nanditaprasad5467 4 роки тому

    Thanks, it was really helpful....
    But how to fit that value in the independent column features??
    i could not find a way to do that... please help.

  • @GeraldTalton
    @GeraldTalton Рік тому

    Hi Jeff, in your video you state you'd have the link to the Max Halford article, but I don't see it here.

  • @daniloamorim3012
    @daniloamorim3012 4 роки тому +2

    Would it be better if you build a Transformer class (with fit and transform methods) instead of a function? This transformation is not stateless, that is, the transformer needs to "learn" some parameters to apply this transformation to new observations for which you don't know the target.

  • @sunnyarora4916
    @sunnyarora4916 3 роки тому

    In what cases it gives better results than one hot encoding

  • @khalidadouhan2001
    @khalidadouhan2001 3 роки тому

    thanks for this useful video, but what about target encoding for multi-class classification??

  • @atulbunkar2650
    @atulbunkar2650 4 роки тому +1

    Thanks man !!

  • @zahrazanjani4136
    @zahrazanjani4136 4 роки тому

    thanks, it was helpful

  • @j220493
    @j220493 Рік тому

    Great video, I have just one question. How do you use this encoding with longitudinal data?. My guess is that you have to apply this method to the corresponding data to each year. How ever, Im confused cause that approach could give you different encoding values for each year.

  • @vasukhajuria6121
    @vasukhajuria6121 3 роки тому

    In the function declaration, you have used df1,df2 as parameters but in definition, you are using df instead of df! Can you explain that?

  • @raheemnasirudeen6394
    @raheemnasirudeen6394 5 років тому +3

    Nice video.
    I still not understand something using target encoding in a Competition.
    Train is with target variable and test with none.
    How can we achieve target encoding in have training and test set and converting both together

    • @balajirajaram9512
      @balajirajaram9512 3 роки тому +1

      I had the same issue but I found the solution so I'm sharing this with you. There are many ways to do that but one of the way is that you train the encoder with the training data and then you use the categories from the training data to map the values to the test data. So for example, if you have a category A in the training set which is mean encoded to 0.60 then u use the same value 0.6 for category A in the testing data. Hope this helps

  • @user-or7ji5hv8y
    @user-or7ji5hv8y 5 років тому +1

    A suggestion, how about a video on the practical workflow of using docker for data scientists.

    • @HeatonResearch
      @HeatonResearch  5 років тому +1

      Thanks... I will get into that somewhat on the deployment model module of my course. But I may do other docker content in the future, very useful tool.

  • @looploop6612
    @looploop6612 4 роки тому

    very messy.
    A ppt or drawing is much better than showing code

    • @HeatonResearch
      @HeatonResearch  4 роки тому +4

      Sorry if it was not useful to you. I do tend to focus much more on the coding than the theoretical, there are so many UA-cam ML videos these days, I am sure you could find similar with less code.

    • @markgiroux3442
      @markgiroux3442 4 роки тому +1

      Disagree, if you couldn't follow the code, you need to get more comfortable with python. This is a feature encoding lesson, not a python coding tutorial.