The Simplest Encoding You’ve Never Heard Of

Поділитися
Вставка
  • Опубліковано 5 вер 2024

КОМЕНТАРІ • 31

  • @aiwithaz
    @aiwithaz Рік тому +12

    next video: boosting model accuracy by enigma encoding categorical features

  • @jespermikkelsen7553
    @jespermikkelsen7553 Рік тому +1

    This channel is so underrated.

  • @rajgurubhosale8680
    @rajgurubhosale8680 6 місяців тому

    glad i found these video when i needed the most!!!!! thank u

  • @edmundfreeman7203
    @edmundfreeman7203 Рік тому +4

    Target encoding has a lot of problems. For instance: 1) Build a data set of character stings AAA, AAB, ... ZZZ 2) Randomly generate 0 and 1's, 25 per string 3) build a model with this. Unless you are very lucky you'll get a strong model with little overfitting. Any technique that lets you build a strong model from random data is a bad idea.
    Really, what you are doing is a very fancy way of putting the target variable into the model, which is a big no-no.
    What you could conceivable do is build the target encoding on the test data only.

    • @underfitted
      @underfitted  Рік тому +2

      It does have a lot of problems, but that doesn't make it useless. Target encoding works very well in many different situations where One-Hot Encoding becomes problematic.
      If you aren't careful, you can overfit with Target Encoding. That's where the smoothing part comes in.

    • @edmundfreeman7203
      @edmundfreeman7203 Рік тому +1

      @@underfitted Give me a little bit. I'm going to owe you are concrete demonstration of what I am talking about, and I'll see if smoothing fixes the problem.

    • @edmundfreeman7203
      @edmundfreeman7203 Рік тому

      @@underfitted I put together a video ua-cam.com/video/4Zl-juDI2YM/v-deo.html on why I think target encoding is an antipattern.

    • @underfitted
      @underfitted  Рік тому +4

      Thanks for the video response. You make great points, and I generally agree with most of what you said. There are still claims that, in practice, hold less value. I've seen Target Encoding used in real-life situations with excellent results and no signs of overfitting. Of course, that doesn't make Target Encoding a silver bullet and you can easily leak the target values if you aren't careful.

    • @edmundfreeman7203
      @edmundfreeman7203 Рік тому +1

      @@underfitted What could work very well is using a past average of the target, instead of the modeling target.

  • @tshock22
    @tshock22 Рік тому

    Your production quality is next level. Much appreciated!

  • @nedafiroz514
    @nedafiroz514 Рік тому

    Wonderful illustration, thank you so much

  • @Cosimao564
    @Cosimao564 Рік тому

    every video i have seen from this channel is applicable in my chemometrics work, thank you

  • @user-vb9jo8xg4s
    @user-vb9jo8xg4s 2 місяці тому

    Fantastic!!!!

  • @openroomxyz
    @openroomxyz Рік тому

    Thanks for creating this videos in any case.

  • @fikriansyahadzaka6647
    @fikriansyahadzaka6647 Рік тому +1

    This might not be related to your video, but could you also cover ChatGPT? The internet went crazy in the past 2 weeks because of it. It will be interesting to understand the history of ChatGPT and how it works.

    • @underfitted
      @underfitted  Рік тому +1

      I did a video last week that talks about ChatGPT.
      ua-cam.com/video/l_oHZT6yTEs/v-deo.html

    • @fikriansyahadzaka6647
      @fikriansyahadzaka6647 Рік тому

      Ah I see, I missed that video. You are so fast updating the current trend. Keep up the good work!

  • @openroomxyz
    @openroomxyz Рік тому

    Maybe you could create a video to about how to self-learn without degree AI, order of things to learn and sources from where to learn, how long it would take ( approximation ), and how could you go monitizing the the knowladge, and skill.

  • @prajwalsyallur712
    @prajwalsyallur712 Рік тому

    Thanks for this useful video!🙂

  • @viswarupmisra
    @viswarupmisra Рік тому

    can you tell me about the camera you are using, the software you use to edit videos and your set up in general. And how do you create the effects in your videos?

    • @underfitted
      @underfitted  Рік тому +1

      I'm using a Sony FX3. Final Cut Pro. The effects are from LenoFX.

  • @philtoa334
    @philtoa334 Рік тому

    Nice vidéo.

  • @afterwork260
    @afterwork260 Рік тому

    How about we just delete the outliers first? and continue doing target encoding?

  • @bobdowling6932
    @bobdowling6932 Рік тому +1

    I don’t understand this at all. What’s to stop two different weathers getting the same score because you were happy on the same number of days with each of those weathers?

    • @underfitted
      @underfitted  Рік тому

      A couple of things:
      1. Keep in mind that this technique is effective with enough data. The example in the video is using 7 rows.
      2. Every row with the same value "SUNNY" will get the same encoding. That's precisely the goal. They already have the same value ("SUNNY"), the difference is that they will be getting a numerical value.

  • @curtisnewton895
    @curtisnewton895 Рік тому +1

    but what if the same text gets the same amount of associated values in another column, they will get the same numeric label
    why not just divide 1 per number of text labels and multiply that ratio by the line index
    so simple

    • @underfitted
      @underfitted  Рік тому

      Hey Curtis, I'm having problems understanding the situation you mention. "Same text gets the same amount of associated values in another column." Happy to discuss more. Feel free to hit me up @svpino in Twitter.

    • @diegofabianledesmamotta5139
      @diegofabianledesmamotta5139 Рік тому

      If I understand, you mean that two categories could en up with the same associated value right?
      Doesn't seem like a big problem to me, sounds like you're loosing infirmation but the goal is to make a predictive model, so if it doesn't prevent the model for doing good predictions I think is ok. @Underfitted do you think that's an issue?

  • @jackcat3745
    @jackcat3745 Рік тому +1

    He is not smart man.