How to do Deep Learning with Categorical Data

Поділитися
Вставка
  • Опубліковано 8 лип 2024
  • If you’re like me, you don’t really need to train self-driving car algorithms or make a cat-image-detectors. Instead, you're likely dealing with practical problems and normal looking data.
    The focus of this series is to help the practitioner develop intuition about when and how to use Deep Learning (DL) models in normal situations with normal data, e.g. structured (i.e. something you can read into pandas) data. I will teach you the fundamentals-the building blocks of DL.
    There are many courses that teach DL on computer vision, NLP, etc. This is not that. We are about teaching the practitioner how to transform normal machine learning (ML) models into DL models-and I have a lot of experience doing just that.
    Some existing DL courses are either overly theoretical (not useful to practitioners), overly simplistic (belying the sophistication), or even overly practical (providing the practitioner with a false sense of security). DL is hard. Real data science is hard. We want to steer you away from the most common mistakes.
    By starting with tabular data, we can introduce you to the DL toolbox in a more intuitive way. Note, this series is not about the underlying math for neural networks or the like.
    This series is aimed most directly at intermediate level users.
    Helpful links:
    Link to Deep Learning Building Blocks Series:
    • Python Keras - Deep Le...
    Link to GitHub repo including categorical data lesson:
    github.com/knathanieltucker/d...

КОМЕНТАРІ • 29

  • @DataTalks
    @DataTalks  4 роки тому +5

    Quick note on the embedding layer! The input length being set to 5 in the embedding layer means that you have the same base categories (like words or tags) for each of the inputs. If you have 5 different types of categories you'll need to use 5 different embedding layers!

    • @jasonclement6305
      @jasonclement6305 4 роки тому

      So if i had a categorical for say... zipcode and one for race... id separate them as multiple embedding layers correct?

    • @DataTalks
      @DataTalks  4 роки тому +4

      @@jasonclement6305 exactly!

    • @sifar1857
      @sifar1857 2 роки тому

      Do you have a sample of how this is done?

    • @herrylau7381
      @herrylau7381 2 роки тому

      If I have two different categorical input I have to separately embed them and the concat the three data together?

    • @DataTalks
      @DataTalks  2 роки тому +1

      @@herrylau7381 If those inputs are from different categories (eg color and size) then yes!

  • @soupizcool
    @soupizcool 3 роки тому +4

    You do not have enough views. This is fantastic. I have been working on a project and have been absolutely baffled about how to get the embedding layers to work properly. Keras API docs and many other sources/videos do not clearly address that you must separate the categorical variables from the numerical ones first. In most examples I have seen, people work with datasets that are entirely categorical but not a mixture. I was so confused why the embedding layers do not know which features to look at to embed, but your video made it so clear. Thanks again keep making videos.

  • @stackexchange7353
    @stackexchange7353 3 роки тому

    Your videos are amazing. thanks for making these concepts so easy to understand.

  • @MasterBen007
    @MasterBen007 Рік тому

    tysm bro, this my first time doing ml and ive been pulling my hair out trying to figure out how to use zip codes for my data and somehow found this perfect video

  • @ph0b056
    @ph0b056 2 роки тому

    Great tutorial! Although a small help, how do I get the classification report for this? Thanks in advance

  • @semidevilz
    @semidevilz 3 роки тому +1

    Thank you! Wanted to clarify, does the “embedding” happen at “embedding_layer =“.... Or does it happen at the model training?
    Also, how do I go about extracting the embedded vectors? I,e. I want to use these embedding to train it on another ML model?

    • @DataTalks
      @DataTalks  3 роки тому +1

      the embedding always happens during training! you can use the below to get the weights:
      layer. get_weights(): returns the weights of the layer as a list of Numpy arrays.

    • @semidevilz
      @semidevilz 3 роки тому

      @@DataTalks thank you!

  • @feifei989
    @feifei989 2 роки тому

    never appreciate any video like this before.

  • @SayanRay0rayzallnight
    @SayanRay0rayzallnight 2 роки тому +1

    Thanks a lot, great explanation. A question - what if you have multiple categorical variables in your dataset? Do you have to use an embedded layer for each variable? Also, if your target variable is also categorical, do you need to embed it too?

    • @DataTalks
      @DataTalks  2 роки тому +2

      If your target is categorical you'll need to change the output function most likely to categorical cross entropy! If you've got multiple categorical variables you'll need to embed them separately :)

    • @SayanRay0rayzallnight
      @SayanRay0rayzallnight 2 роки тому

      @@DataTalks thanks!

    • @anticopss
      @anticopss 2 роки тому

      @@DataTalks Hi! and thank you for your great video :) If I correctly understood your comment, you mean that we would have to have multiple cat_inputs, one for each category? So basically, supposing that your categorical 5 variables would be different, we would have to perform the embedding 5 times, each with input_length = 1?
      Thanks in advance for your answer!

    • @DataTalks
      @DataTalks  2 роки тому +1

      @@anticopss that is exactly right!

    • @anticopss
      @anticopss 2 роки тому

      @@DataTalks thank you very much!