How do I encode categorical features using scikit-learn?

Поділитися
Вставка
  • Опубліковано 20 лип 2024

КОМЕНТАРІ • 452

  • @dataschool
    @dataschool  4 роки тому +21

    *Are you new to Machine Learning?* Watch my video series, "Introduction to Machine Learning in Python with scikit-learn": ua-cam.com/play/PL5-da3qGB5ICeMbQuqbbCOQWcS6OYBr5A.html

    • @arunjohn492
      @arunjohn492 4 роки тому

      Sir what about dummy variable trap , When we use Column Transformer ?

    • @dataschool
      @dataschool  3 роки тому

      Great question! See this video: ua-cam.com/video/NYtwyvyvDEk/v-deo.html

  • @terryhenyo9216
    @terryhenyo9216 4 роки тому +65

    The Legendary Data Science guy is back!

    • @dataschool
      @dataschool  4 роки тому +5

      Thank you for the warm welcome! 😄

  • @GoredGored
    @GoredGored 2 роки тому +21

    For beginners:
    When I tried to complete an ML project of say a simple model based on Logistic or Linear regression it used to take me about a month. As I was a beginner in Python, Pandas, SQL and the rest of it, I thought this will take me a long time to master and may be I am a late comer into this.
    But a year forward now and thanks to Data School, Sentdex, Krish naik, Statquest, Thinkful Webinar and more I am surprised that all I need is a day or less to complete these projects.
    Because of the meticulous analysis on Data School when I needed a deeper understanding that's where my gps leads me to.
    Thank you Data School.

  • @altunbikubra
    @altunbikubra 3 роки тому +5

    Your guideline does not only involves basic codes, but it actually involves very practical and useful functions. I want to sincerely thank you for your effort!

    • @dataschool
      @dataschool  3 роки тому

      Thanks very much for your kind words!

  • @hieungotrung5411
    @hieungotrung5411 4 роки тому +13

    OMG!!! I’ve just started ML in kaggle for the past few weeks. Theres a lot of information to absorb but you teach us in the most understandable way and yet up-to-date question why we should use scikit instead of using dummies. This video is extremely helpful and informative. Thank you alot!!! Guess I gonna spend the rest of the day to watch all of your videos

    • @dataschool
      @dataschool  4 роки тому +1

      Awesome! Glad to hear this was helpful to you 👍

  • @sandeep1026
    @sandeep1026 4 роки тому

    I feel fortunate that I stumbled across this video. Very well articulated. Slows down pace, so that folks can hear, understand and digest. Most videos I come across, seem to rush through the contet before one can digest. Thanks for taking time and sharing your knowledge

    • @dataschool
      @dataschool  3 роки тому

      Thanks very much for your kind words! 🙏

  • @amitsharma8337
    @amitsharma8337 4 роки тому

    THANK YOU for this tutorial! Was wandering around the web to solve unexpected errors that came by following, apparently, outdated tutorials. If I have landed up on this tutorial the very first time, it would have saved me around 4 hours of useless surfing. Thanks again

    • @dataschool
      @dataschool  4 роки тому

      That's awesome to hear... glad I could be of help! By the way, I'll be launching a full course covering these topics (and more)... sign up here to get notified when it launches: scikit-learn.tips

  • @Freethinker33
    @Freethinker33 Рік тому

    I was looking for clear explanation of Pipeline for a long time. You nailed it. Crystal clear explanation and understood by watching one time. Thank you.

  • @tald747
    @tald747 3 роки тому +1

    This is an excellent and simple explanation of this topic. I must say that you are a very talented in the way you teach! You choose your words in a way that emphasizes only the important and relevant staff. Thanks!!!

  • @420nyk
    @420nyk 2 роки тому +1

    Thanks, this helps a lot. Was scratching my head on pipeline and column transformer before this video.
    Also you got a very soothing voice and it helps to relax and really enjoy the learning.

  • @Anarchy977
    @Anarchy977 4 роки тому +1

    Fantastic tutorial! Great teacher, best Machine Learning teacher on youtube! Thank you!

  • @liquid_absabs1334
    @liquid_absabs1334 3 роки тому +6

    There is something about your explanations, that i just get it instantly. You deserve an award

    • @dataschool
      @dataschool  3 роки тому +1

      You are too kind, thank you!

    • @dataschool
      @dataschool  3 роки тому

      Yes, that is the role of the OneHotEncoder.

  • @artyb3115
    @artyb3115 4 роки тому

    Absolutely perfect and useful lessons! Thinking of becoming a patron member as I get a little more confident with ML

    • @dataschool
      @dataschool  4 роки тому +1

      That would be awesome, thank you so much! You can join here: www.patreon.com/dataschool

  • @jkore2554
    @jkore2554 3 роки тому +4

    Thank you for this tutorial. I was working with logistic regression this week and was trying to figure out how to one hot encode for a categorical variable with hundreds of categories. I was getting 100% accuracy and precision so something wasn’t right. I’m going to try the steps that you outlined in this tutorial. Thanks.

  • @fahadkhankhattak8339
    @fahadkhankhattak8339 2 роки тому

    thank you so much!!!!! it was very helpful. yours is the only channel i come running to for help whenever im stuck somewhere. rich conent!! keep sharing these wonderful thingss

  • @dhananjaykansal8097
    @dhananjaykansal8097 4 роки тому +2

    Nice to have u back sir. This session was so fruitful. Thanks a ton. Keep it up!

  • @sanaullahkhanhassanzai8432
    @sanaullahkhanhassanzai8432 4 роки тому +1

    Thank you very much and welcome back after a long time. You are as good as gets when it comes to Machine Learning. You have made me learn a lot. I cant wait for videos on deep learning. I hope you ll come up with deep learning soon. Thanks again

    • @dataschool
      @dataschool  4 роки тому

      Thanks very much for your kind words, and for your suggestion as well!

  • @sophiar5280
    @sophiar5280 4 роки тому

    Always love your step by step, clear lessons. Keep it coming.

  • @horoshuhin
    @horoshuhin 2 роки тому +1

    thank you Kevin, very thorough explanation. I'm glad I found your channel. I like the way you teach.

    • @dataschool
      @dataschool  2 роки тому +1

      Thank you so much! 🙏 That's great to hear!

  • @NoWhiteGullibility
    @NoWhiteGullibility 4 роки тому +3

    Perfect timing, was just searching on pipelines the other day.
    Would be great to follow-up by tacking on Gridsearch in this context.

    • @dataschool
      @dataschool  4 роки тому

      That's awesome to hear! I will definitely cover grid search of a pipeline at some point - thanks for the suggestion!

  • @nishantchaudhary7528
    @nishantchaudhary7528 2 роки тому +1

    That was really something amazingly explained, I was looking for all these topics to understand. I got it in one go.
    Thanks a ton.

  • @Mehrdadgpt
    @Mehrdadgpt 2 роки тому +1

    Thx kevin, one of best & simplest explanations of pipeline

  • @jatinshetty
    @jatinshetty 4 роки тому +2

    yo! Mind blown with the amount of things i learnt from this. Please keep at it!

    • @dataschool
      @dataschool  4 роки тому

      Thank you! You might like my scikit-learn tips: github.com/justmarkham/scikit-learn-tips

  • @christianiheanacho4976
    @christianiheanacho4976 4 роки тому +3

    You are a high quality TEACHER , thank you very much.

    • @dataschool
      @dataschool  4 роки тому

      You are very welcome! 😄

  • @chr1112
    @chr1112 3 роки тому +1

    you are the best tutor i have ever met , keep up the good work. Thank you

  • @georgeognyanov
    @georgeognyanov 3 роки тому +2

    God damn this video is good. I was struggling with column_transformer and pipelines till late last night. The options you suggest here are so much better and easier to understand for me. I am totally going through your "Introduction to Machine Learning in Python with scikit-learn" playlist soon. Thanks for putting this out!

    • @dataschool
      @dataschool  3 роки тому

      You're very welcome! If you want to go deeper into this topic, you may want to check out my course: courses.dataschool.io/building-an-effective-machine-learning-workflow-with-scikit-learn

  • @ayyappahemanth7134
    @ayyappahemanth7134 4 роки тому

    Oh my god! after so much of exhaustive waiting another video came, which is far more useful than others for me! I just love your videos, the content was really useful in my real life, most of the youtube channels they just take the ideal ones which I might not encounter in my whole life! please do these videos regularly!

    • @dataschool
      @dataschool  4 роки тому

      That is awesome to hear, thanks so much for your kind words! 🙏
      Actually, I publish a new Q&A video every month for Data School Insiders at the $5 level: www.patreon.com/dataschool

  • @rommeltito123
    @rommeltito123 4 роки тому +4

    Dayyyyuuummmm.......why did I not stumble upon ur videos earlier ????!!!!!!

  • @harshitarawat8941
    @harshitarawat8941 3 роки тому +1

    Man I love you. I just love you. I love your videos. I love the way you explain things. I love the pace of you videos. I love everything. Thank you.

    • @dataschool
      @dataschool  3 роки тому +1

      Thank you so much, Harshita! 🙏

  • @fet1612
    @fet1612 3 роки тому +6

    00:58
    1) It allows you to properly cross-validate a process rather than just a model. In other words, when you are doing cross-validation like cross_val_score, normally you just pass a model to it. Well, there are cases when that is not going to give you accurate results because you're doing the preprocessing outside of the cross-validation.
    So a pipeline, generally speaking, is useful because you can cross-validate a process that includes
    (a) *preprocessing* as well as
    (b) *model building*.

  • @salonisamant5410
    @salonisamant5410 3 роки тому +1

    Thank you for explaining the pipeline approach so well!

  • @harshalkulkarni511
    @harshalkulkarni511 4 роки тому +1

    Preprocessing with pipeline was complex topic to understand for me before watching this video. Thanks a lot for the video.

    • @dataschool
      @dataschool  4 роки тому

      You're very welcome! Glad it helped 👍

  • @aaqibsoomro5776
    @aaqibsoomro5776 4 роки тому +3

    You are a great teacher. Please make the tutorials or series for Data Visualization, In-Depth Data Analysis, and Cleaning, and Project Deployment, etc. Since after Learning Python and its libraries and ML, these are the next steps.

    • @dataschool
      @dataschool  4 роки тому

      I have many more tutorials! Many of them are listed here: www.dataschool.io/launch-your-data-science-career-with-python/

  • @aimenbaig6201
    @aimenbaig6201 3 роки тому

    i just discovered your channel and i gotta tell you , you got a permanent subscriber here!!! LOVE YOUR TEACHING STYLE!!!!!!!!!!!!!!!

  • @frankgiardina205
    @frankgiardina205 3 роки тому

    Excellent! I was using the pandas dummies and your explanation of why pipeline and ohe is a better solution solves all the problems. thanks again

  • @lovejazzbass
    @lovejazzbass 3 роки тому

    Kevin, it's 5:20am Winston-Salem time and I am digging this. I was very confused. Thank you so much.

  • @asimssheikh
    @asimssheikh 3 роки тому +1

    Impressive explanation, and logical approach to material presentation. You just got a new sub.

  • @quocanhhbui8271
    @quocanhhbui8271 2 роки тому +1

    My god I love your detailed solution. Even my 5yo sibling can understand it. Wonderful. Definitely worth a subscribe.

  • @adarshr30
    @adarshr30 4 роки тому +1

    After searching alot, i found this channel n i feel its best for me:)

  • @salakkal
    @salakkal 4 роки тому +1

    Really great that you did a video like this .
    It just helped me a lot and I am really thankful for it brother . Keep going .

  • @amitblizer4567
    @amitblizer4567 Рік тому +1

    Very clearly explained and helpful video - Thank you!

  • @PaulBillingtonFW
    @PaulBillingtonFW 11 місяців тому +1

    Thanks, for this clear and well paced tutorial.

    • @dataschool
      @dataschool  9 місяців тому

      Glad it was helpful!

  • @Steven-se5jd
    @Steven-se5jd 4 роки тому

    just want to say thank you. I am a beginner and you teach much better than my professor.

    • @dataschool
      @dataschool  4 роки тому

      Glad to hear I have been helpful! 🙏

  • @brandonbermudez9047
    @brandonbermudez9047 Рік тому +1

    Absolute goat bruh, really thankful for your content

  • @JainmiahSk
    @JainmiahSk 4 роки тому +2

    Sir, just before 5 minutes I visited our channel to ask you the same question where it was difficult for me to encode multivariables in kaggles house prediction using advanced regression dataset. Fortunately and surprisingly you posted same. Thank you so much.

    • @dataschool
      @dataschool  4 роки тому +1

      That's amazing! 🙌 I hope this video is helpful to you, and let me know if you have any questions!

    • @JainmiahSk
      @JainmiahSk 4 роки тому

      @@dataschool I have a problem with functions, I can't write custom functions in Python which is very important what to do sir?

    • @dataschool
      @dataschool  4 роки тому

      @@JainmiahSk You can definitely write custom functions in Python!

  • @jobihara
    @jobihara 2 роки тому

    Thankyou dataschool, it was not only helpful, it was great, enlightening and awesome.

    • @dataschool
      @dataschool  2 роки тому

      What a nice thing to say, thank you so much! 🙏

  • @Takk6
    @Takk6 4 роки тому

    You are by far the best data science teacher on youtube.
    Can you make a video on creating your own custom transformers using it to modify your data, then using that custom transformer in a ColumnTransformer and a Pipeline?

    • @dataschool
      @dataschool  4 роки тому +1

      Thanks for your suggestion! I'm working on a course that will likely cover that topic. Sign up here to get notified when it launches: scikit-learn.tips

  • @David-fr7ee
    @David-fr7ee 4 роки тому +2

    Great content, i am learning this in my college data science class. You did better than my professor!

    • @CE-vd2px
      @CE-vd2px 3 роки тому

      Are you undergrad or grad?

    • @dataschool
      @dataschool  3 роки тому

      Thank you! 🙏

  • @TheAdrianPardo
    @TheAdrianPardo 4 роки тому +8

    Thank you so much! You're the best! Please go over scaling when you have a chance :)
    Question: Is is ok to leave in all of the OneHotEncoded columns with this pipe approach? I believe you previously mentioned how it's best to drop one of the columns to prevent multicollinearity. Any way to do this within the pipe?

    • @dataschool
      @dataschool  4 роки тому +11

      You are so kind, thank you! 😊
      Yes, I plan to cover StandardScaler at some point.
      Yes, it is okay to leave in all of the one-hot encoded columns. However, the "drop" parameter for OneHotEncoder (new in scikit-learn 0.21) does allow you to drop one feature per category. Hope that helps!

    • @ramleo1461
      @ramleo1461 4 роки тому +1

      Even I had the same doubt... Thank you for clarifying 😊

  • @sowash2020
    @sowash2020 Рік тому +1

    You just gained another subscriber...this was super useful

  • @TheAstralftw
    @TheAstralftw 3 роки тому

    Finally someone explained me properly what is columns transformer and why we use pipeline. I would like you to put your course to udemy , then i ll buy it 100% .. maybe on average you will sell each course for less price, but trust me, you are explaining this so good, you can sell tens of thousands of courses in few months , ... or in the case you have this on udemy , please provide me with the link!

    • @dataschool
      @dataschool  3 роки тому +1

      Thanks for your kind words and your suggestion! I know that many students like Udemy courses, but my values as a course creator don't align with their business model, and so I'm not currently interested in publishing a course there. I prefer to offer courses directly to interested students. Thanks for understanding!

  • @sandeeppreetam
    @sandeeppreetam 4 роки тому +1

    Thank you good sir, this tutorial was better than many paid tutorials on Udemy. Blessed!

  • @trentjones6468
    @trentjones6468 4 роки тому +1

    Amazing video. You are an excellent instructor. Got yourself a new subscriber :)

  • @abdelkaderkaouane1944
    @abdelkaderkaouane1944 Рік тому

    Your explanation is very clear, thank you very much

  • @krishkonnect814
    @krishkonnect814 4 роки тому +1

    I just found solution to my problem after watching your video. Thanks a lot.

  • @victor-os9wq
    @victor-os9wq 2 роки тому +2

    Thanks for such a detailed tutorial. I am working on a similar problem where I have multiple categorical features. In my dataset, the categorical variables has more than 90 possible values, as a result I am having an additional 121 columns when i use the Get.dummy, but I actually want just four levels.
    Please kindly advise me.

  • @honprarules
    @honprarules 4 роки тому +1

    Amazing explanation, as always!

  • @gyanendergandhar
    @gyanendergandhar 2 роки тому +1

    Thanks alot for this tutorial Kevin. It really saved me😅

  • @davidfullstone
    @davidfullstone Рік тому

    Your videos are amazing and are really helping with the last module on my MSc. I know there is no need to encode Pclass as it is an ordinal variable that is already ordered and you explained that really clearly. I notice also that you explain well about use cases and processing with regard to onehotencoder vs ordinalencoder in other videos. For marking/best academic marking practices in my module would you recommend creating a onehotencoder for my nominal and a ordinalencoder for my ordered data then piping both through make_pipeline? Thank you in advance :-)

  • @kishanlal676
    @kishanlal676 4 роки тому +4

    Thank you for this amazing video. Please do some videos on feature selection and scaling techniques in python!

    • @dataschool
      @dataschool  4 роки тому +2

      I'm hoping to cover feature scaling in a future video, but I do have a video about feature selection: ua-cam.com/video/YaKMeAlHgqQ/v-deo.html
      Hope that helps!

  • @xinchenzou4558
    @xinchenzou4558 2 роки тому +1

    Thank you sir! You've really saved my life...

  • @prithasinha5378
    @prithasinha5378 4 роки тому

    This is a great video! Thank you. Will you be showing how to do parameter tuning with pipeline?

    • @dataschool
      @dataschool  3 роки тому

      Yes, I actually cover that in one of my courses: courses.dataschool.io/building-an-effective-machine-learning-workflow-with-scikit-learn

  • @SaunakDey
    @SaunakDey 3 роки тому +1

    awesome explanation!! Thanks a lot

  • @vincecarter7500
    @vincecarter7500 4 роки тому

    thanks a lot for helping everyone out,
    was just wondering if you will be uploading more videos in the future

    • @dataschool
      @dataschool  3 роки тому

      Yes! I just started posting again last week. Thanks for watching!

  • @ramleo1461
    @ramleo1461 4 роки тому +1

    Hi, this will be very helpful.. Thank you for making this video!!

    • @dataschool
      @dataschool  4 роки тому

      You are very welcome! 🙌

  • @barulli87
    @barulli87 4 роки тому +1

    MIND BLOWN!!!! CV FOR A PROCESS!!! NOICE ONE!!

  • @12345shipreck
    @12345shipreck 3 роки тому

    You are 100x better than my ML course teacher at uni. GG bro.

  • @lourdesmartinez1486
    @lourdesmartinez1486 3 роки тому +1

    Hi! First I would like to thank you for this awesome video! Super well explained, super clear and veeeery useful. Thanks a lot!
    I have a question, does it makes sens to encode n times (cv=n in your cross val) the data set? I mean... using a pipeline is great for test purposes as you explained, but I am not sure that is necessary to use the entire pipeline (including encoding) for the cross val...but maybe I am missing something.
    Could you clarify this point? Thanks it advance for your comments !

    • @dataschool
      @dataschool  3 роки тому

      Great question! Yes, it is critical that you cross-validate the entire pipeline (rather than just the model) so that the data preprocessing occurs within each fold of cross-validation. Doing the preprocessing prior to cross-validation can lead to data leakage, which means that your evaluation scores will be less reliable. This is a complex topic, but I hope that helps a bit!

  • @surfzion
    @surfzion 3 роки тому

    Extremely helpful, thank you so much !!!

  • @1stophchr
    @1stophchr 4 роки тому +1

    thank you very much, very clear video

  • @KVishya
    @KVishya 4 роки тому +1

    Hi Kevin, thank you so much for the wonderful explanation, could you also explain how to use GridSearch or RandomizedSearch along with Pipelines?

    • @dataschool
      @dataschool  4 роки тому

      Great suggestion! I'm working on a tutorial that will be published on UA-cam in late April. It will include that topic. Stay tuned!

  • @yeahzisue
    @yeahzisue 3 роки тому +1

    this is so helpful that I have to comment. great job. thanks a lot

  • @brendensong8000
    @brendensong8000 3 роки тому

    I love it! Amazing tips!

  • @hichamamchtkou7343
    @hichamamchtkou7343 4 роки тому +1

    Thank you very much, it 's very interesting and by the way, it is exactly what i need in my current ML project.

    • @dataschool
      @dataschool  4 роки тому +1

      That's great to hear! Good luck with your project 🙌

    • @hichamamchtkou7343
      @hichamamchtkou7343 4 роки тому +1

      @@dataschool thanks 👍

  • @Putinka1000
    @Putinka1000 4 роки тому +3

    Thank you for speaking slowly. It’s nice to listen to a non-English speaking person

  • @absar66
    @absar66 4 роки тому +1

    Great ! Great ! Great! tutorial..many thanks Kevin

  • @MohammadrezaMokhtari-qh2yg
    @MohammadrezaMokhtari-qh2yg 2 місяці тому

    amazing information. wow! thank you so much man.

  • @gisleberge4363
    @gisleberge4363 Рік тому

    Great example, educational.

  • @christianiheanacho4976
    @christianiheanacho4976 4 роки тому +1

    I am enriched by this teaching.

  • @oeb5542
    @oeb5542 4 роки тому

    Just another amazing video. 😄

    • @dataschool
      @dataschool  4 роки тому

      Thank you so much for your kind words! 😊

  • @eatbreathedatascience9593
    @eatbreathedatascience9593 2 роки тому +1

    This video is excellent.

  • @simonpruthi
    @simonpruthi 4 роки тому +1

    Thank you for your videos I simply love them!!!.. :) I have one ques ain't we need to drop one column after doing one hot encoding since there would dummy trap(for eg if there is two categories only, both columns are providing same information). how can we drop that?

    • @dataschool
      @dataschool  4 роки тому +1

      Thanks for your kind words, and great question! You don't have to drop the first level of each categorical feature (since it's unlikely to impact model performance), but if you'd like to do it, you can set drop='first' for OneHotEncoder to accomplish this. Hope that helps!

  • @abdoulayebalde2139
    @abdoulayebalde2139 4 роки тому +1

    A very nice video that save my life I can see it is well explained keep uploading

  • @pivotai525
    @pivotai525 2 роки тому

    Simply the best!!

  • @krishnaprasadbhat851
    @krishnaprasadbhat851 3 роки тому

    mkayyyyy, awesome tutorial!!!

  • @adityakharwade9501
    @adityakharwade9501 3 роки тому

    Awesome video and thank you for this explanation!!! I have one request could you please make video on PCA

    • @dataschool
      @dataschool  3 роки тому

      Thanks for your suggestion!

  • @cogcog312
    @cogcog312 4 роки тому +1

    Just excellent. Thanks! I am very new to data science so please bear with me. Question - "For a dataset that has several categorical features each column with a lot of different values (say each categorical column has 100 different values as opposed to just 2 for Gender - male or female), after using onehotencoder to convert them to unordered numerical values, the number of table columns increases astronomically. Then you run the model and say one or more of the categorical features are amongst the most useful, how do you reverse or convert back these encoded features to know which categorical feature each represents?

    • @dataschool
      @dataschool  4 роки тому

      I'm not sure off-hand, sorry!

  • @Pqj613
    @Pqj613 Рік тому

    It's a good tutorial for some reasons that you will explain later.:D

  • @salseid1033
    @salseid1033 4 роки тому

    Your tutorial is informative as always. May you prepare a tutorial how to interprete model. Like 'Black Box' interpretation in RF. Thank you.

    • @dataschool
      @dataschool  4 роки тому +1

      Thanks for your suggestion! I'll consider it for the future!

  • @jamalhasanzakarneh9837
    @jamalhasanzakarneh9837 3 роки тому

    Thank you for your very helpful videos. I have question on this video: Why the way you assigned data to x is different than the way used for y?

    • @dataschool
      @dataschool  3 роки тому

      X needs to be a 2-dimensional object, and y needs to be a 1-dimensional object. Does that help?

  • @zohrehvahdati787
    @zohrehvahdati787 4 роки тому

    Thank you so much.😍😍🙏🙏👍👍 It helped me a lot.

  • @anthonyhan6825
    @anthonyhan6825 3 роки тому

    Awesome job!

  • @modhua4497
    @modhua4497 9 місяців тому

    Thanks Kevin, do you have any video example that shows how to incorporate a self defined function in pandas pipeline?

  • @nguyenminhoan7882
    @nguyenminhoan7882 4 роки тому

    thanks you, waiting for more tutorials :3

    • @dataschool
      @dataschool  4 роки тому

      You're very welcome! I will do my best to publish more!

  • @jorgecruz4839
    @jorgecruz4839 3 роки тому

    Thank you so much for this video. It really helped me a lot. I do have a question about this process. In my case one of the columns of my out of sample data has more categories than my in sample data (basically, I have the opposite scenario as the one you mentioned at 26:19). Would this process work in my case?

    • @dataschool
      @dataschool  3 роки тому +1

      Yes, you just have to modify the default parameters of OneHotEncoder to handle unknown categories. See this video for details: ua-cam.com/video/bA6mYC1a_Eg/v-deo.html

  • @sihlengena5022
    @sihlengena5022 3 роки тому

    Simply the best.

  • @IgnitedMountain
    @IgnitedMountain 2 роки тому

    Hello, in the last example. How is the NAN values handled. Are they removed by one of the methods or do you have to remove them by yourself?

  • @jyotis2903
    @jyotis2903 2 роки тому +1

    Thanks for such a detailed tutorial. I am working on a similar problem where I have multiple categorical features. In my dataset some of the categorical variables have more than 10 possible values, as a result my 13 features are getting converted into 74. I am fairly new to this and this is a bit confusing for me cause 74 features seems too much. Could you please share your expertise ? If what I am doing is right ? , or I need to look for another way to encode the feature?

    • @dataschool
      @dataschool  2 роки тому +1

      74 features is not necessarily too much! You can have thousands of features (or more) and still have an effective model!

  • @nowhere5111
    @nowhere5111 3 роки тому

    This video helps a lot👍👍👍

  • @AjayVerma-xi2us
    @AjayVerma-xi2us 4 роки тому +1

    Very good, it cleared my many doubts