Encoding Categorical Data | Ordinal Encoding | Label Encoding

Поділитися
Вставка
  • Опубліковано 5 вер 2024

КОМЕНТАРІ • 98

  • @dishitvasoliya9033
    @dishitvasoliya9033 Рік тому +51

    I purchased a data science course with around 50k fees but even that they are not teaching this level education. You are such fabulous person.. 👍

    • @indra-zd9zu
      @indra-zd9zu Місяць тому +1

      50K pani main gaye chapak

    • @nrted3877
      @nrted3877 Місяць тому +1

      bhai 50k ka koi course khardta hai kya koi

  • @katadermaro
    @katadermaro 3 роки тому +27

    wow I was so confused about column transformer and why everyone is using that. I was so confused. People usually include that in the encoding videos without any explanation.
    You are the first person to explain it separately in your series. I am amazed. Thank you Nitish, I will remember you throughout my journey.

  • @KashifAli-ye1zh
    @KashifAli-ye1zh 14 днів тому +1

    best teacher on youtube respect to data science

  • @mridang2064
    @mridang2064 2 роки тому +8

    Never knew about Label encoder and Ordinal encoder, I used to apply label encoder on input features, thanks for this hidden insight Nitish Sir.

  • @paragvachhani4643
    @paragvachhani4643 Рік тому +2

    Sir kya bolo...just itna hi
    U r doing great job...with quality conceptual clearity...

  • @ajaykushwaha4233
    @ajaykushwaha4233 3 роки тому +3

    Best explanation ever 🙏🏻

  • @devilsworld7299
    @devilsworld7299 3 місяці тому +2

    one quick question sir we can do this isntead of these sklearn function this way we can arrange and give orders to our data and its fast too easy to understand instant output
    df.education[df['education'] == 'School'] = 0
    df.education[df['education'] == 'UG'] = 1
    df.education[df['education'] == 'PG'] = 2
    df.review[df['review'] == 'Poor'] = 0
    df.review[df['review'] == 'Average'] = 1
    df.review[df['review'] == 'Good'] = 2
    df.purchased[df['purchased'] == 'Yes'] = 1
    df.purchased[df['purchased'] == 'No'] = 0

    • @positivevibes2714
      @positivevibes2714 Місяць тому +1

      Instead of doing this you can use pandas Map function it'll do same thing

    • @numberandfacts6174
      @numberandfacts6174 6 днів тому +1

      But when so many categories then instead of this sklearn do fast and easily

  • @hamzayaseen9963
    @hamzayaseen9963 Місяць тому

    This is a great channel. I'm glad I found it. Thank you so much, Sir, for making this so simple.

  • @arhaanahmad3953
    @arhaanahmad3953 2 місяці тому

    Well explained. This really helped me to improve my understanding of ML. Thank you sir.

  • @sneharj2036
    @sneharj2036 2 роки тому +1

    Thanku so much for clearing concepts of encoding technique with example. Very helpful n informative video.

  • @fit_tubes_365
    @fit_tubes_365 13 днів тому

    Course Started : ML
    Lecture-01: 14/08/2024
    Lecture-02: 14/08/2024
    Lecture-03: 14/08/2024
    Lecture-04: 14/08/2024
    Lecture-05: 14/08/2024
    Lecture-06: 15/08/2024
    Lecture-07: 15/08/2024
    Lecture-08: 15/08/2024
    Lecture-09: 15/08/2024
    Lecture-10: 15/08/2024
    Lecture-11: 16/08/2024
    Lecture-12: 16/08/2024
    Lecture-13: 17/08/2024
    Lecture-14: 17/08/2024
    Lecture-15: 18/08/2024
    Lecture-16: 19/08/2024
    Lecture-17: 20/08/2024
    Lecture-18: 20/08/2024
    Lecture-19: 21/08/2024
    Lecture-20: 21/08/2024
    Lecture-21: 22/08/2024
    Lecture-22: 22/08/2024
    Lecture-23: 23/08/2024
    Lecture-24: 23/08/2024
    Lecture-25: 24/08/2024
    Lecture-26: 24/08/2024

  • @a_wise_person
    @a_wise_person Рік тому +2

    The way you teach is amazing sir , i was trying for months to learn ML , finally i am glad that i found you .

  • @alimuiz5328
    @alimuiz5328 Місяць тому

    Thank you for the great video, sir.
    I wanted to ask wouldn't it be better to encode the data before splitting it? This way we don't have to transform the train and test sets individually.

  • @muhammadtayyabtahirqureshi7186

    explicit and to-the-point 👍

  • @santanubag358
    @santanubag358 Рік тому

    You And Krish Naik Sir are the Brahma And Bishnu Of Data Science.

  • @osho_magic
    @osho_magic Рік тому

    M first time comment kar rha ,, Kosi channel p because info is really precious ,,, quality bole to Nitish sir

  • @yogeshsapkal2593
    @yogeshsapkal2593 2 роки тому +1

    sir hamane classes karake bhi hamko yeh concept nahi sikhaee...thank you sir

  • @lol-ki5pd
    @lol-ki5pd 2 місяці тому

    oe = OrdinalEncoder(categories=[['Poor','Average','Good'],['School','UG','PG']])
    when we have this already defined, so why we need to do oe.fit(X_train) I mean, how will it acutally help when all the calculation was done on oe in first line?

  • @Dsutradhar
    @Dsutradhar Рік тому

    I dont know why this channel is not famous

  • @marikhalid6474
    @marikhalid6474 Місяць тому +1

    you are great bro
    bestest video content

  • @siyays1868
    @siyays1868 2 роки тому

    Thanku so much for clearing encoding concepts. Very good explaination with example.

  • @saumyashah6622
    @saumyashah6622 3 роки тому +4

    "Whenever we are doing a project, instead of train_test_split, we should always do k-fold cross validation." Sir, is my thinking correct ?? If wrong, please rectify me.

  • @arpittrivedi6636
    @arpittrivedi6636 Рік тому

    Kabhi-2 main sochta hu agar aap nahi hote to hamara kya hota. Great explanation

  • @zkhan2023
    @zkhan2023 3 роки тому +2

    Sir, you are doing a great job

  • @_Mahesh-nh7xv
    @_Mahesh-nh7xv 3 місяці тому

    Best explanation ever

  • @geethanshr
    @geethanshr 3 місяці тому

    At 16:29 why didn't we convert our transformed numpy array to dataframe?

  • @debasissahoo7559
    @debasissahoo7559 10 місяців тому

    You are great efforts 👌 a appreciate you god bless ❤

  • @sumitb2015
    @sumitb2015 2 роки тому +1

    Excellent explanation 👍

  • @mohitkushwaha8974
    @mohitkushwaha8974 Рік тому

    Doubt
    1. Can't we use ordinal encoding and label encoding before X train and Xtest split????
    It would have been an easy task to do the encoding before its split.
    2. Cant we use replace function of pandas like replace yes and no to 1 and 0, and replace poor , avg and good to some value like 0, 1 2

  • @user-px7de6up2m
    @user-px7de6up2m 7 місяців тому

    sir plz make a video on high cardinality categorical value

  • @saakshidikshit
    @saakshidikshit 6 місяців тому

    Can somebody explain me what order should be followed while doing any ML Project. Like whether feature scaling should be applied first or encoding categorical data should be done etc. Would be extremely grateful if someone can please clarify. Thanx.

  • @Sumitrawat112
    @Sumitrawat112 Рік тому +1

    can we perfom label encoding and oridinal encoding before train test split

  • @heetbhatt4511
    @heetbhatt4511 Рік тому

    Thank you sir

  • @user-wk8fh2ub8b
    @user-wk8fh2ub8b 11 місяців тому

    You Are Really Great Sir

  • @sid_x_18
    @sid_x_18 9 місяців тому

    Why do we even do Label Encoding on target column . I mean that is essentially just 0s and 1s right ? So why we just can’t create dummies ? What’s the logic behind using Label Encoding here ?

  • @sandipansarkar9211
    @sandipansarkar9211 Рік тому

    finished watching and coding

  • @talkswithRishabh
    @talkswithRishabh 2 роки тому

    Too good content sir it is helping me alot

  • @HimanshuSharma-we5li
    @HimanshuSharma-we5li 2 роки тому +1

    It would be great if there is dataset link in aal the videos.

  • @aditirawat9841
    @aditirawat9841 2 роки тому

    recommend these tutorials to aspiring data scientist

  • @SACHINKUMAR-px8kq
    @SACHINKUMAR-px8kq Рік тому

    Thanks Sir for this Amazing Session

  • @evergreenonce5456
    @evergreenonce5456 2 роки тому

    11:18 *Encoding to Categorical Features*

  • @kushagalashravanthi-go3sg
    @kushagalashravanthi-go3sg Рік тому

    Super explanation sir❤

  • @chetanchavan647
    @chetanchavan647 Рік тому

    Best

  • @promitdutta3029
    @promitdutta3029 11 місяців тому

    why label encoding can't used to transform input columns ?

  • @user-qp9fj3vv8n
    @user-qp9fj3vv8n 7 місяців тому

    Hello sir, which lecture has the introduction to sk learn library?

  • @ParallelUniverse550
    @ParallelUniverse550 8 місяців тому

    In label encoding how would the object know whether to map 0 to 'NO ' and 1 to 'YES'. As we didnt specify.

  • @user-wj8my7hw9x
    @user-wj8my7hw9x 8 місяців тому

    Does it matter if the output column is ordinal or nominal before applying label encoding? How to do encoding of categorical feature column with high cardinality? Please help me

  • @manikantareddy298
    @manikantareddy298 Рік тому

    What if there are null values in education column and then how should we start the process?

  • @kingR-p6n
    @kingR-p6n 19 днів тому

    ValueError: Shape mismatch: if categories is an array, it has to be of shape (n_features,). Im getting this error after I run oe.fit(X_train) can any one help me to solve this problem

  • @narendraparmar1631
    @narendraparmar1631 8 місяців тому

    Great Content
    Thank You😀

  • @arman_shekh97
    @arman_shekh97 3 роки тому

    maine socha ajj video nhi ayegi but thank you

  • @arshad1781
    @arshad1781 3 роки тому

    zy sub samjh aey gia but need a video after Encoding us py Analysis kesi kry ge aur fine result ko kesi again male female or yes and no mi change kry gy, after 2 or 3 video bad uni video py practical project video bi bny, problem zy ha transform data ho gye ab usi py analysis kesi kry? final output kesi pta chly zy male ha?

  • @taruchitgoyal3735
    @taruchitgoyal3735 Рік тому

    Hello Sir,
    Thank you for the session. Can we extend concept of ordinal encoding on numeric column such as Age?
    Like in your dataset at 11.45, the values of column Age are: - 98, 16, 53, 69, 77.
    With more number of records we will have more number of distinct values under the column and at maximum we can have 100 values.
    Thus, if we classify the numeric values into categories will that not help to make our data analysis and ML model better?
    For example: We can have a category: Teenager for all Age values from 13 to 19., College students: 20 to 23, Young professionals: 24 to 30, Mid age: 31 to 65 and Senior citizen: 66 to 99. And then finally apply Ordinal encoding on these categories since now we will have order among the classified values.
    It would be very helpful sir to seek your views on the above.
    Thank you

  • @subhajitdey4483
    @subhajitdey4483 Рік тому

    Sir what will happen if the output is categorical data but nominal, should I apply Label Encoding there also...?? Actually I want to say that If the output data is categorical, may be that Nominal / Ordinal, in both of case should I apply Label Encoding....??
    Thank you for this video🙂

  • @piyushnirwan6298
    @piyushnirwan6298 3 роки тому +1

    don't we have to convert the array output in dataframe after transformation is done

  • @meenalpande
    @meenalpande Рік тому

    Nice explanation

  • @tusharkhatri5795
    @tusharkhatri5795 Рік тому

    I have one doubt during train test split we are fitting on training data while transforming both training and testing suppose this was standardization case then if we fit of train data we get mean and variance of that how can we transform test data using this train data mean and var . I just mean test data should be independent of train data there shouldnt be any type of relationship between them to prevent data leakage . So we must calculate seperate mean and variance for both train and test and fit tranform individually? Pls clarify

  • @akshatbhoir1072
    @akshatbhoir1072 Рік тому

    Sir if there are yes/no data in data then which encoding should be used?
    Please clear my this doubt

  • @ajitchaturvedi4052
    @ajitchaturvedi4052 Рік тому

    Please make one vedio on neural architecture search

  • @MuhammadJunaid-yr8jd
    @MuhammadJunaid-yr8jd Рік тому

    thank you so much

  • @maramreddysrikanth5464
    @maramreddysrikanth5464 10 місяців тому

    when ordinalencoding or onehotencoding done using coloumn transformer the output array columns index are changed i mean encoding done on 5th coloumn after tranformation it is appering to be 1st in array after transformation any solution

  • @kamilshaikh1602
    @kamilshaikh1602 2 роки тому

    what to do if the number of features are high (ordinal ones)? I have 40 such features

  • @yashjain6372
    @yashjain6372 Рік тому

    loved it

  • @satyampandey8650
    @satyampandey8650 3 роки тому

    Sir then which encoder we should apply on feature which are not ordinal

  • @tejaskamble8731
    @tejaskamble8731 7 місяців тому

    ❤🔥🔥

  • @Star-xk5jp
    @Star-xk5jp 8 місяців тому

    day2-date:10/1/24

  • @tarunchauhan2339
    @tarunchauhan2339 2 роки тому

    in ordinal encoding an error is raised: Shape mismatch: if categories is an array, it has to be of shape (n_features,)
    can any one resolve please

  • @ajaykushwaha-je6mw
    @ajaykushwaha-je6mw 3 роки тому

    I got he concept but all information are in array, do we need to convert them into DF and merge to proceed further ?

  • @darshedits1732
    @darshedits1732 10 місяців тому

    sir csv file are not download
    please help me urgent

  • @annyd3406
    @annyd3406 2 роки тому

    11 20 to 12 10 - why column transformer

  • @user-vh2pd7us9z
    @user-vh2pd7us9z Рік тому

    how to download dataset from your Github ,it is showing "raw file download" and not downloading
    please help anyone

  • @MRAgundli
    @MRAgundli 4 місяці тому

    done

  • @harshkondkar3193
    @harshkondkar3193 2 роки тому

    How to deal with the situation where there are unseen categories in the test data?

    • @rachitsingh4913
      @rachitsingh4913 Рік тому

      its always good to apply encoding without train test split .

    • @anjushac9307
      @anjushac9307 10 місяців тому

      The encoders have additional parameters that you can set to decide what to do incase unseen categories are encountered in the test data. You can check the documentation for more details

    • @harshkondkar3193
      @harshkondkar3193 10 місяців тому

      @@anjushac9307 will check the doc. Thanks!!

  • @aj_ai
    @aj_ai Рік тому +1

    👾👾👾

  • @monikrayu2546
    @monikrayu2546 Місяць тому

    bol sakte hai sir 3:02

  • @tradingbrothers1126
    @tradingbrothers1126 Рік тому

    kaggle pay nhi milra

  • @user-vh2pd7us9z
    @user-vh2pd7us9z Рік тому

    Please help anyone

  • @osho_magic
    @osho_magic Рік тому

    Jitni tareef ki Jae Kam h . ..

  • @tradingbrothers1126
    @tradingbrothers1126 Рік тому

    sir data set upload kar o

  • @harshmishra7774
    @harshmishra7774 2 роки тому

    Engg branch should be the example of ordinal data 🤣

  • @1981Praveer
    @1981Praveer 2 роки тому

    Q. If we have a big dataset. let's say Housing_price.csv(from Kaggle), then how would I know which column has ordinal data? is there any API to check? @CampusX #CampusX

  • @ajaykushwaha4233
    @ajaykushwaha4233 3 роки тому +4

    Best explanation ever 🙏🏻

  • @Ganeshjadhav2808
    @Ganeshjadhav2808 2 роки тому

    thank you sir