One Hot Encoding | Handling Categorical Data | Day 27 | 100 Days of Machine Learning

Поділитися
Вставка
  • Опубліковано 6 вер 2024

КОМЕНТАРІ • 115

  • @pankajbeldar9799
    @pankajbeldar9799 Рік тому +60

    You deserve billions of subscribers ....you are best teacher for me in the entire world

  • @ajaykushwaha-je6mw
    @ajaykushwaha-je6mw 3 роки тому +40

    I don't have word for your appreciation, your teaching awesome, content awesome, explanation awesome. Thank you so much for such informative video.

  • @RandomAIDude
    @RandomAIDude 25 днів тому +1

    OneHotEncoder(min_frequency = 100)
    it will automatically detect infrequent categories and combine them into one.
    thanks for you effort ♥

  • @fit_tubes_365
    @fit_tubes_365 13 днів тому +1

    Course Started : ML
    Lecture-01: 14/08/2024
    Lecture-02: 14/08/2024
    Lecture-03: 14/08/2024
    Lecture-04: 14/08/2024
    Lecture-05: 14/08/2024
    Lecture-06: 15/08/2024
    Lecture-07: 15/08/2024
    Lecture-08: 15/08/2024
    Lecture-09: 15/08/2024
    Lecture-10: 15/08/2024
    Lecture-11: 16/08/2024
    Lecture-12: 16/08/2024
    Lecture-13: 17/08/2024
    Lecture-14: 17/08/2024
    Lecture-15: 18/08/2024
    Lecture-16: 19/08/2024
    Lecture-17: 20/08/2024
    Lecture-18: 20/08/2024
    Lecture-19: 21/08/2024
    Lecture-20: 21/08/2024
    Lecture-21: 22/08/2024
    Lecture-22: 22/08/2024
    Lecture-23: 23/08/2024
    Lecture-24: 23/08/2024
    Lecture-25: 24/08/2024
    Lecture-26: 24/08/2024
    Lecture-27: 25/08/2024

  • @Garrick645
    @Garrick645 4 місяці тому +3

    Ekdum spoon feeding content hai, loved it. School ke baad pheli baar aise pedigree mili hai.

  • @Alive-Ness
    @Alive-Ness Рік тому +13

    no doubt he is one of the best teacher if you want to learn ML🙌

    • @sarmadali5110
      @sarmadali5110 Рік тому

      Can someone explain that why we don't use fit in test and only use transform

    • @Alive-Ness
      @Alive-Ness Рік тому +2

      @@sarmadali5110 if we use fit on test data then the model will also learn the test data and so it will overfit the test data and we will not be able to find that our model is good enough or not on unseen data

    • @hritikroshanmishra3630
      @hritikroshanmishra3630 Рік тому

      @@Alive-Ness thanks

  • @kumarmanishpradhan
    @kumarmanishpradhan 3 місяці тому +3

    Some one who got into deep in took students into depth also. Love you @CampusX

  • @hasanrants
    @hasanrants Місяць тому +1

    thank you Sir for the valuable content.
    completed on 21st July 2024, 4:30PM.

  • @11aniketkumar
    @11aniketkumar Рік тому +2

    gender is a nominal data still if i treat it like ordinal data, i will get a column of '0' and '1'. But suppose, i divide 'gender' column into two columns using ohe, but since both columns are dependent on each other, I drop one column. So, now for 'gender' I have a single column with zero and one as two types of entries, indirectly I have treated nominal data like ordinal data, end result in both cases is same.

  • @prakharagarwal9448
    @prakharagarwal9448 3 роки тому +7

    Great series, learning so much , that too in hindi
    Please machine learning ke baad deep learning, nlp, opencv ke bhi series lana

    • @zkhan2023
      @zkhan2023 3 роки тому +1

      yes sir deep learning pay bhi banai videos

  • @sneharj2036
    @sneharj2036 2 роки тому +2

    Wow.Amazing video. Wonderful explaination. Thanku so much .Campus X is really very good channel.

  • @kindaeasy9797
    @kindaeasy9797 8 місяців тому +2

    in this car selling price data set , i think brand , fuel type and also owner columns consist of ordinal categorical data
    .one might not consider fuel type to be ordinal but if we practically think then fuel type also effects the selling price of car, example , electric cars are comparitively expensive , and there is some similar trend with other fuel types as well depending on the company

  • @viral_fight0
    @viral_fight0 27 днів тому

    fuel should be ordinal because p >> d >> ......... in case of price .

  • @tathagatasharma
    @tathagatasharma Рік тому +1

    This channel is a gold mine.

  • @saumyashah6622
    @saumyashah6622 3 роки тому +5

    Sir, one suggestion for day 28, please include OHE on most frequent variables (using Scikit learn). Here you have done it using pandas

    • @campusx-official
      @campusx-official  3 роки тому +3

      Can't be done using sklearn

    • @saumyashah6622
      @saumyashah6622 3 роки тому +1

      @@campusx-official ok. Got it 👍

    • @arun5351
      @arun5351 3 роки тому +1

      @@campusx-official what if we change categories with less frequency(some threshold) to 'others' category and commit this change for our dataframe, initially. And then we can use OHE from scikitlearn?

  • @rivupangas2735
    @rivupangas2735 Рік тому +4

    sir i have a doubt why we are using OHE for the owner column instead of using ordinal encoder?

  • @subhashdixit5167
    @subhashdixit5167 Рік тому +1

    Gazab, maja aa gaya... Awesome content sir. Wish yeh channel mujhe pehle pata chta, I could have done some wonder. Thanks,,,This content is much much better than paid course

  • @user-sg8ld4lq3k
    @user-sg8ld4lq3k 2 місяці тому

    your teaching style give me very important understanding

  • @MuhammadAsif-hu3id
    @MuhammadAsif-hu3id 8 місяців тому

    World's best teacher and channel for ML

  • @mandarmore.9635
    @mandarmore.9635 Рік тому +1

    you are amazing teacher thank you for making this video

  • @mohankumar-cw5lw
    @mohankumar-cw5lw 2 роки тому +2

    Very simple. Very informative. Very clear.

  • @musiclover-xy8ii
    @musiclover-xy8ii Рік тому +2

    Its so easy to understand , thank you brother 🥰🥰🥰

  • @BlackRock_07
    @BlackRock_07 Рік тому

    Sir I have seen lot of videos of data science related but i only good understand from your channel only ... thankyou very much sir

  • @karankantyadav4400
    @karankantyadav4400 Рік тому +1

    sir, the process which you did for the most frequent categories, can you please tell how to do the same in a pipeline?

  • @purushottammitra1258
    @purushottammitra1258 3 роки тому +2

    In 22:03 only transform is used with ohe object . Why not xtest new = ohe. fit_transform is used??

  • @neerajtadhiyal3152
    @neerajtadhiyal3152 Рік тому +1

    not gonna lie . this is the first vid I am watching on ur channel and at 4:06 I subscribed u .

  • @HarshKumar-qy4im
    @HarshKumar-qy4im Місяць тому

    you add bramd and km_DRIVEN in X_TRAININ WHILE YOU DONOT ADD BRAND AND KM_DRIVEN IN TESTING

  • @noone0978
    @noone0978 3 місяці тому

    sir can't we put owner into ordinal categorical data and use ordinal encoding because we can give it priority wise as first owner first priority,second owner second prioroty and third owner last priority

  • @debasissahoo7559
    @debasissahoo7559 10 місяців тому

    I accept you are the best teach in the world if once i get chance to meet my life will really greatfull

  • @yesminani848
    @yesminani848 Місяць тому

    Should i use one hot encoding for prediction dataset???

  • @alimuiz5328
    @alimuiz5328 4 місяці тому

    Amazing video, sir.
    Just wanted to ask why did you not do OneHot Encoding before splitting the data?

  • @basavarajangadi2043
    @basavarajangadi2043 2 роки тому

    your explanation is very nice , easy to understand ..........pls keep posting more videos related to Data

  • @math_section
    @math_section 3 місяці тому

    Best. If i say in a word. From Bangladesh

  • @hanishche
    @hanishche Рік тому +2

    Hi sir, quick doubt. Let's say i did OHE on training data (Jan -April data), did train test validation split and all worked welI. Now, took another data (June-Aug) for predicting using trained algorithm above but the problem is i don't have same no of categories for a column(OHE) and it threw error saying expected 30 cols but got only 25 cols. In these scenarios how should I approach. And also one more question on Missing data implementation, lets say in the same unknown data(June-Aug) one column has all NaN values but in train data we had data nd did Labelencoding. So in here what should I impute missing values of unknown data with.? @CampusX

  • @shaktikantpatra
    @shaktikantpatra 9 місяців тому

    Great way of teaching. Keep it up

  • @jroamindia1754
    @jroamindia1754 8 місяців тому

    Pandas don't remember as in?? After converting into dummies and storing it in variable then use it whenever it requires. Cant we do that?

  • @ng2530
    @ng2530 Рік тому

    Best instructor !!

  • @barshabanik7212
    @barshabanik7212 2 роки тому +2

    sir what about the dropping of the first column in case of one hot encoding with top categories. There can be also a problem of multicoliinearity right?

  • @AnkurSingh-kj9wu
    @AnkurSingh-kj9wu Рік тому

    What an explaination!!! Superb..

  • @amarAK47khan
    @amarAK47khan Рік тому

    thanks so much for this bro. love from across the border :)

  • @Manishkumar-iw1cy
    @Manishkumar-iw1cy Рік тому

    Thank you for easy explanation😃

  • @rishabhkapoor5105
    @rishabhkapoor5105 Рік тому

    Bhai truly awesome explanation style, great videos!

  • @sameer9045
    @sameer9045 Рік тому

    thanks you said iss jugaad ki zaroorat ni padegi. bcoz i got confused there.
    BTH i'm on your 100 dayML series

  • @renurenu7629
    @renurenu7629 15 днів тому

    after using hstack ...how can v see our whole dataset

  • @PriyankaSingh-gm6we
    @PriyankaSingh-gm6we 2 роки тому +1

    Thanks for Great content...

  • @TusharMishra-bt1fn
    @TusharMishra-bt1fn 5 місяців тому

    Why he did TRAIN TEST SPLIT at 18:30 , before applying ONE-HOT-ENCODING,
    and after that he applied ONE HOT ENCODING only to XTRAIN , NOT XTEST
    THIS WILL BE A PROBLEM

  • @MohitSingh-jb9tb
    @MohitSingh-jb9tb 9 місяців тому

    Amazing explanation..

  • @bikashthapa8622
    @bikashthapa8622 8 місяців тому

    Thank you so much

  • @pavangoyal6840
    @pavangoyal6840 Рік тому

    Excellent !!!

  • @SACHINKUMAR-px8kq
    @SACHINKUMAR-px8kq Рік тому

    Thankyou So much Sir

  • @talkswithRishabh
    @talkswithRishabh 2 роки тому

    Thanks sir 🙏🙏

  • @Michael-yd9zc
    @Michael-yd9zc 11 місяців тому

    Subtitles would be grand!

  • @littlemeow1562
    @littlemeow1562 Рік тому

    thanks sir , this video really helped me 💙💙

  • @11aniketkumar
    @11aniketkumar Рік тому

    After performing one hot encoding for brands of car, we should remove the first column right?

  • @sandipansarkar9211
    @sandipansarkar9211 Рік тому

    finished watching and coding

  • @maddybuddy7013
    @maddybuddy7013 Рік тому

    For 32 different catagories of brands why we are not using ordinal encoding with the most number of brands as 0 and least number as 31

  • @yashjain6372
    @yashjain6372 Рік тому

    awesome as always

  • @acharjyaarijit
    @acharjyaarijit Рік тому +1

    Sir, is it a good idea to use 'owner' as ordinal data?

  • @AryanGuleria-kj1gt
    @AryanGuleria-kj1gt 7 місяців тому

    values is used for ???

  • @vinitpatidar5617
    @vinitpatidar5617 Рік тому

    How do we get column names for encoded columns of FUEL and OWNER using OneHotEncoder(drop='first') way. The encoded columns comes as np arrays.

  • @sarmadali5110
    @sarmadali5110 Рік тому

    Can someone explain that why we don't use fit in test and only use transform

  • @prashantmarathe6515
    @prashantmarathe6515 3 роки тому

    Great Explain !!!!

  • @awsstudentacademy8210
    @awsstudentacademy8210 Рік тому

    if i have dataset for predicting energy consumption of bikes, its output is not categorical, how can i convert that to numerical data?

  • @EmohGame
    @EmohGame 2 роки тому +1

    ---------------------------------------------------------------------------
    TypeError Traceback (most recent call last)
    in
    ----> 1 counts[counts

    • @beyondanalysis8915
      @beyondanalysis8915 2 роки тому +1

      i am getting the same error, please let me know if you have corrected this code

  • @sonal008
    @sonal008 Рік тому +1

    Only one thing to say- 'itna knowledge laate kaha se ho?' 😂 i just knew one hot encoding but here I learnt so many other ways apart from this

  • @zkhan2023
    @zkhan2023 3 роки тому

    Thanks sir

  • @kanzafatima173
    @kanzafatima173 11 місяців тому

    sir in previous videos you said that we fit and transform on trained data so my question is that pd.get_dummies method which is used to apply onehotencoding is also applied on trained data? or just the dataframe

  • @jorgesisco981
    @jorgesisco981 2 роки тому +1

    fun fact, I noticed you were speaking hindi in min 14 😂, still by just watching what you do was still helpful!! thanks! EDIT: you are swithing langauges LOL 😂 when I noticed the hindi in my mind I was like: mmmm weird I did not notice, then on min 18 you switched back to english, it drove me insane for no reason. Good video anyways! keep it up.

    • @campusx-official
      @campusx-official  2 роки тому +3

      Sorry😂 We Indians do this quite a bit. We also have a term for this... we call it Hinglish. Hope you dont mind.

    • @jorgesisco981
      @jorgesisco981 2 роки тому

      ​@@campusx-official No problem at all, of course I am missing some explanations, but my priority is to see the code, if you don't mind me asking, I have cathegorical features where some features have like 50 unique values and other feature have like 7000 unique values. I can't afford to just take in consideration 10 or 30 more frequent values, because I need to be able to predice any target based on those unique values, do you think it's ok to use one hot encoding for this?

  • @saurabhbarasiya4721
    @saurabhbarasiya4721 3 роки тому

    TypeError Traceback (most recent call last)
    in
    ----> 1 ohe = OneHotEncoder(drop="first",sparse=False)
    2 ohe.fit_transform(X_train[["fuel","owner"]])
    TypeError: __init__() got an unexpected keyword argument 'drop'
    please help me to solve this issue.

  • @krithwal1997
    @krithwal1997 2 роки тому

    Awsome explanation bro ❤

  • @ashoksuthar
    @ashoksuthar 2 роки тому

    Why are we not using this for Output?

  • @abhisheksharda459
    @abhisheksharda459 Рік тому

    hello sir, what if my target variable is also categorical feature(nominal), do i need to encode that as well before giving to ml model?

  • @mohmmedshahrukh8450
    @mohmmedshahrukh8450 Рік тому

    bro but why you did not remove the first column in the car name encoding in the last, this will increase the collinearity right?

  • @deveshtyagi2996
    @deveshtyagi2996 Рік тому

    sir do we remove 1 column in one hot encoding for linear algorithms only or while using all ML algorithms

  • @jiteshsingh6030
    @jiteshsingh6030 2 роки тому

    Supereb Superb 👌🔥

  • @tejaskamble8731
    @tejaskamble8731 7 місяців тому +1

    ❤🔥🔥

  • @highflyer30
    @highflyer30 Рік тому

    sir not able to downlaod file from github from last two videos plz help

  • @MRAgundli
    @MRAgundli 4 місяці тому

    done

  • @waqarjoiya2540
    @waqarjoiya2540 5 місяців тому

    ❤❤❤

  • @saumyamishra1148
    @saumyamishra1148 2 роки тому

    Jo output numeric me convert hokar aaya hai usko excel sheet me kaise layenge

  • @user-xr9wm8ft5z
    @user-xr9wm8ft5z 11 місяців тому

    I am getting the output as true false. Why sir? How I will get the output as 1 & 0

  • @adarsh_kumar_sharma_8638
    @adarsh_kumar_sharma_8638 2 місяці тому

    Sir can you please provide me this note or file

  • @Rider-jn6zh
    @Rider-jn6zh 2 роки тому

    please share dataset link..not able to download it from github

  • @Star-xk5jp
    @Star-xk5jp 8 місяців тому

    day2-date:10/1/24

  • @tanmayshinde7853
    @tanmayshinde7853 2 роки тому

    While train test split why 'x' is capital and 'y'is small?

    • @pritamdas4441
      @pritamdas4441 2 роки тому +1

      convention..you can use any

    • @siddhartharaja9413
      @siddhartharaja9413 2 роки тому +3

      because y is column vector(one column only) ,and x has multiple columns,just to signify this thing X is capital and y is small

  • @prata7143
    @prata7143 11 місяців тому

    1:42 according to feminists, male-female encoding shall be done with ordinal encoding with females being 1.

  • @chessfreak8813
    @chessfreak8813 2 роки тому

    thankss bhai roc and auc ka kr do plz

  • @ajaykushwaha-je6mw
    @ajaykushwaha-je6mw 2 роки тому

    xyz = np.hstack((car[['brand','km_driven']].values,car_train_new))
    a = pd.DataFrame(xyz).
    Sir ek doubt hai, if Data set is small then this approach is good. How can we get column names and add to the data frame a so that we can see the transform data with column name.

  • @MuhammadYahyaKhan-r2m
    @MuhammadYahyaKhan-r2m Місяць тому

    mera to true false a rha hai 0 and 1 ki jgah

  • @rahulpathak8415
    @rahulpathak8415 6 місяців тому

    Sir mere columns abhi bhi True False aa rahe hai

    • @aditya_yadav_01
      @aditya_yadav_01 3 місяці тому

      Pass dtype=int parameter while using function

  • @sisami2109
    @sisami2109 Рік тому

    Man, I have no idea what he's saying I'm just stealing the code

  • @JustPython
    @JustPython 11 місяців тому

  • @as8401
    @as8401 Рік тому

    You deserve billions of subscribers ....you are best teacher for me in the entire world