@@sarmadali5110 if we use fit on test data then the model will also learn the test data and so it will overfit the test data and we will not be able to find that our model is good enough or not on unseen data
gender is a nominal data still if i treat it like ordinal data, i will get a column of '0' and '1'. But suppose, i divide 'gender' column into two columns using ohe, but since both columns are dependent on each other, I drop one column. So, now for 'gender' I have a single column with zero and one as two types of entries, indirectly I have treated nominal data like ordinal data, end result in both cases is same.
in this car selling price data set , i think brand , fuel type and also owner columns consist of ordinal categorical data .one might not consider fuel type to be ordinal but if we practically think then fuel type also effects the selling price of car, example , electric cars are comparitively expensive , and there is some similar trend with other fuel types as well depending on the company
@@campusx-official what if we change categories with less frequency(some threshold) to 'others' category and commit this change for our dataframe, initially. And then we can use OHE from scikitlearn?
Gazab, maja aa gaya... Awesome content sir. Wish yeh channel mujhe pehle pata chta, I could have done some wonder. Thanks,,,This content is much much better than paid course
sir can't we put owner into ordinal categorical data and use ordinal encoding because we can give it priority wise as first owner first priority,second owner second prioroty and third owner last priority
Hi sir, quick doubt. Let's say i did OHE on training data (Jan -April data), did train test validation split and all worked welI. Now, took another data (June-Aug) for predicting using trained algorithm above but the problem is i don't have same no of categories for a column(OHE) and it threw error saying expected 30 cols but got only 25 cols. In these scenarios how should I approach. And also one more question on Missing data implementation, lets say in the same unknown data(June-Aug) one column has all NaN values but in train data we had data nd did Labelencoding. So in here what should I impute missing values of unknown data with.? @CampusX
sir what about the dropping of the first column in case of one hot encoding with top categories. There can be also a problem of multicoliinearity right?
Why he did TRAIN TEST SPLIT at 18:30 , before applying ONE-HOT-ENCODING, and after that he applied ONE HOT ENCODING only to XTRAIN , NOT XTEST THIS WILL BE A PROBLEM
sir in previous videos you said that we fit and transform on trained data so my question is that pd.get_dummies method which is used to apply onehotencoding is also applied on trained data? or just the dataframe
fun fact, I noticed you were speaking hindi in min 14 😂, still by just watching what you do was still helpful!! thanks! EDIT: you are swithing langauges LOL 😂 when I noticed the hindi in my mind I was like: mmmm weird I did not notice, then on min 18 you switched back to english, it drove me insane for no reason. Good video anyways! keep it up.
@@campusx-official No problem at all, of course I am missing some explanations, but my priority is to see the code, if you don't mind me asking, I have cathegorical features where some features have like 50 unique values and other feature have like 7000 unique values. I can't afford to just take in consideration 10 or 30 more frequent values, because I need to be able to predice any target based on those unique values, do you think it's ok to use one hot encoding for this?
TypeError Traceback (most recent call last) in ----> 1 ohe = OneHotEncoder(drop="first",sparse=False) 2 ohe.fit_transform(X_train[["fuel","owner"]]) TypeError: __init__() got an unexpected keyword argument 'drop' please help me to solve this issue.
xyz = np.hstack((car[['brand','km_driven']].values,car_train_new)) a = pd.DataFrame(xyz). Sir ek doubt hai, if Data set is small then this approach is good. How can we get column names and add to the data frame a so that we can see the transform data with column name.
You deserve billions of subscribers ....you are best teacher for me in the entire world
True❤❤ he deserves.
yes exactly
I don't have word for your appreciation, your teaching awesome, content awesome, explanation awesome. Thank you so much for such informative video.
OneHotEncoder(min_frequency = 100)
it will automatically detect infrequent categories and combine them into one.
thanks for you effort ♥
Course Started : ML
Lecture-01: 14/08/2024
Lecture-02: 14/08/2024
Lecture-03: 14/08/2024
Lecture-04: 14/08/2024
Lecture-05: 14/08/2024
Lecture-06: 15/08/2024
Lecture-07: 15/08/2024
Lecture-08: 15/08/2024
Lecture-09: 15/08/2024
Lecture-10: 15/08/2024
Lecture-11: 16/08/2024
Lecture-12: 16/08/2024
Lecture-13: 17/08/2024
Lecture-14: 17/08/2024
Lecture-15: 18/08/2024
Lecture-16: 19/08/2024
Lecture-17: 20/08/2024
Lecture-18: 20/08/2024
Lecture-19: 21/08/2024
Lecture-20: 21/08/2024
Lecture-21: 22/08/2024
Lecture-22: 22/08/2024
Lecture-23: 23/08/2024
Lecture-24: 23/08/2024
Lecture-25: 24/08/2024
Lecture-26: 24/08/2024
Lecture-27: 25/08/2024
Ekdum spoon feeding content hai, loved it. School ke baad pheli baar aise pedigree mili hai.
no doubt he is one of the best teacher if you want to learn ML🙌
Can someone explain that why we don't use fit in test and only use transform
@@sarmadali5110 if we use fit on test data then the model will also learn the test data and so it will overfit the test data and we will not be able to find that our model is good enough or not on unseen data
@@Alive-Ness thanks
Some one who got into deep in took students into depth also. Love you @CampusX
thank you Sir for the valuable content.
completed on 21st July 2024, 4:30PM.
gender is a nominal data still if i treat it like ordinal data, i will get a column of '0' and '1'. But suppose, i divide 'gender' column into two columns using ohe, but since both columns are dependent on each other, I drop one column. So, now for 'gender' I have a single column with zero and one as two types of entries, indirectly I have treated nominal data like ordinal data, end result in both cases is same.
Great series, learning so much , that too in hindi
Please machine learning ke baad deep learning, nlp, opencv ke bhi series lana
yes sir deep learning pay bhi banai videos
Wow.Amazing video. Wonderful explaination. Thanku so much .Campus X is really very good channel.
in this car selling price data set , i think brand , fuel type and also owner columns consist of ordinal categorical data
.one might not consider fuel type to be ordinal but if we practically think then fuel type also effects the selling price of car, example , electric cars are comparitively expensive , and there is some similar trend with other fuel types as well depending on the company
Bhai kia smart baat boli he apne
Apko pradhanmantri banadena chahiye
@@Hammadisteachingchemistry hmm chem walo mai itna dimag kaha hota hai
fuel should be ordinal because p >> d >> ......... in case of price .
This channel is a gold mine.
Sir, one suggestion for day 28, please include OHE on most frequent variables (using Scikit learn). Here you have done it using pandas
Can't be done using sklearn
@@campusx-official ok. Got it 👍
@@campusx-official what if we change categories with less frequency(some threshold) to 'others' category and commit this change for our dataframe, initially. And then we can use OHE from scikitlearn?
sir i have a doubt why we are using OHE for the owner column instead of using ordinal encoder?
Same question
Gazab, maja aa gaya... Awesome content sir. Wish yeh channel mujhe pehle pata chta, I could have done some wonder. Thanks,,,This content is much much better than paid course
your teaching style give me very important understanding
World's best teacher and channel for ML
you are amazing teacher thank you for making this video
Very simple. Very informative. Very clear.
Its so easy to understand , thank you brother 🥰🥰🥰
Sir I have seen lot of videos of data science related but i only good understand from your channel only ... thankyou very much sir
sir, the process which you did for the most frequent categories, can you please tell how to do the same in a pipeline?
In 22:03 only transform is used with ohe object . Why not xtest new = ohe. fit_transform is used??
not gonna lie . this is the first vid I am watching on ur channel and at 4:06 I subscribed u .
you add bramd and km_DRIVEN in X_TRAININ WHILE YOU DONOT ADD BRAND AND KM_DRIVEN IN TESTING
sir can't we put owner into ordinal categorical data and use ordinal encoding because we can give it priority wise as first owner first priority,second owner second prioroty and third owner last priority
I accept you are the best teach in the world if once i get chance to meet my life will really greatfull
Should i use one hot encoding for prediction dataset???
Amazing video, sir.
Just wanted to ask why did you not do OneHot Encoding before splitting the data?
your explanation is very nice , easy to understand ..........pls keep posting more videos related to Data
Best. If i say in a word. From Bangladesh
Hi sir, quick doubt. Let's say i did OHE on training data (Jan -April data), did train test validation split and all worked welI. Now, took another data (June-Aug) for predicting using trained algorithm above but the problem is i don't have same no of categories for a column(OHE) and it threw error saying expected 30 cols but got only 25 cols. In these scenarios how should I approach. And also one more question on Missing data implementation, lets say in the same unknown data(June-Aug) one column has all NaN values but in train data we had data nd did Labelencoding. So in here what should I impute missing values of unknown data with.? @CampusX
Great way of teaching. Keep it up
Pandas don't remember as in?? After converting into dummies and storing it in variable then use it whenever it requires. Cant we do that?
Best instructor !!
sir what about the dropping of the first column in case of one hot encoding with top categories. There can be also a problem of multicoliinearity right?
What an explaination!!! Superb..
thanks so much for this bro. love from across the border :)
Thank you for easy explanation😃
Bhai truly awesome explanation style, great videos!
thanks you said iss jugaad ki zaroorat ni padegi. bcoz i got confused there.
BTH i'm on your 100 dayML series
after using hstack ...how can v see our whole dataset
Thanks for Great content...
Why he did TRAIN TEST SPLIT at 18:30 , before applying ONE-HOT-ENCODING,
and after that he applied ONE HOT ENCODING only to XTRAIN , NOT XTEST
THIS WILL BE A PROBLEM
Amazing explanation..
Thank you so much
Excellent !!!
Thankyou So much Sir
Thanks sir 🙏🙏
Subtitles would be grand!
thanks sir , this video really helped me 💙💙
After performing one hot encoding for brands of car, we should remove the first column right?
finished watching and coding
For 32 different catagories of brands why we are not using ordinal encoding with the most number of brands as 0 and least number as 31
awesome as always
Sir, is it a good idea to use 'owner' as ordinal data?
values is used for ???
How do we get column names for encoded columns of FUEL and OWNER using OneHotEncoder(drop='first') way. The encoded columns comes as np arrays.
Can someone explain that why we don't use fit in test and only use transform
Great Explain !!!!
if i have dataset for predicting energy consumption of bikes, its output is not categorical, how can i convert that to numerical data?
---------------------------------------------------------------------------
TypeError Traceback (most recent call last)
in
----> 1 counts[counts
i am getting the same error, please let me know if you have corrected this code
Only one thing to say- 'itna knowledge laate kaha se ho?' 😂 i just knew one hot encoding but here I learnt so many other ways apart from this
Thanks sir
sir in previous videos you said that we fit and transform on trained data so my question is that pd.get_dummies method which is used to apply onehotencoding is also applied on trained data? or just the dataframe
fun fact, I noticed you were speaking hindi in min 14 😂, still by just watching what you do was still helpful!! thanks! EDIT: you are swithing langauges LOL 😂 when I noticed the hindi in my mind I was like: mmmm weird I did not notice, then on min 18 you switched back to english, it drove me insane for no reason. Good video anyways! keep it up.
Sorry😂 We Indians do this quite a bit. We also have a term for this... we call it Hinglish. Hope you dont mind.
@@campusx-official No problem at all, of course I am missing some explanations, but my priority is to see the code, if you don't mind me asking, I have cathegorical features where some features have like 50 unique values and other feature have like 7000 unique values. I can't afford to just take in consideration 10 or 30 more frequent values, because I need to be able to predice any target based on those unique values, do you think it's ok to use one hot encoding for this?
TypeError Traceback (most recent call last)
in
----> 1 ohe = OneHotEncoder(drop="first",sparse=False)
2 ohe.fit_transform(X_train[["fuel","owner"]])
TypeError: __init__() got an unexpected keyword argument 'drop'
please help me to solve this issue.
Awsome explanation bro ❤
Why are we not using this for Output?
hello sir, what if my target variable is also categorical feature(nominal), do i need to encode that as well before giving to ml model?
bro but why you did not remove the first column in the car name encoding in the last, this will increase the collinearity right?
sir do we remove 1 column in one hot encoding for linear algorithms only or while using all ML algorithms
Supereb Superb 👌🔥
❤🔥🔥
sir not able to downlaod file from github from last two videos plz help
done
❤❤❤
Jo output numeric me convert hokar aaya hai usko excel sheet me kaise layenge
I am getting the output as true false. Why sir? How I will get the output as 1 & 0
Please sir help me 🙏
Sir can you please provide me this note or file
please share dataset link..not able to download it from github
day2-date:10/1/24
While train test split why 'x' is capital and 'y'is small?
convention..you can use any
because y is column vector(one column only) ,and x has multiple columns,just to signify this thing X is capital and y is small
1:42 according to feminists, male-female encoding shall be done with ordinal encoding with females being 1.
thankss bhai roc and auc ka kr do plz
xyz = np.hstack((car[['brand','km_driven']].values,car_train_new))
a = pd.DataFrame(xyz).
Sir ek doubt hai, if Data set is small then this approach is good. How can we get column names and add to the data frame a so that we can see the transform data with column name.
mera to true false a rha hai 0 and 1 ki jgah
Sir mere columns abhi bhi True False aa rahe hai
Pass dtype=int parameter while using function
Man, I have no idea what he's saying I'm just stealing the code
You deserve billions of subscribers ....you are best teacher for me in the entire world