Thanks Rachit for creating the series. Most of the institutes don't really explain this stuff properly and they don't even care about the problem of data leakage. They always begin first fill the missing values on the entire dataset, do outlier detection etc. And then jump onto the train_test_split(). It's really good that you are emphasizing on the data leakage problem. Please create videos on machine learning algorithms with hyperparameter tuning also.
Thank you sir for such an amazing content. First of all i understand when to use ordinal and label encoder but my didn't understand how you implemented the ordinal encoder. can you please enlighten me more on it sir.
Hi Rachit thank you for explanation. Is it possible to apply fit_transform for outlier removal as well. How can we remove outlier from Xtrain and xtest ?
Hey Rachit, thanks for this tutorial. Suppose I had a whole dataset of ordinal likert scale data and I tried to predict a ordinal response variable, what type of model would I use?
Hi Anthony, So if the data is on a likert scale I assume there would be some natural order to it, so you could try and enumerate them instead of creating dummy variables. As for the ordinal target, see this paper: www.scinapse.io/papers/2103459159 here they are predicting wine quality from 1-10. So they've considered a regression problem initially and then rounded off the float to an integer based on some criteria. The paper mentions this criteria. Hope it helps!
Amazing video Rachit. Is it possible to apply label encoder and ordinal in the same dataset for different columns? If yes could you please suggest how? Thanks in advance
Thank you Riya! Well, if you're doing the encoding on a standalone basis (without a pipeline i.e.) then you'd have to make a note of the columns beforehand in a variable (say, cols) and then do a pd.DataFrame(output, columns = cols)
It would lead to "data leakage" where you already have a glimpse of the test data before checking your model on it. We need to keep the test data unseen until we do the predictions. (Self promotion incoming - I explain it here : ua-cam.com/video/Tui5ajW3JF8/v-deo.html ) :p
Thank you Rachit! Your videos have helped me to clear the confusion between the different encoders in sci-kit learn and when to use each one.
You're welcome, Pruthvi! I'm glad I could help! :)
Thank you! This video really helped. It's the first time I use Ordinalencoder.
Regards from Brazil.
Thanks Aramis! I'm glad it helped!
Thank you for this video, I've been trying to figure out how to order ordinal encoder!
Thanks Rachit for creating the series. Most of the institutes don't really explain this stuff properly and they don't even care about the problem of data leakage.
They always begin first fill the missing values on the entire dataset, do outlier detection etc. And then jump onto the train_test_split().
It's really good that you are emphasizing on the data leakage problem.
Please create videos on machine learning algorithms with hyperparameter tuning also.
Thanks! I do have one on hyperparameter tuning : ua-cam.com/video/KzIQ3G_TEFg/v-deo.html
@@rachittoshniwal Thanks for the video on hyperparameter tuning.
Nicely explained.. Thanks!!
Thanks ! Exactly what I was looking for.
I'm glad it helped, Hugo!
really useful compared to other keep it up
Thanks, I'm glad you liked it! :D
Nicely explained handling Ordinal Variable...thanks
You're welcome, Pankaj!
Thank you sir for such an amazing content. First of all i understand when to use ordinal and label encoder but my didn't understand how you implemented the ordinal encoder. can you please enlighten me more on it sir.
Hi Rachit thank you for explanation. Is it possible to apply fit_transform for outlier removal as well. How can we remove outlier from Xtrain and xtest ?
Hey Rachit, thanks for this tutorial. Suppose I had a whole dataset of ordinal likert scale data and I tried to predict a ordinal response variable, what type of model would I use?
Hi Anthony,
So if the data is on a likert scale I assume there would be some natural order to it, so you could try and enumerate them instead of creating dummy variables. As for the ordinal target, see this paper: www.scinapse.io/papers/2103459159
here they are predicting wine quality from 1-10. So they've considered a regression problem initially and then rounded off the float to an integer based on some criteria. The paper mentions this criteria.
Hope it helps!
Amazing video Rachit. Is it possible to apply label encoder and ordinal in the same dataset for different columns? If yes could you please suggest how?
Thanks in advance
Thanks Swati! You can use a column transformer to apply different transformations to different columns
great explanation thank's
I'm glad it helped, thanks!
How to reflect the new encoded values inside the dataframe?
Reflect as in?
Thank you bro!!!
Thank you for the amazing content! Can you also please show how to rename the resulting encoded column?
Thank you Riya! Well, if you're doing the encoding on a standalone basis (without a pipeline i.e.) then you'd have to make a note of the columns beforehand in a variable (say, cols) and then do a pd.DataFrame(output, columns = cols)
Why we should not do data preprocessing on the entire dataset at once?
It would lead to "data leakage" where you already have a glimpse of the test data before checking your model on it. We need to keep the test data unseen until we do the predictions. (Self promotion incoming - I explain it here : ua-cam.com/video/Tui5ajW3JF8/v-deo.html ) :p
Thanks mate!
nice topics
I'm glad you found it useful! :)
Thank you :) !!
bingung gua anj