Thanks Rachit for creating the series. Most of the institutes don't really explain this stuff properly and they don't even care about the problem of data leakage. They always begin first fill the missing values on the entire dataset, do outlier detection etc. And then jump onto the train_test_split(). It's really good that you are emphasizing on the data leakage problem. Please create videos on machine learning algorithms with hyperparameter tuning also.
Amazing video Rachit. Is it possible to apply label encoder and ordinal in the same dataset for different columns? If yes could you please suggest how? Thanks in advance
Thank you sir for such an amazing content. First of all i understand when to use ordinal and label encoder but my didn't understand how you implemented the ordinal encoder. can you please enlighten me more on it sir.
Hi Rachit thank you for explanation. Is it possible to apply fit_transform for outlier removal as well. How can we remove outlier from Xtrain and xtest ?
Hey Rachit, thanks for this tutorial. Suppose I had a whole dataset of ordinal likert scale data and I tried to predict a ordinal response variable, what type of model would I use?
Hi Anthony, So if the data is on a likert scale I assume there would be some natural order to it, so you could try and enumerate them instead of creating dummy variables. As for the ordinal target, see this paper: www.scinapse.io/papers/2103459159 here they are predicting wine quality from 1-10. So they've considered a regression problem initially and then rounded off the float to an integer based on some criteria. The paper mentions this criteria. Hope it helps!
It would lead to "data leakage" where you already have a glimpse of the test data before checking your model on it. We need to keep the test data unseen until we do the predictions. (Self promotion incoming - I explain it here : ua-cam.com/video/Tui5ajW3JF8/v-deo.html ) :p
Thank you Riya! Well, if you're doing the encoding on a standalone basis (without a pipeline i.e.) then you'd have to make a note of the columns beforehand in a variable (say, cols) and then do a pd.DataFrame(output, columns = cols)
Thank you Rachit! Your videos have helped me to clear the confusion between the different encoders in sci-kit learn and when to use each one.
You're welcome, Pruthvi! I'm glad I could help! :)
Thank you for this video, I've been trying to figure out how to order ordinal encoder!
Thank you! This video really helped. It's the first time I use Ordinalencoder.
Regards from Brazil.
Thanks Aramis! I'm glad it helped!
Nicely explained.. Thanks!!
Thanks Rachit for creating the series. Most of the institutes don't really explain this stuff properly and they don't even care about the problem of data leakage.
They always begin first fill the missing values on the entire dataset, do outlier detection etc. And then jump onto the train_test_split().
It's really good that you are emphasizing on the data leakage problem.
Please create videos on machine learning algorithms with hyperparameter tuning also.
Thanks! I do have one on hyperparameter tuning : ua-cam.com/video/KzIQ3G_TEFg/v-deo.html
@@rachittoshniwal Thanks for the video on hyperparameter tuning.
Thanks ! Exactly what I was looking for.
I'm glad it helped, Hugo!
Nicely explained handling Ordinal Variable...thanks
You're welcome, Pankaj!
really useful compared to other keep it up
Thanks, I'm glad you liked it! :D
Amazing video Rachit. Is it possible to apply label encoder and ordinal in the same dataset for different columns? If yes could you please suggest how?
Thanks in advance
Thanks Swati! You can use a column transformer to apply different transformations to different columns
Thank you sir for such an amazing content. First of all i understand when to use ordinal and label encoder but my didn't understand how you implemented the ordinal encoder. can you please enlighten me more on it sir.
Hi Rachit thank you for explanation. Is it possible to apply fit_transform for outlier removal as well. How can we remove outlier from Xtrain and xtest ?
Hey Rachit, thanks for this tutorial. Suppose I had a whole dataset of ordinal likert scale data and I tried to predict a ordinal response variable, what type of model would I use?
Hi Anthony,
So if the data is on a likert scale I assume there would be some natural order to it, so you could try and enumerate them instead of creating dummy variables. As for the ordinal target, see this paper: www.scinapse.io/papers/2103459159
here they are predicting wine quality from 1-10. So they've considered a regression problem initially and then rounded off the float to an integer based on some criteria. The paper mentions this criteria.
Hope it helps!
Thank you bro!!!
Thanks mate!
Thank you :) !!
Why we should not do data preprocessing on the entire dataset at once?
It would lead to "data leakage" where you already have a glimpse of the test data before checking your model on it. We need to keep the test data unseen until we do the predictions. (Self promotion incoming - I explain it here : ua-cam.com/video/Tui5ajW3JF8/v-deo.html ) :p
nice topics
I'm glad you found it useful! :)
Thank you for the amazing content! Can you also please show how to rename the resulting encoded column?
Thank you Riya! Well, if you're doing the encoding on a standalone basis (without a pipeline i.e.) then you'd have to make a note of the columns beforehand in a variable (say, cols) and then do a pd.DataFrame(output, columns = cols)
How to reflect the new encoded values inside the dataframe?
Reflect as in?
great explanation thank's
I'm glad it helped, thanks!
bingung gua anj