(Part 1) Using Column Transformer for making Machine Learning workflow easy | Machine Learning
Вставка
- Опубліковано 10 лют 2025
- In this tutorial, we'll look at Column Transformer, a powerful data pre-processing technique for making machine learning workflow super simple.
Column Transformers can be used in conjunction with Pipelines and GridSearchCV to further let the model itself pick best parameters for the best working model performance.
In the tutorial, we'll be going through all the nitty-gritties of Column Transformer, and discuss when, how, where to use them.
I've uploaded all the relevant code and datasets used here (and all other tutorials for that matter) on my github page which is accessible here:
Link:
github.com/rac...
If you like my content, please do not forget to upvote this video and subscribe to my channel!
If you have any qualms regarding any of the content here, please feel free to comment below and I'll be happy to assist you in whatever capacity possible.
Thank you!
Just what I needed!
This tutorial is awesome!!
well explained!!!
Please Keep this work up,
I hope your channel will grow rapidly
This is amazing..please keep making videos..don't stop !
Haha! Thanks!
I have seen a lot of youtube channels which are very good and have many content but bro you channel conquers them all. Please do more videos on other fields of Machine Learning and Deep Learning. Thanks and my respect to you bro.
Thanks man! Appreciate it :)
This video helped me so much. Keep up the awesome work!
Thanks! I'm glad it helped!
how lovely :-) thank you so very much
very nicely explained in a very smooth manner.......thank you so much sir..
Thanks Amol! Sir mat bolo 😅
Great video bro...
Great content!
Thanks Nikita, I'm glad it helped! :)
I got immense and deep understanding of how I can make life easier with sklearn ColumnTransformer. Thank you so much for the video.
If you can kindly comment on how to get back the column names in original dataframe once encoding is done.
Hi, Dipanwita, I'm glad it helped!
Getting back the columns names is a little tricky, but possible nonetheless. Every column data can be extracted from the "transformers_" attribute of our 'ct' column transformer object (in the video)
in the 'ct', RobustScaler is the first t/f, hence the first 6 columns in the output are of RS. To extract those, we'd need to do something like:
a = ct.transformers_[0][2]
next is OneHotEncoder in the 'ct', hence to get those columns, we'd need to do:
b = ct.transformers_[1][1].get_feature_names( )
We're dropping the remaining columns, hence "a + list(b)" should give us the full list of columns in the correct order of output.
in our case, remainder was "drop", but if it was passthrough, those columns would be situated at the very last in the output df
To get them, we'd do:
c = ct.transformers_[2][2], and this 'c' now contains the index positions of the columns in the original dataframe which were passed through. In our case, it is index 3, hence df.columns[3] is the passthrough column, and we can append this column to the "a + list(b)" list (IFF remainder was "passthrough")
hope it helps!
Nice course man well done. Well explained everything thanks for such good content.
very good video
This is such a great video. I am just sad you did not end it with fitting a model and training after transforming as that is where I have problems. Is there another video of yours where you did that? I would really appreciate. Thank you
Thanks! I do have a couple of end to end project videos where I've fitted models after transforming. Hope they help!
great tutorial!
Thanks Martin! Appreciate that!
@@rachittoshniwal You are welcome. Do you have something on "feature importance"? If not a tutorial maybe some web page that you could recommend? I'd appreciate that very much.
@@martinbielke8301 I'll definitely make one on feature importances. But for now, you can have a look at these excellent links:
mljar.com/blog/feature-importance-in-random-forest/
machinelearningmastery.com/calculate-feature-importance-with-python/
Thank you Rachit for sharing such a great content. I am new to machine learning, can you do a video on "from applying ColumnTransformer on categorical values and then all the way to use them for Linear regression and other algorithms/models"
Hi, i have done a similar video here: ua-cam.com/video/wXQRLpDF-ms/v-deo.html
hope it helps!
@@rachittoshniwal Will check out, thank you very much!
Can we perform this feature engineering before train test split or is it mandatory to do it after train test split
sir coloumn transfer can we use oridinal encoding label encoding and one hot encodig can u please explain Thank you
Cant thank you enough for the knowledge imparted. Kudos !!! . A suggestion - Am looking at a variable which needs imputation before One Hot Encoding. Can i perform both the steps in a single code of column transformer or should there be multiple column transformers, which would later be combined using Pipeline functionality?? Please help
Thanks man! Appreciate that!
I cover exactly this in this video:
ua-cam.com/video/a6o9ies85eM/v-deo.html
Have a look , and hope it helps!
A question, why we are not using CT for hours_per_week ?
I just wanted to demonstrate how we can exclude some columns from the transformations and pass them unfiltered. No other reason really.
How can I get the names of the columns back? :"""""(
Please HELP!!!!!!!!!!!!!
Getting back the columns names is a little tricky, but possible nonetheless. Every column data can be extracted from the "transformers_" attribute of our 'ct' column transformer object (in the video)
in the 'ct', RobustScaler is the first t/f, hence the first 6 columns in the output are of RS. To extract those, we'd need to do something like:
a = ct.transformers_[0][2]
next is OneHotEncoder in the 'ct', hence to get those columns, we'd need to do:
b = ct.transformers_[1][1].get_feature_names( )
We're dropping the remaining columns, hence "a + list(b)" should give us the full list of columns in the correct order of output.
in our case, remainder was "drop", but if it was passthrough, those columns would be situated at the very last in the output df
To get them, we'd do:
c = ct.transformers_[2][2], and this 'c' now contains the index positions of the columns in the original dataframe which were passed through. In our case, it is index 3, hence df.columns[3] is the passthrough column, and we can append this column to the "a + list(b)" list (IFF remainder was "passthrough")
hope it helps!
any way to reach you..
Sure, here! www.linkedin.com/in/rachit-toshniwal
o=OneHotEncoder(drop=First) # this will drop 1 label from each Feature.
Yes it will