(Part 1) Using Column Transformer for making Machine Learning workflow easy | Machine Learning

Поділитися
Вставка
  • Опубліковано 10 лют 2025
  • In this tutorial, we'll look at Column Transformer, a powerful data pre-processing technique for making machine learning workflow super simple.
    Column Transformers can be used in conjunction with Pipelines and GridSearchCV to further let the model itself pick best parameters for the best working model performance.
    In the tutorial, we'll be going through all the nitty-gritties of Column Transformer, and discuss when, how, where to use them.
    I've uploaded all the relevant code and datasets used here (and all other tutorials for that matter) on my github page which is accessible here:
    Link:
    github.com/rac...
    If you like my content, please do not forget to upvote this video and subscribe to my channel!
    If you have any qualms regarding any of the content here, please feel free to comment below and I'll be happy to assist you in whatever capacity possible.
    Thank you!

КОМЕНТАРІ • 40

  • @imdadood5705
    @imdadood5705 3 роки тому

    Just what I needed!

  • @skyrayzor3693
    @skyrayzor3693 Рік тому

    This tutorial is awesome!!

  • @deepanshumahour3318
    @deepanshumahour3318 3 роки тому +1

    well explained!!!
    Please Keep this work up,
    I hope your channel will grow rapidly

  • @KumarHemjeet
    @KumarHemjeet 3 роки тому

    This is amazing..please keep making videos..don't stop !

  • @owusubright1046
    @owusubright1046 3 роки тому

    I have seen a lot of youtube channels which are very good and have many content but bro you channel conquers them all. Please do more videos on other fields of Machine Learning and Deep Learning. Thanks and my respect to you bro.

  • @slimmoses3376
    @slimmoses3376 3 роки тому

    This video helped me so much. Keep up the awesome work!

  • @DeenQuery
    @DeenQuery 3 роки тому

    how lovely :-) thank you so very much

  • @amolkabugade3728
    @amolkabugade3728 3 роки тому

    very nicely explained in a very smooth manner.......thank you so much sir..

  • @ShubhamKumar-xy6kj
    @ShubhamKumar-xy6kj Рік тому

    Great video bro...

  • @nikitanaidu1651
    @nikitanaidu1651 3 роки тому

    Great content!

  • @dipanwitasarkar5185
    @dipanwitasarkar5185 4 роки тому +1

    I got immense and deep understanding of how I can make life easier with sklearn ColumnTransformer. Thank you so much for the video.
    If you can kindly comment on how to get back the column names in original dataframe once encoding is done.

    • @rachittoshniwal
      @rachittoshniwal  4 роки тому +1

      Hi, Dipanwita, I'm glad it helped!
      Getting back the columns names is a little tricky, but possible nonetheless. Every column data can be extracted from the "transformers_" attribute of our 'ct' column transformer object (in the video)
      in the 'ct', RobustScaler is the first t/f, hence the first 6 columns in the output are of RS. To extract those, we'd need to do something like:
      a = ct.transformers_[0][2]
      next is OneHotEncoder in the 'ct', hence to get those columns, we'd need to do:
      b = ct.transformers_[1][1].get_feature_names( )
      We're dropping the remaining columns, hence "a + list(b)" should give us the full list of columns in the correct order of output.
      in our case, remainder was "drop", but if it was passthrough, those columns would be situated at the very last in the output df
      To get them, we'd do:
      c = ct.transformers_[2][2], and this 'c' now contains the index positions of the columns in the original dataframe which were passed through. In our case, it is index 3, hence df.columns[3] is the passthrough column, and we can append this column to the "a + list(b)" list (IFF remainder was "passthrough")
      hope it helps!

  • @owusubright1046
    @owusubright1046 4 роки тому +1

    Nice course man well done. Well explained everything thanks for such good content.

  • @roshantonge1952
    @roshantonge1952 Рік тому

    very good video

  • @olatheog
    @olatheog Рік тому

    This is such a great video. I am just sad you did not end it with fitting a model and training after transforming as that is where I have problems. Is there another video of yours where you did that? I would really appreciate. Thank you

    • @rachittoshniwal
      @rachittoshniwal  Рік тому +1

      Thanks! I do have a couple of end to end project videos where I've fitted models after transforming. Hope they help!

  • @martinbielke8301
    @martinbielke8301 4 роки тому

    great tutorial!

    • @rachittoshniwal
      @rachittoshniwal  4 роки тому

      Thanks Martin! Appreciate that!

    • @martinbielke8301
      @martinbielke8301 4 роки тому

      @@rachittoshniwal You are welcome. Do you have something on "feature importance"? If not a tutorial maybe some web page that you could recommend? I'd appreciate that very much.

    • @rachittoshniwal
      @rachittoshniwal  4 роки тому

      @@martinbielke8301 I'll definitely make one on feature importances. But for now, you can have a look at these excellent links:
      mljar.com/blog/feature-importance-in-random-forest/
      machinelearningmastery.com/calculate-feature-importance-with-python/

  • @ashishsikarwar7578
    @ashishsikarwar7578 3 роки тому

    Thank you Rachit for sharing such a great content. I am new to machine learning, can you do a video on "from applying ColumnTransformer on categorical values and then all the way to use them for Linear regression and other algorithms/models"

    • @rachittoshniwal
      @rachittoshniwal  3 роки тому

      Hi, i have done a similar video here: ua-cam.com/video/wXQRLpDF-ms/v-deo.html
      hope it helps!

    • @ashishsikarwar7578
      @ashishsikarwar7578 3 роки тому +1

      @@rachittoshniwal Will check out, thank you very much!

  • @ankitlakshya450
    @ankitlakshya450 2 роки тому

    Can we perform this feature engineering before train test split or is it mandatory to do it after train test split

  • @JavedKhan-nr2oo
    @JavedKhan-nr2oo 3 роки тому

    sir coloumn transfer can we use oridinal encoding label encoding and one hot encodig can u please explain Thank you

  • @vish183
    @vish183 4 роки тому

    Cant thank you enough for the knowledge imparted. Kudos !!! . A suggestion - Am looking at a variable which needs imputation before One Hot Encoding. Can i perform both the steps in a single code of column transformer or should there be multiple column transformers, which would later be combined using Pipeline functionality?? Please help

    • @rachittoshniwal
      @rachittoshniwal  4 роки тому

      Thanks man! Appreciate that!
      I cover exactly this in this video:
      ua-cam.com/video/a6o9ies85eM/v-deo.html
      Have a look , and hope it helps!

  • @ajaykushwaha-je6mw
    @ajaykushwaha-je6mw 3 роки тому

    A question, why we are not using CT for hours_per_week ?

    • @rachittoshniwal
      @rachittoshniwal  3 роки тому

      I just wanted to demonstrate how we can exclude some columns from the transformations and pass them unfiltered. No other reason really.

  • @hudaali5708
    @hudaali5708 3 роки тому

    How can I get the names of the columns back? :"""""(
    Please HELP!!!!!!!!!!!!!

    • @rachittoshniwal
      @rachittoshniwal  3 роки тому +1

      Getting back the columns names is a little tricky, but possible nonetheless. Every column data can be extracted from the "transformers_" attribute of our 'ct' column transformer object (in the video)
      in the 'ct', RobustScaler is the first t/f, hence the first 6 columns in the output are of RS. To extract those, we'd need to do something like:
      a = ct.transformers_[0][2]
      next is OneHotEncoder in the 'ct', hence to get those columns, we'd need to do:
      b = ct.transformers_[1][1].get_feature_names( )
      We're dropping the remaining columns, hence "a + list(b)" should give us the full list of columns in the correct order of output.
      in our case, remainder was "drop", but if it was passthrough, those columns would be situated at the very last in the output df
      To get them, we'd do:
      c = ct.transformers_[2][2], and this 'c' now contains the index positions of the columns in the original dataframe which were passed through. In our case, it is index 3, hence df.columns[3] is the passthrough column, and we can append this column to the "a + list(b)" list (IFF remainder was "passthrough")
      hope it helps!

  • @amolkabugade3728
    @amolkabugade3728 3 роки тому

    any way to reach you..

    • @rachittoshniwal
      @rachittoshniwal  3 роки тому

      Sure, here! www.linkedin.com/in/rachit-toshniwal

  • @ajaykushwaha-je6mw
    @ajaykushwaha-je6mw 3 роки тому

    o=OneHotEncoder(drop=First) # this will drop 1 label from each Feature.