Tutorial 1- Feature Selection-How To Drop Constant Features Using Variance Threshold

Поділитися
Вставка
  • Опубліковано 3 жов 2024
  • In this video I am going to start a new playlist on Feature Selection and in this video we will be discussing about how we can drop constant features using Variance threshold
    github: github.com/kri...
    Feature Selection playlist: • Feature Selection
    All Playlist In My channel
    Complete ML Playlist : • Complete Machine Learn...
    Complete DL Playlist : • Complete Deep Learning
    Complete NLP Playlist: • Natural Language Proce...
    Docker End To End Implementation: • Docker End to End Impl...
    Live stream Playlist: • Pytorch
    Machine Learning Pipelines: • Docker End to End Impl...
    Pytorch Playlist: • Pytorch
    Feature Engineering : • Feature Engineering
    Live Projects : • Live Projects
    Kaggle competition : • Kaggle Competitions
    Mongodb with Python : • MongoDb with Python
    MySQL With Python : • MYSQL Database With Py...
    Deployment Architectures: • Deployment Architectur...
    Amazon sagemaker : • Amazon SageMaker
    Please donate if you want to support the channel through GPay UPID,
    Gpay: krishnaik06@okicici
    Discord Server Link: / discord
    Telegram link: t.me/joinchat/...
    Please join as a member in my channel to get additional benefits like materials in Data Science, live streaming for Members and many more
    / @krishnaik06
    Please do subscribe my other channel too
    / @krishnaikhindi
    Connect with me here:
    Twitter: / krishnaik06
    Facebook: / krishnaik06
    instagram: / krishnaik06
    #featureselection
    #dropconsstantfeatures

КОМЕНТАРІ • 85

  • @raneshmitra8156
    @raneshmitra8156 4 роки тому +8

    Kris please upload forthcoming videos on interview preparation for one algorithm per video format.... eagerly waiting for that....

  • @ehtishamkhan421
    @ehtishamkhan421 4 роки тому +4

    Keep it coming ✌ great work

  • @m0tivati0n71
    @m0tivati0n71 Рік тому

    Thank you, this was clearly explained and I like that it has practical examples (simple and complex)

  • @dinushachathuranga7657
    @dinushachathuranga7657 7 місяців тому

    Bunch of thanks for the clear explanation❤

  • @ajaykushwaha-je6mw
    @ajaykushwaha-je6mw 2 роки тому

    Hi Friend what krish has taught is very nice, there is one more way you can do it.
    !pip install klib
    import klib
    df2 = klib.data_cleaning(df1)

  • @ronylpatil
    @ronylpatil 3 роки тому

    Nice explanation sir.
    Please upload more videos as soon as possible and these is most awaited playlist.

  • @srinagabtechabs
    @srinagabtechabs 3 роки тому

    neatly arranged play list for all topics.. superb.. thank u for ur effort

  • @naveenvinayak1088
    @naveenvinayak1088 4 роки тому +1

    Krish really a nice information.. waiting for upcoming videos

  • @harshithbangera7905
    @harshithbangera7905 3 роки тому

    Thanks brother ...i was searching for this topic like anything

  • @rohanshetty6347
    @rohanshetty6347 2 роки тому

    you're doing a great job sir, thankyou

  • @surbhiagrawal3951
    @surbhiagrawal3951 3 роки тому +2

    X_train=X_train.iloc[:,var_thres.get_support(indices=True)] is also another way to drop all zero variance columns, this way list comprehension , loop and drop function will not be required.

  • @biswajitsahoo1542
    @biswajitsahoo1542 4 роки тому +2

    Hi Krish, Thanks a lot for such a nice tutorial on Feature selecction. One request from my side: Could you please create a tutorial for GIT-Hub. It would be highly helpful for us. Though we know very few things but there are many things to do which would be highly impactful.

  • @salihsartepe2614
    @salihsartepe2614 10 місяців тому

    Thanks Krish 😊

  • @soumodeepsen3448
    @soumodeepsen3448 2 роки тому

    I appreciate your tutorial. It is a very well structured and information rich tutorial, no doubt. I really liked all the explanations. I would like to add that 'data[data.columns[~var_thres.get_support()]]' gives the same result as the constant_columns :)

  • @anjuappachan1067
    @anjuappachan1067 4 роки тому +2

    Please make a video about model monitoring after deployment , concept drift and data drift.

  • @manishpandor2041
    @manishpandor2041 Рік тому

    2:56 little correction , it will remove x or any feature whose variance less is than x , but not only feature with x variance

  • @yogendrakishorverma2750
    @yogendrakishorverma2750 4 роки тому +1

    Sir pls make and upload videos on Transforms and BERT with practical implementation , waiting for it ............

  • @sudharsanb9391
    @sudharsanb9391 4 роки тому +2

    Super sir pls upload more videos like this

  • @iEntertainmentFunShorts
    @iEntertainmentFunShorts 4 роки тому +2

    First of all Great initiative again and thank you so much,🙌
    🙋My query is - it always we need to drop the constant column or zero variace column or it also domain or problem specific may be it a silly query if you have a time Sir please tell me I know you also have more burden of work😊😊

    • @manasacharya8944
      @manasacharya8944 4 роки тому

      A column with no variance essentially wont be giving any real insights to the models as it is same irrespective of the ground truth. So using it for training would be a waste of computation.

    • @rambaldotra2221
      @rambaldotra2221 3 роки тому

      I think based on the domain of the problem we are working on ,we can take some other threshold values and then all the features which have variance less than or equal to the threshold value of variance we can remove all of them.

  • @akanshabhandari1062
    @akanshabhandari1062 3 роки тому +1

    Upload more video on this data selection topics

  • @sreyasg1565
    @sreyasg1565 4 роки тому +1

    Hi Krish , request you to make videos on already designed hypothesis experiment like ks test

  • @ThisWeekinAnalytics
    @ThisWeekinAnalytics Рік тому

    This is a wonderful video, curious if there is one based on overfitting?

  • @Mathmagician73
    @Mathmagician73 4 роки тому +9

    Well explained sir , but sir i have one doubt , whenever we working with recommendation system like amazon , in which we have featuree like user rating , user id , prod id . ... in real world records in rating table is inserted dynamically , so how to train model dynamically so its gives best output .....u learned us recommendation system in which data is static but how to deal with dynamic dataset

    • @shansingh9858
      @shansingh9858 3 роки тому

      U have to build a entire pipeline for this...

  • @skvali3810
    @skvali3810 4 роки тому +2

    After you left the your job you just going like gold price in india and with quantum internet speed sir

  • @joveriarubaab4805
    @joveriarubaab4805 2 роки тому

    Very nice

  • @ishitachakraborty1362
    @ishitachakraborty1362 4 роки тому +2

    Awesome video

  • @rambaldotra2221
    @rambaldotra2221 3 роки тому

    Awesome Thanks Sir.

  • @dhristovaddx
    @dhristovaddx 4 роки тому +5

    Hi, Krish! Do you need to normalize the data before doing Variance Threshold, so that the features would be on the same scale? Also, how do you pick the most appropriate threshold?

    • @krishnaik06
      @krishnaik06  4 роки тому +3

      Threshold is for specific variance... normalisation will not get impacted with zero variance which is for constant features...but for the other scenarios u have to performs normalisation to find quasi constnt features..like u can the threshold as 0.01

    • @dhristovaddx
      @dhristovaddx 4 роки тому

      @@krishnaik06Thank you so much! I understand. :)

    • @Mathmagician73
      @Mathmagician73 4 роки тому

      @@krishnaik06 sir pls answer my question , how to deal with dynamic dataset in recommendation system ....whether i have to trained model again and again or other approach pls answer sir🙏

    • @saidurgakameshkota1246
      @saidurgakameshkota1246 4 роки тому

      @@krishnaik06 sir how to select the correct threshold. Could you provide some resources

  • @sachinborgave8094
    @sachinborgave8094 4 роки тому +1

    Krish, please upload a video on Advanced cnn

  • @hibaabdalghafgar
    @hibaabdalghafgar Рік тому

    very well explaind am grateful, but how to apply the transform for the test data set
    ?

  • @nayansarma840
    @nayansarma840 3 роки тому

    Nice Video Kris. One query, if we choose the threshold bigger than 0, say 0.10 should we not normalize the data first ? Asking because variance depends on Range of values in the columns and sometimes there could be a very useful feature with small range of values thus having low variance.

  • @abiralaashish8798
    @abiralaashish8798 3 роки тому +2

    Should we drop the features having low variance from test dataset as well?

  • @shivarajnavalba5042
    @shivarajnavalba5042 3 роки тому

    Thanks for the video,

  • @singhmanavop
    @singhmanavop 11 місяців тому +1

    🎯 Key Takeaways for quick navigation:
    00:00 📚 Introduction to Feature Selection
    - Feature selection is essential in data science projects to handle high-dimensional data.
    - Curse of dimensionality can affect model performance, making feature selection crucial.
    - This tutorial focuses on dropping constant features as a feature selection technique.
    02:20 📊 Identifying Constant Features
    - Demonstrates how to identify constant features in a dataset.
    - Uses the variance threshold technique from scikit-learn to find features with zero variance.
    - Explains that constant features are not valuable for machine learning models.
    04:12 🚮 Removing Constant Features
    - Shows how to remove constant features using the variance threshold class.
    - Discusses the threshold parameter and its significance in dropping constant features.
    - Highlights that constant features are not important for model training.
    09:11 🧪 Applying Variance Threshold on a Real Dataset
    - Applies the variance threshold technique to a real-world dataset.
    - Demonstrates the importance of dividing data into independent and dependent features.
    - Shows how to use variance threshold to identify and remove constant features in a larger dataset.

  • @erhemantgangwar
    @erhemantgangwar 4 роки тому +2

    Great 🙏

  • @shaminyeasarapon7177
    @shaminyeasarapon7177 4 роки тому +2

    Upload more soon, please!

  • @abhaysota867
    @abhaysota867 Рік тому +2

    Just a doubt , if I select the variance to be 0.1, then will it remove the columns whose variance is exactly 0.1 or will it also remove the columns whose variance is less than 0.1

  • @PhamDuc8504
    @PhamDuc8504 Рік тому

    Thank You !!!

  • @omarsalam7586
    @omarsalam7586 Рік тому

    thank you

  • @natsudragoneel8752
    @natsudragoneel8752 3 роки тому +1

    But the threshold variance only works for numerical values right? Then what of we have object type also? Do we write our own func for threshold?

  • @ridoychandraray2413
    @ridoychandraray2413 Рік тому

    A small variance indicates that the data points tend to be very close to the mean And high variance indicates that the data points very spread out from the mean .So bro why we removing small variables in this technique?

  • @ridoychandraray2413
    @ridoychandraray2413 Рік тому

    Plz make videos about feature extraction

  • @debatradas1597
    @debatradas1597 3 роки тому

    Thanks

  • @shivangiexclusive123
    @shivangiexclusive123 3 роки тому +1

    hi krish why r u doing train test split when we have test set available in data set?

  • @reachDeepNeuron
    @reachDeepNeuron 4 роки тому +1

    Would it handle multicollinearity

  • @omarfruman3685
    @omarfruman3685 3 роки тому

    Sir Please upload further video ......... thanks

  • @omarfruman3685
    @omarfruman3685 3 роки тому

    Sir Please upload further video ......... thanks

  • @mdmynuddin1888
    @mdmynuddin1888 3 роки тому

    Well xplained .
    12 videos????

  • @drumilbhalani7614
    @drumilbhalani7614 2 роки тому +2

    Can anyone tell me why we should apply VarianceThresold to X_train only , why not on X_test also ?

  • @pratikbhansali4086
    @pratikbhansali4086 3 роки тому

    Pls upload remaining videos on feature selection techniques

  • @saurabhbarasiya4721
    @saurabhbarasiya4721 4 роки тому +1

    Can you complete this series

  • @nikunjgarg4791
    @nikunjgarg4791 4 роки тому +1

    What should be done with test dataset

  • @AIResearchTechniques
    @AIResearchTechniques 2 роки тому

    Hello sir, Can you please upload a video for " Feature Selection using BAT Algorithm".

  • @suvedchougule5847
    @suvedchougule5847 4 роки тому +1

    MySQL db and mongo db playlist are enough for ML????🤔🤔

  • @dhanushsuryavamshi4279
    @dhanushsuryavamshi4279 3 роки тому +2

    Thanks for this tutorial. Just one concern, By doing train-test split before checking for variance, won't we face the risk of getting false results? I mean the field we're dropping in x_train might have few values which is now in X_test but while building the model we dropped that field. So how can we take care of this?

    • @RohanB-xg6vg
      @RohanB-xg6vg 3 роки тому +1

      Krish has said that you will do fit() on training data and only transform() in testing data, so every no varaince feature will be removed in that manner.
      Happy learning:)

  • @killerdrama5521
    @killerdrama5521 2 роки тому

    What if we have some features numerical and some features are categorical against categorical output .. which feature section method will be helpful

  • @harshdwivedi7102
    @harshdwivedi7102 3 роки тому

    @Krish Shouldn't we scale the data before applying Variance Threshold ?

  • @ajaykushwaha-je6mw
    @ajaykushwaha-je6mw 3 роки тому

    you may also try below code to drop constant features from X_test and X_train
    X_train = obj.transform(X_train)
    X_test = obj.transform(X_test)

    • @mohlagare3417
      @mohlagare3417 2 роки тому

      thanks, but what ist obj. in your code (X_test = obj.transform(X_test))

    • @ajaykushwaha-je6mw
      @ajaykushwaha-je6mw 2 роки тому +1

      @@mohlagare3417 Best way: !pip install klib
      In just one line we can remove both things.
      df_new = klib.data_cleaning(df)

  • @pythonenthusiast9292
    @pythonenthusiast9292 4 роки тому

    How is this different from the 4-5 live videos (1hr+ length each) you did about 1-2 months ago?
    Or are they almost same?

  • @yoyoyoyo1727
    @yoyoyoyo1727 2 роки тому

    Hey krish can we remove quasi features before splitting??

  • @rtr.shubhamgrover1280
    @rtr.shubhamgrover1280 3 роки тому

    it is same as taking 4 assumption before applying any ML algo?? that homosdeasctisity and all are taken care?

  • @poojakerkar6129
    @poojakerkar6129 4 роки тому +1

    The discord link given in description is not working. Did anyone joined through that link?

  • @tusharikajoshi8410
    @tusharikajoshi8410 Рік тому

    we can not use this method on dataset having categorical data, can we?

  • @saravanajogan1221
    @saravanajogan1221 2 роки тому

    Please someone tell. What is the best value for threshold?

  • @chinmaybhat9636
    @chinmaybhat9636 3 роки тому

    @Krish Naik Sir i have one doubt, that is as per the Sklearn documentation this technique is used for Unsupervised learning but how come you have used on supervised learning technique containing dataset in this video ??
    Thanks & Regards,
    CHINMAY N BHAT

    • @tusharikajoshi8410
      @tusharikajoshi8410 Рік тому

      I think that sentence means, that because the technique has nothing to do with the target variable it can ALSO be used for unsupervised dataset.

  • @sagarkhande4412
    @sagarkhande4412 3 роки тому

    sir can we use this method after doing one hot encoding on the features?

  • @Ankitkumar-qh6tx
    @Ankitkumar-qh6tx 3 роки тому

    Great videos. We can simplify the code of getting filtered col. after threshold. Here is my code data= data[data.columns[var_thres.get_support()]]. Is this correct ?

  • @trashantrathore4995
    @trashantrathore4995 2 роки тому

    Is it fine if i do instead of fitting only X_train , i did fit_tranform(X) on whole of X then by it i get the transformed Data so no need for using get_support() and doing list comprehensions, is this approach Correct as i do not require values from .fit().

  • @pratikbhansali4086
    @pratikbhansali4086 3 роки тому

    Pls go through your every Playlist and try to check which and all topics are missing under certain Playlist and first try to complete all those remaining videos in every Playlist then come up with any another thing

  • @deepanshukumar7290
    @deepanshukumar7290 3 роки тому

    target kya cheez hai

  • @dityatheprincess
    @dityatheprincess 3 роки тому

    From where download that file housing_data

  • @bijayalaxmisahoo4195
    @bijayalaxmisahoo4195 4 роки тому

    can anybody help me how to download the dataset santander.csv.Trying to download but unable to do so.

  • @susovandey1875
    @susovandey1875 4 роки тому

    Hii...your discord server link says invalid...can you please send the link one more time. Thanks