Tutorial 1- Feature Selection-How To Drop Constant Features Using Variance Threshold
Вставка
- Опубліковано 3 жов 2024
- In this video I am going to start a new playlist on Feature Selection and in this video we will be discussing about how we can drop constant features using Variance threshold
github: github.com/kri...
Feature Selection playlist: • Feature Selection
All Playlist In My channel
Complete ML Playlist : • Complete Machine Learn...
Complete DL Playlist : • Complete Deep Learning
Complete NLP Playlist: • Natural Language Proce...
Docker End To End Implementation: • Docker End to End Impl...
Live stream Playlist: • Pytorch
Machine Learning Pipelines: • Docker End to End Impl...
Pytorch Playlist: • Pytorch
Feature Engineering : • Feature Engineering
Live Projects : • Live Projects
Kaggle competition : • Kaggle Competitions
Mongodb with Python : • MongoDb with Python
MySQL With Python : • MYSQL Database With Py...
Deployment Architectures: • Deployment Architectur...
Amazon sagemaker : • Amazon SageMaker
Please donate if you want to support the channel through GPay UPID,
Gpay: krishnaik06@okicici
Discord Server Link: / discord
Telegram link: t.me/joinchat/...
Please join as a member in my channel to get additional benefits like materials in Data Science, live streaming for Members and many more
/ @krishnaik06
Please do subscribe my other channel too
/ @krishnaikhindi
Connect with me here:
Twitter: / krishnaik06
Facebook: / krishnaik06
instagram: / krishnaik06
#featureselection
#dropconsstantfeatures
Kris please upload forthcoming videos on interview preparation for one algorithm per video format.... eagerly waiting for that....
Keep it coming ✌ great work
Thank you, this was clearly explained and I like that it has practical examples (simple and complex)
Bunch of thanks for the clear explanation❤
Hi Friend what krish has taught is very nice, there is one more way you can do it.
!pip install klib
import klib
df2 = klib.data_cleaning(df1)
Nice explanation sir.
Please upload more videos as soon as possible and these is most awaited playlist.
neatly arranged play list for all topics.. superb.. thank u for ur effort
Krish really a nice information.. waiting for upcoming videos
Thanks brother ...i was searching for this topic like anything
you're doing a great job sir, thankyou
X_train=X_train.iloc[:,var_thres.get_support(indices=True)] is also another way to drop all zero variance columns, this way list comprehension , loop and drop function will not be required.
Hi Krish, Thanks a lot for such a nice tutorial on Feature selecction. One request from my side: Could you please create a tutorial for GIT-Hub. It would be highly helpful for us. Though we know very few things but there are many things to do which would be highly impactful.
Thanks Krish 😊
I appreciate your tutorial. It is a very well structured and information rich tutorial, no doubt. I really liked all the explanations. I would like to add that 'data[data.columns[~var_thres.get_support()]]' gives the same result as the constant_columns :)
Please make a video about model monitoring after deployment , concept drift and data drift.
2:56 little correction , it will remove x or any feature whose variance less is than x , but not only feature with x variance
Sir pls make and upload videos on Transforms and BERT with practical implementation , waiting for it ............
Super sir pls upload more videos like this
First of all Great initiative again and thank you so much,🙌
🙋My query is - it always we need to drop the constant column or zero variace column or it also domain or problem specific may be it a silly query if you have a time Sir please tell me I know you also have more burden of work😊😊
A column with no variance essentially wont be giving any real insights to the models as it is same irrespective of the ground truth. So using it for training would be a waste of computation.
I think based on the domain of the problem we are working on ,we can take some other threshold values and then all the features which have variance less than or equal to the threshold value of variance we can remove all of them.
Upload more video on this data selection topics
Hi Krish , request you to make videos on already designed hypothesis experiment like ks test
This is a wonderful video, curious if there is one based on overfitting?
Well explained sir , but sir i have one doubt , whenever we working with recommendation system like amazon , in which we have featuree like user rating , user id , prod id . ... in real world records in rating table is inserted dynamically , so how to train model dynamically so its gives best output .....u learned us recommendation system in which data is static but how to deal with dynamic dataset
U have to build a entire pipeline for this...
After you left the your job you just going like gold price in india and with quantum internet speed sir
Very nice
Awesome video
Awesome Thanks Sir.
Hi, Krish! Do you need to normalize the data before doing Variance Threshold, so that the features would be on the same scale? Also, how do you pick the most appropriate threshold?
Threshold is for specific variance... normalisation will not get impacted with zero variance which is for constant features...but for the other scenarios u have to performs normalisation to find quasi constnt features..like u can the threshold as 0.01
@@krishnaik06Thank you so much! I understand. :)
@@krishnaik06 sir pls answer my question , how to deal with dynamic dataset in recommendation system ....whether i have to trained model again and again or other approach pls answer sir🙏
@@krishnaik06 sir how to select the correct threshold. Could you provide some resources
Krish, please upload a video on Advanced cnn
very well explaind am grateful, but how to apply the transform for the test data set
?
Nice Video Kris. One query, if we choose the threshold bigger than 0, say 0.10 should we not normalize the data first ? Asking because variance depends on Range of values in the columns and sometimes there could be a very useful feature with small range of values thus having low variance.
Should we drop the features having low variance from test dataset as well?
Thanks for the video,
🎯 Key Takeaways for quick navigation:
00:00 📚 Introduction to Feature Selection
- Feature selection is essential in data science projects to handle high-dimensional data.
- Curse of dimensionality can affect model performance, making feature selection crucial.
- This tutorial focuses on dropping constant features as a feature selection technique.
02:20 📊 Identifying Constant Features
- Demonstrates how to identify constant features in a dataset.
- Uses the variance threshold technique from scikit-learn to find features with zero variance.
- Explains that constant features are not valuable for machine learning models.
04:12 🚮 Removing Constant Features
- Shows how to remove constant features using the variance threshold class.
- Discusses the threshold parameter and its significance in dropping constant features.
- Highlights that constant features are not important for model training.
09:11 🧪 Applying Variance Threshold on a Real Dataset
- Applies the variance threshold technique to a real-world dataset.
- Demonstrates the importance of dividing data into independent and dependent features.
- Shows how to use variance threshold to identify and remove constant features in a larger dataset.
Great 🙏
Upload more soon, please!
Just a doubt , if I select the variance to be 0.1, then will it remove the columns whose variance is exactly 0.1 or will it also remove the columns whose variance is less than 0.1
Thank You !!!
thank you
But the threshold variance only works for numerical values right? Then what of we have object type also? Do we write our own func for threshold?
A small variance indicates that the data points tend to be very close to the mean And high variance indicates that the data points very spread out from the mean .So bro why we removing small variables in this technique?
Plz make videos about feature extraction
Thanks
hi krish why r u doing train test split when we have test set available in data set?
Would it handle multicollinearity
Sir Please upload further video ......... thanks
Sir Please upload further video ......... thanks
Well xplained .
12 videos????
Can anyone tell me why we should apply VarianceThresold to X_train only , why not on X_test also ?
Pls upload remaining videos on feature selection techniques
Can you complete this series
What should be done with test dataset
Hello sir, Can you please upload a video for " Feature Selection using BAT Algorithm".
MySQL db and mongo db playlist are enough for ML????🤔🤔
Thanks for this tutorial. Just one concern, By doing train-test split before checking for variance, won't we face the risk of getting false results? I mean the field we're dropping in x_train might have few values which is now in X_test but while building the model we dropped that field. So how can we take care of this?
Krish has said that you will do fit() on training data and only transform() in testing data, so every no varaince feature will be removed in that manner.
Happy learning:)
What if we have some features numerical and some features are categorical against categorical output .. which feature section method will be helpful
@Krish Shouldn't we scale the data before applying Variance Threshold ?
you may also try below code to drop constant features from X_test and X_train
X_train = obj.transform(X_train)
X_test = obj.transform(X_test)
thanks, but what ist obj. in your code (X_test = obj.transform(X_test))
@@mohlagare3417 Best way: !pip install klib
In just one line we can remove both things.
df_new = klib.data_cleaning(df)
How is this different from the 4-5 live videos (1hr+ length each) you did about 1-2 months ago?
Or are they almost same?
Hey krish can we remove quasi features before splitting??
it is same as taking 4 assumption before applying any ML algo?? that homosdeasctisity and all are taken care?
The discord link given in description is not working. Did anyone joined through that link?
we can not use this method on dataset having categorical data, can we?
Please someone tell. What is the best value for threshold?
@Krish Naik Sir i have one doubt, that is as per the Sklearn documentation this technique is used for Unsupervised learning but how come you have used on supervised learning technique containing dataset in this video ??
Thanks & Regards,
CHINMAY N BHAT
I think that sentence means, that because the technique has nothing to do with the target variable it can ALSO be used for unsupervised dataset.
sir can we use this method after doing one hot encoding on the features?
Great videos. We can simplify the code of getting filtered col. after threshold. Here is my code data= data[data.columns[var_thres.get_support()]]. Is this correct ?
Is it fine if i do instead of fitting only X_train , i did fit_tranform(X) on whole of X then by it i get the transformed Data so no need for using get_support() and doing list comprehensions, is this approach Correct as i do not require values from .fit().
Pls go through your every Playlist and try to check which and all topics are missing under certain Playlist and first try to complete all those remaining videos in every Playlist then come up with any another thing
target kya cheez hai
From where download that file housing_data
can anybody help me how to download the dataset santander.csv.Trying to download but unable to do so.
Hii...your discord server link says invalid...can you please send the link one more time. Thanks