Really enjoy the way you teach. First comes the theory, then the practice. I have one question though. I saw you pass RandomForestClassifier into ExhaustiveFeatureSelector or SequentialFeatureSelector in the previous tutorial. But what if my actual training model is other algorithm , say SVM or neural network, can I still pass RandomForestClassifier to do the feature selection first anyway? Then after I get the reduced features, I can use those to start training my actual model?
You can use any algorithm instead of RandomForestClassifier like SVM, logistic regression, decision tree, etc. You can use those features with any other model also.
great video, is it possible that you share the demo data set used in this video?? I am doing a small presentation and I think your example is perfect for it. Thank you in advance
Great video. Sir I am not sure how could be assigned to this also the dummy variables for categorical data? This step you did not show because you data has already 0 and 1. Many thanks!
When you pass your categorical variables it tells you their importance. If they are significant then you can make dummy variables of those categorical variables.
@@StatsWire many thanks. The issue is that I am not great in coding. So literally try to follow your steps and doing at my end the analysis. Could you add a short and add to the current video of what piece of code should be added to select the dummy variables to be integrated to the current video? Probably asking too much.. appreciated a lot
@@StatsWire thank you a lot! So usually I import the excel file and then convert it to Pyt format. From there used to do the stepwise regression (have about 8 variables that requires dummy variables) but all the times get bad results. I wanted to try to do what you do and see if there is any difference.
I applied this code, but I will not divide the data for testing and training with the fit function. I entered all the data with X, y is this considered correct
No, this is not correct because we need to have some data for testing. If we don't test our model on the unseen data then we won't be able to come to know whether our model is working well or not.
Sir, in exhaustive feature selection I see most people are using random forest, so my question is ` 1. Is the feature selected by random forest is applicable for all other algorithms like SVM, Naïve Bayes, KNN, Logistic regression and decision tree? or the feature selected by random forest is suitable for it only? 2.In exhaustive search method, instead of using radnom forest to chose feature can I use SVM, Naïve Bayes and decision tree???
Hi, both 1 and 2 are correct. You can use the features selected by random forest in any algorithm and second instead of the random forest you can use any other algorithm of your choice SVM, naive bayes, decision tree, etc
@@StatsWire Thank you sir, what made me confused is, it is often said that feature selected by it is not model agnostic therefore was thinking to use each algorithm to see if results from each algorithm will be the same or different in the context of selecting optimal subsets
@@StatsWire I applied five algorithms including SVM, RF, DT, LGR and KNN in exhaustive search individually but the optimal feature subset obtained from the each model is different. So it means different algorithms gives different results?
Yes, because different algorithms have different functions and advantages and disadvantages but most of the features will remain the same in each algorithm. You don't have to worry about that. You can choose whatever algorithm is giving you the best result
Really enjoy the way you teach. First comes the theory, then the practice.
I have one question though. I saw you pass RandomForestClassifier into ExhaustiveFeatureSelector or SequentialFeatureSelector in the previous tutorial. But what if my actual training model is other algorithm , say SVM or neural network, can I still pass RandomForestClassifier to do the feature selection first anyway? Then after I get the reduced features, I can use those to start training my actual model?
You can use any algorithm instead of RandomForestClassifier like SVM, logistic regression, decision tree, etc. You can use those features with any other model also.
Thanks a lot brooo 👍🏼👍🏼👍🏼👍🏼
Happy to help
great video, is it possible that you share the demo data set used in this video?? I am doing a small presentation and I think your example is perfect for it. Thank you in advance
Please find the dataset here: github.com/siddiquiamir/Data
@@StatsWire thank you so much
You're welcome @@cyf1434568100
Great video. Sir I am not sure how could be assigned to this also the dummy variables for categorical data? This step you did not show because you data has already 0 and 1. Many thanks!
When you pass your categorical variables it tells you their importance. If they are significant then you can make dummy variables of those categorical variables.
@@StatsWire many thanks. The issue is that I am not great in coding. So literally try to follow your steps and doing at my end the analysis. Could you add a short and add to the current video of what piece of code should be added to select the dummy variables to be integrated to the current video? Probably asking too much.. appreciated a lot
@@aslivinschi I understand it happens. I will make a video or share some materials with you on tuesday. I am a little occupied this weekend.
@@StatsWire thank you a lot! So usually I import the excel file and then convert it to Pyt format. From there used to do the stepwise regression (have about 8 variables that requires dummy variables) but all the times get bad results. I wanted to try to do what you do and see if there is any difference.
@@StatsWire hello sir, kindly wanted to check if you had any chance to review the comment above. much appreciated your support.
I applied this code, but I will not divide the data for testing and training with the fit function. I entered all the data with X, y is this considered correct
No, this is not correct because we need to have some data for testing. If we don't test our model on the unseen data then we won't be able to come to know whether our model is working well or not.
Sir, in exhaustive feature selection I see most people are using random forest, so my question is `
1. Is the feature selected by random forest is applicable for all other algorithms like SVM, Naïve Bayes, KNN, Logistic regression and decision tree? or the feature selected by random forest is suitable for it only?
2.In exhaustive search method, instead of using radnom forest to chose feature can I use SVM, Naïve Bayes and decision tree???
Hi, both 1 and 2 are correct. You can use the features selected by random forest in any algorithm and second instead of the random forest you can use any other algorithm of your choice SVM, naive bayes, decision tree, etc
@@StatsWire Thank you sir, what made me confused is, it is often said that feature selected by it is not model agnostic therefore was thinking to use each algorithm to see if results from each algorithm will be the same or different in the context of selecting optimal subsets
@@StatsWire I applied five algorithms including SVM, RF, DT, LGR and KNN in exhaustive search individually but the optimal feature subset obtained from the each model is different. So it means different algorithms gives different results?
Yes, because different algorithms have different functions and advantages and disadvantages but most of the features will remain the same in each algorithm. You don't have to worry about that. You can choose whatever algorithm is giving you the best result
@@StatsWire very satisfying answer