Feature selection in machine learning | Full course

Поділитися
Вставка
  • Опубліковано 5 лют 2025

КОМЕНТАРІ • 57

  • @ildestino6200
    @ildestino6200 День тому

    You’re brilliant that’s the first time I hear such clear and comprehensive explanation of feat selection. Thank you very very much and keep up the amazing work ❤

  • @ax5344
    @ax5344 7 місяців тому +5

    I like the logic of this video. You showed the baseline, then three additional methods, then compare them in the end. Thanks a lot for sharing the technique. The feature/target matrix is also very helpful.
    My question is the principle or concept behind the filter method, RFE, and boruta. Is it possible to do a video on them?

  • @tanyaalexander1460
    @tanyaalexander1460 9 місяців тому

    I am a noob to data science and feature selection. Yours is the most succinct and clear lesson I have found... Thank you!

  • @mauroSfigueira
    @mauroSfigueira 8 місяців тому +1

    Hugely informative and educational content. Many feature engineering videos are not that instructive.

  • @ednaldogomes6124
    @ednaldogomes6124 3 місяці тому

    Congratulations and thanks for the excellent tutorial. You gained another subscriber👏👏👏

  • @abhinavreddy6451
    @abhinavreddy6451 9 місяців тому

    Please do more Data science-related content, It was very helpful I searched everywhere for feature selection videos and finally landed on this video and this was all I needed, the content is awesome and the explanation is as well!

  • @pedro_tonom
    @pedro_tonom 6 місяців тому

    Amazing video and excelent didatic. Congrats for the great quality, helped me a lot!

  • @samuelliaw951
    @samuelliaw951 Рік тому +2

    Really great content! Learnt a lot. Thanks for your hard work!

  • @mandefrolegesse5748
    @mandefrolegesse5748 7 місяців тому

    Very interesting explanation and clear to understand. I was looking for this kind of tutorial. Subscribed👍

  • @oluwasegunodunlami7360
    @oluwasegunodunlami7360 Рік тому +1

    Wow, this video is really helpful, a lot of interesting methods were shown. Thanks a lot.
    I like to ask you to make a future video covering how you perform feature engineering and model fine tuning 1:49

  • @NulliusInVerba8
    @NulliusInVerba8 6 місяців тому

    This is extremely helpful and informative. Thanks a LOT!

  • @babakheydari9689
    @babakheydari9689 9 місяців тому

    It was great! Thanks for sharing your knowledge. Hope to see more of you.

  •  Рік тому +1

    I am currently reading your book and it's amazing

  • @claumynbega1670
    @claumynbega1670 Рік тому

    Thanks for this valuable work. Helps me learning the subject.

  • @shwetabhat9981
    @shwetabhat9981 Рік тому

    Woah , much awaited 🎉 . Thanks for all the efforts put in sir . Looking forward to more such amazing content 🙂

  • @marco_6145
    @marco_6145 11 місяців тому

    Sensational video, thank you so much!

  • @paramvirsaini2806
    @paramvirsaini2806 Рік тому

    Great explanation. Easy hands-on as well!!

  • @lecturesfromleeds614
    @lecturesfromleeds614 7 місяців тому

    Marco's the man!

  • @maythamsaeed533
    @maythamsaeed533 Рік тому

    very helpful video and easy way to explain the content. thanks alot

  • @chiragsaraogi363
    @chiragsaraogi363 Рік тому +1

    This is an incredibly helpful video. One thing I noticed is that all features are numerical. How do we approach feature selection with a mix of numerical and categorical features? Also, when we have categorical features, do we first convert them to numerical features or first do feature selection. A video on this would be really helpful. Thank you

    • @haleematajoke4794
      @haleematajoke4794 Рік тому

      You will need to convert the categorical features into numerical format by using label encoding which automatically converts it to numerical values or custom mapping where u can manually assign ur preferred values to the features. I hope it helps

    • @haleematajoke4794
      @haleematajoke4794 Рік тому +1

      You will have to do the conversion before feature selection because machine learning models only learn from numerical data

    • @darrentan271
      @darrentan271 11 днів тому +1

      @@haleematajoke4794 my dataset has a mix of numerical and categorical features for a classification task. If I perform encoding to these categorical features, which k best method should i then use (chi square or t-test in the table at 20:13 )?

    • @haleematajoke4794
      @haleematajoke4794 11 днів тому

      @@darrentan271 Chi-squared test is better

  • @tongji1984
    @tongji1984 11 місяців тому

    Dear Marco Thank you.😀

  • @scott6571
    @scott6571 Рік тому

    Thank you! It's helpful!

  • @mohammadhegazy1285
    @mohammadhegazy1285 4 місяці тому

    Than you very much for the video

  • @berlinbrown03
    @berlinbrown03 5 місяців тому

    Thanks, good review

  • @DharmapuriSangeetha
    @DharmapuriSangeetha Місяць тому

    thank you for clean insides can you list out all your books

    • @datasciencewithmarco
      @datasciencewithmarco  Місяць тому

      I only have one book for now: Time Series Forecasting in Python. I'm writing another one right now: Time Series Forecasting Using Foundation Models (cool stuff!!)

  • @edzme
    @edzme 5 місяців тому

    this is great! about how long did it take to do boruta for your dataset? like if i have 400 features and 1 million rows.. would that be impossible?

  • @alfathterry7215
    @alfathterry7215 Рік тому

    in Variance threshold technique, if we use Standard scaler instead of Minmax scaler, the variance would be the same for all variables.... does it means we can eliminate this step and just use standars scaler?

  • @imadsaddik
    @imadsaddik 11 місяців тому

    Thank you for sharing

  • @eladiomendez8226
    @eladiomendez8226 Рік тому

    Awesome video

  • @deniz-gunay
    @deniz-gunay 2 місяці тому

    hi, thanks for the content 🎉 i have a dataset of 62 features and 40k rows and im using RFE. but rfe takes so much time. 90 minutes passed but it is still working. is there a problem? is this normal? what do you think?

    • @datasciencewithmarco
      @datasciencewithmarco  2 місяці тому +1

      It's normal, as by default, it will use all features and remove them one at a time. You can increase the value of "step" or set it to a float between 0 and 1 to express a percentage. That controls how many features are removed at each iteration and should speed things up. So, if you set "step" to 5, then it removes 5 features at every iteration. If you set it to 0.05, then it removes 5% of all features at each iteration.

    • @deniz-gunay
      @deniz-gunay 2 місяці тому +1

      @@datasciencewithmarco thanks man! i have fixed it. i used mutual information to decide the number of features and RFE to select the best features.

  • @TheSerbes
    @TheSerbes 6 місяців тому

    I want to make LSTM time series, what should I do for this? I think the situation is different for time series. Would I be wrong if I use what you did? There is both trend and seasonality in the series.

  • @mrthwibble
    @mrthwibble Рік тому

    Excellent video, however I'm preoccupied trying to figure out if having wine as a gas would make dinner parties better or worse. 🤔

  • @cagataydemirbas7259
    @cagataydemirbas7259 Рік тому +1

    Hi, when I use randomforest , DecisionTree and xgboost on RFE , even if all of them tree based models, they returned completely different orders. On my dataset has 13 columns, on xgboost one of feature importance rank is 1, same feature rank on Decisiontree is 10, an same feautre on Randomforest is 7. How can I trust wich feature is better than others in general purpose ? İf a feature is better predictive than others, shouldnt it be de same rank all tree based models ? I am so confused about this. Also its same on SquentialFeatureSelection

    • @datasciencewithmarco
      @datasciencewithmarco  Рік тому

      That's normal! Even though they are tree-based, they are not the same algorithm, so ranking will change. To decide on which is the best feature set, you simply have to predict on a test set and measure the performance to make a decision.

  • @roba_trend
    @roba_trend Рік тому

    interesting content much love it🥰

  • @dorukucar7105
    @dorukucar7105 Рік тому

    pretty helpful!

  • @therevolution8611
    @therevolution8611 Рік тому

    can you explain how we are performing feature selection for the multilabel problem?

    • @BehindClosedDoorsBCD
      @BehindClosedDoorsBCD Рік тому

      You can convert the label to numerical features by replacing them with numbers. If you have 3 labels in a feature, you could represent them with 0,1,2 there are different methods to use. Simpler one is .replace({})

  • @pooraniayswariya997
    @pooraniayswariya997 Рік тому

    Can you teach how to do MRMR feature selection in ML?

  • @nikhildoye9671
    @nikhildoye9671 10 місяців тому

    I thought feature selection is done before model training. Am I wrong?

  • @roba_trend
    @roba_trend Рік тому +1

    i tried to search under your github aint get the data where is the data you work on?

    • @datasciencewithmarco
      @datasciencewithmarco  Рік тому

      The dataset comes from the scikit-learn library! We are not reading a CSV file. As long as you have scikit-learn installed, you can get the same dataset! That's what we do in cell 3 of the notebook and it's also on GitHub!

  • @nabeel_kaleel
    @nabeel_kaleel 8 місяців тому

    subscribed