Live-Features Selection-Various Techniques To Select Features Day 7

Поділитися
Вставка
  • Опубліковано 22 гру 2024

КОМЕНТАРІ •

  • @rambaldotra2221
    @rambaldotra2221 3 роки тому +22

    You are definitely going to make life of thousand students who want to be a Data Scientist . Thanks from heart Sir. Sir you inspired me a lot.

  • @adeshinajohn3988
    @adeshinajohn3988 2 роки тому

    Thanks!

  • @adamansbah7017
    @adamansbah7017 3 роки тому +6

    man you are too good, after spending my entire day looking for a solution on univariate analysis, all of a sudden i was about to go to bed then i came across your video, guess what? i did it man thanks to you. Kudos Broh you are a star

  • @pankajkumarbarman765
    @pankajkumarbarman765 4 роки тому +3

    Thank you sir this session is very benificials for us.

  • @cliffordtarimo1511
    @cliffordtarimo1511 3 роки тому +2

    Much thanks. I learned a lot today!

  • @sahilgaikwad98
    @sahilgaikwad98 4 роки тому +3

    you are great sir very informative session conducted today😊

  • @kalyankrishna5902
    @kalyankrishna5902 4 роки тому +1

    super sir I learned so much knowledge from ur videos......

  • @PratapO7O1
    @PratapO7O1 3 роки тому +2

    Starts at 8:00

  • @ushirranjan6713
    @ushirranjan6713 4 роки тому +3

    Great Explanation sir!! Appreciate your effort!!

  • @PoojaYadav-tq3yp
    @PoojaYadav-tq3yp 2 роки тому

    Very informative and interested

  • @rohitbharti2882
    @rohitbharti2882 2 роки тому

    Thanks for clearing everything sir 😊

  • @t.prathiba3836
    @t.prathiba3836 4 роки тому +1

    Excellent session

  • @equbalmustafa
    @equbalmustafa 3 роки тому +1

    Better to add this video in feature selection playlist

  • @skyrayzor3693
    @skyrayzor3693 Рік тому

    Thank you sir

  • @dorgeswati
    @dorgeswati 4 роки тому

    very helpful information thanks for creating a video

  • @joguns8257
    @joguns8257 18 днів тому

    Hello. For extremely large datasets like CERT r4.2 dataset, which technique would you recommend for filling NaN, missing values? For example, the 'activity' column?

  • @ridoychandraray2413
    @ridoychandraray2413 2 роки тому

    very helpful

  • @naveenrajan3765
    @naveenrajan3765 3 роки тому +1

    I am not able to find the continuation video of this session. Can anyone please help me!!

  • @matthewavaylon196
    @matthewavaylon196 3 роки тому +3

    Can you explain how you were able to use chi2 when it’s not categorical?

    • @akshaymathpal9111
      @akshaymathpal9111 3 роки тому +3

      Exactly same question i had. chi2 should only be used between categorical type or discrete variable but not with continuous variable. but he used it will all.

    • @tijothomas4563
      @tijothomas4563 Рік тому

      Even I had the same question

  • @sumandhakal828
    @sumandhakal828 3 роки тому +1

    Here is a simple modification of the correlation function, this will return pair of features which are highly correlated with each other.
    def correlation(df,threshold):
    corr_features = set() #set allows to store none repeation items
    corr_matrix = df.corr()
    for i in range(len(corr_matrix.columns)): # these two loops allows use to scan the first half of the diagonal. Since, the other half is just the mirror of the first half
    for j in range(i):
    if abs(corr_matrix.iloc[i,j]) > threshold: # add those features that have correlation more than the threashold value
    corr_features.add((corr_matrix.columns[i],corr_matrix.columns[j])) #getting name of the columns
    return corr_features

  • @ufukeskici
    @ufukeskici 3 роки тому +1

    Where is the second part of this video???

  • @atakanbilgili4373
    @atakanbilgili4373 2 роки тому

    Is second part of the video available anywhere?

  • @attaullah4998
    @attaullah4998 Рік тому

    Hello, In your previous videos, you split the data into test and train and then apply the feature selection on x.train just to avoid the over fitting, but here on 22 minutes using select k best for feature selection ,you are taking the whole X ,why is it so sir? Any one

  • @vishalaaa1
    @vishalaaa1 Рік тому

    Is there any feature selection technique that can select significant input variables without converting inout columns to non numeric ?

  • @shaktisharma5529
    @shaktisharma5529 4 роки тому +1

    do we need to consider threshold in the negative also eg. less than -0.50??

  • @manojrangera
    @manojrangera 3 роки тому

    Chi2 used for hypothesis testing and feature selection also

  • @shreyasb.s3819
    @shreyasb.s3819 4 роки тому +1

    Nice....Where is 2nd part feature selection?

  • @rajeshwarig213
    @rajeshwarig213 2 роки тому

    Hi Krish, In the mutual information feature selection can we pass the null value feature

  • @mrigankshekhar384
    @mrigankshekhar384 4 роки тому

    Sir whether these scores are based on p values which is actually trying to estimate the relationship between individual features and target variable (where Null Hypothesis will be NO Relationship and Alternative Hypothesis will be Relationship)

  • @pilafvibes9170
    @pilafvibes9170 3 роки тому

    Hello sir,
    Which feature selection technique to use for high dimensional data set like number of columns are 800

  • @dasgupts10
    @dasgupts10 2 роки тому

    Hi Krish, we use chi2 test on top of categorical variables. But here we are also using them on numerical variable. Can you please explain this?

  • @akashprabhakar6353
    @akashprabhakar6353 4 роки тому

    I tried ktest for feature selection in House Pricing Dataset Problem....as it had many categorical features converted into dummy variables so many columns were just dummy variables with no meaning separately....now on applying ktest...I am getting many dummy variable columns....Now how can i proceed further as including them as a separate feature makes no sense but ktest is indicating high importance

  • @priyayadav3990
    @priyayadav3990 3 роки тому

    In feature selection, tutorial 6 is missing.pls share

  • @0001abcdf
    @0001abcdf 4 роки тому +4

    Sir, amazing session but I didn't get notifications.

  • @pooranikhari5080
    @pooranikhari5080 2 роки тому

    Continue sir

  • @swatijha7390
    @swatijha7390 4 роки тому +1

    What should be done first feature selection or feature Engineering

    • @ashwinikatariya9008
      @ashwinikatariya9008 4 роки тому +1

      Feature engineering and then feature selection

    • @randomdude79404
      @randomdude79404 3 роки тому

      Feature engineering and then feature selection. Check life cycle of dats science project if you are not sure.

  • @amanjyotiparida5818
    @amanjyotiparida5818 3 роки тому

    Sir for negative value selectkbest not work

  • @akshaygane159
    @akshaygane159 3 роки тому

    Is feature importance similar with mutual information??

  • @BhaveshSharma1691991
    @BhaveshSharma1691991 3 роки тому +1

    Chi2 value ranges from 0&1, however the score we get is like example1477655.677. can you please tell what is this score?

  • @azmath4710
    @azmath4710 4 роки тому

    Hi Krish,
    I am unable to find the Live Feature engineering Day 5 in your playlist....Would you mind in sharing that?

  • @faheemkhan9786
    @faheemkhan9786 Рік тому

    What if be have Categorical Features

  • @eyobsolomon4663
    @eyobsolomon4663 3 роки тому

    i'm challenged by the error named: ValueError: Input contains NaN, infinity or a value too large for dtype('float64'). please help me.

    • @koleshjr
      @koleshjr 2 роки тому

      maybe you have missing values in your dataset

  • @utkarshsharma1867
    @utkarshsharma1867 2 роки тому

    Please make a video on 1-way and 2-way anova and its implementation in python

  • @raviyadav2552
    @raviyadav2552 3 роки тому

    is it always necessary to remove the collinearity or correlated feature or there are certain cases?

    • @akshaymathpal9111
      @akshaymathpal9111 3 роки тому

      only if independent features are highly correlated with each other that means they are almost same so adding almost same data will not help the model and won't make the model more accurate....it's like telling the model same thing again and again which it already learnt from the previous data.

  • @sauravshukla6659
    @sauravshukla6659 2 роки тому

    Krish please make a video on Deep Learning Classifiers

  • @mandarchincholkar5955
    @mandarchincholkar5955 3 роки тому +1

    Please provide timestamps.. 🙏🏻

  • @rad.strums
    @rad.strums 4 роки тому

    Should I ever log transfrom the target variable ?

    • @rad.strums
      @rad.strums 4 роки тому

      in a regression problem should I transform the target variable ?

    • @ok_7566
      @ok_7566 4 роки тому +1

      It is generally not done the constants or the weights generally scale well to reach target so not very evident though nothing is strict no in ml so...

    • @maradashyamsundar3076
      @maradashyamsundar3076 3 роки тому

      No you can't

  • @Mahanchotu
    @Mahanchotu 3 роки тому

    sir please also make some content on multivarient

  • @harshmalviya7
    @harshmalviya7 4 роки тому +1

    Sir can't join telegram channel plz help

    • @advocatesanthoshreddy9524
      @advocatesanthoshreddy9524 4 роки тому

      Install Telegram app and search for "Discussion on ML and DL by Krish Naik"

    • @harshmalviya7
      @harshmalviya7 4 роки тому

      @@advocatesanthoshreddy9524 thank you 😊 I have joined

  • @manojrangera
    @manojrangera 3 роки тому

    Sir, i have a question... In individual video you do train test split and compute feature selection technique on xtrain and
    Then remove correlated feature from xtrain and xtest also.. Sometimes you don't do train test split and direct use df fir feature selection.. So i am confused.. Should i need to do train test split and then after calculate feature selection process or just use df for it... Please reply
    Sir.. I am so much confused in this... If anyone knows the answer then reply also
    ... Thanx in advanced.. 🙏🙏🙏

    • @nbbhatt1695
      @nbbhatt1695 3 роки тому

      we can do feature selection after split or before split . but there it can bs cause overfitting problem when we select the whole df , so to prevent it we do first split and apply feature selection on X_train and then when u got features we keep same features for X_test as well .
      if i am doing wrong please guide meme.

    • @yashasvibhatt1951
      @yashasvibhatt1951 3 роки тому

      Just a friendly reminder brother do not learn things hard coded, first analyze why do you need to apply a certain algorithm on a data set then check whether or not is there really the need of that algorithm to be applied on that data set?
      Ask some questions to yourselves like:
      Can't we achieve the results without applying it?
      Is there any other, more optimized, algorithm to do the same task?
      What could be the merits and demerits of applying it on our data set?
      Then later after answering all of those questions, make your decision. The question you are asking is completely based on this principle, it is not necessarily true that you should apply feature selection or feature engineering before or after the splitting, you should know that the size of data will vary your correlation scores, if you have significantly less data then you should not even think of applying the splitting thing, even at the time of modelling, because each algorithm in Machine Learning is based on the size of data and if you split the data which is already in small size then you are only increasing the chances of overfitting.
      Understand your question like this, you are in a hypothetical situation where you have a bag containing 50000 balls, your task is to paint all of them blue and apparently you have a machine that can do that for you. Now as I said above, before doing anything analyze the problem and ask some questions:
      What kind of balls are those and does it really matter with the kind of a ball?
      Is there any burst ball present, if it is then painting that thing is not valuable and will be time and resource consuming?
      Is there any ball already painted with that color?
      What if the color you are using is not all that good quality wise, say it gets weak after the ball interacts with water?
      You got all the answers and now you are going to proceed towards the next thing
      (The next part should answer your question)
      You split the balls into a group of 40000 and 10000, but the task is still there, you still have to paint all 50000 balls and on the same time check all the anomalies present. Well now it is up to you if you want to do the procedure first on 40000 balls and then on 10000 balls or applying the same procedure in one go.
      TL;DR
      You task is to do the Feature Selection, it doesn't matter if you do it after splitting or before splitting, the thing is that you have to do it on whole data set because lets say if a feature after splitting has very low correlation with target variable in the training set then you definitely would have to remove it from testing set even though it may be showing good correlation on testing part, it doesn't matter otherwise it will make both parts look different, or you can first remove unwanted features and then do the splitting thing, well in both cases you are doing the same thing.
      Hope that helps, in case of anything that you didn't able to understand in my answer do ask me. 🙂🙂

  • @ganeshjagtap2326
    @ganeshjagtap2326 4 роки тому

    hii sir

  • @rupampatil6425
    @rupampatil6425 4 роки тому

    Hello sir... your videos are great. I need your small help. I am a BE IT student. Can you suggest me project idea for final year. My Domain is machine learning and deep learning and I want use some part of computer vision. This would be great help if you suggest me topic. Thanks in advance.