Chi Square (Category) | Feature Selection | Python

Поділитися
Вставка
  • Опубліковано 22 гру 2024

КОМЕНТАРІ • 35

  • @HackersRealm
    @HackersRealm  Рік тому +7

    Hi everyone, I have mistakenly mentioned that pvalue should be greater than 0.5. It should be 0.05.

  • @DasSquadBureauhalt
    @DasSquadBureauhalt 2 роки тому +1

    Literally popped in my recommended 10 minutes ago. This is great, thank you!

  • @isaackodera9441
    @isaackodera9441 Рік тому +2

    Just what I was looking for.

  • @javidhesenov7611
    @javidhesenov7611 Рік тому +1

    Nice explanation thanks. Before i was watching scipy chi2. It was a little bit diffucult. But turns out sklearn chi2 is pretty straightforward and well explained in the website. Thanks for introducing it.

  • @pradeeppaladi8513
    @pradeeppaladi8513 Рік тому +1

    Hi Ashwin, your explanation is very good. I liked it & In fact, I have subscribed to your channel as well.

    • @HackersRealm
      @HackersRealm  Рік тому

      Glad you liked the video!!! I will try my best to share more videos like this!!!

  • @kennylouries410
    @kennylouries410 27 днів тому +1

    hello sir Excellent work....kindly share playlist link for previous video

    • @HackersRealm
      @HackersRealm  27 днів тому

      Thanks. Which video you're referring?

  • @owurakuagyekum3871
    @owurakuagyekum3871 11 місяців тому

    Please what will you do next after finding the chi-values and p-values and plotting the graph? How will you use this to analyse the data and come to a conclusion??

    • @HackersRealm
      @HackersRealm  11 місяців тому

      You can find the importance of the features and try to eliminate the rest if you have many features. Eg. 1000 features

  • @sravanirekha
    @sravanirekha 10 місяців тому +1

    Can we do label encoding if one of the features have more than 10 categories?

  • @joseluisbeltramone599
    @joseluisbeltramone599 Рік тому

    ¡Tremendous explanation! Thank you very much.

  • @kartikjha5704
    @kartikjha5704 Рік тому

    We need to label encode tge variables before applying this or it will work as it is ??

  • @pradeeppaladi8513
    @pradeeppaladi8513 Рік тому

    Hi Ashwin,
    I have a question. In the list of categorical variables that you have extracted, why have you added "Dependents" & "Credit_History". Are they not numerical variables? I just want to understand the basis behind adding them to the categorical variables list! An earliest response is highly appreciated.

    • @HackersRealm
      @HackersRealm  Рік тому +2

      if you check the data, dependents is category as it has a value 4+ which is a string and also credit history is a category similar to gender... only continuous values are considered for numerical

    • @pradeeppaladi8513
      @pradeeppaladi8513 Рік тому

      @@HackersRealm Where can we find this dataset? Could you please share the link here?

    • @HackersRealm
      @HackersRealm  Рік тому +1

      @@pradeeppaladi8513 It's in the github repo and the link is in the description!!!

  • @DharmendraKumar-DS
    @DharmendraKumar-DS Рік тому

    Great explanation....can I use this technique with any dataset for regression?

  • @SWJ-MKhyathi
    @SWJ-MKhyathi 5 місяців тому

    Hi, it's a beneficial video. But how can we use this chi-square for malware detection in Android application? could you please reply me?

    • @HackersRealm
      @HackersRealm  5 місяців тому

      could you please explain this with more detail like what are the attributes you're considering?

  • @shuvamsingh4014
    @shuvamsingh4014 10 місяців тому

    my chi scores is giving nan values in array and the series attribute in pandas is also not working.
    could you please help me with my problem

    • @HackersRealm
      @HackersRealm  10 місяців тому

      Are you using different dataset or same?

    • @shuvamsingh4014
      @shuvamsingh4014 10 місяців тому

      different dataset @@HackersRealm

  • @69nukeee
    @69nukeee Рік тому

    Thank you! This video was very clear and very insightful to check.
    I do only have a quick question which isn't still clear to me: what is the null hypothesis H0? Is it maybe the hypothesis of some correlation between the categorical variables against the y target variable? If this is the case, then only variables Credit_history and Education result into having a p-value lower than 0.05, and hence they mean something (H0 valid) while the other dependent categorical variables are to be dropped (as their p-values are higher than 0.05, hence rejecting H0). Did I got it correctly?
    Anyway, really nice job, keep it up ;)

    • @AnasAbid-zm1lk
      @AnasAbid-zm1lk Рік тому +1

      The end result is correct, however the reasons aren't, I think you have misunderstood the Chi2 Independance test, let me reclarify it for you:
      - H0: the target and the dependant variable are independant
      - H1: the target and the dependant variable are depandant
      The p-value is linked to the test statistic Chi2 (measure of distance between observed and expected results), the greater Chi2, the greater the distance and therefore the less likely that the variables are independant (if they were independant, observed results and expected results would be close and Chi2 small). Also, the greater the Chi2, the smaller the p-value.
      Therefore, to sum it up, if the p-value is small (0.05 is a common threshold), it means the independance is unlikely and that we reject H0, hence only keeping variables which p-values are lower than 0.05, since they are dependant to the target (and therefore useful).

    • @69nukeee
      @69nukeee Рік тому

      @@AnasAbid-zm1lk Thank you for getting back at me!

  • @Leanmonkeyvr38217
    @Leanmonkeyvr38217 Рік тому

    p-value should be > .05 (No .5) to fail to reject Ho..

    • @HackersRealm
      @HackersRealm  Рік тому

      thanks for finding the mistake, I will update it!!!