Statistical Thinking - Chi Square Test - Feature Selection

Поділитися
Вставка
  • Опубліковано 24 жов 2024

КОМЕНТАРІ • 61

  • @chiragbilimoria3275
    @chiragbilimoria3275 4 роки тому +3

    This is what I was looking for sir...Thanks a lot for this video

  • @TheSocialDrone
    @TheSocialDrone 3 роки тому +1

    You explained it very well! Thanks for producing and sharing this tutorial.

  • @DoYouHaveAName1
    @DoYouHaveAName1 Рік тому

    Thank you very much, you are a great teacher

  • @raviirla459
    @raviirla459 4 роки тому +2

    Awesome vedios with great content. loved it.. :).. waiting more vedios on feature engineering.

  • @lavendermlay5731
    @lavendermlay5731 6 місяців тому

    Hello, this was very helpful video . If you have done a bayesian analysis please provide the video link

  • @mukeshkund4465
    @mukeshkund4465 4 роки тому +2

    Best way to start your morning !!:)

  • @tryingtolearn3299
    @tryingtolearn3299 3 роки тому +1

    Thank you very much for the videos. I had two questions.
    When we have categorical variables, can we use Pearson correlation to get the order of significance, such as paperless billing is more significant than seniorcitizen? or do we need to only use chi- squared test?
    Another question- if I have few categorical variables with multiple categories, should we first create dummy variables and then run chi squared test on each of the dummy variables against the target variable?

    • @AIEngineeringLife
      @AIEngineeringLife  3 роки тому

      Strength of relationship between 2 categorical value can be measured with Cramers V test. You can check my cramers V video in case if you have not already
      one hot encoding might not be required. you just create a contingency table based on number of categories

    • @tryingtolearn3299
      @tryingtolearn3299 3 роки тому

      @@AIEngineeringLife Thank you very much for the quick reply.

  • @hssp1534
    @hssp1534 2 роки тому

    what would be our feature selection if we are using mixture of continuous and categorical variables to predict categorical variable

  • @kodjigarpp
    @kodjigarpp 4 роки тому +1

    Best way to have a productive lunch, thank you! I have a question, did you chose chi_square because the degree of freedom is 1 (for churn x gender for example). If it would have been DOF>30, what would you have chosen?

    • @AIEngineeringLife
      @AIEngineeringLife  4 роки тому +1

      Chi Square can be used with higher cardinality categories as well. But if there are lot of low tail categories it is better to group them and feed it else low tails can distort the output stats

    • @kodjigarpp
      @kodjigarpp 4 роки тому

      @@AIEngineeringLife thank you for your answer, what do you call low tail?

    • @AIEngineeringLife
      @AIEngineeringLife  4 роки тому

      When you say have a feature with 30 categories you might see last few might have only few observation to make a strong conclusion. These are low tail ones

  • @akashsinha9938
    @akashsinha9938 4 роки тому +1

    Hi sir, thanks for posting video. I had a question that to check the significance between two categorical we use chi-square test, for significance between two continuous we use t-test. How can we check significance between independent categorical variable and dependent continuous variable or vice versa?

    • @devpratap
      @devpratap 4 роки тому +1

      You can use Regression after converting your categorial variable to numeric values. If you're looking for statistical test then ANOVA would suffice.
      This will help: www.researchgate.net/post/What_if_an_independentvariable_is_categorical_and_dependent_variables_iscontinuous_variable_can_anyone_suggest_a_suitable_test

    • @akashsinha9938
      @akashsinha9938 4 роки тому

      @@devpratap Thanks for your answer. But ANOVA will work in case of independent categorical and continuous dependent variable. what test in case of continuous independent and categorical dependent. Is there any test for such case or we need to convert the categorical dependent to numerical?

  • @hardikraja
    @hardikraja 4 роки тому +1

    Awesome...

  • @rsinh3792
    @rsinh3792 3 роки тому

    Sir reviewer has asked me this question I don't know how to address it, can you please guide me "Use some statistical significant test such as T-test or ANOVA to prove you validate the proposed diagnostic model on patients and quality improvements of your method". I have two datasets. Dataset 1 was used to train the model and dataset 2 was used to validate the trained model. I have trained the ML model deployed it and Validated it on new data and presented the results. Actually, I have understood the question. Shall I apply the statistical test between the performance metrics of trained model results and validation results? Please help me, sir.

  • @akhileshlekurwale364
    @akhileshlekurwale364 4 роки тому +1

    Is this good practice to perform statistical test on all column available for modelling how any trigger point to consider this.

    • @AIEngineeringLife
      @AIEngineeringLife  4 роки тому +1

      Akhilesh.. it is very subjective. I would say it is good to investigate each variable to see how it impacts the model. How exhaustive it depends. Most of these tests can be automated

  • @rameshthamizhselvan2458
    @rameshthamizhselvan2458 4 роки тому +1

    I have one doubt instead using the stats package we can use the chisquare directly from sklearn library rit?

    • @AIEngineeringLife
      @AIEngineeringLife  4 роки тому +1

      Yes you can.. since I did not use sklearn pipeline I used stats one

  • @nickw22689
    @nickw22689 3 роки тому

    Fantastic video, you just helped me with a major assignment and saved me a lot of stress. Buy you a beer if I could!
    How can I access your github repo?

  • @uttamagrahari
    @uttamagrahari 2 роки тому

    Thank you sir

  • @junaidasghar8462
    @junaidasghar8462 4 роки тому +1

    can we do CHI-sqaure between two categorical data when there is no target variable (gender and paperlessbilling )i.e un-supervised data ?

    • @AIEngineeringLife
      @AIEngineeringLife  4 роки тому +1

      Yes Junaid you can. It can be any 2 categorical variables

    • @shubhamchoudhary5461
      @shubhamchoudhary5461 3 роки тому

      @@AIEngineeringLife ..in that way , can we find out multicolinearilty between 2 categorical features??

  • @madhukerbillapati3944
    @madhukerbillapati3944 4 роки тому +1

    Good one!!

  • @SahilSingh-cu7rh
    @SahilSingh-cu7rh 4 роки тому +1

    Hello sir,
    What types of project should we do as a fresher to get a job.
    And also to what extent one should know python?

    • @AIEngineeringLife
      @AIEngineeringLife  4 роки тому +1

      You can check my video on project. This is sample approaches where you can try out something similar
      ua-cam.com/play/PL3N9eeOlCrP7RBbok898Yk0SsUw1O9urP.html
      Learn python to a extent you can do data science work. You need to have good understanding of pandas, numpy, scikit and matplot packages

    • @SahilSingh-cu7rh
      @SahilSingh-cu7rh 4 роки тому

      Will surely watch and work on your recommended approach.
      Thank You

    • @manishsharma2211
      @manishsharma2211 3 роки тому

      Epic tut

  • @Vk-gv3sc
    @Vk-gv3sc 4 роки тому +1

    What if I have a dataset with multiple data? Should I change it to 1NF? How can i do it in python any resources plz

    • @AIEngineeringLife
      @AIEngineeringLife  4 роки тому +1

      Vijay.. Can you elaborate as the dataset I have shown in video has multiple data. Typically while testing we test for individual column with target first during data analysis phase
      Instead of doing column by column manually we can create functions and iterate through multiple columns

  • @rushikeshbulbule8120
    @rushikeshbulbule8120 4 роки тому +1

    Nice ✌

  • @poojashah5095
    @poojashah5095 3 роки тому +1

    Hello sir..
    Thank you for posting this video.
    But sir I have some doubts regarding this chi square test..
    Is it possible to use for numerical dataset as I have numerical dataset not categorical data..?
    I'm working on lung cancer dataset in which we have all numerical data ...
    Can you please post one video for selecting best features using chu square test for numerical data?
    It would be a great help if u do and explain.

    • @AIEngineeringLife
      @AIEngineeringLife  3 роки тому +1

      Chi Square test if for categorical but if it is numeric will pearson or spearman correlation will not work?. Or you can use any other feature elimination method like forward selection or others

    • @poojashah5095
      @poojashah5095 3 роки тому

      @@AIEngineeringLife so chi square test is not possible for numerical data ? But in this beginning of your video you said that in next video will show how to use chi square test for numerical dataset...

    • @poojashah5095
      @poojashah5095 3 роки тому

      @@AIEngineeringLife even is it not possible to use for continuous data ?

    • @AIEngineeringLife
      @AIEngineeringLife  3 роки тому

      @@poojashah5095 .. What you can do it you can bucket numerical data and run chi square. This is for continuous data where bucket makes sense like age, salary bucket and others. If I had said chi square for pure continuous then I made a mistake but i do have video for continuous data using regular correlation

    • @poojashah5095
      @poojashah5095 3 роки тому

      @@AIEngineeringLife can you please provide that video link ?

  • @erinwolf1563
    @erinwolf1563 4 роки тому +1

    Thank you😊😊

  • @simransharma1070
    @simransharma1070 3 роки тому

    How do we get to know as to which variables out of the given data are to be compared using chi square test?

    • @AIEngineeringLife
      @AIEngineeringLife  3 роки тому

      You can compare all categorical variables if we do not have much background of business or every dependent variable with independent

  • @subhadipghosh8194
    @subhadipghosh8194 3 роки тому

    What if there are more number of categories in a feature, like say 15-20. What to use in such cases?

    • @AIEngineeringLife
      @AIEngineeringLife  3 роки тому

      You can still use it. But if you have very low ocurance of some categories then it might not give correct outcome

    • @subhadipghosh8194
      @subhadipghosh8194 3 роки тому

      @@AIEngineeringLife Thanks for your reply

  • @salvadorrojas7969
    @salvadorrojas7969 3 роки тому

    Gran explicación

  • @venkateshkatepally6110
    @venkateshkatepally6110 3 роки тому

    Will be helpful if colab link is shared for all the videos .Thanks

    • @AIEngineeringLife
      @AIEngineeringLife  3 роки тому

      You can find repo details of my courses here - github.com/srivatsan88/
      The one you are seeing is part of my applied stats course

  • @GladstonLeon
    @GladstonLeon 3 роки тому

    Null Hypo : There is no relation between the variables
    13:30 we fail to reject the Null hypo..s , the gender col is not significant with Churn columm !
    How is it possible ???

  • @swapnanilsharma
    @swapnanilsharma 4 роки тому +1

    Why you choose no relation in NULL hypothesis. Why not NULL hypothesis is like: there is some relationship between 2 cat vaiables

    • @AIEngineeringLife
      @AIEngineeringLife  4 роки тому +1

      That is the why chi square test is defined. Each test when was hypothesized was framed on some hypothesis. The null hypothesis of the Chi-Square test is that no relationship exists on the categorical variables in the population but other tests might have different null hypothesis

    • @swapnanilsharma
      @swapnanilsharma 4 роки тому

      @@AIEngineeringLife Thanks for your quick reply. Suppose for an observation, the p-value is very small and less than the significant value, and Cramer's V score is also very less(due to the high sample size). What can we conclude from this?

    • @AIEngineeringLife
      @AIEngineeringLife  4 роки тому

      @@swapnanilsharma .. This can be your input to feature selection process of ML model as well to see if this variable is important in modelling the target variable. Again one thing is this is statistical test that gives you a probability of correlation but you can always override it if you feel this variable is important based on domain understanding

  • @farahalaa2362
    @farahalaa2362 4 роки тому +1

    Can you give me the code please ?

    • @AIEngineeringLife
      @AIEngineeringLife  4 роки тому

      it is in my git repo here - github.com/srivatsan88/UA-camLI/blob/master/statistics/Statistical_Thinking_Feature_Selection_Categorical_Variables.ipynb