Explaining the Chi-squared test

Поділитися
Вставка
  • Опубліковано 5 лют 2025
  • A brief explainer on the chi-squared test of independence, a classic hypothesis test for binary data.
    OTHER CHANNEL LINKS
    🗞️ Substack: verynormal.sub...
    🏪 My digital products: very-normal.se...
    ☕ Buy me a Ko-fi: ko-fi.com/very...

КОМЕНТАРІ • 38

  • @pfizerpflanze
    @pfizerpflanze 5 місяців тому +30

    The degree of freedom for the same test statistics for bigger contingency tables with I rows and J columns should be (I-1)×(J-1), for those wondering

  • @komethtauch5151
    @komethtauch5151 5 місяців тому +5

    you have no idea how long I've waited for this

  • @psl_schaefer
    @psl_schaefer 3 місяці тому +1

    This was such a clear and nice video! Keep it going!

  • @bhomiktakhar8226
    @bhomiktakhar8226 Місяць тому

    That's explained in a much better way than most colleges in my country.

  • @IUT-e8x
    @IUT-e8x 5 місяців тому

    One of my favorite channels thanks a lot.

  • @HesderOleh
    @HesderOleh 5 місяців тому +6

    I liked this video. I just would have felt very confused by what "expected" means in this context had I not already known it. I think that a good place to quickly explain it would have been when you were explaining why a 2x2 table has one degree of freedom. On the other hand it might get someone still confused to research it more themselves. On the other other hand, I am not sure where they should go to find something like that out, as most resources are not easy for a beginner to approach for math.

  • @fg786
    @fg786 5 місяців тому +18

    Wake up babe. Very Normal uploaded a new video!

  • @sotirisbekiaris3580
    @sotirisbekiaris3580 5 місяців тому +1

    Awesome content! You should definitely do a video about survival analysis

    • @very-normal
      @very-normal  5 місяців тому +1

      Thanks! I do have a small bit of survival in another video about the “biggest award in statistics” but it’s definitely worth it’s own video

  • @mark110292
    @mark110292 5 місяців тому +1

    Lost from 8.28; until then, fantastic especially with Claude for remediation.

  • @spaceotter6218
    @spaceotter6218 4 місяці тому

    good video, i was just confused with the expected table numbers, I thought to caculate this by hand any of the tables you displayed were good. I ended up learning to multiply the margins and applying the Yate's correction and that was enough replicate the result you got from R

  • @tr0wb3d3r5
    @tr0wb3d3r5 5 місяців тому

    Great vid, thanks a ton!🏆

  • @bilal_ali
    @bilal_ali 5 місяців тому +5

    Just one question
    At the end when the p value is less than 5% we fail to reject the null hypothesis.
    Means our drug is not effective.
    Right?

    • @very-normal
      @very-normal  5 місяців тому +8

      😔 yeah you’re right, my company is going to need to fictionally downsize

    • @pfizerpflanze
      @pfizerpflanze 5 місяців тому +4

      The p-value is actually 15%, namely greater than 5%.
      And yes, we don't reject the null.

  • @axscs1178
    @axscs1178 5 місяців тому

    It would ‘ve been great if you had shown how the expected frequencies under the independence assumption are calculated.

  • @psl_schaefer
    @psl_schaefer 3 місяці тому

    But how does that actually compare to using logistic regression to estimate the log odds of success given the group label (which we can estimate by MLE, MAP, or even fully Bayesian)

    • @very-normal
      @very-normal  3 місяці тому

      Chi-squared test/proportion test looks directly at the response rate, which theoretically you can transform in to odds or or log-odds. The reverse can be said of logistic regression, but we also get the added benefit of adjusting for additional variables beyond a particular covariate of interest

  • @AM-kp3mv
    @AM-kp3mv 2 місяці тому

    I was confused because calculating the statistic gave me a different value than in the video. Reading about this, I think it's because of the Yates Correction. Is it necessary in this case as the frequencies are >5? The result changes by quite a bit. Thanks for your videos.

    • @very-normal
      @very-normal  2 місяці тому

      Yes, the different value is because of the correction. The correction helps when the counts are small, but doesn’t change much if the size is large. When you say the result changes, which result are you referring to?

    • @AM-kp3mv
      @AM-kp3mv 2 місяці тому

      @@very-normal I was referring to the p-value, ( 0.153 vs 0.116 ). I was thinking in a scenario when using the correction changes the analysis for a given alpha, and what to do then.

    • @very-normal
      @very-normal  2 місяці тому +1

      Ah okay I get what you mean
      Let’s say that the significance level is 0.05. And hypothetically, doing it without correction produces a p-value of 0.04, and with it produces a p-value of 0.06.
      In an ideal world, an author with this result will declare everything they will do ahead of time before the analysis is done. This also includes whether or not they do the correction. If you have stated in the protocol or paper that you will do it with the correction, you should report the p-value that came with the correction. Choosing the more favorable p-value while ignoring this plan constitutes p-hacking and will get a paper redacted if this is found out.
      It’s important to note that p-value thresholds are arbitrary, we just choose them to be low according to our needs. If a p-value is on the borderline, then a good reviewing statistician will note that it almost didn’t fall below the threshold and might not allow the paper to publish, or force the author to note the borderline non significance of the result. The 5% threshold is often viewed as a “magic” number to get below, but for a statistician it’s pretty much describes the same situation: the probability that the data (via test statistic) would look like that, assuming the null hypothesis is true, would be low.
      Of course, that’s in an ideal world. Unfortunately, other interests can get in the way and allow falsely optimistic results to be published. Statistics is a nuanced field in a world that wants black and white results

  • @Matthew-eb3di
    @Matthew-eb3di 5 місяців тому +1

    😩😩😩 10/10 training without even having to apply to the job

  • @braineaterzombie3981
    @braineaterzombie3981 5 місяців тому

    Thanks!

  • @yahlimelnik4483
    @yahlimelnik4483 4 місяці тому

    Damn dude, what is the frequency of you hitting the gym? Your arms are BIG

  • @fibonacci112358steve
    @fibonacci112358steve 5 місяців тому +3

    This is a great video, but I'd like to make two comments for everybody:
    - "Chi Squared test" is an awful name, because there are many, many different statistical tests that have Chi-squared(n) as its null-distribution. As a group, let's all try to phase out the use of this terminology.
    - The test presented in this video is increasingly replaced by the G-test. The test statistic in this video is an asymptotic approximation of the G-test statistic. The asymptotic distribution of the G-test is Chi-squared (which comes back to the first point).

  • @sajanator3
    @sajanator3 5 місяців тому

    How do you choose between using 2 sampled t-test and chi squared test?
    Are there any examples where one would be suitable and one wouldn't?

    • @very-normal
      @very-normal  5 місяців тому +1

      I think you mean the two-sample proportion test, the t-test is technically for continuous outcomes.
      The chi-squared test (in this video) is actually equivalent to the two sample proportion test, assuming everything I did in the video. The conclusions would be the same, no matter which you use.
      If you run the proportion test in R, you’ll actually see it uses the chi-squared test to calculate a p-value.
      You would want to use something else if your sample size is small or isn’t mutually exclusive. A usual substitution is Fisher’s test for small sample sizes. For paired data, there’s also McNemar’s test.

    • @sajanator3
      @sajanator3 5 місяців тому

      @@very-normal Sorry yes, I did mean the 2 sample proportion test. Thank you for the reply.

  • @Abhishek-bz5is
    @Abhishek-bz5is 5 місяців тому

    best youtuber

  • @duckymomo7935
    @duckymomo7935 5 місяців тому

    What about chi Square goodness of fit

    • @very-normal
      @very-normal  5 місяців тому

      it kinda follows the same logic. The null hypothesis is that your data comes from some specific distribution. Your data would actually be a contingency table with one row because a goodness of fit test looks at whether or not your data conceivably comes from a given distribution. Based on this specific distribution, you can calculate expected counts. From there you calculate the statistic in the same way.

  • @pipertripp
    @pipertripp 5 місяців тому

    Shit just got real.

  • @AER9095
    @AER9095 5 місяців тому +2

    I came here for the math. Disappoint.