Explaining the ANOVA and F-test

Поділитися
Вставка
  • Опубліковано 4 січ 2025

КОМЕНТАРІ • 65

  • @johns.7752
    @johns.7752 6 місяців тому +36

    The law of total variance is what made it make sense for me! None of my classes covered why something called "analysis of variance" would be a hypothesis test for significantly different means.

  • @princeofrain1428
    @princeofrain1428 6 місяців тому +26

    I wish my statistics classes had gone this deep into ANOVA. Unfortunately, we were limited by time constraints and sort of took for granted why they work. Thank you for providing more background context in a fun and engaging way!

    • @Apuryo
      @Apuryo 6 місяців тому +1

      At my school, linear models is a two year course, regression and anova get their own semester then we do generalized models and other things

  • @berjonah110
    @berjonah110 6 місяців тому +9

    An additional point on using ANOVA in practice: the F-test can only tell you that a difference between the means is present, not necessarily which groups are different or not. You have to use a more specific test (Tukey's HSD) to compare specific groups against each other.

  • @lucasortengren3844
    @lucasortengren3844 6 місяців тому +2

    Immensely underrated channel, 46k subscribers is criminal

  • @maniksethi551
    @maniksethi551 Місяць тому

    Almost done with my ANOVA class, and this built my intuition more than the course itself! thanks :)

  • @R.H111
    @R.H111 6 місяців тому +3

    Hey dude. I'm in Highschool and I got back my (self studied) AP statistics score earlier today. Scored a 5/5. I don't think I could've done it without you lol. tysm.

    • @very-normal
      @very-normal  6 місяців тому +2

      Great job! I’m sure I only played a small role in that, you’re the one who hustled to learn the material, congratulations!

  • @smoother4740
    @smoother4740 6 місяців тому +4

    This is the best explanation of the ANOVA I've seen so far. It directly answer why such a test that is testing the "equality" of differents means is called "ANOVA "(Analysis of Variance). I also liked how you showed its direct connection with the F-statistic using the actual equations. Keep up the good work!

  • @OmneyaZ
    @OmneyaZ Місяць тому

    this is amazing and totally underrated. keep going

  • @perseusgeorgiadis7821
    @perseusgeorgiadis7821 3 дні тому

    That was fucking amazing. Why does nobody explain it using the law of total variance. It all clicked now. Thank you!

  • @doentexd4770
    @doentexd4770 6 місяців тому +2

    Christian, would you consider making a video specifically about multiple regression? I still don't have an intuitive understanding of why the Gauss-Markov hypothesis need to be confirmed in order to make inferences, and I think your videos would be of great help for you're an incredible teacher. Thank you for your work! Keep it up!

    • @samlevey3263
      @samlevey3263 6 місяців тому

      It's because the assumptions of the Gauss-Markov theorem are used to determine what the standard errors of the coefficient estimators are. So, if those assumptions aren't met, but you still calculate the standard errors in the same way as you would if they were met, then you're going to get incorrect values for the standard errors. Then you use those standard errors to calculate t-statistics and such, so you'll get incorrect values for the t-statistics, and hence incorrect confidence intervals and potentially incorrect results for hypothesis tests.

  • @walterreuther1779
    @walterreuther1779 6 місяців тому

    Oh, I love it that you not only know the term Homoskedasticity but also mention it as an assumption we are taking!
    Sometimes I ask Psychologists about what they think of Nassim Taleb's criticism of IQ - it being too heteroskedastic - and then usually their looks give away that they have never learned about Heteroskedasticity in their Psychometric lessons... I think this is sad, so all the better you mention it ;-)

  • @mclovin312
    @mclovin312 5 місяців тому

    Thanks for continuously producing these videos! Your channel is by far the best explainer on statistics compared to other UA-cam channels IMO. I’m curious: what software do you use to create the videos? PowerPoint?

    • @very-normal
      @very-normal  5 місяців тому +1

      Thanks! I use Final Cut Pro for editing, Figma and Midjourney for graphics and the manim python library for animations

  • @yazer9821
    @yazer9821 6 місяців тому +3

    can you do a video on GLMs please!! Your videos are great

  • @yorailevi6747
    @yorailevi6747 5 місяців тому

    I want to mention I am currently taking aparametric stats course! so I understand the vids about it better!

  • @Apuryo
    @Apuryo 6 місяців тому +5

    what's crazy is that my stat inference midterm is literally tomorrow, it's about one way anova 🤣

  • @RomanNumural9
    @RomanNumural9 6 місяців тому

    I think an important note on this is that the more populations you check the higher the likelihood is that one differs significantly by sheer luck. If instead of 5 cancers you're checking 100, the odds that statistical fluke will make one mean look further away from the others is fairly high.

    • @very-normal
      @very-normal  6 місяців тому +1

      Yeah I thought about covering multiplicity here, but it deserves its own video

    • @statswithbrian
      @statswithbrian 6 місяців тому +3

      This is not true with ANOVA. It has a type I error rate of 5% for finding *any* difference, not for each particular difference. If you had 1 million populations that were all the same, you would still only have a alpha% chance of finding a fluke. This is the advantage of running an ANOVA and not just running a bunch of two-sample pairwise tests.

  • @GeoffryGifari
    @GeoffryGifari 6 місяців тому +1

    Hmmm what if 5 out of 6 drug-organ pairs see success in cancer treatment? (1 mean singled out from the group, but not what we expect)
    Or if the group means are clustered, split in half (pairs 1,2,3 have the same mean, so do pairs 4,5,6)?

    • @very-normal
      @very-normal  6 місяців тому +2

      You’d have a similar conclusion. The ANOVA is only detecting that at least one of them is different, so if that’s the case, there should be some compelling evidence to reject the null hypothesis. But to actually figure out *which* one is different, you’d need to follow up with secondary testing for each of the means

  • @abcpsc
    @abcpsc 5 місяців тому

    At 9:22, why are they Chi square distributed?

    • @very-normal
      @very-normal  5 місяців тому

      It comes from the distribution assumption on the residuals.
      The residuals were assumed to be normally distributed with some variance, sigma^2. You if you divide the sum of squares by sigma^2, then you get a random variable that’s a standard normal, squaring that gives you a chi-squared distribution. This applies to both the numerator and denominator in the F-statistic.

  • @AnkhArcRod
    @AnkhArcRod 5 місяців тому +1

    @Very Normal What textbook would you suggest for this content?

    • @very-normal
      @very-normal  5 місяців тому +1

      Rosner’s Fundamentals of Biostatistics (7th ed) is a good source with a solutions manual that can also easily be found online

    • @AnkhArcRod
      @AnkhArcRod 5 місяців тому

      @@very-normal Thanks! And I must say that you are an excellent teacher.

  • @1.4142
    @1.4142 6 місяців тому

    Wow I was just working on this exact scenario

  • @Iachlan
    @Iachlan 5 місяців тому

    In the one sample t test, we take alpha error to be cconstant and play around with error beta. Could we do it the other way around what would the implications be?

    • @very-normal
      @very-normal  5 місяців тому

      you could, but most of the time we’re interested in detecting a significant effect, so power is the thing we want to maximize. There’s a trade off between reducing type-I error and power, so we choose to keep alpha constant to signify we tolerate a defined probability of making a wrong decision about rejecting the null

  • @AJ-tr4jx
    @AJ-tr4jx 5 місяців тому

    what if the drug has effect on all the test group and the means for all the groups are shifted the same amount?

    • @very-normal
      @very-normal  5 місяців тому +1

      You’d prolly get a null result. If you shift all the distributions by the same amount, there wouldn’t be a change in the variance in group means

  • @walterreuther1779
    @walterreuther1779 6 місяців тому +1

    Question: What to do when the assumption of h̶o̶m̶o̶s̶k̶e̶d̶a̶s̶t̶i̶c̶i̶t̶y̶ homogeneity of variance is not met, i.e. there are different variances in the different populations?
    I would think this is a rather major assumption, especially if the sample size is small, as that would make ̶h̶e̶t̶e̶r̶o̶s̶k̶e̶d̶a̶s̶t̶i̶c̶i̶t̶y̶ heterogeneity of variance harder to test...
    Shouldn't one not always in some form test for ̶h̶e̶t̶e̶r̶o̶s̶k̶e̶d̶a̶s̶t̶i̶c̶i̶t̶y̶ heterogeneity of variance? Is this done in practice?
    Edit: Sorry, I wrote homoskedasticity and heteroskedasticity, but I meant homogeneity of variance and heterogeneity of variance. (The former assumes constant variance in the regressor variables, while the latter assumes the same variance for different sub-populations.

    • @zaydmohammed6805
      @zaydmohammed6805 6 місяців тому

      Same question here. In regression I remember them teaching us that you can scale down the data with the different variances in presence of heteroscedasticity. I wonder if that would work here or we have to do some sort of non parametric test

    • @very-normal
      @very-normal  6 місяців тому +1

      Yeah, common variance is a pretty strong assumption to make. One solution I know of is a variant of the ANOVA called Welch’s ANOVA that can be used when you don’t want to make this assumption.
      It’s from the same guy behind Welch’s t-test, the version that students learn for two-sample problems when they also can’t assume common variance.

    • @walterreuther1779
      @walterreuther1779 6 місяців тому

      @@very-normal Thank you that's great to know. It seems like Welch's ANOVA is really the way to go, both for small sample size and for no knowledge about the data. (Apparently, it is almost as powerful as the standard ANOVA, even if heterogeneity of variance is fulfilled, so...)

  • @Iachlan
    @Iachlan 5 місяців тому

    Can you explain the statistics behind weather prediction

    • @very-normal
      @very-normal  5 місяців тому

      I’m not very well versed it in, but it sounds like it’d be a fancy, high dimensional regression model

  • @stanleystanleystanley7246
    @stanleystanleystanley7246 2 місяці тому

    If the goal is to find out whether the drug is potentially useful, whether all the mu's are the same or not doesn't really tell you anything. The drug could be equally useful for all 5 illnesses or unequally useful.

    • @very-normal
      @very-normal  2 місяці тому

      It depends on what mu represents. If you define mu to be some baseline value or rate, then if something is statistically different from this baseline, then it could merit further investigation in a larger experiment.

  • @dullyvampir83
    @dullyvampir83 6 місяців тому

    If the residues are normally distributed are then the original data not normal distributed as well? Aren't they just shifted by the mean?

    • @very-normal
      @very-normal  6 місяців тому +2

      You’re right, I just wanted to emphasize that the main assumption is on the residuals. It implies that the outcome is normally distributed, but it’s more of a consequence of the fact that the residuals are normally distributed, rather than an assumption of the model

  • @Imperial_Squid
    @Imperial_Squid 6 місяців тому +1

    Could you explain a bit further about the "residuals are normally distributed not that the variable is normally distributed itself" thing? This is one of the things that trips me up most often..

    • @very-normal
      @very-normal  6 місяців тому +1

      Yeah for sure, I’ll try my best. This is partially my opinion, so just a heads up.
      My feeling is that assuming something about the data itself is much stronger than assuming something about the residuals. Very rarely will real-world data follow nice distributions like the Normal, so it’s harder to convince people (read: the statistical referee) that this will hold up.
      On the other hand, assuming that the residuals is not so bad. It’s like saying, we know there’s an average outcome and people will differ from this average, but they won’t differ too badly from it. In other words, outlier residuals are very rare. It’s confusing because this residual assumption implies that the outcome is also normally distributed in this, but it’s important to note that it’s the residual assumption we make.
      It’s also important because with stuff like linear regression, we’re looking at how different values of the predictor (i.e. cancer group) shift the distribution of the outcome. If you assume the data itself to have an outcome, it gets more complicated to try to work in how other variables influence it. Assuming the distribution is on the residuals doesn’t come with this baggage.
      Some people are taught that they should try to transform the outcome so that it “works better” with linear regression or ANOVA. Even though you’re manipulating the outcome, the hope is that this transformations makes the -residuals- look more normal.
      I hope this helps clarify somewhat. If anyone else sees this and thinks I left something out, please chip in. This is a common question, but even I don’t feel like I get all the nuances.

    • @Imperial_Squid
      @Imperial_Squid 6 місяців тому +2

      @@very-normal "it's confusing because this residual assumption implies the outcome is also normally distributed in this" yeah that's the bit that always tripped me up, like I get that you can make one or other the core assumption and build it up from there (it's like picking your axioms in pure maths or something), but in my head the fact that the kinda nebulous residuals assumption implies the much more intuitive distribution assumption meant that I was often fighting between intuition and logic in terms of thinking it through. It also doesn't help that thinking of an example where the residuals are Normal but the distribution _isn't_ is much harder...
      So it's more about being an assumption of convenience in that it makes the maths much nicer to deal with and is also a weaker and more generalisable assumption, rather than it being anything else like purity or tradition or something.
      Thanks, I think I get it now! Though no doubt this will be one of those weird bits that'll always feel a little bit of, I feel like I have a much better grasp of the rationale! Much appreciated!

  • @bcs1793
    @bcs1793 4 місяці тому

    At 9:17, shouldn't Y_i be \mu_j? Or \mu_i, depending on what you are summing over

    • @very-normal
      @very-normal  4 місяці тому

      My notation was a little sloppy here... I think you are right. The denominator is supposed to be the variance of the residuals, but my sum doesn't look like it there. Thanks for catching that

  • @chillphil967
    @chillphil967 6 місяців тому +2

    1:19 is there heart cancer? i thought no, since the cells are from birth. cool video either way, thx!

    • @very-normal
      @very-normal  6 місяців тому +4

      I saw it was really rare, but deep down, I was just looking for an emoji to represent the group lol 😅

  • @jasondads9509
    @jasondads9509 6 місяців тому

    anova did my head in stats, i

  • @glebpl7068
    @glebpl7068 3 місяці тому

    i will be counting days until you do videos about 2-factor ANOVA and then ANCOVA. and then :-) special video explaining what is actually the difference.
    because i`m dense m****f**** and i dont get it.... thank you)))

    • @very-normal
      @very-normal  3 місяці тому

      not gonna lie, you’ll be counting a lot of days friend. But I can explain a bit
      The rationale behind two way anova is almost exactly like the one-way anova. As its name suggests, one-way anova looks at a single categorical variable, two-way looks at two. The “groups” in two-way anova include not just the main categories (i.e being in treatment A vs not, or being in treatment B vs not), but also considers interactions as their own groups as well (i.e. someone being on both treatment A and B).
      As for ANCOVA, I’ve never dealt with it before myself lol, so I can’t comment on it herr

  • @chillphil967
    @chillphil967 6 місяців тому

    🎉

  • @dibyajyotisaikia11
    @dibyajyotisaikia11 5 місяців тому

    I think example is incorrect, if the new drug is effective on different types of cancer , anova may still show statistically non significant inspite the drug being effective leading to wrong conclusion drawn and loss to the company 😂

    • @very-normal
      @very-normal  5 місяців тому +2

      that’s all hypothesis tests tho lol

    • @dibyajyotisaikia11
      @dibyajyotisaikia11 5 місяців тому

      @@very-normal I meant you need atleast one more group of standard or control to come to any conclusion regarding efficacy

  • @synchro-dentally1965
    @synchro-dentally1965 6 місяців тому +1

    I heard recently that Fisher was great at stats but not the best in moral and ethical character.

    • @very-normal
      @very-normal  6 місяців тому +1

      yeahhh he had some L opinions with smoking and eugenics

  • @vegetableball
    @vegetableball 5 місяців тому

    Wait... You spend most of the time about ANOVA test and make an irrelevant simulation. Could you make a better simulation that looks more like the cancers and drugs problem we were looking at?

    • @very-normal
      @very-normal  5 місяців тому

      i don’t have access to data like that, so a simulation from an particular situation was the next best thing lol

  • @femboymadara
    @femboymadara 5 місяців тому

    ur the goat