Using Linear Models for t tests and ANOVA, Clearly Explained!!!

Поділитися
Вставка
  • Опубліковано 22 лип 2024
  • This StatQuest shows how the methods used to determine if a linear regression is statistically significant (covered in part 1) can be applied to t-tests and ANOVA. It also introduces the concept of a "design matrix".
    If you'd like to support StatQuest, please consider...
    Patreon: / statquest
    ...or...
    UA-cam Membership: / @statquest
    ...buy my book, a study guide, a t-shirt or hoodie, or a song from the StatQuest store...
    statquest.org/statquest-store/
    ...or just donating to StatQuest!
    www.paypal.me/statquest
    Lastly, if you want to keep up with me as I research and create new StatQuests, follow me on twitter:
    / joshuastarmer
    Correction:
    7:40 There should be parentheses around the SS differences in the F-statistics to have correct equations; (SS(mean)-SS(fit))/(p_fit-p_mean)
    #StatQuest

КОМЕНТАРІ • 47

  • @statquest
    @statquest  Рік тому +4

    Support StatQuest by buying my book The StatQuest Illustrated Guide to Machine Learning or a Study Guide or Merch!!! statquest.org/statquest-store/

  • @newtonfamily2274
    @newtonfamily2274 Рік тому +17

    I have to say that I really appreciate the fact that this video is bringing me back to 2009 UA-cam while teaching me stats. Thanks!

  • @flaetus217
    @flaetus217 Рік тому +4

    I cannot express how grateful I am for such wonderful videos!

  • @user-ob3gy3zo6y
    @user-ob3gy3zo6y Рік тому +1

    Thanks so much for posting this! We are going into two way ANOVA, I hope this helps

  • @natiajavakhishvili8517
    @natiajavakhishvili8517 Рік тому +1

    wow! this is really good! we need more for means and effects parametrization!

  • @za7607
    @za7607 Рік тому +1

    Thanks for understanding non native speakers and your clear explanation

  • @hameddadgour
    @hameddadgour 3 місяці тому +1

    Great video. Thank you for sharing!

  • @2460z_htdja
    @2460z_htdja 2 місяці тому

    I am really grateful for having found this site. I just want a simple suggestion as to what type of statistics (Anova, chi-square, etc.) I should use to determine my goal below. I am truly confused yet optimistic for someone generous out there, and I would greatly appreciate any additional comments or suggestions to clarify or simplify my statement or claim here. Thank you in advance.
    "Among the randomly selected senior students in the eight (4 public and 4 private) US south-west states (California, Arizona, New Mexico, and Texas), their responses are unanimous (strongly agree, agree, neutral, disagree, strongly disagree) as regards participating in home gardening rather than school gardening."

  • @camillesylvain804
    @camillesylvain804 Рік тому +4

    Hii, could you do videos on the different GLM (Poisson, binomial,log, Tweedie) and also the link function and some R examples plsss? I loveee your video🙌🏼

    • @statquest
      @statquest  Рік тому +2

      I'll keep those topics in mind.

  • @saude-5online422
    @saude-5online422 Рік тому +1

    First, thank you for your awesome videos! Is there any advantage of General Linear Model over a simple ANOVA, for example?

    • @statquest
      @statquest  Рік тому +3

      ANOVA is just a specific type of General Linear Model.

  • @ryanmckenna2047
    @ryanmckenna2047 7 місяців тому +1

    Here the p-value for the mean of the data is equal to the number of parameters for the equation of the mean and the p-vaue for the fit of the data is equal to the number of parameters for the fit of the data.
    How does this related to the typical notion of p-values (probability we should not reject null hypothesis)?
    Also once F has been computed, is F the itself a P-value or a value that we need to compute a p-value?

    • @statquest
      @statquest  7 місяців тому

      You might want to start with the first video in this series (this is "part 2") because that will answer your questions about how the p-values are computed and what they mean: ua-cam.com/video/nk2CQITm_eo/v-deo.html

  • @rahulbahadur1
    @rahulbahadur1 10 місяців тому +1

    Thanks

    • @statquest
      @statquest  10 місяців тому

      Hooray! Thank you so much for supporting StatQuest! TRIPLE BAM! :)

  • @SNAKE1375
    @SNAKE1375 Рік тому

    Hello Josh, it's me again....I had a question concerning the Anova. I still don't understand why we have to use an Anova when we have to compare a reference condition to several drug treatments A-B-C done in the same experiment. For instance, I am only interested in the effect of drug A or drug B compared to a control condition. So I only need a T-test. Why should I have to do an Anova?

    • @statquest
      @statquest  Рік тому +3

      An ANOVA is just a generalized t-test. Technically, you can do an ANOVA with just 2 conditions, Drug A vs a control, and you'll get the exact same results as a t-test (which is why people call it a t-test).

  • @pavoldzama4641
    @pavoldzama4641 4 місяці тому

    Hi, thanks for the awesome video ... I have a question, I still cannot get my head around the concept of this matrix... you have the formula y= 1* 2,2 + 0*3.6 + residual of the value ...
    I thought that only residuals are going into calculation ... how do we get from this bunch of y= equations one below the other to the final calculation, how does it fit in there? I am missing a bridge there. As well once there is residual at the end of the y equation and other times there is not, is this annotation later used counting with this residuals?
    If I understand correctly SS(fit) = sum of all these y equations together .... and I don't get why we need this matrix there if we do only the sum of residuals anyway ... or if I follow the equation literally I have sum residual squared (nobody written is squared but I suppose so) plus mean (which depends if is on or off according to the matrix)??
    Thanks a lot for your response

    • @statquest
      @statquest  4 місяці тому

      What time point, minutes and seconds, are you asking about?

    • @pavoldzama4641
      @pavoldzama4641 4 місяці тому

      It is spreadt between 5:00-7:00. ... where you start to show y= ewuations and you end up with simple equation in 7 amd then you move on, I am confused how this fit together

    • @statquest
      @statquest  4 місяці тому +1

      @@pavoldzama4641 First, when we use the equations to calculate the exact y-axis value for each of the known data point, we include the residuals because the mean value + the residual = the exact y-axis coordinate for the original data value. However, when we use the equations to make predictions - say like someone asks us to predict gene expression for a new control mouse - and we don't actually know what that value is - then we leave the residual off the equation, since we don't know what the difference between the mean and the "true" value is. Thus, when the equation is used with known values for the y-axis, we include the residual. When the equation is used to make predictions (and we don't know the y-axis value) we leave off the residual.
      Now, the reason we keep track of the equations in a matrix is that we can change the coefficients in one place and see how they effect the residuals and thus, how they minimize ss(fit).

    • @pavoldzama4641
      @pavoldzama4641 4 місяці тому +1

      ​@@statquestthanks for explainig, I think maybe we misunderstand, I just could not connect the dost .... I think I get it now after watching again, you use this formula for fit and this should not include residual as we calculate residuals based on this... then I got co fused pretty much on those matrices but ypur follow up video saved the day! Great wotk, keep it up

  • @unlearningcommunism4742
    @unlearningcommunism4742 Рік тому

    When I hear your voice, I want to make chemometric papers with you

  • @ToniSkit
    @ToniSkit Рік тому +3

    Woah I was like that’s weird a video … and then it was like - that’s a a lot of videos BAMMMMM

    • @computerconcepts3352
      @computerconcepts3352 Рік тому +1

      yeah lol, idk if it's a re-upload though

    • @statquest
      @statquest  Рік тому +2

      It's a re-upload. For some reason UA-cam put the originals behind a paywall...so I re-uploaded them so that they would still be free.

    • @computerconcepts3352
      @computerconcepts3352 Рік тому

      @@statquest oh ok, interesting 🤔, I thought you manually did that, lol. Thanks 👍 for the information and clarification. I'm curious whether if you access your own video or do you have to pay UA-cam to watch your own videos as well?

    • @statquest
      @statquest  Рік тому +2

      @@computerconcepts3352 I can watch them, so at first I had no idea what was going on. The content worked for me, but no one else. It was very confusing and stressful because I got a lot of negative comments. Ugh. A bad day in StatLand. :(

    • @computerconcepts3352
      @computerconcepts3352 Рік тому +1

      @@statquest oofty doof oof oof, I guess uploading videos to other platforms could help reduce damages against something like this but then I really hate how UA-cam does random things like this. I remember one of my videos got deleted for the wrong reason and I lost a bunch of views. Fixing UA-cam is actually one of my eventual goals and me watching your UA-cam videos is my first step towards that goal 👍

  • @aiswaryanair9033
    @aiswaryanair9033 Рік тому

    Pardon me if this is a dumb question , but I am having a hard time wrapping my head around the idea! Why would one prefer to do a linear regression for T test? How does it make this better?

    • @statquest
      @statquest  Рік тому +1

      It's actually the exact same test - there are just two ways to do it. The advantage of doing it this way (using regression) is that we have more flexibility - we can easily generalize it into an ANOVA test or even something more fancy.