There is an error in this video - we apologize for any confusion this may have caused. At 4:40, we say that in order to calculate the SSM, you "sum up the squared distances between each point and its group mean." We should have said "sup up the squared distances between each point and the grand mean, or overall mean." Thanks to our audience for pointing this out.
5:00 The distance between each point and it's group mean is the residual error (SSE). The SSM would be the difference between each group mean and grand mean.
You are totally right. I have taken a look at my statistics text book and it says that SST(Total sum of squares) = SSM + SSE, and SSM is calculated by the (difference between each group mean and the grand mean)^2 * (Total number of categories). I am surprised to notice two points. The first thing is that crash course made this kind of huge mistakes when explaining ANOVA and the second is that no one actually notices that (only based on this comment section) except you, Ronan
You make statistics so understandable and not as abstract! I am not so scared of t-tests, z-tests, F-tests and ANOVA anymore! Why do most statistics teachers make it seem so scary? Statistics is great! (esp for curious minds like myself ^_^)
Great video, although I wouldn’t recommend running three t-tests after the ANOVA without first applying the Bonferroni correction! This is using an alpha level of 0.05/the number of comparisons you’re making (in this case 3). Use this corrected alpha level to determine significance, otherwise you may run into family-wise problems and make a type 1 error
I am so glad SPSS does most of the work and I'm just learning to interpret the results. Interesting that you moved onto ANOVA at the same week my quantitative and qualitative research methods module at uni did :)
I really love all of crash course's content, but I've been having a tough time following this series. After rewatching this episode and the previous one multiple times, I'm still confused. In this episode, but SSM and SSE are both described as the sum of squares between each point and its group mean, but SSM and SSE are different! If someone could explain this to me I'd really appreciate it.
This is the only CC series to date that seems to be geared very specifically at being a class supplemental and not necessarily accessible to someone only watching the videos (although Engineering is also skirting the line a little bit). It's been fairly disappointing.
I'm sorry if there was an error. SSE is the sum of the squared distance between each point and its group mean (more generally it's the distance between the data point and the predicted value). SSModel is the sum of squared distance between the model prediction and the grand (overall) mean.
That's definitely good because as soon as they said "just run 3 t-tests" I almost fell out of my chair. Doesn't help those students watching this who are now like "oh just run t tests!"
If you were doing this study would it be better to do all 3 t-tests with the bonferroni correction and present all 3 in a paper, or find the t-test that shows the strongest result and only present that one?
@@lakudzala195 I don't remember the bonferroni(spelling?Lol) correction but just wanted to say, there's a push for presenting confidence intervals in papers. So, regardless of what you end up doing, I suggest using confidence intervals. Also, maybe look for it on google scholar (also related topic: replication in science). (I'm assuming this is a scientific study of some kind. )
For the first example and the slope calculation, it should be the opposite (μ1-μ0) in the numerator. Just for avoiding any confusion with regard to the code names for rainy and non-rainy days. Thanks Crash Course team for all your efforts and teaching, and thanks Adriene for this particular course which ,for me as an engineer, was a tough lesson for all those years that I was avoiding Statistics courses 😅
Hey, I love these series, it's helping me through a semester of Corona-Statistics. I just think I might have found a mistake at 10:57 for the model sum of squares, because the sum should go from i=1 to k, instead of to n as the figure says.
A bunny preserve how cool. Also I really have to think hard to understand your videos, but it is always worth it. I never gamble, but life is a gamble and statistics are one of the best ways to make a decision. Not always the right decision, but random chance rules the world.
Thank you so much for this information!! I was wondering if you guys might be able to add the APA citation for these sources in the description! That would be really helpful!
There was one big mistake....never talk about chocolate in maths 😄 I was not able to think about calculations but chocolate. Overall was pretty clear:)
I love Indiana, my family is from there. I’m hype. Love u guys. Thanks for your help. You guys literally help me in every class I have. I go to college online. CTU online. Thanks guys for real.
I think your explanation for the model sum of squares is incorrect; it should be the sum of squared differences between group means and the overall mean.
Is there any situation where an F test would say not statistically significant but a T test would? The fact that you said a failed F test means a relationship "probably" doesn't exist seems to imply that it can. What would you do in that case?
Is there a variation of GLM that relies on median rather than mean? I kind of doubt it because it doesn't work mathematically ... but I have read that medians are a "more accurate" measure of "typical" than means. For instance, in the bunnies example, if one day the sanctuary sent all the bunnies outside on a sunny day and you saw 30 bunnies, it would skew your 1-or-5 general model strongly.
Most of the time you could take a non-parametric test over a parametric test if you're concerned that your data doesn't follow a theoretical distribution. Non-parametric tests are often based on rank. The good thing is that they are robust, the downside is that you lose some statistical power.
When you are calculating, SSM- it is the difference between the overall mean and the mean of each group. SSE- is the difference between observed data and the group means. The SSM is explained incorrectly in the video. But otherwise, great content. Thank you.
So if the mean of one or more groups (are skewed by outlier or missing values), is anova's result between the groups still valid? Since the parameters for anova is the variances
Cacao bean difference here is an example of the danger of significance testing. I would argue that a mean difference of .17, on the scale being used, is not meaningful.
Also the ratings are on an ordinal scale, not a continuous ones (as seen by the discrete values the ratings can take). So applying a non-parametric test might be more useful.
Since it's a Kaggle dataset, did you use R or python to analyze the data? Or did you download it and use Excel, SPSS, Stata, SAS, etc to analyze the data?
Even though I've taken an entire semester worth of classes on statistics, these videos are actually even more confusing. You guys focus too much on keeping the videos short and end up explaining nothing at all. there is information and you make a few good points but its nothing one cant get from a regular math website. the visuals are a waste and it all seems pretty forced and like youre just reading off a screen.
Using ordinal data is a bad example with the cocoa bean type. You cant use the mean as a measure of central tendency when it has no meaning i.e.what is the average of strongly agree and disagree? Also really bad idea to teach doing multiple t-tests as it increasing the Type I error, and defeats the whole point of ANOVA. Would have been better to show Tukey’ HSD to determine which means are different.
If we can know exactly the statistical significance between different groups by using t tests for every 2 groups, why even bother with the f-test in the first place lol ?
There is an error in this video - we apologize for any confusion this may have caused. At 4:40, we say that in order to calculate the SSM, you "sum up the squared distances between each point and its group mean." We should have said "sup up the squared distances between each point and the grand mean, or overall mean." Thanks to our audience for pointing this out.
I'm here in the deserate hopes that this will help me understand stats after a full semester of classes.
same lol oops
Same... Paid school fees to have lectures but end up depend on UA-cam to learn cause lectures are bad :(
im with you!
Same
Same but she is talking way too fast and now I'm more lost
5:00 The distance between each point and it's group mean is the residual error (SSE). The SSM would be the difference between each group mean and grand mean.
10:36 SSE (Residual) should be (Xi - Xbar_group) in squared term?
You are totally right. I have taken a look at my statistics text book and it says that SST(Total sum of squares) = SSM + SSE, and SSM is calculated by the (difference between each group mean and the grand mean)^2 * (Total number of categories). I am surprised to notice two points. The first thing is that crash course made this kind of huge mistakes when explaining ANOVA and the second is that no one actually notices that (only based on this comment section) except you, Ronan
The explanation for the SSM and SSE are exactly the same on this video, which means that either of one is wrong
Had the same doubt, checked the comments for confirmation, and found your reply. Thank you!
I guess we have to contact crash course team somehow to tell them about that, in the case of other people will rely on those videos
Today I learned that the word ANOVA exists, and that I shouldn't jump halfway into a course.
ANOVA - Analysis of Variance
This + 9000!!!
This is a 2nd year statistics topic (at least at my uni), so its not easy!
Lol. I'm halfway through the video, and this is what I learnt so far xD *goes back to earlier videos*
I got the same lesson 😂😂
I appreciate the effort into making the video, but I was a bit overwhelmed by all the graphics and the speed of the explanations.
I watched at 0.75 speed XD
@@philocac5424 I watched at 0.5 speed holding down space
beat that 🤣
Same here
I mean its crashcourse lol
I'm a grad student studying psychology. In a couple weeks, I have to take a class on ANOVA. This really helps.
You make statistics so understandable and not as abstract! I am not so scared of t-tests, z-tests, F-tests and ANOVA anymore! Why do most statistics teachers make it seem so scary? Statistics is great! (esp for curious minds like myself ^_^)
Great video, although I wouldn’t recommend running three t-tests after the ANOVA without first applying the Bonferroni correction! This is using an alpha level of 0.05/the number of comparisons you’re making (in this case 3). Use this corrected alpha level to determine significance, otherwise you may run into family-wise problems and make a type 1 error
This presentation helped me gain further insight to ANOVA. Wish I had you (this goes to the whole cast) as a stats teacher! Big THANKS!!!
ANOVA, I learned it, being tested on it, aced it, but really didn't understand what it is.
Yah me too ... The software is too complicated...
I took two classes using ANOVA b4 learning (on my own) the connection to GLMs lol
The intuition I have about ANOVA is that it tests wether the variance between groups exceeds the variance within groups. Maybe that could help
basically my college life
This is incredible. Taking Stats for Psyc right now, getting a lot harder as it goes on. Thank you!
same. its killing me.
I am so glad SPSS does most of the work and I'm just learning to interpret the results. Interesting that you moved onto ANOVA at the same week my quantitative and qualitative research methods module at uni did :)
NOVA has been one of my favorite PBS programs for 3 decades now.
This information is spit out insanely fast.
I love crash course videos for their simplicity but couldn't make out much from this one
Wow. These are coming out right when I need them lol. Taking a hard statistics course.
I really love all of crash course's content, but I've been having a tough time following this series. After rewatching this episode and the previous one multiple times, I'm still confused. In this episode, but SSM and SSE are both described as the sum of squares between each point and its group mean, but SSM and SSE are different! If someone could explain this to me I'd really appreciate it.
This is the only CC series to date that seems to be geared very specifically at being a class supplemental and not necessarily accessible to someone only watching the videos (although Engineering is also skirting the line a little bit). It's been fairly disappointing.
I'm sorry if there was an error.
SSE is the sum of the squared distance between each point and its group mean (more generally it's the distance between the data point and the predicted value).
SSModel is the sum of squared distance between the model prediction and the grand (overall) mean.
@@chelseaparlett8069 Thanks for the clarification, I think I understand now :)
@@chelseaparlett8069 thank you!
@@chelseaparlett8069 Is the predicted value in SSE basically the predicted mean?
I love crash course! But this is not an introductory level video. There are others that explain anova more simply
You can't run multiple T-tests... this inflates the rate of Type I error!
They said they'll address that in a future episode
That's definitely good because as soon as they said "just run 3 t-tests" I almost fell out of my chair. Doesn't help those students watching this who are now like "oh just run t tests!"
Ya, you have to do a post-hoc test like Tukey. Otherwise, there's no point in doing the ANOVA in the first place, just do t-tests.
If you were doing this study would it be better to do all 3 t-tests with the bonferroni correction and present all 3 in a paper, or find the t-test that shows the strongest result and only present that one?
@@lakudzala195 I don't remember the bonferroni(spelling?Lol) correction but just wanted to say, there's a push for presenting confidence intervals in papers. So, regardless of what you end up doing, I suggest using confidence intervals. Also, maybe look for it on google scholar (also related topic: replication in science). (I'm assuming this is a scientific study of some kind. )
Hi graphics team, ß ≠ β 😉
Thank you Adriene for helping me pass this class 🙏
You are an incredibly eloquent speaker. Thanks for this explanation!
For the first example and the slope calculation, it should be the opposite (μ1-μ0) in the numerator. Just for avoiding any confusion with regard to the code names for rainy and non-rainy days. Thanks Crash Course team for all your efforts and teaching, and thanks Adriene for this particular course which ,for me as an engineer, was a tough lesson for all those years that I was avoiding Statistics courses 😅
Hey, I love these series, it's helping me through a semester of Corona-Statistics.
I just think I might have found a mistake at 10:57 for the model sum of squares, because the sum should go from i=1 to k, instead of to n as the figure says.
I love the examples but this goes way too fast, I had a hard time following the explanations 😢
Just watch it on x0.75 then, or pause it and go back.
I have my intro to bio stats final tomorrow and these videos are my 3 am Hail Mary half court shot.
A bunny preserve how cool. Also I really have to think hard to understand your videos, but it is always worth it. I never gamble, but life is a gamble and statistics are one of the best ways to make a decision. Not always the right decision, but random chance rules the world.
Thank you! I've needed this episode for years.
Thank you so much for this information!! I was wondering if you guys might be able to add the APA citation for these sources in the description! That would be really helpful!
There was one big mistake....never talk about chocolate in maths 😄
I was not able to think about calculations but chocolate.
Overall was pretty clear:)
YES. There is largely unpalatable chocolate. I've eaten some.
I love Indiana, my family is from there. I’m hype. Love u guys. Thanks for your help. You guys literally help me in every class I have. I go to college online. CTU online. Thanks guys for real.
Array! Noru moosko ra pandhi! Epudu matladuthanai untavu. Konchuma brathakamu neruchuko.
My favorite statistics documentary series is PBS ANOVA
ANOVA beat me up in stats class...
thank youapparently i am understing this 1 year after campus into data science
Brilliant video, thanks for sharing.
I think your explanation for the model sum of squares is incorrect; it should be the sum of squared differences between group means and the overall mean.
You're gonna save my master's degree *.*
Is there any situation where an F test would say not statistically significant but a T test would? The fact that you said a failed F test means a relationship "probably" doesn't exist seems to imply that it can. What would you do in that case?
This video was super helpful!! Thank you!!! ❤
Is it a bumpy or slippery slope? Si there's a variable difference.
Cool, Hill is back. I liked her in econ
Is it right to say that ANOVA is the same as T-test but the former is when you have more that 2 groups?
It's right that the ANOVA is used for cases where a t-test is inappropriate/inadequate because there are more than two groups to compare.
Would there be a point in doing an ANOVA for two groups, or would it be easier to just do a T-test?
omg love this video
Is there a variation of GLM that relies on median rather than mean? I kind of doubt it because it doesn't work mathematically ... but I have read that medians are a "more accurate" measure of "typical" than means. For instance, in the bunnies example, if one day the sanctuary sent all the bunnies outside on a sunny day and you saw 30 bunnies, it would skew your 1-or-5 general model strongly.
Hey, General linear model and Generalized linear model(GLM) are two different things.
Most of the time you could take a non-parametric test over a parametric test if you're concerned that your data doesn't follow a theoretical distribution. Non-parametric tests are often based on rank. The good thing is that they are robust, the downside is that you lose some statistical power.
Very confusing.
When you are calculating, SSM- it is the difference between the overall mean and the mean of each group. SSE- is the difference between observed data and the group means. The SSM is explained incorrectly in the video. But otherwise, great content. Thank you.
So if the mean of one or more groups (are skewed by outlier or missing values), is anova's result between the groups still valid? Since the parameters for anova is the variances
Cacao bean difference here is an example of the danger of significance testing. I would argue that a mean difference of .17, on the scale being used, is not meaningful.
I think most people would agree, which is why you should always present your effect size along with your p-value :)
Also, presenting confidence intervals is a good idea!
Also the ratings are on an ordinal scale, not a continuous ones (as seen by the discrete values the ratings can take). So applying a non-parametric test might be more useful.
@@teunvandenbrand1324 well if we care about that, an ordered logit / probit would work. C:
K. I have a question... why exactly do you hate the ever so extraordinary SPONGE???
Or don’t answer...
Statistics is way better than geometry.
So instead of using ANOVA, why not just use multiple T-test?
Yeah, but which potato varieties did best in Martian soil, supplemented with human manure and bacteria cultures?
It sounds like SSR and SSE are the same thing
You are a life saver!
Is it a complete course on statistics..I mean.. does it includes most of what we need to know about statistics..?
Since it's a Kaggle dataset, did you use R or python to analyze the data? Or did you download it and use Excel, SPSS, Stata, SAS, etc to analyze the data?
Voltaire's Army R was used
Thank you!!!
ANOVA is a great tasting chocolate bean.
You're amazing
Can you sugest good book to follow crash course series & further practise
Open Intro Statistics is decent. It's free to get an ebook and has decent resources too. I used it in a class lol
@@voltairesarmy6702 thanks i will start to download from web
Can you give your suggestion e-book
Even though I've taken an entire semester worth of classes on statistics, these videos are actually even more confusing. You guys focus too much on keeping the videos short and end up explaining nothing at all. there is information and you make a few good points but its nothing one cant get from a regular math website. the visuals are a waste and it all seems pretty forced and like youre just reading off a screen.
Do a crash course history on the Bangladeshi war of independence in 1971. I have a project and would love of you do a video on it.
I thought this was going to be about my sous vide circulator lol
Using ordinal data is a bad example with the cocoa bean type. You cant use the mean as a measure of central tendency when it has no meaning i.e.what is the average of strongly agree and disagree? Also really bad idea to teach doing multiple t-tests as it increasing the Type I error, and defeats the whole point of ANOVA. Would have been better to show Tukey’ HSD to determine which means are different.
*NOTIFICATION SQUAD WHERE YOU AT? 🔥💯💪*
NIce explaination..
Adrian Hill is the best no one compares
this letterally bettar than sharing the bad with my gf ,thank you so much for the work absolutely mind-blowing
wait was this a 2 way or 1 way ANOVA? Lol what is the difference?
I wanna walk through a bunny preserve to work 🥺
Unpalatable chocolate?
Yeah,...it's called carob
Bunny count would not be Gaussian. Probably Poisson or negative binomial.
Anyone here from ANU?
this doesnt make anova easy to understand at all. it doesn't take into consideration that people are now learning this whole concept...
I like cookies
Please feature bunnies more often.
9 grand a year to learn more off off of a 5yr old UA-cam playlist than in my Stats lectures... (I have an exam on this and I am so screwed)
DFTBAQ (hey did you can type DFTBAQ with your left hand only?)
This isn’t John Green.
Hey! Explain the story of Scheherazade. Pwease.
Very difficult to follow with this speed of the explanation.
This series has the lowest viewership compared to all other series in cc
If we can know exactly the statistical significance between different groups by using t tests for every 2 groups, why even bother with the f-test in the first place lol ?
والله مدري ايش بتقول دي
yup nope still confused I miss the dude D: TAKE ME BACK TO SCIENCE
still can't get it. I'm an idiot. sorry.
bro!
UMMMMMMM AM I AN IDIOT OR HOW DO U CALCULATE THE P-VALUE??????
bro I feel u I cant find any substantial explanation either :/
Lol what people will do for the first comment...
Omg I think I'm worse off..
1st
I think I might understand this a little better if she used a video game examp!e
this didn't make sense at all fam
This is way too fast
.
First
Yes. Of course there is unpalatable chocolate out there. It's called chocolate.
Just use SPSS
The 4 letters of thesis nightmares.
No, use R lol
SPSS is waaay to expensive lol