Hi, Josh! I just wanted to say thank you for these videos! The way you explain concepts has been honestly life changing for me (in terms of my academic career). Concepts that I've struggled with for years are finally becoming clear. I just wanted to take a moment to express my appreciation, and let you know how impactful these videos are!
This is such a breath of fresh air as opposed to the unecessarily difficult 'explanations' we have to work with in statistical analysis courses. Your videos are awesome.
Excellent vid & totally helped me again with my regression homework! One of the toughest challenges I have is writing and speaking Regression! One of your last slides around 10:29 helped me learn how to connect a positive / negative variable relationship with R2...love you guys, seriously!
When I saw "is the mean wweight the best way to predit mouse weight", I thought, "it is stupid". And then when I see the formula of R-square, I found that "I was stupid". Awesome videos and it really helps.
All stats courses any level of education must be taught like that. Otherwise for majority of the people stats is ambiguous and difficult to understand. But feel like lecturers are saying this is time consuming, we have a lot of topics to cover and etc. Luckily we have nice UA-cam channel and online documents to supplement the courses. Thanks for the great video!
BAM! Avery, I'm glad this is helpful. This is actually the first StatQuest I ever made, back in the day. I had to re-upload it yesterday due to some oddness on behalf of UA-cam, but it's still a classic and the video that got the whole thing started.
This is excellent. Why can't professors explain as well and clearly as you? I had a linear regression class yesterday and I had never even heard about variation before, only standard deviation. I didn't know the reason it was squared either. Thanks a lot
Holy mother of god THANK YOU for this video, I was looking online at a bunch of websites (some paywalled) and none of them explained them as well as this video. Thank you for providing examples and explaining the how rather than the what. 😁😁
@ 03:30 How did you choose which line (which angle, starting point) to fit to the data? Shouldn't there be a method to find a line so that the line's R squared equals plain old R's squared?
There is an analytical method, meaning an equation we can plug our data in to get a result, that will give us the line the minimizes the sum of the squared residuals. The line that minimizes the sum of the squared residuals is defined as the best fitting line. Alternatively, we can use an iterative method like Gradient Descent to find the best fitting line. For details on Gradient Descent, see: ua-cam.com/video/sDv4f4s2SB8/v-deo.html
Sorry you had trouble and I hope it never, ever happens again. It was very, very frustrating from my end since I've tried to hard to make my videos free for the world.
thanks for the nice explanation. I wonder what is the difference between R2 formulation the one you explained and this one --> , R2 = 1 - SSE / SST, where SSE is sum of squared errors, and SST is sum of data variance.
Hi, I see a lot of your Analytics videos are repeated. Are these refreshed with new info or simply repeated? Do I need to watch both or just the newest one?
They are the same. For some strange reason, about a year ago some of my videos got stuck behind a paywall. So I re-uploaded all of the videos behind the paywall so that they would, once again, be available to everyone for free. It now seems that whatever freak event happened back then has become undone, so now I have 2 copies of a handful of videos.
Hi Sir I am madly addicted to your WAY OF EXPLAINING I personally owe you a lot I love math, the way you quest it recently I was researching on DEA as you surely know data envelopment analysis I now, know what does it mean and how to calculate it. can even pyomo code it. use it blindly ... but WHAT IS THE MAIN IDEA BEHIND DEA? Clearly Explained... searched the web there is no remarkable article or video etc I was thinking if you could make such genius video
Yep. That's what this video was originally intended to explain - how R^2 relates to linear regression. That's why we compare the fitted straight line to a horizontal line at the mean.
How to apply it in multivariable linear regression? Calculate R^2 for each feature vs the dependant variable? Could it then be used as a feature selection method? Is that what is called Pearson correlation?
For multivariable linear regression, are still comparing the model (the fit line) to the mean of the values on the y-axis. For more details, see: ua-cam.com/video/nk2CQITm_eo/v-deo.html
I have a question: in some cases I get an out of sample R squared which is negative, for example with multiple linear regression or even simple one-variable linear regression. Does that tell me the model is less capable of predicting the response compared to a simple mean? While in sample, there is there no difference between the R squared of a simple linear regression and the square of Person's correlation between two variables?
I'm not sure I understand what you mean by "out of sample" and "in sample", but if you are calculating R^2 using data the model was not originally fit to, then it is possible to get negative values.
@@statquest ah I see! I meant that sometimes I would fit a model on a training set, and among the metrics to evaluate its performance on a dev/test set I would use the R squared, occasionally obtaining negative values. But I see now that it's a pretty different scope compared to the one proposed in your video, since I'm not trying to measure how related two variables are, but rather trying to evaluate a model! Thank you for your reply btw!!
Is variance different from variation? At 2:15 we find the sum of the squared differences but we don't divide it by the number of observations - 1. Is there a reason for this?
In this case we don't need to divide by n-1 because the denominators will cancel out, leaving us with just the numerators. So we save our selves a step and omit it.
It depends on the model. If the model only contains a single variable, X, then R-squared tells us the % of variance explained by the model, or X. Both are true. However, we can also calculate R-squared for models with many variables. For details, see: ua-cam.com/video/nk2CQITm_eo/v-deo.html and ua-cam.com/video/zITIFTsivN8/v-deo.html
I love Statquest videos however, this video had me confused. I tried to study R-Squared from other sources and they told me a different formula which was, R squared = 1-(SSR/SST). Are there different kinds of R squared used in different situations?
It's the same formula, just written differently. However, you can do the algebra and show that they are equal to each other. See: en.wikipedia.org/wiki/Coefficient_of_determination
i have a doubt this R square is used to test the accuracy of our model, and it is also used to select the parameters for our model, it will be very helpful if you can come up with a video explaining how to create a full fledged model with proper steps
The square of correlation coefficient (i.e., predicted and true values) is equal to "R squared" only in linear regression, and not in any other regression like decision tree regressor, support vector regressor, THIS is not mentioned in the video?
Var(mean) and var(line) are numbers that are calculated by the sum of squares residuals. For example, for the var(mean), what you do is you find the difference between the mean and every point, square those, and then sune them up. In the video, this comes out to 32. Similarly, for the var(line) you find the difference between the points and the line, squaring, and summing
@@statquest apologies, it was my attempt at humour. I'm sure it's part of your earlier series that you've re-uploaded recently. The video is fantastic in content.
Yes. Something weird happened to the original and now it is behind a paywall. I contacted UA-cam and they said there was nothing I could do about it, so I had to re-upload. Sorry for the trouble.
@@statquest In other thing.... what would you think of Statquest en Español! (pum!, the most spanish onomatopeia for bam!) I could help in the translation
@@rubenestebangarciagomez7040 I think it would be great and it's a dream of mine that I want to come true. I've even been trying to learn spanish on my own (but I'm a slow learner). For StatQuest, I've been using AI to create overdubs for my new videos and I think it is OK. If it's good enough, the cool thing is that it can be used for a ton of different languages.
How do I get access to wach some of the videos labeled "Pay to watch" such as ua-cam.com/video/nk2CQITm_eo/v-deo.html. Do I have to become a certain level member or just pay for the video itself?
I've contacted UA-cam and am trying to do everything I can to fix this problem. In the mean time, I've re-uploaded that video so that you can still watch it for free: statquest.org/video-index/ NOTE: Whenever you see a note saying you have to pay to watch a video, just scroll down to the first pinned comment and you will see a link to a free version.
I hate to be a smart ass but I think you are wrong, R^2 COULD BE NEGATIVE, a simple example is if you have a very bad regressor that way too away from all training points, then the variance could be very very large, so variance of the mean minus variance of the model could be negative, the video here is very misleading.
You are correct. However, when I made this video I was thinking of R-squared only in the context of linear regression, and in that context, R^2 can't be negative. In that context, the worst your model can do is the mean of the y-axis variable.
Support StatQuest by buying my book The StatQuest Illustrated Guide to Machine Learning or a Study Guide or Merch!!! statquest.org/statquest-store/
👍
Hi, Josh! I just wanted to say thank you for these videos! The way you explain concepts has been honestly life changing for me (in terms of my academic career). Concepts that I've struggled with for years are finally becoming clear. I just wanted to take a moment to express my appreciation, and let you know how impactful these videos are!
@@DrOats22 Thank you very much! :)
This is such a breath of fresh air as opposed to the unecessarily difficult 'explanations' we have to work with in statistical analysis courses. Your videos are awesome.
Wow, thank you!
You're videos are the single greatest resource for my education on machine learning and AI. If I lost access to your videos, I would be devastated.
Glad you like them!
one of the most well explained about R, thanks for sharing! no time wasted in this video!
Thank you!
Excellent vid & totally helped me again with my regression homework! One of the toughest challenges I have is writing and speaking Regression! One of your last slides around 10:29 helped me learn how to connect a positive / negative variable relationship with R2...love you guys, seriously!
Glad it was helpful!
Yes!! Thanks for this. You are saving grad students around the world!
Happy to help!
And former grad students who haven't touched linear regression in 25 years! :) What a great concise refresher. BAM!
This is just what I was expecting from an explanation of what R-squared is. Thank you very much for making it clear and simple
Glad it was helpful!
It's INSANE how clear this is, thank you!
Thank you! :)
clicked for the title, stayed for the content. thanks for this
bam!
Your videos are the most helpful and easiest to follow!
Glad you like them!
Beautifully explained! Loved the “Correlations close to 0 are lame “😂
:)
When I saw "is the mean wweight the best way to predit mouse weight", I thought, "it is stupid". And then when I see the formula of R-square, I found that "I was stupid". Awesome videos and it really helps.
bam!
All stats courses any level of education must be taught like that. Otherwise for majority of the people stats is ambiguous and difficult to understand. But feel like lecturers are saying this is time consuming, we have a lot of topics to cover and etc. Luckily we have nice UA-cam channel and online documents to supplement the courses. Thanks for the great video!
Thank you very much! I appreciate it.
Josh, I'm literally teaching my students this today! Going to refer them to this video.
BAM! Avery, I'm glad this is helpful. This is actually the first StatQuest I ever made, back in the day. I had to re-upload it yesterday due to some oddness on behalf of UA-cam, but it's still a classic and the video that got the whole thing started.
This is excellent. Why can't professors explain as well and clearly as you? I had a linear regression class yesterday and I had never even heard about variation before, only standard deviation. I didn't know the reason it was squared either. Thanks a lot
Thanks!
Thank you so much!!! You explain these concepts so easily!! Saving lives one video at a time 😁💕
Thank you!!! :)
people have no idea how much of a gold this video is
Thank you! :)
You keep this up and I’ll have to forward my tuition to your address.
BAM! :)
Holy mother of god THANK YOU for this video, I was looking online at a bunch of websites (some paywalled) and none of them explained them as well as this video. Thank you for providing examples and explaining the how rather than the what.
😁😁
Glad I could help!
mind blown. amazingly well explained thank you!
Thank you!
Incredible explainations. I'm so glad I found this chanel/book!
Thank you!
That's so intuitive! You really save my Midterm
Thanks!
Excellent explanation. Consider this comment as 1million likes.❤❤
Thank you very much! :)
Thank you UNC-Chapel Hill for saving my life on my AP Stats test. I hope my EA is accepted.
BAM! Congratulations and good luck!
Just awesome plain explanation 🎉
Thank you!
Very clearly explained. Thank you
Thank you!
StatQuest is the best thing to come out of UNC since MJ
TRIPLE BAM! :)
@ 03:30 How did you choose which line (which angle, starting point) to fit to the data?
Shouldn't there be a method to find a line so that the line's R squared equals plain old R's squared?
There is an analytical method, meaning an equation we can plug our data in to get a result, that will give us the line the minimizes the sum of the squared residuals. The line that minimizes the sum of the squared residuals is defined as the best fitting line. Alternatively, we can use an iterative method like Gradient Descent to find the best fitting line. For details on Gradient Descent, see: ua-cam.com/video/sDv4f4s2SB8/v-deo.html
This is a good video. Funny, yet informative.
Glad you enjoyed it!
Thank for repost this precious r-squared explanation. Yesterday i cant play this modul because of payment bla bla bla bla. Super thanks !
Sorry you had trouble and I hope it never, ever happens again. It was very, very frustrating from my end since I've tried to hard to make my videos free for the world.
Great clear explanation! Thanks!
Glad it was helpful!
very clear and concise
Thanks!
Very clear and helpful, thank you
Thanks!
Thank you for this video! I have a much better understanding now
Glad it was helpful!
Thank you so much for explaining everything in easier way !
Thanks!
Thank you. Very useful.
Glad it was helpful!
thank you so much, subscribing right now!
Thank you!
Thank you so much and thank you UNC Chapel Hill for enabling you to make these
bam! :)
This was wonderful. Thank you so much!
Glad you enjoyed it!
Amazing Explanation.
Great video. Thank you
Thanks!
Time spent sniffing a rock 🤣🤣🤣
:)
Banger intro, man
Thanks!
Such a beautiful explanation. Thank You! :-)
You're very welcome!
thanks for the nice explanation. I wonder what is the difference between R2 formulation the one you explained and this one --> , R2 = 1 - SSE / SST, where SSE is sum of squared errors, and SST is sum of data variance.
There is no difference. One formula can be derived directly from the other.
Hi, I see a lot of your Analytics videos are repeated. Are these refreshed with new info or simply repeated?
Do I need to watch both or just the newest one?
They are the same. For some strange reason, about a year ago some of my videos got stuck behind a paywall. So I re-uploaded all of the videos behind the paywall so that they would, once again, be available to everyone for free. It now seems that whatever freak event happened back then has become undone, so now I have 2 copies of a handful of videos.
🤣🤣 The Intro . I'm enjoying stats thanks to you
:)
Starmer = Hero
Thank you! :)
just beautiful!!
Thank you!
Hi thanks for your videos! Any chance is there a statquest for adjusted R-squared?
I mention it in my video on linear regression: ua-cam.com/video/nk2CQITm_eo/v-deo.html
Hi Sir
I am madly addicted to your WAY OF EXPLAINING
I personally owe you a lot
I love math, the way you quest it
recently I was researching on DEA as you surely know data envelopment analysis
I now, know what does it mean and how to calculate it. can even pyomo code it. use it blindly ...
but
WHAT IS THE MAIN IDEA BEHIND DEA?
Clearly Explained...
searched the web
there is no remarkable article or video etc
I was thinking if you could make such genius video
I'm glad you like my videos and I'll keep that topic in mind.
r^2 = R^2 holds only for simple linear regression as I know, please correct me if i am wrong.
Yep. That's what this video was originally intended to explain - how R^2 relates to linear regression. That's why we compare the fitted straight line to a horizontal line at the mean.
@@statquest Thanks
this makes sm sense tysm
bam! :)
I can't believe this videos are fresh new. I'm sorry for everyone who had to give Statistics without watching these first
BAM! :)
How to apply it in multivariable linear regression? Calculate R^2 for each feature vs the dependant variable? Could it then be used as a feature selection method? Is that what is called Pearson correlation?
For multivariable linear regression, are still comparing the model (the fit line) to the mean of the values on the y-axis. For more details, see: ua-cam.com/video/nk2CQITm_eo/v-deo.html
I have a question: in some cases I get an out of sample R squared which is negative, for example with multiple linear regression or even simple one-variable linear regression. Does that tell me the model is less capable of predicting the response compared to a simple mean? While in sample, there is there no difference between the R squared of a simple linear regression and the square of Person's correlation between two variables?
I'm not sure I understand what you mean by "out of sample" and "in sample", but if you are calculating R^2 using data the model was not originally fit to, then it is possible to get negative values.
@@statquest ah I see!
I meant that sometimes I would fit a model on a training set, and among the metrics to evaluate its performance on a dev/test set I would use the R squared, occasionally obtaining negative values. But I see now that it's a pretty different scope compared to the one proposed in your video, since I'm not trying to measure how related two variables are, but rather trying to evaluate a model! Thank you for your reply btw!!
Bring back stat quest
I hope to have some new stuff out soon.
Is variance different from variation? At 2:15 we find the sum of the squared differences but we don't divide it by the number of observations - 1. Is there a reason for this?
In this case we don't need to divide by n-1 because the denominators will cancel out, leaving us with just the numerators. So we save our selves a step and omit it.
@@statquest Thank you! It's so obvious now that you pointed it out lol
thanki you so much.
Thanks!
This variation around the mean/regression line that you speak of, is that the mean squared error?
It's related: stats.stackexchange.com/questions/140536/whats-the-difference-between-the-variance-and-the-mean-squared-error
you are very good
Thanks! 😃
yay more new videos ☺️
:)
Nice video, but Is var(x) supposed to be the variation or the variance?
Variation and variance are often used interchangeably and, in this case, it's OK.
Thanks ! Ques: is R squared the % of y variance explained by X or explained by the model( regression equation) ?
It depends on the model. If the model only contains a single variable, X, then R-squared tells us the % of variance explained by the model, or X. Both are true. However, we can also calculate R-squared for models with many variables. For details, see: ua-cam.com/video/nk2CQITm_eo/v-deo.html and ua-cam.com/video/zITIFTsivN8/v-deo.html
thanks bro
Any time!
Sometimes a single video is better than a whole pdf
:)
I love Statquest videos however, this video had me confused. I tried to study R-Squared from other sources and they told me a different formula which was,
R squared = 1-(SSR/SST). Are there different kinds of R squared used in different situations?
It's the same formula, just written differently. However, you can do the algebra and show that they are equal to each other. See: en.wikipedia.org/wiki/Coefficient_of_determination
@@statquest Thanks. Thats helpful. I will try that.
i have a doubt this R square is used to test the accuracy of our model, and it is also used to select the parameters for our model, it will be very helpful if you can come up with a video explaining how to create a full fledged model with proper steps
See: ua-cam.com/video/u1cc1r_Y7M0/v-deo.html and ua-cam.com/video/hokALdIst8k/v-deo.html and ua-cam.com/video/Hrr2anyK_5s/v-deo.html
@@statquest wow thanks didn't saw old videos great ❤❤❤❤
Ty
:)
If I only know the angle between the two lines, Will I be able to find the R2 value? (Like Tan theta?)
No.
10:00 explains 25% of original varaition means , 25% less variation compared to that of mean line. right?
coeffficient of correlation is square root of coefficient of determination ? 🙂
Yep, 25% less variation around the regression line than around the mean.
I loved the video! I would like to give this video ten likes!
Thank you!
Stat Quest ✊
bam! :)
You are the boss
Thanks!
Hi Josh, can you also explain the F test?
Sure, see: ua-cam.com/video/nk2CQITm_eo/v-deo.html and ua-cam.com/video/NF5_btOaCig/v-deo.html
The square of correlation coefficient (i.e., predicted and true values) is equal to "R squared" only in linear regression, and not in any other regression like decision tree regressor, support vector regressor, THIS is not mentioned in the video?
That is correct. When I made this video, way back in early 2015, I only had linear regression in mind.
not all heroes wear capes
:)
I'm repeating my question from the original video here:
4:21 I do not understand how this - var(blue line) - is calculated manually.
Thank you.
You may actually want to watch the whole linear regression playlist: ua-cam.com/play/PLblh5JKOoLUIzaEkCLIUxQFjPIlapw8nU.html
@@statquest You replied so quickly. I will look at this, thank you!
Awesome!!!
Thanks!!
How did he get the var(mean) of 32 and the var(line) 32? are they just points?
Var(mean) and var(line) are numbers that are calculated by the sum of squares residuals. For example, for the var(mean), what you do is you find the difference between the mean and every point, square those, and then sune them up. In the video, this comes out to 32. Similarly, for the var(line) you find the difference between the points and the line, squaring, and summing
You can also see: ua-cam.com/video/SzZ6GpcfoQY/v-deo.html
DOUBLE BAM!!!
YES!
Can you make a video explaining ETA squared?
I'll keep that in mind.
Nice
Thanks!
Please explain adjusted r square also
I describe adjusted R-squared in my video on linear regression, here: ua-cam.com/video/nk2CQITm_eo/v-deo.html
Noice 👍 Doice 👍 Ice 👍, ....wait, is this a re-upload?
Yes. Without telling me, UA-cam put the original behind a paywall, so I re-uploaded it so it would still be free.
@@statquest oofty doof oof oof, Noice 👍 Thanks 👍
Cool !!
Thanks!
mate can u update the resolution please.
Unfortunately updating old videos is a lot harder than you would expect. :(
Why is 4 months ago potato quality? Thank you so much for this.
What time point in the video, minutes and seconds, are you asking about?
@@statquest apologies, it was my attempt at humour. I'm sure it's part of your earlier series that you've re-uploaded recently. The video is fantastic in content.
why does this video only have the resolution of 360p?
It's super old, but people still watch it a lot.
is this a repost Josh?
Yes. Something weird happened to the original and now it is behind a paywall. I contacted UA-cam and they said there was nothing I could do about it, so I had to re-upload. Sorry for the trouble.
@@statquest In other thing.... what would you think of Statquest en Español! (pum!, the most spanish onomatopeia for bam!) I could help in the translation
@@rubenestebangarciagomez7040 I think it would be great and it's a dream of mine that I want to come true. I've even been trying to learn spanish on my own (but I'm a slow learner). For StatQuest, I've been using AI to create overdubs for my new videos and I think it is OK. If it's good enough, the cool thing is that it can be used for a ton of different languages.
@@statquest I'll try to contact you later. Even will try to sing and play ukulele intros...
I don't khow how to say thank you to be enough
:)
BAM!
:)
This is great. Can I get a BAM!!! ??
bam! :)
How do I get access to wach some of the videos labeled "Pay to watch" such as ua-cam.com/video/nk2CQITm_eo/v-deo.html. Do I have to become a certain level member or just pay for the video itself?
I've contacted UA-cam and am trying to do everything I can to fix this problem. In the mean time, I've re-uploaded that video so that you can still watch it for free: statquest.org/video-index/ NOTE: Whenever you see a note saying you have to pay to watch a video, just scroll down to the first pinned comment and you will see a link to a free version.
So there's a 6% correlation between sniffing rocks and a mouse's weight? Lol
:)
💚
:)
Time spent sniffing a rock 😂😂😂
bam! :)
First! Bam
:)
This is a re-upload from 8-years ago.
Yep. For some reason the original ended up behind a paywall, so I had to re-upload it.
I hate to be a smart ass but I think you are wrong, R^2 COULD BE NEGATIVE, a simple example is if you have a very bad regressor that way too away from all training points, then the variance could be very very large, so variance of the mean minus variance of the model could be negative, the video here is very misleading.
You are correct. However, when I made this video I was thinking of R-squared only in the context of linear regression, and in that context, R^2 can't be negative. In that context, the worst your model can do is the mean of the y-axis variable.
He might be meaning the correlation coefficient, r
@@user-ff5sx6pg3d Slightly so but insignificant in practice. Not very misleading as you try to put it