Sum of squared residuals also called the sum of squared Errors is SSE and Sum of squared Regression is SSR just make sure about this since new students can get confused. Y = individual data points, Yreg = predicted Regression points Ymean = Average of Individual data points SSE = Y - Yreg SSR = Yreg - Ymean so, SST = SSE + SSR = Y - Ymean
Initially N=1000 R^2=0.85 p=5 (initially) adjusted R_Squared = 1 - ((1-0.85)(1000-1)/(1000-5-1)) = 0.9849 1. suppose a new non-correlated variable is added: N=1000 R^2=0.86 (suppose new R^2) p=6 (new) adjusted R_Squared = 1 - ((1-0.86)(1000-1)/(1000-6-1)) = 0.8591 2. suppose a new correlated variable is added: N=1000 R^2=0.92 (suppose new R^2) p=6 (new) adjusted R_Squared = 1 - ((1-0.92)(1000-1)/(1000-6-1)) = 0.9195 As we can notice, on adding a non-correlated predictor, the overall adjusted R_squared has decreased while it has increased on adding a correlated predictor. Hope it helps!
Hi Krish, Nicely explained. But have a query. R-square will always increase whether calculated against significant or insignificant feature. So, there is no thing that R-sq will be less for non-corelated features and more for corelated ones, like it will increase blindly. So, how can you say that R-adj will decrease when added attributes are non-corelated as R-sq will still increase, making R-adj = 1 - smaller_number ? I hope my question is bit clear. Thanks n respect sir!! (v).
Nayek sir p is total independent features or those independent features which we have added later? Also, can we say that N is total number of columns in the data set? is so then, should we count those columns also which have irrelevant data like ticket serial number or passenger name in titanic dataset?
I didn't get one thing that even in Adjusted R2, whether there's correlation or not is not taken into consideration. So, by just considering number of variables, how correlation issue gets addressed?
Krish R-square will increase in both of the cases whether the variable is correlated with dependent variable or not. hence it result in decrease in Adj R-Squarein both of the case. However the magnitute will be difference.
Hello sir, I am making a project on income and health expenses, my r-squared value comes out less than 1%. What should i interpret from this? Should i change my linear model or try other? What should i do?
you should add another feature which is correlated to the target variable. Low R-squared means that your independent feature and target variable are not correlated. You can confirm this by computing the correlation between them
Which variable in the R^2 adjusted is equation has related to correlation. it is not R^2 and all other variable have nothing to do with correlation. Is it the ratio of (n-1)/(n-p-1)?
Good morning sir. Please do upload a video with explanation of what exactly is p-value. Getting confused with it. I hope atleast your explanation would give more clarity.
In adjusted r2, their is r2 But whether the feature is correlated or not the r2 value will increase than how we are able to say something about adjusted r2
good day sir, I just wanted to ask if an independent variable is not significant or does not have an explanatory power to the model but when removing it lowers the adjusted r-square what does this imply? so far the reason that i know the reason is because the t-statistic is greater than one. With this information, what can we infer?
Let's say I have 10 features and some R square value is calculated. Later it found that 4 of the features are uncorrolated with the target. Now 1-R2 value is not going to change and so does the adjusted R2 value. Can u correct me if I'm analyzing it wrong hoping it would follow the simple linear regression model not the lasso
Sir, As you said that in order to avoid negate values in the residuals we squared the terms SSres and SStot , but sir if we apply mode on both values neglecting squared both terms , what will be the change in R values ?? On squaring the R value its getting larger which is reaching towards 1 more easily that depicts our model has fitted well . please answer sir .
You said by using 1st formula that even if independent feature is not related, r^2 value increses .that was the drawback. But at 14.18 sec of video you are saying if the feature is not related then we would get smaller r^ value from 1 st formula. I got confused here. Please solve my confusion. I will be glad. Please🙌🙌🙏
No.even if the feature is not correlated to output variable,the value of r square will increase, thats why we uses the adjusted r square..if the feature is not correlated, value will decrease.... May be he said that by mistake
he meant that for the same features, if they are correlated with the target variable, you will get a higher R2 value and a smaller value if they are uncorrelated.
Hi Krish, At the end of your each sentence while explanation please make the same rhythm of the speech. What happen here is at the end of your sentence you make your voice very low so this creates confusion while listening.
Could you please explain with any example from scratch with multi output in regression?? I want to predict 2 output (distance travelled and velocity) from the dataset.
If I have 10 features and if I need to know which feature is affecting output y and which is not affecting y. Do I need to find correlation between y and each feature separately. If yes , then how? If not , then what to do? Krish please reply. Thanks
You need to perform chi square test if both IP&Op variables are categorical and ANOVA for cat ,cont variables ,finally Pearson correlation for both continuous ...!!!
you have many way to find , firstly you can find correlation between them using heatmap or corr method, secondly you an find the VIF value of the features , last way you can check your standard error by using OLS method.
No bro, That will depend whether the features getting added are correlated or not. If the features getting added are not correlated with the target variable then the adjusted R square will decrease, however if they are correlated then naturally adjusted R square will also increase.
Adding multiple feature will automatically increase the r square, as increasing feature decreases the value of SSres.even if the feature is not related to the output variable. Adding multiple feature to our model can perform better in sample than when tested out of sample.So in such case adjusted r square works
i am not 100% sure if this is correct when you say it needs to be squared (Actual - Predicted) because of negative value but i suspect its for the outliers
Sir at last of the video you said that r^2 will never be decreasing on increase of independent features even if the that feature is not correlated , then how can you say that adjusted R^2 will decrease when R^2 is less (at 14:16) which will never be true according to the fact that R^2 will always be increasing then how can it be less It have actually confused me Plz help if anyone knows
1) If added features are correlated with target, R2 grows much fater compared to denominator term containing number of features ( p). Hence Adj. R2 also increases. 2) If added features are not correlated or less correlated with target, then R2 grows slower compared to denominator term containing number of features ( p). Hence Adj. R2 will increase a little, but will not have any significant rise.( NOTE: Adj R2 Does not decrease) That is what is called as penalized. Not allowed to grow at same rate as that of correlated features case.
What do we do next if we get to know that r-square is small ? Yeah it says the model isn't a good fit but is there any way we can improve the model after getting to know the r squared is less or we use some other method to solve this model
Hey, I didn't get the term Penalizing. In the video just before explaining Adjusted R square, it was said that "it is not Penalizing the new added features". Can someone please elaborate.
This is the problem with our education system...everything is just formula based...you started off with the formula without even giving any intuition about what actually R2 and adjusted R2 mean...what does a 50% R2 tell you...formula and maths always come last...you should first make your students visualize what these terms mean without using any maths at all...once they are good with it...then you bring the formula
Still not clear for me, can anyone help me out. In case of un-correlated or correlated variable, If p increases then N will also increase, R2 obviously increase, then how its penalizing?
If these two are different then why do all say that r-sqaure and adjusted r-sqaure both are same and while seeing the ouput we always see the adjusted r-square.
R-Squared and Adj R-Squared are NOT the same. For Simple Linear Regression, the R-Squared and Adj. R-Squared values will almost be similar. You can just check the R-Squared value to evaluate your model's goodness of fit. For multiple Linear Regression, you will find that no matter what, the R-Squared value will keep increasing as you add new features (even if the new feature is not correlated to the dependent variable). This leads you to believe that the new feature (independent variable) you've added is contributing to building a better model, which is not the case. The adjusted R-Squared function provides a penalty mechanism that reduces the overall value if the new feature is not contributing to the model. This metric is usually considered to evaluate the goodness of fit (in the case of Multiple Linear Regression), especially when you're using a Feature Selection method like Step-Wise Regression.
not a satisfactory explanation as to how R adjusted takes care of non correlated value, just hacking the formula doesnt make it very clear. The intuition and the reason for adding sample size is not explained properly. Overall not a good explanation
Little Confusing for the use of Adjusted Rsquare !.. So when we add more independent variables to model, the Rsquare will always make sure to increase, then Adjusted Rsquare checks if independent variables is not correlated to the target variable and minimize Rsquare value. Does that mean while feature selection, we should take those independent features that are correlated to target/output variable and drop other..? Aren't we supposed to take those independent variables in model that are not correlated with each other and they are independent, so why penalizing them which are not correlated !! For independent variables that are correlated, we could drop them !
Correct yourself R-squared = SumSquareRegression/SumSquareTotal and this entity cannot be negative. SST = SSR + SSE. So SST > SSE , there is no chance of R-squared to be negative. This what happens when you are teaching without have good understanding of concepts behind them. You have more than 150K subscribers and do not mislead them From mathematical stand point R-square is the ratio of variation explained due to the model to variation in the data
𝑅2 compares the fit of the chosen model with that of a horizontal straight line (the null hypothesis). If the chosen model fits worse than a horizontal line, then 𝑅2 is negative. Note that 𝑅2 is not always the square of anything, so it can have a negative value without violating any rules of math. 𝑅2 is negative only when the chosen model does not follow the trend of the data, so fits worse than a horizontal line. Example: fit data to a linear regression model constrained so that the 𝑌 intercept must equal 1500 i.stack.imgur.com/CHpzE.png The model makes no sense at all given these data. It is clearly the wrong model, perhaps chosen by accident. The fit of the model (a straight line constrained to go through the point (0,1500)) is worse than the fit of a horizontal line. Thus the sum-of-squares from the model (𝑆𝑆reg) is larger than the sum-of-squares from the horizontal line (𝑆𝑆tot). 𝑅2 is computed as 1−𝑆𝑆reg𝑆𝑆tot. When 𝑆𝑆reg is greater than 𝑆𝑆tot, that equation computes a negative value for 𝑅2 . With linear regression with no constraints, 𝑅2 must be positive (or zero) and equals the square of the correlation coefficient, 𝑟. A negative 𝑅2 is only possible with linear regression when either the intercept or the slope are constrained so that the "best-fit" line (given the constraint) fits worse than a horizontal line. With nonlinear regression, the 𝑅2 can be negative whenever the best-fit model (given the chosen equation, and its constraints, if any) fits the data worse than a horizontal line. Bottom line: a negative 𝑅2 is not a mathematical impossibility or the sign of a computer bug. It simply means that the chosen model (with its constraints) fits the data really poorly.
This person has put in a great degree of time and effort which is an indication of his passion. The reason he has 150K subscribers is that the followers are able to make sense of what he is saying. And dude, logically what will he gain by misleading them. Is he preaching some religion???? I checked your UA-cam channel...surprised that you are commenting without having uploaded a single video?? I recommend that first of all we learn to appreciate the person and even if there is a mistake in something he is saying(to err is human!), lets show some humility in pointing it out.
@@jagannathgirisaballa Hi I understand that you no idea about ML or stats. I dont need videos to be uploaded to comment on others videos. Anyway I have Phd in ML/Computer Vision. I dont want get into fight with you . Chill and follow his Videos.
Buddy chill...whatever I explain is based on the practical experience...so that means I have proof of everything I do. Any how u r highly qualified, I think u should share your knowledge with everyone...I would also love to see some implementations from your end..and Yes I do not mislead anyone..You can check my linkedin profile, and these videos have helped people to clear interviews. Anyhow it has not helped you, I am sorry about it. So in conclusion misleading is a very wrong term to use over here. Being a highly qualified guy like you, it doesn't suit you at all. Cheer stay safe and healthy. I would also suggest u to go through this link stats.stackexchange.com/questions/12900/when-is-r-squared-negative
@@machinelearningchefs3525 bro, I will be the first person to accept that I have no idea of ML or stats. And that's my excuse of being here and watching the video. So, bro with a PhD, whats your excuse of being here and watching the video? Checking out the opposition? :-) anyways, peace brother. I am here for learning and would love to learn from anyone..apologies if my comment hurt your feelings. not intentional.
Sum of squared residuals also called the sum of squared Errors is SSE and Sum of squared Regression is SSR just make sure about this since new students can get confused.
Y = individual data points, Yreg = predicted Regression points Ymean = Average of Individual data points
SSE = Y - Yreg
SSR = Yreg - Ymean
so,
SST = SSE + SSR
= Y - Ymean
Thank you, now understood well
@Ahmed Kellen didn't they ask money
Hey can you help me the 'N' here, is it the total number of features or the total number of data points.
@@ShashwatAgarwal007 big N is total number of population and small n is total number of samples which we take from population
Initially
N=1000
R^2=0.85
p=5 (initially)
adjusted R_Squared = 1 - ((1-0.85)(1000-1)/(1000-5-1)) = 0.9849
1. suppose a new non-correlated variable is added:
N=1000
R^2=0.86 (suppose new R^2)
p=6 (new)
adjusted R_Squared = 1 - ((1-0.86)(1000-1)/(1000-6-1)) = 0.8591
2. suppose a new correlated variable is added:
N=1000
R^2=0.92 (suppose new R^2)
p=6 (new)
adjusted R_Squared = 1 - ((1-0.92)(1000-1)/(1000-6-1)) = 0.9195
As we can notice, on adding a non-correlated predictor, the overall adjusted R_squared has decreased while it has increased on adding a correlated predictor. Hope it helps!
But it decreased from the initial adj R^2, so how we find out that new feature is correlated
SSR means Sum of the Squares of the Residuals
SST - Sum of the Squares of the Total....
best teacher of ML on the youtube
I am glad I came across this tutorial. Very well explained !
very informative and useful content, lucid explaination
It's very excellent and detailed explanation for a beginner!!!
Explained in detailed manner keep doing
Hi Krish, Nicely explained. But have a query. R-square will always increase whether calculated against significant or insignificant feature. So, there is no thing that R-sq will be less for non-corelated features and more for corelated ones, like it will increase blindly. So, how can you say that R-adj will decrease when added attributes are non-corelated as R-sq will still increase, making R-adj = 1 - smaller_number ? I hope my question is bit clear. Thanks n respect sir!! (v).
I too have this doubt
Nayek sir
p is total independent features or those independent features which we have added later?
Also, can we say that N is total number of columns in the data set?
is so then, should we count those columns also which have irrelevant data like ticket serial number or passenger name in titanic dataset?
Can you please explain how the SSres will decrease as we try to add a new independent variable?
Wow.. thanks so much Krish. This was the best explanation i found
I didn't get one thing that even in Adjusted R2, whether there's correlation or not is not taken into consideration. So, by just considering number of variables, how correlation issue gets addressed?
Bahut accha somjaya sir thank you sir
N - total sample size, indicates no of rows in the model?
please tell why SS res decrease as we increase the feature
please explain ?
Very intuitive explanation..!!! You have been such an inspirational instructor ..!!!!
Thanks a lot Krish 🙂its really helpful
very helpful video, thank you sir
Thank you so much sir for your great support by making such videos.
Very interesting Krish. As always you stimulate us to think and learn.
Rsqaure meanns ssr/sst only right whay -1 before that . Just to know in some excel videos it shows only ssr/sst
Can you suggest good book for Machine Learning ?
Krish R-square will increase in both of the cases whether the variable is correlated with dependent variable or not. hence it result in decrease in Adj R-Squarein both of the case. However the magnitute will be difference.
Hello sir, I am making a project on income and health expenses, my r-squared value comes out less than 1%. What should i interpret from this? Should i change my linear model or try other? What should i do?
you should add another feature which is correlated to the target variable. Low R-squared means that your independent feature and target variable are not correlated. You can confirm this by computing the correlation between them
Which variable in the R^2 adjusted is equation has related to correlation. it is not R^2 and all other variable have nothing to do with correlation. Is it the ratio of (n-1)/(n-p-1)?
Even I have same question. There should be something more in the formula of R2 adjusted which will take correlation into account.
Sir, but if p will increase the N will also increase because they both have independent variables. So the denominator will always be zero.
N is the number of samples, not number of predictors. For the shape of dataframe (m,n) the number of samples is m and number of preictors is n.
Bhai kya karke manoge , itna simply koi kaise padha sakta hai👍
Well Explained
Good morning sir. Please do upload a video with explanation of what exactly is p-value. Getting confused with it. I hope atleast your explanation would give more clarity.
www.wikihow.com/Calculate-P-Value
In adjusted r2, their is r2
But whether the feature is correlated or not the r2 value will increase than how we are able to say something about adjusted r2
good day sir, I just wanted to ask if an independent variable is not significant or does not have an explanatory power to the model but when removing it lowers the adjusted r-square what does this imply? so far the reason that i know the reason is because the t-statistic is greater than one. With this information, what can we infer?
Thank you sir u made the things veery easy
Very interesting and excellent but requested to give examples to evaluate situations
beautiful explanation sirji
Excellent explanation.. thank u very much
Let's say I have 10 features and some R square value is calculated. Later it found that 4 of the features are uncorrolated with the target. Now 1-R2 value is not going to change and so does the adjusted R2 value. Can u correct me if I'm analyzing it wrong hoping it would follow the simple linear regression model not the lasso
All time never ever found these kind explanation.
I will not follow any howle heros except Sadhguru and You.
Sir, As you said that in order to avoid negate values in the residuals we squared the terms SSres and SStot , but sir if we apply mode on both values neglecting squared both terms , what will be the change in R values ?? On squaring the R value its getting larger which is reaching towards 1 more easily that depicts our model has fitted well . please answer sir .
hi krish,
if we add features with high error then the SSres increases , but if we add features with low error then SSres decreases
You said by using 1st formula that even if independent feature is not related, r^2 value increses .that was the drawback. But at 14.18 sec of video you are saying if the feature is not related then we would get smaller r^ value from 1 st formula. I got confused here. Please solve my confusion. I will be glad. Please🙌🙌🙏
No.even if the feature is not correlated to output variable,the value of r square will increase, thats why we uses the adjusted r square..if the feature is not correlated, value will decrease....
May be he said that by mistake
he meant that for the same features, if they are correlated with the target variable, you will get a higher R2 value and a smaller value if they are uncorrelated.
Well done
Awesome video and explaination
what are possible interpretations and justifications for low r square values in management science?
Sir it would be great it you can compliment this with an example
Hi Krish, At the end of your each sentence while explanation please make the same rhythm of the speech. What happen here is at the end of your sentence you make your voice very low so this creates confusion while listening.
Could you please explain with any example from scratch with multi output in regression?? I want to predict 2 output (distance travelled and velocity) from the dataset.
If I have 10 features and if I need to know which feature is affecting output y and which is not affecting y. Do I need to find correlation between y and each feature separately. If yes , then how? If not , then what to do? Krish please reply. Thanks
You can do Eda, do a pairplot check correlation and put on heatmap and later you can aply machine learning algo
@@deepakgehani thanks a lot. I will apply this and revert back to you in case I face any other issue. Thanks again
You need to perform chi square test if both IP&Op variables are categorical and ANOVA for cat ,cont variables ,finally Pearson correlation for both continuous ...!!!
You write in a loop all the variables and check correlation.
you have many way to find , firstly you can find correlation between them using heatmap or corr method, secondly you an find the VIF value of the features , last way you can check your standard error by using OLS method.
Great explanation Thank you
What does this mean that R square will always increase when feature is added. This means when features are increased predictions are better. Is it so?
No bro, That will depend whether the features getting added are correlated or not. If the features getting added are not correlated with the target variable then the adjusted R square will decrease, however if they are correlated then naturally adjusted R square will also increase.
Adding multiple feature will automatically increase the r square, as increasing feature decreases the value of SSres.even if the feature is not related to the output variable. Adding multiple feature to our model can perform better in sample than when tested out of sample.So in such case adjusted r square works
very well explained
i am not 100% sure if this is correct when you say it needs to be squared (Actual - Predicted) because of negative value but i suspect its for the outliers
Good explanation, but it would be better to add an example. That way it will become more clear :)
Please see if this could help you
ua-cam.com/video/3SoK930HWL0/v-deo.html
Sir at last of the video you said that r^2 will never be decreasing on increase of independent features even if the that feature is not correlated , then how can you say that adjusted R^2 will decrease when R^2 is less (at 14:16) which will never be true according to the fact that R^2 will always be increasing then how can it be less It have actually confused me Plz help if anyone knows
Yup I also have the same problem
1) If added features are correlated with target, R2 grows much fater compared to denominator term containing number of features ( p). Hence Adj. R2 also increases.
2) If added features are not correlated or less correlated with target, then R2 grows slower compared to denominator term containing number of features ( p). Hence Adj. R2 will increase a little, but will not have any significant rise.( NOTE: Adj R2 Does not decrease) That is what is called as penalized. Not allowed to grow at same rate as that of correlated features case.
Nicely explained... Can you help me with difference between Sum of Residual and Cost function? Looks like both have same formula.
Actually both are same..sum of residual is the sum of square of difference between predicted and actual data points and cost function is also same,
@@ayushmishra-sw4po Thanks Ayush!!!
Great explanation Sir!
Sir SSR means sum of squares of residuals.
Wonderful Explanation !!
i just wanna know this total sample size is total number of columns or total number of rows
sample size is total number of rows. predictors are total number of columns
Kuch samjh nhi aya
HOW U TOOK AVERAGE LINE IN GRAPH (ON WHAT BASIS?)
It's simply the arithmetic mean of target variable's "actual" values.
What do we do next if we get to know that r-square is small ? Yeah it says the model isn't a good fit but is there any way we can improve the model after getting to know the r squared is less or we use some other method to solve this model
Hyperparameter tuning
How we can say adj r square is significant or not
what is the meaning of penalize
Hey, I didn't get the term Penalizing. In the video just before explaining Adjusted R square, it was said that "it is not Penalizing the new added features". Can someone please elaborate.
What are these 33 dislikes for ? Is your language different :-D, Awesome explanation Krish, hats off
maybe in search of hindi content
Thank you sir🙏
very useful video
Thanks .. Explained beautifully
Fantastic course!. I hope you doing well sir .
Can R square be considered as training accuracy?
yes, it is a performance metric. in practice, adjusted r-score is used more often
This is the problem with our education system...everything is just formula based...you started off with the formula without even giving any intuition about what actually R2 and adjusted R2 mean...what does a 50% R2 tell you...formula and maths always come last...you should first make your students visualize what these terms mean without using any maths at all...once they are good with it...then you bring the formula
Thank you Krish that's the good explanation.
Sir, what is the meaning of penalize in terms of machine learning?
Here Panalize means er are adding extra predictor which is no use..so it will decrease the value of Adjusted R sq
@@ayushmishra-sw4po thank you so much
Awesome
Why r2 value is no decreasing when features are increasing is their any theory behind it
yes. you will always be adding either 0 or small values > 0 (because of the square) so it will either remain the same or increase.
kamal !!!!!
Hi krish can u please suggest how to explain the algorithm in interview
are they ask algorithm in interview
@@bhavyaparikh6933 yes
Still not clear for me, can anyone help me out.
In case of un-correlated or correlated variable, If p increases then N will also increase, R2 obviously increase, then how its penalizing?
N is constant here because it's number of samples vs p is number of preictors.
In which condition, SSR will be greter than SST?
As we increase the number of independent feature the value of SSR will increase
If the model prediction is worst than the average prediction we have assumed in SST
superb
thank you so much...It helped
If these two are different then why do all say that r-sqaure and adjusted r-sqaure both are same and while seeing the ouput we always see the adjusted r-square.
R-Squared and Adj R-Squared are NOT the same.
For Simple Linear Regression, the R-Squared and Adj. R-Squared values will almost be similar. You can just check the R-Squared value to evaluate your model's goodness of fit.
For multiple Linear Regression, you will find that no matter what, the R-Squared value will keep increasing as you add new features (even if the new feature is not correlated to the dependent variable). This leads you to believe that the new feature (independent variable) you've added is contributing to building a better model, which is not the case. The adjusted R-Squared function provides a penalty mechanism that reduces the overall value if the new feature is not contributing to the model. This metric is usually considered to evaluate the goodness of fit (in the case of Multiple Linear Regression), especially when you're using a Feature Selection method like Step-Wise Regression.
Since R Square is the squared value of r, then how it will get a negative value.
R square always 0 to 1. It will never ever be a negative number
There is no such value of R, only R Square is the terminology used for this formula. Check out the formula for R Square.
R is the Correlation Coefficient
R squared can be a negative value if the model is worse than average best fit line.
very well explained, thank you sir.
not a satisfactory explanation as to how R adjusted takes care of non correlated value, just hacking the formula doesnt make it very clear. The intuition and the reason for adding sample size is not explained properly.
Overall not a good explanation
Little Confusing for the use of Adjusted Rsquare !.. So when we add more independent variables to model, the Rsquare will always make sure to increase, then Adjusted Rsquare checks if independent variables is not correlated to the target variable and minimize Rsquare value.
Does that mean while feature selection, we should take those independent features that are correlated to target/output variable and drop other..?
Aren't we supposed to take those independent variables in model that are not correlated with each other and they are independent, so why penalizing them which are not correlated !! For independent variables that are correlated, we could drop them !
Particular bolna kab band kroge
Correct yourself R-squared = SumSquareRegression/SumSquareTotal and this entity cannot be negative.
SST = SSR + SSE.
So SST > SSE , there is no chance of R-squared to be negative. This what happens when you are teaching without have good understanding of concepts behind them. You have more than 150K subscribers and do not mislead them
From mathematical stand point R-square is the ratio of variation explained due to the model to variation in the data
𝑅2 compares the fit of the chosen model with that of a horizontal straight line (the null hypothesis). If the chosen model fits worse than a horizontal line, then 𝑅2 is negative. Note that 𝑅2 is not always the square of anything, so it can have a negative value without violating any rules of math. 𝑅2
is negative only when the chosen model does not follow the trend of the data, so fits worse than a horizontal line.
Example: fit data to a linear regression model constrained so that the 𝑌
intercept must equal 1500
i.stack.imgur.com/CHpzE.png
The model makes no sense at all given these data. It is clearly the wrong model, perhaps chosen by accident.
The fit of the model (a straight line constrained to go through the point (0,1500)) is worse than the fit of a horizontal line. Thus the sum-of-squares from the model (𝑆𝑆reg)
is larger than the sum-of-squares from the horizontal line (𝑆𝑆tot). 𝑅2 is computed as 1−𝑆𝑆reg𝑆𝑆tot. When 𝑆𝑆reg is greater than 𝑆𝑆tot, that equation computes a negative value for 𝑅2
.
With linear regression with no constraints, 𝑅2
must be positive (or zero) and equals the square of the correlation coefficient, 𝑟. A negative 𝑅2 is only possible with linear regression when either the intercept or the slope are constrained so that the "best-fit" line (given the constraint) fits worse than a horizontal line. With nonlinear regression, the 𝑅2
can be negative whenever the best-fit model (given the chosen equation, and its constraints, if any) fits the data worse than a horizontal line.
Bottom line: a negative 𝑅2
is not a mathematical impossibility or the sign of a computer bug. It simply means that the chosen model (with its constraints) fits the data really poorly.
This person has put in a great degree of time and effort which is an indication of his passion. The reason he has 150K subscribers is that the followers are able to make sense of what he is saying. And dude, logically what will he gain by misleading them. Is he preaching some religion???? I checked your UA-cam channel...surprised that you are commenting without having uploaded a single video?? I recommend that first of all we learn to appreciate the person and even if there is a mistake in something he is saying(to err is human!), lets show some humility in pointing it out.
@@jagannathgirisaballa Hi I understand that you no idea about ML or stats. I dont need videos to be uploaded to comment on others videos. Anyway I have Phd in ML/Computer Vision. I dont want get into fight with you . Chill and follow his Videos.
Buddy chill...whatever I explain is based on the practical experience...so that means I have proof of everything I do. Any how u r highly qualified, I think u should share your knowledge with everyone...I would also love to see some implementations from your end..and Yes I do not mislead anyone..You can check my linkedin profile, and these videos have helped people to clear interviews. Anyhow it has not helped you, I am sorry about it. So in conclusion misleading is a very wrong term to use over here. Being a highly qualified guy like you, it doesn't suit you at all. Cheer stay safe and healthy. I would also suggest u to go through this link
stats.stackexchange.com/questions/12900/when-is-r-squared-negative
@@machinelearningchefs3525 bro, I will be the first person to accept that I have no idea of ML or stats. And that's my excuse of being here and watching the video. So, bro with a PhD, whats your excuse of being here and watching the video? Checking out the opposition? :-) anyways, peace brother. I am here for learning and would love to learn from anyone..apologies if my comment hurt your feelings. not intentional.
Thank you Krish. Nice explanation.
Very well explained
Thankyou so much sir
Thanks...very well explained.