Hi , once your have the coefficients for PCR and equation will be such as y = b0 + b1*PC1+ b2*PC2 .. . Now when you deploy this model in production how do you convert this equation to original variable values equation ? do we calculate PC1 value using original data-set value and then feed in the above equation
Is it possible to test the significance of the coefficients? -instead of just getting the betas, when you multiply the rotation matrix with the coefficient matrix
This is great. at 13:53 you find the coefficients of the original variables. At that point, are you saying that this creates a function essentially: bodyfat = -4.49(Weight) -0.36(Chest) +10.41(Abdomen) - 0.46(Hip) + 1.06(Thigh) +0.5(Biceps)
The PCA model is the same as the linear model, but uses "rotated" variables, that is, new variables that are just linear combinations of the original variables. Thus, the PCA model will always give the exact same fit as the original model. The goal is to then pick some of the new variables to eliminate because they have little impact on the fit, leaving only the "principle" components.
hey sir, i have question, why dont you split your data into test and train. i wondering that, if i work on same data with vary of machine learning models, should i work with PCA data set all of them or it is just for one model like linear regression?
Hi Chris, excellent video! Just a quick question, now that you have your PCs, how do you relate them back to the original variables? How do I know which variables impact the response? Thanks
You can go back and for between the original variables and the principle components, since the PCs are linear combinations of the original variables. But PCA doesn't tell you which of the original variables are significant - you get that information from a regular regression. Instead, you want to find out which principle components are significant (that is, what linear combination of original variables have the greatest impact on the output).
Dear Professor many thanks for this tutorial, it is very helpful. However, I still find hard to understand what we can tell about our predictors on the dependent variable. Which is the next step after this PCA procedure? Since each PC is composed of each of the predictors, how can we know which of them are relevant for in this case body fat (chest, abdomen, hip etc.)? And from which PC would we rely on (I assume PC1 as is the most explicative but also the others are sig.)?
PCA will not tell you which individual variable (chest, hip, etc.) is or is not significant. It will only tell you which principle components are significant (using the standard tests for significance).
Hi Chris, this was very informative and helpful. I just have one question, as we are using scaled data and when specifying the model coefficients in terms of original variables, how can we unscale the coefficients of the model?
First of all, thanks Chris for the time you spend teaching statistics... Students need more people like you. I wanted to ask you something. In this video you explain how to make a regression on Principal Components, don't you? My doubt is how to know if some of the explanatory variables are meaningless with this procedure. That is, in your example, you finally find the coefficients for the 6 explanatory variables you used (weight, chest, abdomen, hip, thigh and biceps) after making a linear model among bodyfat (Y) and the Principal Components 1, 2, 4 and 5 (they were significant). You find the coefficients -4.48, -0.35, 10.40, -0.45, 1.05 and 0.50 for weight, chest, abdomen, hip, thigh and biceps respectively. Can we know if some of those coefficients are non-significant, and thus, they are not related to bodyfat? Because in the video you test the significance of the coefficients for the model between Y (bodyfat) and the Principal Components, but not for the coefficients that you get later for the explanatory variables. Or due to the method (Regression on Principal Components), we can't exclude any of the explanatory variables? I'm just curious about how to interpret this. Thanks in advance!!!! :)
Of course, not every problem involves linear modeling (linear in the coefficients of the model), but many do. And these are real problems, important problems. For nonlinear modeling, difference approaches are required.
Hi , once your have the coefficients for PCR and equation will be such as y = b0 + b1*PC1+ b2*PC2 .. . Now when you deploy this model in production how do you convert this equation to original variable values equation ? do we calculate PC1 value using original data-set value and then feed in the above equation
thank you. The best I have seen on PCA.
Is it possible to test the significance of the coefficients? -instead of just getting the betas, when you multiply the rotation matrix with the coefficient matrix
Thank you so much for your video Chris! That was truly one lifesaver!
I want to run a regular regression and compare the model with the a regression of PCA with higher components. How can I do it ?
Do you not need to rescale the values form production after you get back to the original coefficients?
How to make a PCA with three components?
This is great. at 13:53 you find the coefficients of the original variables.
At that point, are you saying that this creates a function essentially:
bodyfat = -4.49(Weight) -0.36(Chest) +10.41(Abdomen) - 0.46(Hip) + 1.06(Thigh) +0.5(Biceps)
There are some way to get coeff to real data (not standardized)?
why are the r square for both pca regression and linear model the same?
The PCA model is the same as the linear model, but uses "rotated" variables, that is, new variables that are just linear combinations of the original variables. Thus, the PCA model will always give the exact same fit as the original model. The goal is to then pick some of the new variables to eliminate because they have little impact on the fit, leaving only the "principle" components.
hey sir, i have question, why dont you split your data into test and train. i wondering that, if i work on same data with vary of machine learning models, should i work with PCA data set all of them or it is just for one model like linear regression?
For a discussion of data splitting, see Lecture 62: Building Models. ua-cam.com/video/9mRCOnRbGTw/v-deo.html
Thanks Professor for the informative video. I still wonder how to interpret the results!!!
This was great, beautifully explained.
Hi Chris, excellent video! Just a quick question, now that you have your PCs, how do you relate them back to the original variables? How do I know which variables impact the response? Thanks
You can go back and for between the original variables and the principle components, since the PCs are linear combinations of the original variables. But PCA doesn't tell you which of the original variables are significant - you get that information from a regular regression. Instead, you want to find out which principle components are significant (that is, what linear combination of original variables have the greatest impact on the output).
hi, very informative video. nd helpful for me. i just have one qn.. how do you know which variables make up PC1 and which PC2 etc
Thank you for video, really helpful. especially for last night study
Dear Professor many thanks for this tutorial, it is very helpful. However, I still find hard to understand what we can tell about our predictors on the dependent variable. Which is the next step after this PCA procedure? Since each PC is composed of each of the predictors, how can we know which of them are relevant for in this case body fat (chest, abdomen, hip etc.)? And from which PC would we rely on (I assume PC1 as is the most explicative but also the others are sig.)?
PCA will not tell you which individual variable (chest, hip, etc.) is or is not significant. It will only tell you which principle components are significant (using the standard tests for significance).
Thanks so much! Very easy to follow and understand.
Hi Chris, this was very informative and helpful. I just have one question, as we are using scaled data and when specifying the model coefficients in terms of original variables, how can we unscale the coefficients of the model?
Ridge Regression in R please .. ( effect Multicollinearty by ridge regression )
First of all, thanks Chris for the time you spend teaching statistics... Students need more people like you.
I wanted to ask you something. In this video you explain how to make a regression on Principal Components, don't you? My doubt is how to know if some of the explanatory variables are meaningless with this procedure. That is, in your example, you finally find the coefficients for the 6 explanatory variables you used (weight, chest, abdomen, hip, thigh and biceps) after making a linear model among bodyfat (Y) and the Principal Components 1, 2, 4 and 5 (they were significant). You find the coefficients -4.48, -0.35, 10.40, -0.45, 1.05 and 0.50 for weight, chest, abdomen, hip, thigh and biceps respectively. Can we know if some of those coefficients are non-significant, and thus, they are not related to bodyfat? Because in the video you test the significance of the coefficients for the model between Y (bodyfat) and the Principal Components, but not for the coefficients that you get later for the explanatory variables. Or due to the method (Regression on Principal Components), we can't exclude any of the explanatory variables? I'm just curious about how to interpret this.
Thanks in advance!!!! :)
If you want to remove a non-significant predictor variable, principle component analysis will not be of much help. See lecture 53 for more details.
AWESOME!
Thank you very much for the explaination!!!!!
thanks for the video
Hey Sir, I need an urgent help from. You, can you help me plz.... I will remember u in my prayers ...
thank you
ok but pca is useful for linear problems ... most problems in machine learning or real problems ....
Of course, not every problem involves linear modeling (linear in the coefficients of the model), but many do. And these are real problems, important problems. For nonlinear modeling, difference approaches are required.
super sir