Hey Spencer, I love your videos! Your channel is among the most insightful in all of data science UA-cam. My favorite video of yours is the pairs trading one from a few months back. You mentioned future videos on seasonality and other finance related topics at the end of that video. Do you still have plans to cover more financial topics?
Thanks for watching! :) Oh for sure! If there is a demand for it, then by all means, I can make future videos surrounding that material. I can note down some additional financial applications around the idea of financial trading. I'll make a note for future content.
Glad you like it! The components you are referring to represent the newly transformed features in a different feature space. Those two components explain some percentage of the variance of the original data and can be used in place of the original features for classification or regression type problems. I hoped that helped!
Hi. Thank you for your video. It is very informative. I do have one concern though. In fact, in performing the principal component regression technique, how can we calculate the regression coefficients from the starting values. If possible, could you send me the script file or the do file for implementing these estimates?
Hey Spencer, are you familiar with something called the synthetic control method? It's a technique from econometrics that's become pretty popular over the years for causal inference.
@@SpencerPaoHere reason I asked, is cuz apparently.... you can use PCR to de-noise an outcome matrix, and then impute counterfactuals from it using SCM. I don't know if you'd have access to it, but you should look up a paper called "Using Synthetic Controls" by Alberto Abadie, published in the Journal of Economic Literature.
@@jaredgreathouse3672 I believe the paper is linked here: economics.mit.edu/files/17847 I'll dig a little deeper on this subject, but yes! This is an intriguing topic. I didn't realize that this method was commonly used in many areas. Might be an interesting video topic!
Hi Spencer, I wonder if using this approach as opposed to the princomp() function and package would be sufficient enough to find the number of principal components in PCA? Thank you!
Hi Spencer! Great video. I have a question: Can i fit the results of a FAMD from the package FatMineR into this model? If it is possible, do you knoe any example about how to do it?
I am not familiar with that particular package. However, I can imagine that you can utilize the predictions of PCR with any other package. You can save the predictions as a data frame (for example) and use as a input for another function.
Hi Spencer, great video thanks! I just had a question. I know that with PCA you can also visualize the correlation of supplementary variables (not used in building the dimensions) with the dimensions. So if you find that your dependent variable (i.e. Life expectancy) is highly and significantly correlated with a subset of the PCA dimensions...why would you need to do a regression with the principal components in addition? I would really appreciate this clarification, thanks a lot!
If you have an independent variable that is essentially a cofactor to your dependent variable, I'd say that is highly suspicious. But the overall idea is that you'd want to have predictive capabilities using PCA. So, if you ever want to place this model in production, you will have to follow a succinct pattern. Transform features using the PCA model Plug in PCA output into regression model (assuming that model has already been trained) Get predictions for whatever you are trying to do.
Hi Spencer! I am a beginner at R and have to use a PCA for my school project, hoping you can help :) I have parental language proficiency scores in 4 domains (understanding, speaking, reading, writing) --> I have done a PCA on these and it resulted in 2 factors. PC 1 --> reading, writing. PC2 --> understanding, speaking. Now I would like to check if PC1 and PC2 are correlated with another variable, language use. How should I proceed?
It seems that your variables are categorical? If so, try to run the chi-square test to see correlation between variables. If it's a categorical vs numerical, try running the one-way anova test and analyze from there.
Hey Spencer, Thanks for making and sharing this, it is much appreciated! I have a question: I have 180 variables on human body movement. I want to reduce the size of the dataset while keeping as much variability as possible, hence me using PCA. However, I have no dependent variable! What does this mean? As far as I know I can't use the same methodology you used in this video, since you used life expectancy as your dependent. Is PCA still applicable here?
For Principal Component Regression, you will need a dependent variable since it is a regression. For Principal Component analysis, nope! You don't need a dependent variable. You can check out the PCA video here: ua-cam.com/video/uNJBBpyss50/v-deo.html
@@SpencerPaoHereSorry, but what method could I use to compare the transformed data to the original data? I am mostly only familiar with comparing regarding linear regression methods, and without that I am admittedly lost as to how to compare them.
Super clear and very helpful. I am so glad to find this video. Thank you!
Thanks for the insight... This was amazing...
Hey Spencer, I love your videos! Your channel is among the most insightful in all of data science UA-cam. My favorite video of yours is the pairs trading one from a few months back. You mentioned future videos on seasonality and other finance related topics at the end of that video. Do you still have plans to cover more financial topics?
Thanks for watching! :)
Oh for sure! If there is a demand for it, then by all means, I can make future videos surrounding that material. I can note down some additional financial applications around the idea of financial trading. I'll make a note for future content.
@@SpencerPaoHere looking forward to it!
Hi Spencer, your video is super helpful! Could you perhaps explain more about what do PC1 and PC2 capture in the final bit of the video?
Glad you like it!
The components you are referring to represent the newly transformed features in a different feature space. Those two components explain some percentage of the variance of the original data and can be used in place of the original features for classification or regression type problems.
I hoped that helped!
Very good. I am Brazil.
Hi. Thank you for your video. It is very informative.
I do have one concern though. In fact, in performing the principal component regression technique, how can we calculate the regression coefficients from the starting values. If possible, could you send me the script file or the do file for implementing these estimates?
I have a github that hosts the code:
github.com/SpencerPao/Data_Science/tree/main/Principal%20Components/PCR
What would you suggest if you are having categorical variables in your dataset? I mean how does PCA deals with them?
You’d want to one hot encode your categorical variables ! Then you can run PCA on the dataset
Hey Spencer, are you familiar with something called the synthetic control method? It's a technique from econometrics that's become pretty popular over the years for causal inference.
I am just reading about it, and this is a fascinating subject!
@@SpencerPaoHere reason I asked, is cuz apparently.... you can use PCR to de-noise an outcome matrix, and then impute counterfactuals from it using SCM. I don't know if you'd have access to it, but you should look up a paper called "Using Synthetic Controls" by Alberto Abadie, published in the Journal of Economic Literature.
@@jaredgreathouse3672
I believe the paper is linked here: economics.mit.edu/files/17847
I'll dig a little deeper on this subject, but yes! This is an intriguing topic. I didn't realize that this method was commonly used in many areas. Might be an interesting video topic!
Hi Spencer,
I wonder if using this approach as opposed to the princomp() function and package would be sufficient enough to find the number of principal components in PCA?
Thank you!
Yep! Both are fine methods. You can use one or the other.
Hi Spencer!
Great video. I have a question:
Can i fit the results of a FAMD from the package FatMineR into this model?
If it is possible, do you knoe any example about how to do it?
I am not familiar with that particular package. However, I can imagine that you can utilize the predictions of PCR with any other package. You can save the predictions as a data frame (for example) and use as a input for another function.
FactorMiner you mean?. Yeah, it should work
Hi Spencer, great video thanks! I just had a question. I know that with PCA you can also visualize the correlation of supplementary variables (not used in building the dimensions) with the dimensions. So if you find that your dependent variable (i.e. Life expectancy) is highly and significantly correlated with a subset of the PCA dimensions...why would you need to do a regression with the principal components in addition?
I would really appreciate this clarification, thanks a lot!
If you have an independent variable that is essentially a cofactor to your dependent variable, I'd say that is highly suspicious.
But the overall idea is that you'd want to have predictive capabilities using PCA. So, if you ever want to place this model in production, you will have to follow a succinct pattern.
Transform features using the PCA model
Plug in PCA output into regression model (assuming that model has already been trained)
Get predictions for whatever you are trying to do.
@@SpencerPaoHere Thank you !
Hi Spencer! I am a beginner at R and have to use a PCA for my school project, hoping you can help :)
I have parental language proficiency scores in 4 domains (understanding, speaking, reading, writing) --> I have done a PCA on these and it resulted in 2 factors. PC 1 --> reading, writing. PC2 --> understanding, speaking.
Now I would like to check if PC1 and PC2 are correlated with another variable, language use.
How should I proceed?
It seems that your variables are categorical? If so, try to run the chi-square test to see correlation between variables.
If it's a categorical vs numerical, try running the one-way anova test and analyze from there.
amazing voice
Hey Spencer, Thanks for making and sharing this, it is much appreciated!
I have a question:
I have 180 variables on human body movement. I want to reduce the size of the dataset while keeping as much variability as possible, hence me using PCA. However, I have no dependent variable! What does this mean? As far as I know I can't use the same methodology you used in this video, since you used life expectancy as your dependent.
Is PCA still applicable here?
For Principal Component Regression, you will need a dependent variable since it is a regression.
For Principal Component analysis, nope! You don't need a dependent variable. You can check out the PCA video here:
ua-cam.com/video/uNJBBpyss50/v-deo.html
@@SpencerPaoHere thank you!
@@SpencerPaoHere Therefore, can I not test the validity of my PCA transformation compared to the original dataset?
@@kakabudi You most definitley can. But you'd just need to follow the same data transformation process and compare with the original dataset.
@@SpencerPaoHereSorry, but what method could I use to compare the transformed data to the original data? I am mostly only familiar with comparing regarding linear regression methods, and without that I am admittedly lost as to how to compare them.
Hi Spencer, I try to look for your video topic Scree Plot in R with no luck. Would you be happy to send me a copy please. Thank you muchly.
Hi! Check out my PCA video here: ua-cam.com/video/uNJBBpyss50/v-deo.html
I go over the screeplot topic more in depth there.