Principal Component Regression in R

Поділитися
Вставка
  • Опубліковано 26 лип 2024
  • ===== Likes: 152 👍: Dislikes: 0 👎: 100.0% : Updated on 01-21-2023 11:57:17 EST =====
    Understand Principal Component Analysis? Cool! So, how do I use PCA for Machine Learning purposes? Well, look no further! I go into depth on how to utilize the linear transformations of PCA for any machine learning model!
    Github:
    github.com/SpencerPao/Data_Sc...
    PCA:
    • Applied Principal Comp...
    XGBoost Regression: Theory and Application!
    • Understanding and Appl...
    XGBoost Classification: Theory and Application!
    • Understanding and Appl...
    Linear Regression: Understanding Linear Regression!
    • HOW TO: Linear Regress...
    Neural Networks: Understanding and Applying Neural Networks!
    • Understanding and Appl...
    Data Imputation: Wondering what to do with NA observations?
    • Dealing with MISSING D...
    0:00 - Principal Component Regression Summary
    1:14 - Cleaning Data for Machine Learning Models
    3:40 - Linear Regression Model (Base)
    5:35 - Principal Component Regression
    9:40 - Using Results of PCA for other Machine Learning Models (train/test)
  • Наука та технологія

КОМЕНТАРІ • 39

  • @fawn0213
    @fawn0213 2 роки тому +2

    Super clear and very helpful. I am so glad to find this video. Thank you!

  • @shahrizalmuhammadabdillah3127
    @shahrizalmuhammadabdillah3127 9 місяців тому

    Thanks for the insight... This was amazing...

  • @lilmikeytheskater
    @lilmikeytheskater 2 роки тому +2

    Hey Spencer, I love your videos! Your channel is among the most insightful in all of data science UA-cam. My favorite video of yours is the pairs trading one from a few months back. You mentioned future videos on seasonality and other finance related topics at the end of that video. Do you still have plans to cover more financial topics?

    • @SpencerPaoHere
      @SpencerPaoHere  2 роки тому +1

      Thanks for watching! :)
      Oh for sure! If there is a demand for it, then by all means, I can make future videos surrounding that material. I can note down some additional financial applications around the idea of financial trading. I'll make a note for future content.

    • @lilmikeytheskater
      @lilmikeytheskater 2 роки тому

      @@SpencerPaoHere looking forward to it!

  • @kexinni6864
    @kexinni6864 2 роки тому +1

    Hi Spencer, your video is super helpful! Could you perhaps explain more about what do PC1 and PC2 capture in the final bit of the video?

    • @SpencerPaoHere
      @SpencerPaoHere  2 роки тому

      Glad you like it!
      The components you are referring to represent the newly transformed features in a different feature space. Those two components explain some percentage of the variance of the original data and can be used in place of the original features for classification or regression type problems.
      I hoped that helped!

  • @rafaelguimaraes1424
    @rafaelguimaraes1424 Рік тому

    Very good. I am Brazil.

  • @baeksudream7964
    @baeksudream7964 Рік тому

    amazing voice

  • @ishtardory
    @ishtardory 2 роки тому

    Hi Spencer, great video thanks! I just had a question. I know that with PCA you can also visualize the correlation of supplementary variables (not used in building the dimensions) with the dimensions. So if you find that your dependent variable (i.e. Life expectancy) is highly and significantly correlated with a subset of the PCA dimensions...why would you need to do a regression with the principal components in addition?
    I would really appreciate this clarification, thanks a lot!

    • @SpencerPaoHere
      @SpencerPaoHere  2 роки тому +1

      If you have an independent variable that is essentially a cofactor to your dependent variable, I'd say that is highly suspicious.
      But the overall idea is that you'd want to have predictive capabilities using PCA. So, if you ever want to place this model in production, you will have to follow a succinct pattern.
      Transform features using the PCA model
      Plug in PCA output into regression model (assuming that model has already been trained)
      Get predictions for whatever you are trying to do.

    • @ishtardory
      @ishtardory 2 роки тому

      @@SpencerPaoHere Thank you !

  • @jaredgreathouse3672
    @jaredgreathouse3672 2 роки тому

    Hey Spencer, are you familiar with something called the synthetic control method? It's a technique from econometrics that's become pretty popular over the years for causal inference.

    • @SpencerPaoHere
      @SpencerPaoHere  2 роки тому

      I am just reading about it, and this is a fascinating subject!

    • @jaredgreathouse3672
      @jaredgreathouse3672 2 роки тому

      @@SpencerPaoHere reason I asked, is cuz apparently.... you can use PCR to de-noise an outcome matrix, and then impute counterfactuals from it using SCM. I don't know if you'd have access to it, but you should look up a paper called "Using Synthetic Controls" by Alberto Abadie, published in the Journal of Economic Literature.

    • @SpencerPaoHere
      @SpencerPaoHere  2 роки тому

      ​@@jaredgreathouse3672
      I believe the paper is linked here: economics.mit.edu/files/17847
      I'll dig a little deeper on this subject, but yes! This is an intriguing topic. I didn't realize that this method was commonly used in many areas. Might be an interesting video topic!

  • @fabios5524
    @fabios5524 2 роки тому

    Hi Spencer!
    Great video. I have a question:
    Can i fit the results of a FAMD from the package FatMineR into this model?
    If it is possible, do you knoe any example about how to do it?

    • @SpencerPaoHere
      @SpencerPaoHere  2 роки тому

      I am not familiar with that particular package. However, I can imagine that you can utilize the predictions of PCR with any other package. You can save the predictions as a data frame (for example) and use as a input for another function.

    • @nosaosawe3158
      @nosaosawe3158 9 місяців тому

      FactorMiner you mean?. Yeah, it should work

  • @cooookieraider
    @cooookieraider 2 роки тому

    Hi Spencer! I am a beginner at R and have to use a PCA for my school project, hoping you can help :)
    I have parental language proficiency scores in 4 domains (understanding, speaking, reading, writing) --> I have done a PCA on these and it resulted in 2 factors. PC 1 --> reading, writing. PC2 --> understanding, speaking.
    Now I would like to check if PC1 and PC2 are correlated with another variable, language use.
    How should I proceed?

    • @SpencerPaoHere
      @SpencerPaoHere  2 роки тому +1

      It seems that your variables are categorical? If so, try to run the chi-square test to see correlation between variables.
      If it's a categorical vs numerical, try running the one-way anova test and analyze from there.

  • @amanrastogi5184
    @amanrastogi5184 2 роки тому

    What would you suggest if you are having categorical variables in your dataset? I mean how does PCA deals with them?

    • @SpencerPaoHere
      @SpencerPaoHere  2 роки тому

      You’d want to one hot encode your categorical variables ! Then you can run PCA on the dataset

  • @MELVINBRO100
    @MELVINBRO100 9 місяців тому

    Hi Spencer,
    I wonder if using this approach as opposed to the princomp() function and package would be sufficient enough to find the number of principal components in PCA?
    Thank you!

    • @SpencerPaoHere
      @SpencerPaoHere  5 місяців тому

      Yep! Both are fine methods. You can use one or the other.

  • @FrancisNgoma-vr7nj
    @FrancisNgoma-vr7nj 8 місяців тому

    Hi. Thank you for your video. It is very informative.
    I do have one concern though. In fact, in performing the principal component regression technique, how can we calculate the regression coefficients from the starting values. If possible, could you send me the script file or the do file for implementing these estimates?

    • @SpencerPaoHere
      @SpencerPaoHere  5 місяців тому

      I have a github that hosts the code:
      github.com/SpencerPao/Data_Science/tree/main/Principal%20Components/PCR

  • @kakabudi
    @kakabudi 2 роки тому

    Hey Spencer, Thanks for making and sharing this, it is much appreciated!
    I have a question:
    I have 180 variables on human body movement. I want to reduce the size of the dataset while keeping as much variability as possible, hence me using PCA. However, I have no dependent variable! What does this mean? As far as I know I can't use the same methodology you used in this video, since you used life expectancy as your dependent.
    Is PCA still applicable here?

    • @SpencerPaoHere
      @SpencerPaoHere  2 роки тому +1

      For Principal Component Regression, you will need a dependent variable since it is a regression.
      For Principal Component analysis, nope! You don't need a dependent variable. You can check out the PCA video here:
      ua-cam.com/video/uNJBBpyss50/v-deo.html

    • @kakabudi
      @kakabudi 2 роки тому

      @@SpencerPaoHere thank you!

    • @kakabudi
      @kakabudi 2 роки тому

      @@SpencerPaoHere Therefore, can I not test the validity of my PCA transformation compared to the original dataset?

    • @SpencerPaoHere
      @SpencerPaoHere  2 роки тому

      @@kakabudi You most definitley can. But you'd just need to follow the same data transformation process and compare with the original dataset.

    • @kakabudi
      @kakabudi 2 роки тому

      @@SpencerPaoHereSorry, but what method could I use to compare the transformed data to the original data? I am mostly only familiar with comparing regarding linear regression methods, and without that I am admittedly lost as to how to compare them.

  • @surpriseworld1662
    @surpriseworld1662 2 роки тому

    Hi Spencer, I try to look for your video topic Scree Plot in R with no luck. Would you be happy to send me a copy please. Thank you muchly.

    • @SpencerPaoHere
      @SpencerPaoHere  2 роки тому

      Hi! Check out my PCA video here: ua-cam.com/video/uNJBBpyss50/v-deo.html
      I go over the screeplot topic more in depth there.