Using prcomp and varimax for PCA in R

Поділитися
Вставка
  • Опубліковано 6 вер 2024
  • See my new blog for R programming at rollingyours.wo...
    Best Viewed in Large or Full Screen Mode
    This video shows how to use the prcomp and varimax functions in R to accomplish a Principal Components Analysis. We cover the following steps: 1) Read in the Data, 2) Plot a Correlation Matrix, 3) Call prcomp, 4) DotPlot the PCA loadings, 5) Apply the Kaiser Criterion, 6) Make a screeplot, 7) Plot the Biplot, and 8) Apply the varimax rotation.
    Download Code from raw.githubuser...
    The example data comes from: Abdi, H., & Williams, L.J. (2010). Principal Component Analysis, Wiley Interdisciplinary Reviews: Computational Statistics, 2, 433-459

КОМЕНТАРІ • 53

  • @StevePittard
    @StevePittard  12 років тому +1

    In answer to some emails I've received about prcomp vs princomp. I recommend using prcomp instead of princomp - If you look in the help pages for princomp you will see the following: "The calculation is done using eigen on the correlation or covariance matrix, as determined by cor. This is done for compatibility with the S-PLUS result. A preferred method of calculation is to use svd on x, as is done in prcomp." Hence my decision to focus on prcomp.

  • @jenniferf6265
    @jenniferf6265 4 роки тому +1

    Thank you so much for taking the time to create this video. It was very informative, helpful and it improved my understanding of PCA.

  • @StephanMahler
    @StephanMahler 11 років тому

    well done....reading pdf docs for a while now and you clarified a bunch of that up. Thanks

  • @StevePittard
    @StevePittard  11 років тому

    Sorry for the delayed response. You can do this two ways with the first being a "hack". The second is to write you own biplot function which isn't that hard but here is the first way:
    > biplot(my.prc,col=c("white","red"),cex=c(1,0.7))
    > points(my.prc$x[,1],my.prc$x[,2],pch=as.character(1:5),col=c("green","green","black","blue","blue"))

  • @noneofyoureffingbizness5806
    @noneofyoureffingbizness5806 8 років тому

    After a week of searching and twisting my brain i finally did it ,thanks to your help kind man.God bless you!Excelent video!!!

  • @rahulkala763
    @rahulkala763 6 років тому +1

    Amazing videos, All of them are super informative
    I struggled with PCA a lot and any other video or blog i tried was a waste of time.
    Finally got some videos which solved all my questions and doubts
    Thank for these videos!

  • @andrewhoward7760
    @andrewhoward7760 6 років тому

    Great video, I've been struggling with understanding PCA for sometime but now cracked it!

  • @kareemjeiroudi1964
    @kareemjeiroudi1964 6 років тому +3

    Why didn't you leave a link to the previous videos?

  • @hannesmehrer9655
    @hannesmehrer9655 12 років тому

    Thanks for the great video. Very helpful, indeed!
    Just a small hint: at the beginning of the script the letters "li" are missing in front of "brary"

  • @marijnc.h.peters7581
    @marijnc.h.peters7581 2 роки тому

    Thank you so much, this is just what I needed! :D

  • @bri1009r
    @bri1009r 7 років тому +1

    Thank you! This helped me so much with my Master's project!

  • @guvenim
    @guvenim 7 років тому

    Hi Steve, great videos, very informative. This is the first time I came acrross PCA and your vids help me to understand them.

  • @StevePittard
    @StevePittard  11 років тому

    Sorry, its been a while since I looked at this example. We have 5 observations in this data set hence we have 5 components each of which recombines the 7 variables. If the PCA is successful then we can capture most of the variation with maybe the first two components. In "real life", however, this is rarely the case although highly correlated data can usually be expressed in fewer dimensions.

  • @rkarunia
    @rkarunia 12 років тому

    Excellent video on PCA using R. I have a question, after the Varimax rotation, how can we redraw the biplot? I would appreciate it if you can do a similar video on Singular Value Decomposition. Thanks!

  • @vocabularybytesbypriyankgo1558
    @vocabularybytesbypriyankgo1558 4 роки тому +1

    Thanks a lot, explained beautifully !!!

  • @tarsociolete2940
    @tarsociolete2940 4 роки тому

    thank YOU man
    You saved me

  • @renanlolop
    @renanlolop 5 років тому

    Amazing video, Mr. Pittard. Thanks!

  • @scottrobinsonmusic
    @scottrobinsonmusic 11 років тому

    Really useful video. When you make the correlation graph at the start why do you use 'abs()' on the correlation matrix? Wouldn't this make -1 and 1 the same colour? Isn't that misleading?

  • @PriyeshPrateek
    @PriyeshPrateek 10 років тому +2

    Thank you for the tutorial Sir
    and tell me one more thing..after generating principal components..suppose I selected PC1, PC2 and PC3. using these PCs how can I generate the new dataset of reduced dimensionality? please reply

  • @jaanpehechaanho
    @jaanpehechaanho 9 років тому

    Thanks for the great tutorials, helping me a lot!
    I just figured out there is a little error in your code files:
    Original:
    # Look at the correlations
    library(gclus) my.abs

  • @md.jahirulislam4879
    @md.jahirulislam4879 4 роки тому

    Thanks a lot. This video is very informative and clearly explained.

  • @StevePittard
    @StevePittard  12 років тому

    gclus should work with older or newer versions. I'm currently using 2.15.1 on OSX and Linux. To install gclus you can do "install.packages("gclus",dependencies=TRUE)" from the command prompt.

  • @batlin
    @batlin 12 років тому

    It's a pity this one was uploaded in low resolution - it was much easier to see what was going on in the first two videos.

  • @qualitytoolbox4872
    @qualitytoolbox4872 5 років тому

    Hi Simple and great explanation. For the variable Price, Obs 4 and 5 are contributing more. If I wanted to understand the contribution % between Obs 4 and Obs 5, for the variable price, how do i find the distance. As two different scales are being used to plot the scores and the variables.

  • @SHEKINAHVOICING
    @SHEKINAHVOICING 8 років тому

    Hello,
    That is a good job you did right there. But i think this is more useful when the number of variable is very small like you just did. When i tried this on a data that has 335 variables, the output of the output of the "varimax" function is so much that i could not see the rotation table that shows the variables.
    My question is, how do i select the right variables in this case?
    Hope to hear from you soon. Thanks

  • @erethizon67
    @erethizon67 9 років тому

    Thanks for a great series of PCA videos. If you are using PCA to reduce a data set into a couple of variables you may then use in e.g. a multiple regression (for instance, reducing a psychometric scale to 1 - 2 vars down from 15 - 20) do you use the PCA scores as the new variables?

  • @MarinaUganda
    @MarinaUganda 3 роки тому

    How do you do the biplot of the varimax rotated components?

  • @IsabelMarin1986
    @IsabelMarin1986 8 років тому +1

    Thank you very much for this very informative video, but I would like to know if there is a way to plot the PCA after the Varimax rotation with R studio. I have tried to do it in the same way that I do for the PCA (with either biplot or ggbiplot) but I can't as it is not a "prcomp" object anymore (but a list), could you help me?Thank you very much in advance.

  • @estefaniagarcia2534
    @estefaniagarcia2534 Рік тому

    For this data set, how would you graph the score plot, say PC1 and PC2?

  • @lizitro
    @lizitro 4 роки тому

    How make with three components the PCA?

  • @SandeepKumar-uv6mq
    @SandeepKumar-uv6mq 9 років тому +1

    Hi Steve,
    I am new to predictive modelling thing and working on a project which has 118 variables. I know that PCA is to reduce the dimensions of the dataset to find the significant variables that can be used for modelling purpose.
    I understood up to comp1,comp2, comp3.. table that we get after prcomp but how should I find the exact variables that I use in modelling because it seems in Rotation part, each component it has some values for all variables.
    I want to know what variables I should use for modelling in the example given in your video.
    And one more thing the Biplot was between 2 components what if I find 3 components which are very important in terms of variance?
    Please help!

  • @TrestanBird
    @TrestanBird 12 років тому

    Thanks for the video.
    I wonder how can we reconstruct the original data after choosing the first and second principle components?
    Thanks

  • @peterg7643
    @peterg7643 8 років тому

    You're saying, that the "$rotation" figures are the loadings. Thats not true afaik? These are the Eigenvectors. Aren't the loadings the correlations between variables and components? At least in factor analysis that is the case. So imo to get the loadings you'd have to compute either:
    cor(my.wines[,-1], my.prc$x)
    or
    my.prc$rotation%*%diag(my.prc$dev)
    Yes the Eigenvectors give hints to the correlations of variables to components, but for interpreting the components i think it would be more elegant to use the actual correlations.
    Please correct me if i'm wrong. =)

  • @MrPhilautus
    @MrPhilautus 12 років тому

    Thanks for the post. Could you also explain how to obtain scores after Varimax rotation. Thank you.

  • @camilojarac
    @camilojarac 4 роки тому

    Great video! Thank you so much!

  • @Actanonverba01
    @Actanonverba01 4 роки тому

    Once again, Great job!

  • @williamqlim3296
    @williamqlim3296 8 років тому

    Thank you for putting this up!

  • @mofolotopo
    @mofolotopo 12 років тому

    Very helpful! Thank you for making this video.

  • @Rongchunhan
    @Rongchunhan 10 років тому

    Very informative! Thank you very much!!

  • @bahia112011
    @bahia112011 12 років тому

    hi, thanks for teh video, very useful, i have a question, how can we have a legend, explaining, the number 1-5, as wine1, wine2, and so on...

  • @wmccall001
    @wmccall001 11 років тому

    At 10:00, Is there a way to make the numbers 1 & 2 green, keep 3 black and 4 & 5 blue to represent groups?

  • @ronakpol1580
    @ronakpol1580 6 років тому

    Can you change the 1,2,3,4,5 with names of the wines maybe in my BiPlot
    If so can you please share how

  • @anushasabhahit
    @anushasabhahit Рік тому

    PC1 PC2 PC3 PC4 PC5
    FA -1
    AL -1
    ER -1
    InLo -1
    MNC -1
    why am I getting the factor loadings this way? what should I do now??
    Somebody please help me...

  • @silomix
    @silomix 10 років тому +1

    Muchas Gracias :)

  • @felipetomko
    @felipetomko 7 років тому

    Hi Steve, Great video!
    What is the most practical way of selecting the components into a data.frame? I have to perform a pca regression while assessing mixed effect between components and other variables (categorical, etc).

  • @ditke71
    @ditke71 10 років тому

    What command do you used for obtainig the principal components.
    'princomp' can only be used with more units than variables, which is not this case.

  • @leonardoluizborges5139
    @leonardoluizborges5139 9 років тому

    How can I transform the numbers 1, 2,3 and 4 of the biplot in rownames of my data?
    What's the command?
    Thanks!

  • @mdddd0731
    @mdddd0731 8 років тому

    excellent!

  • @emilybian1254
    @emilybian1254 10 років тому

    helpful~

  • @Chale1288
    @Chale1288 Рік тому

    You dont show how to plot the varimax results.