Principal components analysis in R

Поділитися
Вставка
  • Опубліковано 19 січ 2025

КОМЕНТАРІ • 209

  • @sadian3392
    @sadian3392 6 років тому +11

    I had listened to several other lectures on this topic but the pace and the detail covered in this video is simply the best.
    Please keep up the good work!

  • @rebecai.m.6670
    @rebecai.m.6670 6 років тому +44

    OMG, this tutorial is perfection, I´m serious. You make it sound so easy and you explain every single step. Also, that is the prettiest plot I´ve seen. Thank you so much for this.

    • @hefinrhys8572
      @hefinrhys8572  6 років тому +1

      You're very welcome! If you like pretty plots, check out my video on using ggplot2 ;) ua-cam.com/video/1GmQ5BdAhG4/v-deo.html

  • @vplougoboy
    @vplougoboy 4 роки тому +1

    Noone explains R better than Hefin. Give this man a medal already!!

  • @maitivandenbosch1541
    @maitivandenbosch1541 5 років тому +9

    Never a tutorial about PCA so clear and simply. Thanks

  • @PhinaLovesMusic
    @PhinaLovesMusic 5 років тому +3

    I'm in graduate school and you just explained PCA better than my professor. GOD BLESS YOU!!!!

  • @HarmonicaTool
    @HarmonicaTool 2 роки тому

    5 year old video still one of the best I found on the topic on YT. Thumbs up

  • @Rudblattner
    @Rudblattner 3 роки тому

    I never comments on videos, but you really saved me here. Nothing was working on my dataset and this came smoothly. Well done on the explanations too, everything as crystal clear.

  • @chinmoysarangi9399
    @chinmoysarangi9399 4 роки тому +1

    I have my exam in 2 days and Your video saved me tons of effort in combing through so many other articles and videos explaining PCA. A BIG Thank You! Hope you do many more videos and impart your knowledge to newbies like me. :)

  • @WatchMacro16
    @WatchMacro16 5 років тому +11

    Finally a perfect tutorial for POA in Rstudio. Thanks mate!

  • @jackiemwaniki1266
    @jackiemwaniki1266 5 років тому

    How i came across this video a week before ,my final year, project due date is a miracle. Thank you so much Hefin Rhys.

    • @mohamedadow8153
      @mohamedadow8153 5 років тому

      Jackie Mwaniki doing?

    • @jackiemwaniki1266
      @jackiemwaniki1266 5 років тому

      @@mohamedadow8153 my topic is on Macroeconomic factors and the stock prices using the APT framework.

  • @Axle_Tavish
    @Axle_Tavish 2 роки тому

    Explained everything one might need. If only every tutorial on UA-cam is like this one!

  • @user-kb6ui2sh5v
    @user-kb6ui2sh5v Рік тому

    really useful video thank you, I've just started my MSc project using PCA, so thank you for this. I will be following subsequent videos.

  • @timisoutdoors
    @timisoutdoors 4 роки тому +2

    Quite literally, the best tutorial I've ever seen on an advanced multivariate topic. Job well done, sir!

  • @tylerripku8222
    @tylerripku8222 4 роки тому

    The best run through I've seen for using and understanding PCA.

  • @shantanutamuly6932
    @shantanutamuly6932 4 роки тому +1

    Excellent tutorial. I have used this for analysis of my research. Thanks a lot for sharing your valuable knowledge.

  • @johnkaruitha2527
    @johnkaruitha2527 4 роки тому

    Great help, been doing my own work following step by step this tutorial...the whole night

  • @lilmune
    @lilmune 4 роки тому +1

    In all honesty this is the best tutorial I've seen in months. Nice job!

  • @fabriziomauri9109
    @fabriziomauri9109 4 роки тому +5

    Damn, your accent is hypnotic! The explanation is good too!

  • @jackpumpunifrimpong-manso6523
    @jackpumpunifrimpong-manso6523 4 роки тому

    Excellent! Words cannot show how grateful I am!

  • @ditshegoralefeta1315
    @ditshegoralefeta1315 4 роки тому +1

    I've been going through your tutorials and I'm so impressed. Legend!!!

  • @glenndejucos3891
    @glenndejucos3891 4 роки тому

    This video gave a major leap in my study. Thanks.

  • @HDgamesFTW
    @HDgamesFTW 4 роки тому +1

    Best explanation I’ve found so far! Thanks mate, legend!

    • @HDgamesFTW
      @HDgamesFTW 4 роки тому

      Uploaded the script as well what a guy

  • @nrlzt9443
    @nrlzt9443 Рік тому

    really love your explanantion! thank you so much for your video, really helpful and i can understand it! keep it up! looking forward to your many more upcoming videos

  • @johnmandrake8829
    @johnmandrake8829 3 роки тому

    its so funny I don't think you realize but myPR "my pyaar" in Urdu/Hindi means my love. Thank you for an amazing and extremely helpful video

  • @siktrading3117
    @siktrading3117 3 роки тому

    This tutorial is outstanding. Excellent explanation! Thank you very much!!!

  • @brunocamargodossantos5049
    @brunocamargodossantos5049 2 роки тому

    Thanks for the the video, it helped me a lot!! Your explanation is very didactic!

  • @elenavlasenko5452
    @elenavlasenko5452 6 років тому

    I can say for sure that it´s the best explanation I´ve ever seen!! Go on and I would be really grateful if you make one of Time Series and Forecasting :)

    • @hefinrhys8572
      @hefinrhys8572  6 років тому

      Thanks Elena! Thank you also for the feedback; I may make a video on time series in the future.

  • @0xea31c0
    @0xea31c0 3 роки тому

    The explanation is just perfect. Thank you.

  • @brunopiato
    @brunopiato 7 років тому +1

    Great video. Very instructive. Please keep making them

  • @tankstube09
    @tankstube09 6 років тому

    Very nice tutorial, nicely explained and really complete, looking forward to learn more in R with other of your vids, thank you for the tremendous help!

  • @chris-qm2tq
    @chris-qm2tq 2 роки тому

    Excellent walkthrough. Thank you!

  • @vagabond197979
    @vagabond197979 2 роки тому

    Added to my stats/math playlist! Very useful.

  • @andreamonge5025
    @andreamonge5025 3 роки тому

    Thank you so much for the very clear and concise explanation!

  • @lisakaly6371
    @lisakaly6371 2 роки тому

    In fact I found out how to overcome the multicolinearity , by using the eigen values of PC1 and PC2! I love PCA!

  • @em70171
    @em70171 3 роки тому

    This is gold. I absolutely love you for this

  • @arunkumarmallik9091
    @arunkumarmallik9091 5 років тому

    Thanks for nice and easy way of explanation.It really helps me a lot.

  • @blackpearlstay
    @blackpearlstay 4 роки тому

    Thank you so much for this SUPER helpful video. (P.S. The explanation with the iris dataset was especially convenient for me as I'm working on a dataset with dozens of recorded plant traits:D)

  • @stephravelo
    @stephravelo 3 роки тому +1

    Hi, i wonder if it's possible to put label in each points? I tried geom_text but i get error

    • @hefinrhys8572
      @hefinrhys8572  3 роки тому

      Yes you should be able to. What have you tried? If you have a column called names with the label for each point, something like this should work:
      ggplot(df, aes(PC1, PC2, label = names)) +
      geom_text()
      Or use geom_label() if you prefer.
      You can also check out the ggrepel package if you have many overlapping points.

    • @stephravelo
      @stephravelo 3 роки тому

      @@hefinrhys8572 I have 18 observations and 9 variables w/represented my environmental parameters. I successfully produced the ggplot figure. But I wanted to put a label in all the points in the figure to know what variables cluster together. i tried your suggestion but it gives me the numerical value, not the environmental variables. Any other suggestion?

  • @himand11
    @himand11 2 роки тому +1

    Thank you so so much!! You just saved the day and helped me really understand my homework for predictive analysis.

  • @florama5210
    @florama5210 6 років тому +1

    It is a really nice and clear tutorial! Thanks a lot,​ Hefin~

  • @rVnikov
    @rVnikov 7 років тому +5

    Excellent tutorial Hefin. Hooked and subscribed...

    • @hefinrhys9234
      @hefinrhys9234 7 років тому +1

      Vesselin Nikov thank you! Feel free to let me know if there are other topics you'd like to see covered.

  • @OZ88
    @OZ88 4 роки тому +1

    Ok so the Sepal.Width contributes mostly over 80% to the PC2 and the other three to PC1 more. 14:32 and so Sepal Width is fair enough as an info to separate setosa in the next plot. Isn't it also advisable to apply pca to linear problems?

    • @hefinrhys8572
      @hefinrhys8572  4 роки тому

      You're correct about the relative contributions of the variables to each principal component. The Setosa species is discriminated from the other two species mainly by PC1, to which sepal.width contributes less that than the other variables. As PCA is a linear dimension reduction technique, it will best reveal clusters of cases that are linearly separable, but PCA is still a valid and useful approach to compress information, even in situations where this isn't true, or when we don't know about the structures in the data. Non-linear techniques such as t-SNE and UMAP are excellent at revealing non-linearly-separable clusters of cases in data, but interpreting their axes is very difficult/impossible.

  • @biochemistry9729
    @biochemistry9729 4 роки тому

    Thank you so much! This is GREAT! You explained very clearly and smoothly.

  • @kevinroberts5703
    @kevinroberts5703 Рік тому

    thank you so much for this video. incredibly helpful.

  • @mustafa_sakalli
    @mustafa_sakalli 4 роки тому

    Finally understood this goddamn topic! Thank you dude

  • @testchannel5805
    @testchannel5805 4 роки тому

    Very nice, guys hit the subscribe button, the best explanation so far.

  • @metadelabegaz6279
    @metadelabegaz6279 6 років тому +2

    Sweet baby Jesus. Thank you for making this video!

  • @shafiqullaharyan261
    @shafiqullaharyan261 4 роки тому

    Perfect! Never seen such explanation

  • @harryainsworth6923
    @harryainsworth6923 4 роки тому +1

    this tutorial is slap bang fuckin perfect, god bless you, you magnificant bastard

  • @mativillagran1684
    @mativillagran1684 4 роки тому

    thank you so much! you are the best, very clear explanation.

  • @Fan-vk9gx
    @Fan-vk9gx 4 роки тому +1

    You are really a life saver! Thank you!

  • @rockcandy28
    @rockcandy28 6 років тому +2

    Hello! Thanks for the video, just a question how would you modify the code if you have NA values? In advance, thank you!

  • @kasia9904
    @kasia9904 Рік тому

    when i generate the PCA with the code explained @ 20:46 my legend appears as a gradient rather than the separate values (as in your three different species appearing in red, blue green. how can i change this?

  • @EV4UTube
    @EV4UTube 3 роки тому

    Can I confess something that baffles me? Because, I see this all the time. OK, so you, personally, are motivated to share your knowledge with the world, right? I mean, you took time, effort, energy, focus, planning, equipment, software, etc. to prepare this explanation and exercises. You screen-captured it, you set up your microphone, you edited the video, you did all this enormous amount of work. You're clearly motivated. Yet, when it actually comes time to deliver that instruction, you think it is 100% acceptable to place all your code into an absolutely miniscule fraction of the entire screen. Like, pretty-close to 96% of the screen is 'dead-space' from the perspective of the learner. The size of the typeface is miniscule (depending on your viewing system). It would be like producing a major blockbuster film, but then publishing it at the size of a postage stamp. Surely, it would be possible for you to 'zoom-into' that section of the IDE to show people what it was you were typing - the operators, the functions, the arugments, etc. I'm not really picking on you, individually, per se. I see this happen all the time with instructors of every stripe. I have this insane idea that instruction has much, much less to do with the insturctor's ability to demonstrate their knowledge to an uninformed person and has much, much more to do with the instructor's ability to 'meet' the student 'where' they are and to carry the student from a place of relative ignoracne (about a specific topic) to a place of relative competence. One of the best tools for assessing whether you're meeting that criteria is to PRETEND that you know nothing about the topic - then watch your own video (stripping-out all the assumptions you would automatically make about what is going on based on your existing knowledge). If you didn't have a 48" monitor and excellent eye-sight, would you be able to see what was being written? Like... why would you do that? If writing of the code IS NOT important - don't bother showing it. If writing of the code IS important, then make it (freaking) visible and legible. This really baffles me. I guess instructors are so "in-their-own-head" when they're delivering content, they don't take time to realize that no one can see what is happening. . It just baffles me how often I see this.

    • @EV4UTube
      @EV4UTube 3 роки тому

      If 'zooming-in' is not easily achieved, the least instructors could do is go into the preferences of the IDE and jack-up the size of the text so that it would be reasonably legible on a screen typical of, say, a laptop or tablet. It just seems like such a low-hanging fruit, and easy fix to facilitate learning and ensure legibility.

    • @Pancho96albo
      @Pancho96albo 2 роки тому +1

      @@EV4UTube chill out dude

  • @DesertHash
    @DesertHash 4 роки тому +1

    At 5:50, don't you mean that if we measured sepal width in kilometers then it would appear LESS important? Because if we measured it in kilometers instead of millimeters, our numerical values will be smaller and vary far less, making it less important in the context of PCA.
    Thank you for this video.

    • @hefinrhys8572
      @hefinrhys8572  4 роки тому +1

      Yes, you're absolutely correct! What I meant to say was that if that length was kilometers, but we neasured it in millimeters, then it would be given greater importance. But yes, larger values are given greater importance.

    • @DesertHash
      @DesertHash 4 роки тому

      @@hefinrhys8572 Alright, thanks for the reply and for the video!

  • @alessandrorosati969
    @alessandrorosati969 2 роки тому

    How is it possible to generate outliers uniformly in the p-parallelotope defined by the
    coordinate-wise maxima and minima of the ‘regular’ observations in R?

  • @murambiwanyati3607
    @murambiwanyati3607 2 роки тому

    Great teacher you are, thanks

  • @fatimaelmansouri9338
    @fatimaelmansouri9338 4 роки тому

    Super well-explained, thank you!

  • @SUMITKUMAR-hj8im
    @SUMITKUMAR-hj8im 4 роки тому

    a perfect tutorial for PCA... Thank you

  • @Ifrjicne
    @Ifrjicne 6 років тому +2

    Amazing video Hefin, there are lot of details covered in 27 min video, we just have to be careful not to miss any second of the video. I have a question: How does the scores are calculated for each PC's ? Why do we have to check the correlation between the variables and the PC1 & PC2 ? what value it adds practically ?

  • @salvatoregiordano2511
    @salvatoregiordano2511 4 роки тому +1

    Hi Hefin,
    Thanks for this tutorial. What do we do if PC1 and PC2 can only explain around 50% of the variation? Do we also include PC3 and PC4? If so, how?

  • @sandal-city-pet-clinic-1
    @sandal-city-pet-clinic-1 5 років тому +1

    simple and clear. very good

  • @Badwolf_82
    @Badwolf_82 4 роки тому

    Thank you so much for this tutorial, it really helped me!

  • @aliosmanturgut102
    @aliosmanturgut102 4 роки тому

    Very informative and clear Thanks.

  • @stephaniefaithravelo3510
    @stephaniefaithravelo3510 3 роки тому

    Hi Hefin, can I put a percentage in the PCA 1 and PC2 in the x and y-axis? How to do that?

  • @timothystewart7300
    @timothystewart7300 3 роки тому

    Fantastic video Hefin! thanks

  • @mohammadtuhinali1430
    @mohammadtuhinali1430 2 роки тому

    Many thanks for your efforts to make this complex issue much easier for us. Could you enlight me to understand group similarly and dissimilarity using pca?

  • @Actanonverba01
    @Actanonverba01 5 років тому +1

    Clear and straight forward, good work!
    Bully for you! Lol

  • @fsxaviator
    @fsxaviator 2 роки тому

    Where did you define PC1 and PC2 (where you use them in the ggplot)? I'm getting "Error: object 'PC1' not found"

  • @esterteran2872
    @esterteran2872 4 роки тому

    Good tutorial!I have learnt a lot. Thanks !

  • @christianberntsen3856
    @christianberntsen3856 2 роки тому

    10:21 - When using "prcomp", the calculation is done by a singular value decomposition. So, these are not actually eigenvectors, right?

    • @hefinrhys8572
      @hefinrhys8572  2 роки тому +1

      SVD still finds eigenvectors as it's a generalization of eigen-decomposition. This might be useful: web.mit.edu/be.400/www/SVD/Singular_Value_Decomposition.htm

    • @christianberntsen3856
      @christianberntsen3856 2 роки тому

      @@hefinrhys8572 Thank you answering! I will look into it.

  • @aminsajid123
    @aminsajid123 2 роки тому

    Amazing video! Thanks for the explaining everything very simply. Could you please do a video on PLS-DA?

  • @JibHyourinmaru
    @JibHyourinmaru 3 роки тому

    If my biological data only has numbers(1,2 & 3 digits) and a lot of zeros, do I need to scale also?

  • @Sunny-China3
    @Sunny-China3 4 роки тому

    Very informative video. Can you tell me? When i m plotting the last plot ggplot it showed error like . R said there is no package called digest. How to deal with it kindly advise.

  • @sandracuadros3787
    @sandracuadros3787 5 років тому +1

    Hi! I have a question, does it make sense to run a PCA on discrete data? I am trying something using your tutorial as a guide but I get a weird result in the plot, and I am wondering it it is because of the nature of my data. Thanks

    • @hefinrhys8572
      @hefinrhys8572  5 років тому

      Great question! If your data are not ordinal, you may get some use out of PCA if you numerically encode your discrete variables, but you may get more out of Multiple Correspondence Analysis (MCA) than PCA. Have a look here: www.rpubs.com/piterii/dimension_reduction

  • @AcademicActuary
    @AcademicActuary 4 роки тому

    Great presentation! However, why did you not binarize the categorical variable first, and then do the subsequent analysis?
    Thanks!

  • @maf4421
    @maf4421 3 роки тому +1

    Thank you Hefin Rhys for explaining PCA in detail. Can you please explain how to find weights of a variable by PCA for making a composite index? Is it rotation values that are for PC1, PC2, etc.? For example, if I have (I=w1*X+w2*Y+w3*Z) then how to find w1, w2, w3 by PCA.

  • @jackiemwaniki1266
    @jackiemwaniki1266 5 років тому

    Thank again. Quick one....Would you mind also doing the Fama and Macbeth Analysis without using the KenFrench Dataframe?

  • @stinkbomb13
    @stinkbomb13 3 роки тому

    Error in svd(x, nu = 0, nv = k) : infinite or missing values in 'x'
    ???

  • @izzuddinabdullah880
    @izzuddinabdullah880 8 місяців тому

    I have a question, what if i want want to perform PCA on data that have not just different scale, but also different unit, such as data that involves environmental parameters such as temperature, humidity, light intensity, etc. Will scaling the data can solve this? Thank you

    • @hefinrhys8572
      @hefinrhys8572  8 місяців тому

      Hi, yes this is a common situation. Scaling our variables means we can use them to find meaningful principal components, irrespective of their different measurement scales. Try running PCSLA on your data set with and without scaling the variables, you'll likely see a big difference. Scaling is valid (and important) for vaeiables with different measurement scales.

  • @Yosef_Guevara
    @Yosef_Guevara 4 роки тому

    I have a question the to use the prcomp command, is it not necessary to transpose the matrix to do analysis on individuals and not on variables?

    • @hefinrhys8572
      @hefinrhys8572  4 роки тому

      The prcomp function assumes the columns are variables, and each row is a case. In this way, the resulting components maximise the explained variance of the original variables. I'm not sure how you would interpret the principal components if you first transposed the matrix. Try it, and see what you get.

    • @Yosef_Guevara
      @Yosef_Guevara 4 роки тому

      @@hefinrhys8572 I'm really not sure I'm confused, I'm based on this video
      ua-cam.com/video/0Jp4gsfOLMs/v-deo.html
      Thanks for your answer

    • @hefinrhys8572
      @hefinrhys8572  4 роки тому

      Yes so in the video you link to, the matrix they create has the cases as the columns, and the variables as the rows. This is why they use the t() function to transpose the matrix so that the columns are variables, and the rows are cases, which the prcomp function expects. Does that make sense?

    • @Yosef_Guevara
      @Yosef_Guevara 4 роки тому

      @@hefinrhys8572 Well, I'm not sure if it's a problem or confusion with the names with which the columns are called and the rows in Spanish and the English translation, we usually put the individuals in the rows and the characteristics in the columns, but from what I understand you call the individuals variables and the cases the characteristics, am I right?
      Can you see the following table and confirm which ones with the variables for you?
      drive.google.com/file/d/11QipxFBhlL6hoJ45_1SIU0VrKNiANndc/view
      I would really appreciate your help, I'm really confused

    • @hefinrhys8572
      @hefinrhys8572  4 роки тому +1

      Ok so the language of columns and rows can be confusing as there are many different words that mean the same thing. So your interpretation is the wrong way round. Features == variables == characteristics, individuals == cases == subjects. So in the table you link to, the columns are variables/features/characteristics, and the rows are individuals/cases/features. So in that example, you would NOT transpose as it is in the format prcomp expects.

  • @djangoworldwide7925
    @djangoworldwide7925 Рік тому

    Great tutorial but it leaves me with the question, what do i do with it? Is this just the begining of a K means classification that gives me an idea of the proper k?

  • @simonjds4960
    @simonjds4960 3 роки тому

    Very cool Hefin. I'm trying to run a data reduction for panel data (220 countries, about 25 years of data, and about 100 different variables). Could PCA be used for this?

    • @hefinrhys8572
      @hefinrhys8572  3 роки тому

      Hi Simon, it will depend on what kind of data you have and what your goal is. All the variables will need to be numeric as PCA can't handle categorical variables (check out independent correspondence analysis for this). If you want to find linear combinations of variables to explain most the variation in the data, then PCA is a good choice. If you're just interested in seeing whether there are subgroups of subjects in your dataset, you might want to try a non-linear dimension reduction algorithm like t-SNE or UMAP :)

  • @hellthraser550
    @hellthraser550 4 роки тому

    How can i input desired fonts and font size in that graph ?

  • @lindseykoper761
    @lindseykoper761 2 роки тому +1

    Thank you so much for your videos!! Your videos are the best I have seen hands down :) All of your explanations and step by step through R are what I needed to work on my research.
    One area I am having trouble with (since I am not a statistician) is making sure I run my data through all the necessary statistical tests before running the PCA. My data is similar to the iris dataset (skull measurements categorized by family and subfamily levels) but I am seeing different sources run different tests before the PCA (ANOVA vs non-parametric tests). If anything, would you be able to recommend some good sources for me to refer to? Thank you! I really appreciate it!

  • @yayciencia
    @yayciencia 4 роки тому

    Thank you! This was very helpful to me

  • @tonyrobinson9046
    @tonyrobinson9046 Рік тому

    Outstanding. Thank you.

  • @Orange-xw4lt
    @Orange-xw4lt 4 роки тому

    Hi, good job but If I have an input data as a wave how can I take and separate the values ​​of the crests starting from a certain threshold?

  • @shapsgh
    @shapsgh 5 років тому

    There I have a question. Why "iris[,-5]*myPr$rotation" is not equal to "myPr$x" ? Isn't the "myPr$rotation" matrix factor loadings? Thanks in advance...

  • @blessingtate9387
    @blessingtate9387 4 роки тому +3

    You "R" AWESOME!!!

  • @galk32
    @galk32 5 років тому +2

    amazing video, thank you

  • @Marinkaasje
    @Marinkaasje 3 роки тому

    I run into the error when running line 17 (in the download file): Error in data.frame(..., check.names = FALSE) : arguments imply differing number of rows: 510, 382. What it going wrong?

  • @rafaborkowski580
    @rafaborkowski580 2 роки тому

    How can I upload my data into RStudio to work with ?

  • @avnistar2703
    @avnistar2703 2 роки тому

    Can you run PCA on factor variables coded as 0 vs 1. 1 meaning presence of something?

    • @hefinrhys8572
      @hefinrhys8572  2 роки тому

      There are some answers here that might help: stats.stackexchange.com/questions/5774/can-principal-component-analysis-be-applied-to-datasets-containing-a-mix-of-cont
      But I would ask what your goal is with this. Are you looking to uncover some underlying latent variables in your data? In which case factor analysis may be the way to go. If it's just to reduce dimensionality to uncover clusters/patterns in the data, then PCA might work, but it will treat those 0/1 variables as continuous, which might not yield the results you're hoping for.

  • @amggwarrior
    @amggwarrior 4 роки тому

    Thank you for this very clear video. Question about interpretation: I get just the 1 cluster in my ggplot, what does this mean? that all my variables relate to the same construct (component) and that they cant really be differentiated?

    • @hefinrhys8572
      @hefinrhys8572  4 роки тому

      So when you apply PCA to your own data and plot the first two components, you see just a single cloud of data? This would indicate that you don't have distinct, linearly-separable sub-classes of cases in your dataset. PCA will still compress the majority of the information of your many variables into a smaller number of variables, so even if it doesn't reveal a class structure in your data, it can still be beneficial for dimension reduction.

    • @amggwarrior
      @amggwarrior 4 роки тому

      @@hefinrhys8572 thanks for the quick reply. Yes I only see a single cloud. I am not using PCA for dimension reduction - just using it to explore my data before including these variables into a SEM. In particular, I wanted to see if it makes sense to relate these 5 variables to a single latent variable in my SEM. All the loadings for PC1 are 0.7 or 0. 8, or more, and PC1 captures 0.7 of variation. Can I take this result as support for considering these 5 variables as part of the same measuring model (linked to the same latent variable) in my SEM? theoretically it makes sense to, but I wanted to see if the data supported this. I have never done PCA or SEM so no idea if I am doing this right.

  • @hannahredders4442
    @hannahredders4442 4 роки тому +1

    Why can I not use categorical data?

    • @hefinrhys8572
      @hefinrhys8572  4 роки тому

      So you could actually include categorical variables by numerically encoding them, or dummy coding them. The issue is that PCA finds new axes that minimise the variance of the data along them and calculating variance for a categorical variable doesn't really make sense. If you have categorical variables, you could look at Independent Correspondence Analysis (ICA), or you could apply PCA to your continuous variables, select the components that capture most variance, and combine these with your categorical variables for your downstream analysis. This may or may not yield satisfactory results.

  • @lisakaly6371
    @lisakaly6371 2 роки тому

    Thank you for this great video. can you show how to seek multicolinearity or treat multicolinearity with PCA ? I have a data set with 40 variables with high intercorrelation because of cross reactivity . VIF and matrix correlation doesnt work probably because of multiple comparison ....:(((

  • @hoseinmousavi4890
    @hoseinmousavi4890 4 роки тому

    Thanks for your nice job! I have a question.
    I have a biostat data. As you told in this video, we do not need to know what is our variable for colour grouping!
    Actually, I have a problem, and it does not work for me! aes(x = PC1, y = PC2 , col= ??? )
    I really appreciate it if you reply me back!

  • @YummiestOrphan
    @YummiestOrphan 4 роки тому

    Why am I getting the error "Too few points to calculate an ellipse". Can someone please explain in dummy terms. I am using my own data btw and following along this tutorial.

  • @yuvenmuniandy8202
    @yuvenmuniandy8202 6 років тому

    Amazing tutorial. Very simple and straight to the point. Already subscribed. I have some questions. PCA is an unsupervised method, isn't it? Is it possible to further decompose the data for Versicolor and Virginica to find further grouping? I have read before there are supervised methods. Do you have some tutorial for those?

    • @hefinrhys8572
      @hefinrhys8572  6 років тому

      Thanks enthiran! Yes, PCA is unsupervised because we don't give it any information about group membership, we give it unlabelled data and let if find the optimal projection of the data into a lower dimensional space that maximises the explained variance. If you wanted to build a model to predict group membership, then you would need to use a supervised cluster analysis algorithm, where you supply a training dataset with grouping labels (this is what makes it supervised). The algorithm will then learn which features in the data associate with each group, such that when you give the model unlabelled data, it will predict group membership. I have a video on various clustering algorithms here: ua-cam.com/video/PX5nSBGB5Tw/v-deo.html