How to Apply PCA before K-means Clustering in R Programming (Example) | Principal Component Analysis

Поділитися
Вставка
  • Опубліковано 25 бер 2024
  • How to apply a K-means clustering algorithm after applying a PCA in the R programming language. The video also offers a preview of the upcoming Statistics Globe online course on "Principal Component Analysis (PCA): From Theory to Application in R". More details: statisticsglobe.com/online-co...
    R code of this video:
    install.packages("factoextra") # Install & load factoextra
    library("factoextra")
    data(iris) # Load data
    head(iris) # Print first rows of data
    iris_num <- iris[ , 1:4] # Remove categorical variable
    head(iris_num) # Print first rows of final data
    my_pca <- prcomp(iris_num, # Perform PCA
    scale = TRUE)
    summary(my_pca) # Summary of explained variance
    my_pca_data <- data.frame(my_pca$x[ , 1:2]) # Extract PC1 and PC2
    head(my_pca_data) # Print first rows of PCA data
    fviz_nbclust(my_pca_data, # Determine number of clusters
    FUNcluster = kmeans,
    method = "wss")
    set.seed(123) # Set seed for reproducibility
    my_kmeans <- kmeans(my_pca_data, # Perform k-means clustering
    centers = 3)
    fviz_pca_ind(my_pca, # Visualize clusters
    habillage = my_kmeans$cluster,
    label = "none",
    addEllipses = TRUE)
    Follow me on Social Media:
    Facebook - Statistics Globe Page: / statisticsglobecom
    Facebook - R Programming Group for Discussions & Questions: / statisticsglobe
    Facebook - Python Programming Group for Discussions & Questions: / statisticsglobepython
    LinkedIn - Statistics Globe Page: / statisticsglobe
    LinkedIn - R Programming Group for Discussions & Questions: / 12555223
    LinkedIn - Python Programming Group for Discussions & Questions: / 12673534
    Twitter: / joachimschork
    Instagram: / statisticsglobecom
    TikTok: / statisticsglobe

КОМЕНТАРІ • 20

  • @rodrigopalmacl
    @rodrigopalmacl 3 місяці тому +1

    muy interesante estimado practicare con su ejercicio y agradezco su video.

    • @StatisticsGlobe
      @StatisticsGlobe  3 місяці тому

      That's great to hear, Rodrigo! Glad the videos are helpful!

  • @smartinssmart
    @smartinssmart 3 місяці тому +1

    nicely done! 👌

  • @gt6139a
    @gt6139a 3 місяці тому +1

    Great video. Thank you for making it:) It would've been interesting to plot the same, but coloring the dots using the original labels as well. Then we can see how well the groupings done using unsupervised learning compared to the original labels!!

    • @StatisticsGlobe
      @StatisticsGlobe  3 місяці тому

      Thanks for the kind words and the nice idea! It would definitely be nice to visualize this comparison. Next time! :)

  • @CryptoStop360
    @CryptoStop360 Місяць тому +1

    hello can u make video how to apply multi condtion to all items for data frame and combine with and + or
    i not find it on line
    thanks

    • @CryptoStop360
      @CryptoStop360 Місяць тому +1

      i mean apply condtion with and + or to alll items inside data frame

    • @StatisticsGlobe
      @StatisticsGlobe  Місяць тому

      Thanks for the topic suggestion, I'll keep it in mind.

  • @ibrahimlawan9663
    @ibrahimlawan9663 3 місяці тому +1

    Great video. Thank you.
    Is there any assumption before deciding to use PCA or PCoA?

    • @StatisticsGlobe
      @StatisticsGlobe  3 місяці тому

      Thanks for the kind comment, Ibrahim! Glad you liked the video. Before using PCA (Principal Component Analysis), it's assumed that linear relationships exist in the data and that the most important variance directions are the ones to focus on. For PCoA (Principal Coordinates Analysis), the assumption is that distances or dissimilarities between data points can meaningfully reflect their relationships. So it depends on your specific data whether to use PCA or PCoA. I hope this helps!

  • @uselessminority6071
    @uselessminority6071 3 місяці тому +1

    what if PC1 and PC2 only explain lets say 75% of variance? how would you proceed? is that enought or is it possible to somehow add PC3 and PC4 in the analysis?
    Great video btw 👍👍

    • @jeanpascalkoh4123
      @jeanpascalkoh4123 3 місяці тому +1

      I think it still ok. However more PC becomes difficult for human perception of 3 or more dimensions.
      Cheers!

    • @jeanpascalkoh4123
      @jeanpascalkoh4123 3 місяці тому +1

      Nice presentation

    • @StatisticsGlobe
      @StatisticsGlobe  3 місяці тому +1

      Hey, thanks for the great feedback, glad you like the video! Regarding your question: Yes, you can definitely add more components (and usually this is what you would do with a realistic data set). You would just have to change the number in this line of code from 2 to whatever number of components you would like to keep: my_pca_data <- data.frame(my_pca$x[ , 1:2]) Please note that it might become more difficult to visualize your data when using more components. I hope that clarifies your question! Regards, Joachim

  • @uma9183
    @uma9183 2 місяці тому +1

    thank you sir, but provide your script of code in notepad format ;; my suggestion only

    • @StatisticsGlobe
      @StatisticsGlobe  2 місяці тому

      Hey, thanks for your kind comment. I assume you could simply copy and paste the code from the description into notepad, couldn't you?

    • @uma9183
      @uma9183 2 місяці тому +1

      @@StatisticsGlobe I am telling in your channel space point of view, and other also convient ;; thank you for your response

    • @uma9183
      @uma9183 2 місяці тому +1

      please make video satellite data handle in R

    • @StatisticsGlobe
      @StatisticsGlobe  2 місяці тому

      Thanks for the topic suggestion! I'm not an expert on this, but it might be a nice topic for the future.