How to Apply PCA before K-means Clustering in R Programming (Example) | Principal Component Analysis
Вставка
- Опубліковано 25 бер 2024
- How to apply a K-means clustering algorithm after applying a PCA in the R programming language. The video also offers a preview of the upcoming Statistics Globe online course on "Principal Component Analysis (PCA): From Theory to Application in R". More details: statisticsglobe.com/online-co...
R code of this video:
install.packages("factoextra") # Install & load factoextra
library("factoextra")
data(iris) # Load data
head(iris) # Print first rows of data
iris_num <- iris[ , 1:4] # Remove categorical variable
head(iris_num) # Print first rows of final data
my_pca <- prcomp(iris_num, # Perform PCA
scale = TRUE)
summary(my_pca) # Summary of explained variance
my_pca_data <- data.frame(my_pca$x[ , 1:2]) # Extract PC1 and PC2
head(my_pca_data) # Print first rows of PCA data
fviz_nbclust(my_pca_data, # Determine number of clusters
FUNcluster = kmeans,
method = "wss")
set.seed(123) # Set seed for reproducibility
my_kmeans <- kmeans(my_pca_data, # Perform k-means clustering
centers = 3)
fviz_pca_ind(my_pca, # Visualize clusters
habillage = my_kmeans$cluster,
label = "none",
addEllipses = TRUE)
Follow me on Social Media:
Facebook - Statistics Globe Page: / statisticsglobecom
Facebook - R Programming Group for Discussions & Questions: / statisticsglobe
Facebook - Python Programming Group for Discussions & Questions: / statisticsglobepython
LinkedIn - Statistics Globe Page: / statisticsglobe
LinkedIn - R Programming Group for Discussions & Questions: / 12555223
LinkedIn - Python Programming Group for Discussions & Questions: / 12673534
Twitter: / joachimschork
Instagram: / statisticsglobecom
TikTok: / statisticsglobe
muy interesante estimado practicare con su ejercicio y agradezco su video.
That's great to hear, Rodrigo! Glad the videos are helpful!
nicely done! 👌
Thank you so much, glad you like it! :)
Great video. Thank you for making it:) It would've been interesting to plot the same, but coloring the dots using the original labels as well. Then we can see how well the groupings done using unsupervised learning compared to the original labels!!
Thanks for the kind words and the nice idea! It would definitely be nice to visualize this comparison. Next time! :)
hello can u make video how to apply multi condtion to all items for data frame and combine with and + or
i not find it on line
thanks
i mean apply condtion with and + or to alll items inside data frame
Thanks for the topic suggestion, I'll keep it in mind.
Great video. Thank you.
Is there any assumption before deciding to use PCA or PCoA?
Thanks for the kind comment, Ibrahim! Glad you liked the video. Before using PCA (Principal Component Analysis), it's assumed that linear relationships exist in the data and that the most important variance directions are the ones to focus on. For PCoA (Principal Coordinates Analysis), the assumption is that distances or dissimilarities between data points can meaningfully reflect their relationships. So it depends on your specific data whether to use PCA or PCoA. I hope this helps!
what if PC1 and PC2 only explain lets say 75% of variance? how would you proceed? is that enought or is it possible to somehow add PC3 and PC4 in the analysis?
Great video btw 👍👍
I think it still ok. However more PC becomes difficult for human perception of 3 or more dimensions.
Cheers!
Nice presentation
Hey, thanks for the great feedback, glad you like the video! Regarding your question: Yes, you can definitely add more components (and usually this is what you would do with a realistic data set). You would just have to change the number in this line of code from 2 to whatever number of components you would like to keep: my_pca_data <- data.frame(my_pca$x[ , 1:2]) Please note that it might become more difficult to visualize your data when using more components. I hope that clarifies your question! Regards, Joachim
thank you sir, but provide your script of code in notepad format ;; my suggestion only
Hey, thanks for your kind comment. I assume you could simply copy and paste the code from the description into notepad, couldn't you?
@@StatisticsGlobe I am telling in your channel space point of view, and other also convient ;; thank you for your response
please make video satellite data handle in R
Thanks for the topic suggestion! I'm not an expert on this, but it might be a nice topic for the future.