This is great! Very helpful and thorough analysis that goes beyond basics. Would love to see your approach to dealing with mixed type variables - categorical and numerical, and features selection.
Hi, I've seen several guides for data clustering. But what I haven't found is how to retrieve the component ids of each individual cluster. For example, if I analyze a customer base of 1000 users and divide it into 3 clusters, it is possible to have a list of the components of each individual cluster in table form. Regards Daniele
Hi Daniele, after clustering you get a new variable cluster which identifies cluster number the record belongs to, if you wanted to separate members of a given cluster into a separate data frame, use subset/select to get them out.
If you go to the github, you will find the "clustering_101_util.r" file, where I've hidden some of the extra functions, which I used in this video, among them is "set_plot_dimensions" (relies on ggplot2).
@@ironfrown Thank you for your reply! I only saw code for the utilities not the code that runs through this example. Do you have the latter? Thanks again!
@@qiongyang5470 the github includes code for all, utilities, model training and its application. What may confuse you is its form. It is not in a single and plain R file but rather in a Jupiter notebook, which nowadays is my preferred way of developing R (after years of using R Studio). The notebook files look like documentation but include executable R code, which was presented in the video.
@@ironfrown Thank you! Get it now. I got another question: i noticed that you got rid of all the categorical vars in the demo. What do you think will happen when we have a bunch of categorical vars in an exercise?
@@qiongyang5470 if you include any categorical variables, you will need to convert them to either ordinal or dummy variables first. However, you will need to adjust some of the steps which unfortunately may fail when non continuous vars are present.
great explanation, thank you
Best video I've come across for pca and clustering with R, thank you.
This is great! Very helpful and thorough analysis that goes beyond basics. Would love to see your approach to dealing with mixed type variables - categorical and numerical, and features selection.
Awesome work, helped me a lot. Thank you!
Many thanks for sharing this
Hi, I've seen several guides for data clustering. But what I haven't found is how to retrieve the component ids of each individual cluster. For example, if I analyze a customer base of 1000 users and divide it into 3 clusters, it is possible to have a list of the components of each individual cluster in table form. Regards Daniele
Hi Daniele, after clustering you get a new variable cluster which identifies cluster number the record belongs to, if you wanted to separate members of a given cluster into a separate data frame, use subset/select to get them out.
It is very helpful video thank you very much...... But what I want ask you that you that how can we solve the problems Set_Plot_dimension R studio.
If you go to the github, you will find the "clustering_101_util.r" file, where I've hidden some of the extra functions, which I used in this video, among them is "set_plot_dimensions" (relies on ggplot2).
Thanks for sharing! Would you be willing to share the complete r code?
If you look in the video description you will find links to the github with the code
@@ironfrown Thank you for your reply! I only saw code for the utilities not the code that runs through this example. Do you have the latter? Thanks again!
@@qiongyang5470 the github includes code for all, utilities, model training and its application. What may confuse you is its form. It is not in a single and plain R file but rather in a Jupiter notebook, which nowadays is my preferred way of developing R (after years of using R Studio). The notebook files look like documentation but include executable R code, which was presented in the video.
@@ironfrown Thank you! Get it now. I got another question: i noticed that you got rid of all the categorical vars in the demo. What do you think will happen when we have a bunch of categorical vars in an exercise?
@@qiongyang5470 if you include any categorical variables, you will need to convert them to either ordinal or dummy variables first. However, you will need to adjust some of the steps which unfortunately may fail when non continuous vars are present.