Here's a fun pet project I've been working on: udreamed.com/. It is a dream analytics app. Here is the UA-cam channel where we post a new video almost three times per week: ua-cam.com/channels/iujxblFduQz8V4xHjMzyzQ.html Also available on iOS: apps.apple.com/us/app/udreamed/id1054428074 And Android: play.google.com/store/apps/details?id=com.unconsciouscognitioninc.unconsciouscognition&hl=en Check it out! Thanks!
You were the best to explain k-means . congratulation. You just have to avoid making confusion mistakes during explanation because actually student i mean beginners, don't like that, and if they dont like that they will can't put you a like because they already swallow it with confusion
one of the best explaination i have seen! sir.. you are required in good mba schools! i am studying in one of the best mba schools and they dont teach as good as you have!
Hi James, I am researching three different educational levels (low, mid, high) with the Life Value Inventory from Duanne Brown (1996). The respondents filled in a 5 item Likert scale, from which I have the results positively tested with Cronbach's Alpha. Now I want to compare the results from these three focus groups, possibly by K-means cluster analysis. How to I set my three levels of education as you set your restaurants? In my case: - Burgers = students - Nutrient levels = Likert scale results - Restaurant = educational levels
recently i was doing cluster analysis and your video helped a lot. I would like you ask you if I have two clusters then how to find out the centre of each cluster...i.e. mean of low and high categories
Very helpful but I'm confused concerning my variables. I'm looking at three psychological variables (independent) in relation to test performance (dependent). Do I have to throw all variables in K-means and compare how clusters are formed or do I have only to throw the independent and hen profile participants and run ANOVA with the dependent test scores and see how test scores relate to participants' profiles?
Hello James, Thank you very much for your videos, they helped me lot. I am wondering how to apply the elbow method with K-means cluster to identify the number of clusters? do you have any idea please? Thank you
I've never used the elbow approach, although it sounds like just looking at a screen plot and finding the elbow. Here is a video on validating cluster solutions in k-means analysis: ua-cam.com/video/yWwHi8RTYnQ/v-deo.html
James, great videos. Why do you say k-means is your least favorite? How is it different from 2-step? Aside from the 2-step in spss being so much more visually appealing.
+John Taveras K-means only let's you check one solution at a time. It is also only non-hierarchical (which is the same issue really). I like two-step because it is both hierarchical and non-hierarchical. The visuals are a plus as well.
The impression I got of 2-step (from your videos) was that it's basically kmeans with the option to set a max number of clusters ... Which I assumed was just running multiple kmeans and then picking the best k. Am I far off there? Or is there a big difference in the actual cluster algorithm. Great videos again!
+John Taveras With two step, it does one step as hierarchical, and the other step as non-hierarchical. In this way, it avoids limitations and exploits advantages of both approaches. Two step also allows us to specify continuous vs categorical variables and allows for evaluation fields. The algorithm is different than just multiple k-means.
Hi, I was wondering how to read in cluster centers from an external file (after having done the hierarchical clustering) as SPSS always shows error messages (not correct format or one variable name is incorrect). Do you have a video for that? or any solution to my problem?
Hi James! very helpful video. Thanks. Could you help me to test a hypotheses? Hypotheses is " There is no impact of Promotional Strategies on Consumer's buying decisions in online shopping." The Question I have a framed as follows: Level of influence on buying decisions (likert scale) Strategies Extremely very much moderately slightly not at 1.Deals & Discounts influenced influenced influenced influenced influenced 2.Coupon codes 3. Loyalty program 4. Fast delivery option 5. try & buy option Please help me which statistics should I use for testing the hypotheses?
Usually we do not hypothesize no effect, as this is the default "null" hypothesis. If you were to test this, then you could just look at the average response. If your hypothesis is to be supported, you would need to observe that most people responded with "slightly" or "not at all".
Hi James, Thank you so much for your video! May I ask you a question? Is there any criteria about how many cluster should I choose? Like any statistical way to compare the fitness of 3 clusters v.s. 4 clusters? Thank you in advance!
Hi James, thanks for the video. But I am wondering why anova table is concerned here, as clustering is descriptive, so it's not making any inference to a population, in this case why would I care if it's statistically significant?
+Peng Xu The ANOVA table just shows us which variables are providing meaningful contributions toward clustering the cases. If the ANOVA shows no significance for a variable, then the clusters are not very different with regards to that variable.
thank you very much! but there is still one thing i don't understand, if i have questioners and i want the results of all subjects to be divided to 2 groups using this method, but i don't want it to be divided randomly , i want to decide by other variable for example gender and see the results. where do i choose gender? thanks again
Hi James, Thank you so much for making these perfect videos :). I have a question; at the end of the video when you want to check if the clusters are different based on their membership, shall we use Zvalues or usual values?
@@Gaskination Thank you :) and one more question: if we have more than 10 variables for clustering, shall we change the Maximum Iterations or not? is there any specific rule about that?
@@boshrahejraty1708 I don't know of any published rules on iteration limits. However, I would not recommend iterating more than 3x the number of variables included.
Hi James, thank you for the video! Really clear instructions and explanations. However, I got a little bit different data - behavioral binary data - yes/no to 18 questions. I followed your instructions, but the problem is that every time I repeat the actions, the results are different. Is k-means a good method for this type of analysis?
+Gabriele Vasiliauskaite Binary data is rough for cluster analysis because there is not much variance to work with. In such cases, it might be better to determine if there are underlying dimensions to your items (do some of them conceptually group together), and then create scores (sums) of them. Then you can do a cluster analysis on these few dimensions with greater variance, than on the 18 binary variables with no variance.
can I use k-mean analysis for trying to identify which attribute (namely:,economic sector, age of company, number of employees, address, income of company, etc) explains more another attribute of a company, namely: Company belongs to an "x" sindicate?
Hi James, thanks for the video. Have a query though - Lets say for 60% of the rows (burgers and sandwiches in this case), the number of cluster solutions (1,2,3) are explaining the data quite well, however for remaining 40% of the rows, the cluster solution doesn't make lot of sense, so can we run trial 2 by considering only 40% of the rows?
Hi James, I come back to you with another query and would appreciate if you could address this one too Genius - I had a set of 60 attitudinal/behaviour statements (asked on 1-5 scale), which I reduced to 20 explainable Factors......Now I wish to run K-means cluster analysis on these 20 Factors. Please let me know if I need to Standardize the data of these 20 Factors before taking these as an input variables for K-Mean, OR I can directly take these 20 Factors as an input variables for K-Mean.
If you used factor analysis, then you should extract/save factor scores for the 20 factors. In SPSS, these will automatically be standardized. Then use these 20 new (standardized) factor scores for the cluster analysis, instead of using the original 60 items.
Hi James, thanks for mentoring me...just a last query (and apologies to bother you) - In the most of the segmentation exercise, I have to deal with binary (dichotomous) data that are coded as 0, 1. I understand K-means doesn't works best with binary data, therefore I tried converting the data to scale by running factor analysis. However running K-means on factors didn't give reasonable segments, rather K-means on Binary data gave logical and actionable segments. Would you recommend to run K-means on Binary Data or is there any other way to do Segmentation exercise with this kind of data?
Hi James, Can you indicate a good reference for Cubic Criterion Clustering?? I know what it is because I searched on internet, but for sure that there is some book/author that I can cite about this issue.
+talia romag I'm not sure there is a recommended number of variables. It can handle as many as you want. Just realize that the more variables you put in there, the more difficult it may become to interpret the findings.
Dear Researcher, kindly guide, how can i cluster the questionnaire line items of the large data set. like more than1000 observation? I have 79 final line items of questionnaire now i want to cluster the line items into distinct latent variable. kindly guide me how can i cluster the line items. thanks in anticipation.
If you mean you want to cluster the variables, you should use factor analysis instead. The most common approach is principle components analysis. Here is a video: ua-cam.com/video/VBsuEBsO3U8/v-deo.html
@@Gaskination SIr , i want to cluster the line items. I already have done principal component analysis but my supervisor want to see some sort of cluster analysis as a new contribution. Respected SIR, kindly guide, I have 79 questions (line items) in questionnaire now I want to cluster these questions (line items) into distinct variable. Kindly guide me how can I cluster these questions (line items) in spss. Thanks in anticipation.
@@muhammadqasim-bm7oj The way to statistically cluster line items (variables/questions/measures) is to perform an EFA such as PCA. That is the clustering method for columns in a dataset.
@@Gaskination Thanks for responding. Yeah they definitely move for sure. The problem is that to get the EXACT same results in R that you do in SPSS K-Means (Quick cluster function particularly), you need those initial cluster centers to be the same. Once you get those the iterative process follows the same path. FYI i love your videos, its almost like we are all in study group learning together
Here's a fun pet project I've been working on: udreamed.com/. It is a dream analytics app. Here is the UA-cam channel where we post a new video almost three times per week: ua-cam.com/channels/iujxblFduQz8V4xHjMzyzQ.html
Also available on iOS: apps.apple.com/us/app/udreamed/id1054428074
And Android: play.google.com/store/apps/details?id=com.unconsciouscognitioninc.unconsciouscognition&hl=en
Check it out! Thanks!
Your voice in this video is so good for teaching cause it doesn’t make me sleep. Thanks for good information
Hi James! I really don't know how to thank you. I've been watching your videos for over a week and I find them really useful. Thank you again.
+Harinder Singh I'm glad you have found these helpful.
You were the best to explain k-means . congratulation.
You just have to avoid making confusion mistakes during explanation because actually student i mean beginners, don't like that, and if they dont like that they will can't put you a like because they already swallow it with confusion
Million of thanks for your video james 🙏
Good video, James! Excellent way of talking of talking us through the subject.
It's really great James for sharing some finer points.
one of the best explaination i have seen! sir.. you are required in good mba schools! i am studying in one of the best mba schools and they dont teach as good as you have!
Hello Mr. James...I find your videos really helpful...I would like you to thank a lot.....
You are literally saving my life right now, thank you so much for this video!
Thanks so much for this video! Saved a lot of my time on preparing the final paper.
same here! helped me prepare for my mba exam!
Hi James, I am researching three different educational levels (low, mid, high) with the Life Value Inventory from Duanne Brown (1996). The respondents filled in a 5 item Likert scale, from which I have the results positively tested with Cronbach's Alpha. Now I want to compare the results from these three focus groups, possibly by K-means cluster analysis. How to I set my three levels of education as you set your restaurants?
In my case:
- Burgers = students
- Nutrient levels = Likert scale results
- Restaurant = educational levels
Sounds like ANOVA would be more appropriate. Use the group variable as the factoring variable and use the LVI as the dependent variable.
recently i was doing cluster analysis and your video helped a lot. I would like you ask you if I have two clusters then how to find out the centre of each cluster...i.e. mean of low and high categories
sometimes there is an option to save cluster scores or distances from the centroid. see if that is an option.
Very helpful but I'm confused concerning my variables. I'm looking at three psychological variables (independent) in relation to test performance (dependent). Do I have to throw all variables in K-means and compare how clusters are formed or do I have only to throw the independent and hen profile participants and run ANOVA with the dependent test scores and see how test scores relate to participants' profiles?
You can keep the DV out of it until the ANOVA.
Hello James, Thank you very much for your videos, they helped me lot. I am wondering how to apply the elbow method with K-means cluster to identify the number of clusters? do you have any idea please? Thank you
I've never used the elbow approach, although it sounds like just looking at a screen plot and finding the elbow. Here is a video on validating cluster solutions in k-means analysis: ua-cam.com/video/yWwHi8RTYnQ/v-deo.html
@@Gaskination thank you very much, yes I will use this method! Have a Nice day
Hi James, Thanks for the video! please, how do I combine k-means with mlp in an ensemble form using ibm spss?
I'm not sure what MLP is. Is it multilayer perception? I'm not sure how to integrate that with k-means... MLP can also stand for My Little Pony :)
James Gaskin yes multilayer perceptron
James, great videos. Why do you say k-means is your least favorite? How is it different from 2-step? Aside from the 2-step in spss being so much more visually appealing.
+John Taveras K-means only let's you check one solution at a time. It is also only non-hierarchical (which is the same issue really). I like two-step because it is both hierarchical and non-hierarchical. The visuals are a plus as well.
The impression I got of 2-step (from your videos) was that it's basically kmeans with the option to set a max number of clusters ... Which I assumed was just running multiple kmeans and then picking the best k. Am I far off there? Or is there a big difference in the actual cluster algorithm.
Great videos again!
+John Taveras With two step, it does one step as hierarchical, and the other step as non-hierarchical. In this way, it avoids limitations and exploits advantages of both approaches. Two step also allows us to specify continuous vs categorical variables and allows for evaluation fields. The algorithm is different than just multiple k-means.
Great Video, thanks!
Hi, I was wondering how to read in cluster centers from an external file (after having done the hierarchical clustering) as SPSS always shows error messages (not correct format or one variable name is incorrect). Do you have a video for that? or any solution to my problem?
hmmm... I don't have a video for that. I also have never done it, so I'm not sure... sorry about that. Best of luck to you!
Hi James! very helpful video. Thanks.
Could you help me to test a hypotheses? Hypotheses is " There is no impact of Promotional Strategies on Consumer's buying decisions in online shopping."
The Question I have a framed as follows:
Level of influence on buying decisions (likert scale) Strategies Extremely very much moderately slightly not at
1.Deals & Discounts influenced influenced influenced influenced influenced
2.Coupon codes
3. Loyalty program
4. Fast delivery option
5. try & buy option
Please help me which statistics should I use for testing the hypotheses?
Usually we do not hypothesize no effect, as this is the default "null" hypothesis. If you were to test this, then you could just look at the average response. If your hypothesis is to be supported, you would need to observe that most people responded with "slightly" or "not at all".
Hi James,
Thank you so much for your video! May I ask you a question? Is there any criteria about how many cluster should I choose? Like any statistical way to compare the fitness of 3 clusters v.s. 4 clusters? Thank you in advance!
The minimization of AIC is one good criteria.
Hi James, thanks for the video. But I am wondering why anova table is concerned here, as clustering is descriptive, so it's not making any inference to a population, in this case why would I care if it's statistically significant?
+Peng Xu The ANOVA table just shows us which variables are providing meaningful contributions toward clustering the cases. If the ANOVA shows no significance for a variable, then the clusters are not very different with regards to that variable.
Thank you. What was important to me - type of variables, whether we standardize them-) thanks
Bravo!!
thank you very much! but there is still one thing i don't understand, if i have questioners and i want the results of all subjects to be divided to 2 groups using this method, but i don't want it to be divided randomly , i want to decide by other variable for example gender and see the results. where do i choose gender? thanks again
Hi James, Thank you so much for making these perfect videos :). I have a question; at the end of the video when you want to check if the clusters are different based on their membership, shall we use Zvalues or usual values?
I would use the original values (because they are easier to interpret), although it really should make no difference.
@@Gaskination Thank you :) and one more question: if we have more than 10 variables for clustering, shall we change the Maximum Iterations or not? is there any specific rule about that?
@@boshrahejraty1708 I don't know of any published rules on iteration limits. However, I would not recommend iterating more than 3x the number of variables included.
Hi James, thank you for the video! Really clear instructions and explanations. However, I got a little bit different data - behavioral binary data - yes/no to 18 questions. I followed your instructions, but the problem is that every time I repeat the actions, the results are different. Is k-means a good method for this type of analysis?
+Gabriele Vasiliauskaite Binary data is rough for cluster analysis because there is not much variance to work with. In such cases, it might be better to determine if there are underlying dimensions to your items (do some of them conceptually group together), and then create scores (sums) of them. Then you can do a cluster analysis on these few dimensions with greater variance, than on the 18 binary variables with no variance.
+James Gaskin thaэts a very good idea! Thanks!
can I use k-mean analysis for trying to identify which attribute (namely:,economic sector, age of company, number of employees, address, income of company, etc) explains more another attribute of a company, namely: Company belongs to an "x" sindicate?
Not sure I understand. Usually when you want an attribute to explain another attribute you use regression or correlation.
Hi James, thanks for the video. Have a query though - Lets say for 60% of the rows (burgers and sandwiches in this case), the number of cluster solutions (1,2,3) are explaining the data quite well, however for remaining 40% of the rows, the cluster solution doesn't make lot of sense, so can we run trial 2 by considering only 40% of the rows?
suresh patel that’s perfectly fine as long as there is a good theoretical or practical reason for the difference in your sample.
Thanks so much for your guidance.
Hi James, I come back to you with another query and would appreciate if you could address this one too Genius - I had a set of 60 attitudinal/behaviour statements (asked on 1-5 scale), which I reduced to 20 explainable Factors......Now I wish to run K-means cluster analysis on these 20 Factors. Please let me know if I need to Standardize the data of these 20 Factors before taking these as an input variables for K-Mean, OR I can directly take these 20 Factors as an input variables for K-Mean.
If you used factor analysis, then you should extract/save factor scores for the 20 factors. In SPSS, these will automatically be standardized. Then use these 20 new (standardized) factor scores for the cluster analysis, instead of using the original 60 items.
Hi James, thanks for mentoring me...just a last query (and apologies to bother you) - In the most of the segmentation exercise, I have to deal with binary (dichotomous) data that are coded as 0, 1. I understand K-means doesn't works best with binary data, therefore I tried converting the data to scale by running factor analysis. However running K-means on factors didn't give reasonable segments, rather K-means on Binary data gave logical and actionable segments.
Would you recommend to run K-means on Binary Data or is there any other way to do Segmentation exercise with this kind of data?
Hi James,
Can you indicate a good reference for Cubic Criterion Clustering?? I know what it is because I searched on internet, but for sure that there is some book/author that I can cite about this issue.
I'm not sure. I've never heard of the cubic criterion. Best of luck to you.
i have a question, what is the number of recommended variables to use for this type of analysis?
+talia romag I'm not sure there is a recommended number of variables. It can handle as many as you want. Just realize that the more variables you put in there, the more difficult it may become to interpret the findings.
+James Gaskin thank you for it time!
Dear Researcher, kindly guide, how can i cluster the questionnaire line items of the large data set. like more than1000 observation?
I have 79 final line items of questionnaire now i want to cluster the line items into distinct latent variable. kindly guide me how can i cluster the line items. thanks in anticipation.
If you mean you want to cluster the variables, you should use factor analysis instead. The most common approach is principle components analysis. Here is a video: ua-cam.com/video/VBsuEBsO3U8/v-deo.html
@@Gaskination SIr , i want to cluster the line items.
@@Gaskination SIr , i want to cluster the line items. I already have done principal component analysis but my supervisor want to see some sort of cluster analysis as a new contribution.
Respected SIR, kindly guide, I have 79 questions (line items) in questionnaire now I want to cluster these questions (line items) into distinct variable. Kindly guide me how can I cluster these questions (line items) in spss. Thanks in anticipation.
@@muhammadqasim-bm7oj The way to statistically cluster line items (variables/questions/measures) is to perform an EFA such as PCA. That is the clustering method for columns in a dataset.
Thank you so much !!!!
Why does everyone skip the initial cluster centers?
Is it because no one knows how it is calculated?
My guess is because those centers move after iterating. So, the initial cluster centers are not very informative for the final solution. (I think...)
@@Gaskination Thanks for responding. Yeah they definitely move for sure. The problem is that to get the EXACT same results in R that you do in SPSS K-Means (Quick cluster function particularly), you need those initial cluster centers to be the same. Once you get those the iterative process follows the same path.
FYI i love your videos, its almost like we are all in study group learning together
thank s jamesssss
terima kasih thanks
could share the file burgers.sav??
Caleb Terrel Orellana Sure. Here is a link. I hope it works properly.
www.dropbox.com/s/58771yb3cgk5mu6/BurgersOriginal.sav?dl=0
James Gaskin Thanks a lot!
It is goin down...please!!!