K-Means Cluster Analysis in SPSS (SPSS Tutorial Video #30)

Поділитися
Вставка
  • Опубліковано 31 лип 2024
  • In this video I describe how to conduct and interpret the results of K-Means Cluster Analysis in SPSS. I especially emphasize using Hierarchical cluster analysis analysis to first determine the number of clusters in your data and then use that result as an input to the k-Means algorithm.
    This SPSS tutorial series is designed to teach you the basics of how to analyze and interpret the results of data using SPSS. I will cover everything from the very basics of the main windows within SPSS, to manipulating data, to running and interpreting meaningful analyses like t-tests, ANOVA, regression, and many more, and visualizing results.
    Link to Hierarchical Cluster Analysis Video: • Hierarchical Cluster A...
    Link to K-Mean Cluster Analysis Video: • K-Means Cluster Analys...
    Link to Two Step Cluster Analysis Video: • Two Step Cluster Analy...
    The data file used in this video can be found here: drive.google.com/file/d/1-Bbn...
    Video tutorial and walkthrough of the data file used in this video: • Introduction to Data F...
    Playlist of video covering INTUITION for statistics and data science: • Data Intuition
    All the SPSS tutorial videos are in this playlist: • SPSS Tutorials
    Learn more about who I am and why I'm doing this here: • Data Demystified - Who...
    Follow me at:
    LinkedIn: / jeff-galak-768a193a
    Patreon: / datademystified
    Website: www.jeffgalak.com/datademystified
    Equipment Used for Filming:
    Nikon D7100: amzn.to/320N1FZ
    Softlight: amzn.to/2ZaXz3o
    Yeti Microphone: amzn.to/2ZTXznB
    iPad for Teleprompter: amzn.to/2ZSUkNh
    Camtasia for Video Editing: amzn.to/2ZRPeAV

КОМЕНТАРІ • 68

  • @nawilliam2754
    @nawilliam2754 2 роки тому

    After a long search , finally something easy to understand

  • @jessicamartin1446
    @jessicamartin1446 2 роки тому

    Great! I was able to complete my entire assignment, using only this video

  • @anass2243
    @anass2243 Рік тому

    I really thank you for this great series of videos they have been so much useful in my research

  • @bernardoluca6613
    @bernardoluca6613 3 роки тому +3

    Fantastic explanation! nothing to do with all those videos out there! keep going like this!

  • @xeniavlasenko9830
    @xeniavlasenko9830 3 роки тому +2

    This is the 5th video I wath on K-Means and it FINALLY made sense. Thank you so much!

    • @DataDemystified
      @DataDemystified  3 роки тому

      I'm so glad to hear that! Is there something in particular that made the content here more understandable? I ask so that I can make sure to incorporate that type of teaching in my other videos. Thanks!

    • @xeniavlasenko9830
      @xeniavlasenko9830 3 роки тому +1

      @@DataDemystified I guess commenting along the way on how to interpret the results/ how all these program steps and numbers in the tables are part of the "story" was particularly helpful :)

    • @DataDemystified
      @DataDemystified  3 роки тому +1

      @@xeniavlasenko9830 Thank you for the feedback! I will make sure to incorporate it into new tutorial videos!

  • @martinpeikert6746
    @martinpeikert6746 Рік тому

    So clear, thank you so much!

  • @dsavkay
    @dsavkay 4 місяці тому

    Great advanced info, subscribed!

  • @tacs3
    @tacs3 8 місяців тому

    thank you so much! for this one and the hierarchical one!

  • @StevenWang82
    @StevenWang82 Рік тому

    Thank you very much, this video is very easy to understand !!

  • @erikailles9598
    @erikailles9598 2 роки тому +1

    You are a hero!

  • @miakirk7010
    @miakirk7010 2 роки тому

    Very clear explanations. Thank you.

  • @abdullahisani9746
    @abdullahisani9746 Рік тому

    Thanks for the demonstration

  • @LXiao33
    @LXiao33 2 роки тому +1

    brilliant! thank you for uploading this video!

    • @DataDemystified
      @DataDemystified  2 роки тому

      My pleasure!

    • @LXiao33
      @LXiao33 2 роки тому

      @@DataDemystified I wonder whether I should choose cluster analysis in SPSS or perform latent class analysis using Mplus to identify the underlying groups in my data, I am still a bit confused. Can you kindly provide some advice? Thank you.

    • @DataDemystified
      @DataDemystified  2 роки тому

      @@LXiao33 That entirely depends on your research question. Without knowing that, I really can't answer your question. Sorry!

  • @transitionperf_MPO
    @transitionperf_MPO 2 роки тому +1

    Thank you for a great explanation! I was wondering how to view demographic characteristics between each established cluster. For example, viewing percentage breakdowns of age, gender, etc. in each cluster. Thanks!

  • @GenuineReciprocity
    @GenuineReciprocity 2 роки тому +4

    Your videos are so easy to understand and its so amazing how many people your kindness has been helping! I have a small question and was wondering if you can share your insight about it if you have time available. A study that I am trying to replicate has categorized individuals based on whether they score above or below the mean on two variables (i.e., high high, high low, low high, low low - 4 categories). I was advised that that technique was crude and that I should instead use a cluster analysis to categorize the groups. Why would cluster analysis be a better statistical analyses than what the original authors did in categorizing the variables? Sorry to trouble you! I look forward to more of your incredibly helpful videos!

  • @lydialim1993
    @lydialim1993 3 роки тому +1

    Wonderful series! Keep it up!

    • @DataDemystified
      @DataDemystified  3 роки тому

      Thank you! Any topics you'd specifically like to see covered?

    • @lydialim1993
      @lydialim1993 3 роки тому

      @@DataDemystified Any chance you'll do one on Structural Equation Modelling? Like I know it's a bunch of regressions under the hood, but it would be nice to see a proper demo of how to use one in real life.

    • @DataDemystified
      @DataDemystified  3 роки тому +1

      @@lydialim1993 Great idea, but I don't know if that'll happen any time soon. The challenge is that you need the AMOS package for SPSS, which most people don't have (including me, at the moment). That said, I'll look into how much demand there is for something like this! Thanks for the suggestion!

  • @tracyquetzal9477
    @tracyquetzal9477 2 роки тому

    Hi Professor, very good presentation. I would like to know how can you understand your cluster in order to label them? What patterns do you find to classify your cluster?

  • @aarinwood4522
    @aarinwood4522 2 роки тому

    Great series of videos -- thank you! I do have one follow up question: What are the sample size requirements for Cluster Analysis? Thank you!

  • @ezeugochukukere1538
    @ezeugochukukere1538 2 роки тому

    This is very helpful. Oddly enough the reason i came across this video was because i was searching on how to calculate the initial cluster centers in SPSS.
    I need them for my R script to perfectly replicate the K mean clusters analysis i run in spss...inputting the initial cluster centers calculated in SPSS provides the exact same results for the final cluster solution in R
    ...it was the first thing you said we don't need but i am pretty desperate in my search to find out how those initial cluster centers are calculated. Any help you could provide would be huge

  • @user-ry2pb8zg7w
    @user-ry2pb8zg7w Рік тому

    thank you for the great video, would you please explain about how to apply elbow method to find cluster number?

  • @zahraalinam62
    @zahraalinam62 3 місяці тому

    Which method of hierarchical or K-means is the most appropriate for dichotomous variables with binary coding (0,1) showing the presence and absence of a variable?

  • @zahraalinam62
    @zahraalinam62 3 місяці тому

    In case the Sig for some variables is bigger than .001 what should we do? Should we screen and remove them and do the cluster analysis again?

  • @GhadeerShm
    @GhadeerShm 10 місяців тому

    hi can I did references for the way how you had selected the variables ? or what it is called ?

  • @lingkan1984
    @lingkan1984 8 місяців тому

    To cluster analysis for multimorbidity, is there any special format to arrange the data?

  • @vindaflyfox
    @vindaflyfox 2 місяці тому

    Hello, I am wanting to follow this process by doing a hierarchical cluster analysis to determine the k for my k-means analysis. My question is, my variables are not all on the same scale so in the hierarchical cluster analysis I will need to convert them into z-scores or something similar so they are comparable. How does this impact the k-means cluster analysis? Do I need to do an extra step here or will my variables already be converted and able to be used again after the hierachical analysis?

  • @deborahhaile4191
    @deborahhaile4191 2 роки тому

    How can run k-mean clustering algorithm for 40 sample with four variables to group the sample to into two?

  • @musiknation7218
    @musiknation7218 2 роки тому

    I need to do assignments between kmean and improved kmean cluster analysis,can pls tell me how to do that

  • @katiesharp8080
    @katiesharp8080 2 роки тому +4

    Hi I love your videos, really helping me analysis my dissertation data :) I was wondering if you had any videos that touched on how to identify the characteristics of your clusters? i.e. age, gender, those sort of things?

    • @DataDemystified
      @DataDemystified  2 роки тому +2

      I don’t, but basically you’re just going to run either t-tests/ANOVA or cross tabs. You’d use the cluster number as the independent variable and your demographic as the dependent variable. I have a bunch of videos on those techniques in the SPSS playlist on this channel. Good luck!

  • @divyajaiswal4330
    @divyajaiswal4330 11 місяців тому +1

    Can k means clustering data be represented graphically? If yes, how?

  • @tacs3
    @tacs3 8 місяців тому

    how can we plot this data in spss the way R does? is there a way?

  • @mariabecker1803
    @mariabecker1803 3 роки тому +1

    Dear Jeff, I was wondering if I could ask you one more question. As I am working with z-scores and trying to compare the means (of z-scores) at the end of the cluster analysis in order to show the difference of variables within and between the clusters, I encountered very high means of z-scores ranging up to 4 or 5. Could this be an indication of outliers? Would you suggest me to remove all the outliers before the analysis or would this change the dataset too much and you would just report it as it is? Thank you!!

    • @DataDemystified
      @DataDemystified  3 роки тому

      4-5 on a z-score is pretty high. We typically consider statistical outliers as being more than 3 standard deviations from the mean (which translates to a z-score of 3 or more). The choice to remove data, based on outliers, however, is a lot more complex. Did you pre-specifiy that you would do so? Are you doing it because your results, inclusive of the outliers don't "Look good"? The point is to make sure that your exclusion isn't going to artificially inflate Type 1 error (p-hacking). Good luck!

    • @mariabecker1803
      @mariabecker1803 3 роки тому +1

      @@DataDemystified I did not pre-specify that I would do that. Just compared to other cluster analysis, with other data, and their results (mean z-scores), mine are very high, so I thought that I might have done a mistake and that it would be best to remove the outliers. However, I do not want to manipulate my data. Maybe it is enough to just mention the high z-scores but leave them in the data? Thank you!

    • @DataDemystified
      @DataDemystified  3 роки тому

      ​@@mariabecker1803 I don't know what context you're reporting in (academic paper, school assignment, etc...) but transparency is always a good thing. At minimum, add a footnote with the explanation. Better yet is a robustness check that is explicitly exploratory: see what happens when you drop those outliers. Do the results meaningfully change? If they do, report that and speculate as to why. If they don't, report that as well with a note about how your results are robust to their removal.

    • @mariabecker1803
      @mariabecker1803 3 роки тому

      @@DataDemystified Dear Jeff, it´s part of my dissertation so I really want to do a thorough job. I will definetly do a robustness check and am curious to see what will change. So thank you for your advice!

  • @rabeeyafarooq2788
    @rabeeyafarooq2788 4 місяці тому

    How do we define the names as to what is increasing and what is not

  • @mahdifareghi3916
    @mahdifareghi3916 9 місяців тому

    Hello if any video about anaaysis kmean results deeper

  • @joycethegreat9259
    @joycethegreat9259 3 місяці тому

    During my conjoint analysis, there is no important value and utilities because spss stated "no analysis is performed because there are no valid cases" how to solve this. I did cluster analysis to get the utilities and std.error of each cluster but after performing conjoint to my one cluster, conjoint won't show results. Please help. I have no missing values, no duplication and whatsoever.

  • @aviralbhatt1664
    @aviralbhatt1664 2 роки тому

    Hello, I have a doubt and I would really appreciate if you could clarify it. So do we use Hierarchial Cluster Analysis to identify the potential clusters and then K-Means to understand how those clusters are different from each other?

    • @DataDemystified
      @DataDemystified  2 роки тому +1

      We use Hierarchical Cluster analysis to identify the most likely # of clusters. We then use k-means to actually create those clusters and explore them. Hope that helps!

    • @aviralbhatt1664
      @aviralbhatt1664 2 роки тому

      @@DataDemystified yes it does thanks alot 🙌

  • @mehmettolgataner8878
    @mehmettolgataner8878 3 місяці тому

    Is it the same on SPSS29?

  • @musiknation7218
    @musiknation7218 2 роки тому

    How to do improved k mean cluster analysis

  • @sachikogaming1137
    @sachikogaming1137 2 роки тому

    Is it necessary to correlate first the variables before proceeding to clustering. Is it important to select only variables that are correlated, for analysis.

    • @DataDemystified
      @DataDemystified  2 роки тому

      Nope. Clustering does not require variables to be correlated.

  • @mariabecker1803
    @mariabecker1803 3 роки тому +1

    Hi, I was wondering how to read in cluster centers from an external file (after having done the hierarchical clustering) as SPSS always shows error messages (not correct format or one variable name is incorrect). Do you have a video for that? or any solution to my problem?

    • @DataDemystified
      @DataDemystified  3 роки тому

      Sorry you're having trouble with that. I don't have a video on the topic and don't often import cluster centers from an external file. Is there a reason you are doing it that way rather than natively running the analysis on the data?

    • @mariabecker1803
      @mariabecker1803 3 роки тому

      @@DataDemystified yes, I am using k-means clustering in order to validate the cluster centers/numbers of clusters that I have calculated with hierarchical clustering. Therefore, I want to use the cluster centers that I have (from the hierarcical clustering) as a starting point and see what changes when I do the k-means clustering. However, no matter what I do (even when I do everything according to the literature) I get error messages and SPSS has troubles reading in the cluster centres from an external file. Would you know what I could do to avoid the error messages and get my results?

    • @DataDemystified
      @DataDemystified  3 роки тому

      @@mariabecker1803 Got it. One option is to just re-run your hierarchical clustering with the original data and then, in the same data file, run the k-means clustering. Save the cluster membership for both analyses, and then do your comparison. If that's not possible and the import isn't working, you can always do it manually. As in, sort the data by some identifier and copy and paste the column of data from your original data (where the hierarchical analysis is) into the new data file (where you plan to run k-means). I hope that helps!

    • @mariabecker1803
      @mariabecker1803 3 роки тому

      @@DataDemystified Thank you! I have tried that already and it works to compare the two in the same data file. This is not the problem. However, I saw that the cluster memberships are completely different (hierarchical and kmeans), therefore I wanted to do the k-means clustering with the same cluster centers as I discovered in the hierarchical in order to see where the difference is when both have the same starting point, if that makes sense? It is just that there is no other way in order to put in the starting points (cluster centers) manually and only do it with the read in, I guess? which in my case is not working. Therefore, I do not know how to proceed.

    • @DataDemystified
      @DataDemystified  3 роки тому

      @@mariabecker1803 My only suggestion at this point is to make sure you are using Ward's Method in your hierarchical clustering. That tends to give results closest to k-means. Good luck!

  • @Netsi-ed6ee
    @Netsi-ed6ee 3 місяці тому

    prof i need your support can you help me