Validating K-means cluster anslysis in SPSS

Поділитися
Вставка
  • Опубліковано 3 жов 2024
  • In this video I show and explain how to determine the appropriate and valid number of factors to extract in a k-means cluster analysis.

КОМЕНТАРІ • 30

  • @zhalehmohammadalipour3542
    @zhalehmohammadalipour3542 2 роки тому

    Very great tutorial! it helped a lot. Thanks.

  • @nataliegillepiegaskins
    @nataliegillepiegaskins 2 роки тому

    Thank you for this! Nice last name!

  • @kanika8123
    @kanika8123 4 роки тому

    Thanks a lot. Very helpful video.

  • @Ana-zi4mk
    @Ana-zi4mk 8 років тому

    Hi, James. Thank you for this video. I also watched your other video regarding K-means cluster analysis in SPSS where you have mentioned: „If we can't converge in 10 iterations than we probably don’t have good data for clustering”. I am trying to learn how to do the cluster analysis and I am using some of my data. I have followed your suggestions on how to determine the number of clusters and how to validate them. In my case, I did k-means cluster analysis where I have specified 2, 3, 4 and 5 clusters. In the case of 3 cluster solution, post hoc tests were significantly different in the table presenting Multiple comparisons, but a number of iterations where 0.000 was achieved for all three clusters was 14. On the other hand, in the case of 4 cluster solutions, a number of iterations where 0.000 was achieved for all three clusters was 10, but in the table presenting Multiple comparisons two clusters were not significantly different on few variable. What is your opinion, is my data not suitable for cluster analysis?

    • @Gaskination
      @Gaskination  8 років тому +1

      +Ana It might be suitable. The more variables you include, the harder it is to converge. So, if there are lots of variables, then more than 10 iterations is fine. I don't know if there is a published threshold or guideline.

    • @eboamuah6811
      @eboamuah6811 3 роки тому

      @@Gaskination Hi James. Your work has been very helpful. I have read about silhouette as a method of validation in K mean cluster analysis. However, I don't know how to obtain that in SPSS. Is there any index in SPSS that can be used to validate the number of clusters chosen in K mean cluster analysis? Thank you

    • @Gaskination
      @Gaskination  3 роки тому

      @@eboamuah6811 silhouette is used in two-step cluster analysis in SPSS, but I don't know of a way to produce it for K-means.

  • @najeebullahahmadzai5160
    @najeebullahahmadzai5160 3 місяці тому

    Thank you sir!

  • @009kishor
    @009kishor 6 років тому

    Very helpful video 👍🏻

  • @thanghoang1944
    @thanghoang1944 4 роки тому

    THANK YOU!

  • @jdemontre
    @jdemontre 4 роки тому +1

    Hey James, I enjoy your videos specially about SEM and now cluster analysis. Thank you! I ran my data and everything went well (10 variables and ca.100 observations). The 3-cluster solution was the best in all criteria. But the Bonferroni test resulted not significant in 2 (out of 60) comparisons (p-vaue slightly higher than 0.1), does it mean the solution was not validated?

    • @Gaskination
      @Gaskination  4 роки тому

      If it is just 2 out of 60 comparisons, then this is strong evidence that it is a good clustering solution. Nice!

  • @marcelbeermann1036
    @marcelbeermann1036 4 роки тому

    Thanks for the video.
    How can I see if a cluster actually is underrepresented?

    • @Gaskination
      @Gaskination  4 роки тому

      It's just a subjective judgment. If the sample size of the cluster is small, then perhaps it is under-represented. You can see what the profile of members of that cluster looks like to determine if it is a legitimate cluster, or just an odd outlier.

  • @kieramillar-brandt2854
    @kieramillar-brandt2854 3 роки тому

    Hi James, thanks for this video. Is there a paper that can be referenced to support that a lower number of iterations is better? Or maybe a paper that indicates best practice in general for reporting the results of k-means clustering? Many thanks. Kiera

    • @Gaskination
      @Gaskination  3 роки тому

      Chapter nine of Hair et al 2010 ("Multivariate Data Analysis") is all about clustering methods.

    • @kieramillar-brandt2854
      @kieramillar-brandt2854 3 роки тому

      @@Gaskination thanks very much. That's really appreciated. Your videos are great!

  • @henrypritchard4911
    @henrypritchard4911 4 роки тому

    Hi James,
    This has been very helpful, so firstly thank you!
    I was wondering if there was a way to validate/find a statistical difference between two clusters as a post hoc one way ANOVAs cannot be performed on fewer than 3 groups/clusters of data?
    Kind Regards,
    Henry

    • @Gaskination
      @Gaskination  4 роки тому +1

      You can just use a t-test instead.

    • @henrypritchard4911
      @henrypritchard4911 4 роки тому

      @@Gaskination Thank you!

    • @henrypritchard4911
      @henrypritchard4911 4 роки тому

      @@Gaskination Hi James,
      I am sorry to be a pain with another question. I was also wondering why in these instances there is no need to test for normality of distribution before performing the ANOVA with post hoc tests?
      Thank you in advance and Kind regards, Henry

    • @Gaskination
      @Gaskination  4 роки тому

      @@henrypritchard4911 Normality of distribution is not required for cluster membership. We really just need sufficient sample size in each group.

  • @mayurgo10
    @mayurgo10 7 років тому

    my data contains 900 observations and i tried k means method, the data converges at 15 iterations for 4 cluster solution and 16 iterations for 10 cluster solution. can you suggest some good test to check which cluster solution would be better?

    • @Gaskination
      @Gaskination  7 років тому +3

      Check the AIC or BIC if that is an option. You want to minimize these. Also, check to see which solution is more helpful. Usually 3-5 clusters is most useful and anything more than 5 begins to be difficult to interpret or distinguish.

  • @statsmadeeasy7233
    @statsmadeeasy7233 Рік тому

    Hi James can we get a copy of the file that you used? I wanted to practice it.

    • @Gaskination
      @Gaskination  Рік тому

      It's the burgers dataset available on the homepage of statwiki.gaskination.com/

  • @shantanuchakrabory5527
    @shantanuchakrabory5527 4 роки тому

    K-mean cluster analysis using spss in really special one

  • @masharifulamin5682
    @masharifulamin5682 4 роки тому

    Hello James, im new here, is it possible to get the dataset to practice? plz share it with us.

    • @Gaskination
      @Gaskination  4 роки тому

      The dataset is available on the homepage of statwiki: statwiki.kolobkreations.com/

  • @karlafuentes2726
    @karlafuentes2726 3 роки тому

    In spanish plis