How to use Stata for Principal Component Analysis (PCA)

Поділитися
Вставка
  • Опубліковано 3 гру 2024

КОМЕНТАРІ • 16

  • @tarirondoro2886
    @tarirondoro2886 Місяць тому

    Very clear explanation. You've made it all seem easy. Thank you!

    • @financefundamentals
      @financefundamentals  Місяць тому

      This is exactly what I try to aim for - helping others to avoid the frustrations I ran into when I was learning this stuff myself! Really appreciate you letting me know it helped.

  • @ehiidoko6934
    @ehiidoko6934 6 місяців тому

    Thanks for this! it was super helpful

    • @financefundamentals
      @financefundamentals  6 місяців тому

      Awesome! Happy I could help! Good luck with your Stata/PCA journey!

  • @khadimhussainmalik3284
    @khadimhussainmalik3284 7 місяців тому

    Dear Sir, I extend my gratitude for the insightful lecture you provided. In my research, I have identified two variables with noteworthy cross-loading factors. The dilemma arises as to which variable should be prioritized for removal, considering their significant cross-loading with Factor 1 and Factor 2.
    tour4 | 0.7039 -0.5249
    ser | 0.7423 0.5641

    • @financefundamentals
      @financefundamentals  7 місяців тому

      Thank you for your comment/question! As I mentioned in the video, I'm not a statistics expert. Just a generalist interested in sharing knowledge about using Stata for various analyses. So you need to consider my response below while bearing that in mind.
      Regarding your specific question about which variable to remove due to cross-loading, a common approach is to consider both the statistical and theoretical aspects. From a statistical perspective, you would most likely remove the variable with the lower communality. (Based on the limited numbers you provided, this might be 'tour4' - but you need to check that column of your results.)
      However, you should also think about the theoretical relevance of each variable to your research question. Consider which variable is more meaningful to retain, based on your study's objectives and underlying theory. Sometimes a variable with slightly lower communality may be more crucial to keep from a conceptual standpoint.
      Another option to consider is trying the analysis with each variable removed in turn, and comparing the results to see which solution makes more sense and aligns better with your research goals.

  • @NsrenaAly
    @NsrenaAly 2 місяці тому

    Thanks so much, I want to know,why you use rotate normalize not rotate. What i can do if i did normalize of data at beginning

    • @financefundamentals
      @financefundamentals  2 місяці тому

      Remember that I am replicating the Hair et al results, and they did not start their PCA by normalizing their data first. If you are doing your own project and your step 1 was normalizing your data, then I would imagine that adding the "normalize" parameter will have no effect. Why not try both commands, "rotate" and "rotate, normalize" to see what differences (if any) you get?

  • @mohammadtaufan9914
    @mohammadtaufan9914 Рік тому

    Hello, can I ask you one little question? Is there a way to create plot using the factors here 9:29? Thanks in advanced.

    • @financefundamentals
      @financefundamentals  Рік тому

      Remember that you would realistically be limited to a maximum of 3 factors if you wanted to visualise a plot. Here there are 4, which is why the source text used for this video does not try to show such a plot. 4-dimensional plots on a 2-D piece of paper are not strictly speaking impossible, but are unavoidably messy and hard to interpret.

    • @mohammadtaufan9914
      @mohammadtaufan9914 Рік тому

      First, I'd like to give you my gratitude for replying. Your answer makes sense as it provides little to no information making plot from these factors. What I had in mind was I tried to make time series graph in which there were plot lines of each factors (X axis is variable time and Y axis is the value of factors loadings). Perhaps there is a tutorial for making such graph? As always, thank you in advanced. @@financefundamentals

  • @atharalishah4951
    @atharalishah4951 Рік тому

    Hello sir can you please explain why x11 in the cross loading is eliminated although the value is not the same in both columns. in fact they are close to same, if this is the case then other factors are also close to each other why they are not dropped. Thanks.

    • @financefundamentals
      @financefundamentals  Рік тому +1

      [Time stamp: issue starts around 9.55] Take a careful look at all the loadings. Notice that for all variables, except for X11, there is one (and only one) factor that has a high loading. X11 is different. It does not have any loading that is as high as any of the others, with a maximum loading of only 0.6420. But that is not the main problem. Even worse, it has TWO loadings around 0.59 to 0.64. This is called a cross-loading. So X11 is dropped. A cross-loading is NOT defined as two loadings that are exactly the same. Instead you are looking for two or more high(ish) loadings on a single variable, which are greater than your chosen significance level.

  • @Mimi-nr6jx
    @Mimi-nr6jx Рік тому +1

    How do you use the loadings to create an index please?

    • @financefundamentals
      @financefundamentals  Рік тому

      There are a number of methods. I personally have used the approach in Anderson, TW and Rubin, H. 1956. Statistical inference in factor analysis. Proceedings of the Third Berkeley Symposium on Mathematical Statistics and Probability, 5:111-150.

    • @Mimi-nr6jx
      @Mimi-nr6jx Рік тому +1

      Thank you!