This is exactly what I try to aim for - helping others to avoid the frustrations I ran into when I was learning this stuff myself! Really appreciate you letting me know it helped.
Dear Sir, I extend my gratitude for the insightful lecture you provided. In my research, I have identified two variables with noteworthy cross-loading factors. The dilemma arises as to which variable should be prioritized for removal, considering their significant cross-loading with Factor 1 and Factor 2. tour4 | 0.7039 -0.5249 ser | 0.7423 0.5641
Thank you for your comment/question! As I mentioned in the video, I'm not a statistics expert. Just a generalist interested in sharing knowledge about using Stata for various analyses. So you need to consider my response below while bearing that in mind. Regarding your specific question about which variable to remove due to cross-loading, a common approach is to consider both the statistical and theoretical aspects. From a statistical perspective, you would most likely remove the variable with the lower communality. (Based on the limited numbers you provided, this might be 'tour4' - but you need to check that column of your results.) However, you should also think about the theoretical relevance of each variable to your research question. Consider which variable is more meaningful to retain, based on your study's objectives and underlying theory. Sometimes a variable with slightly lower communality may be more crucial to keep from a conceptual standpoint. Another option to consider is trying the analysis with each variable removed in turn, and comparing the results to see which solution makes more sense and aligns better with your research goals.
Remember that I am replicating the Hair et al results, and they did not start their PCA by normalizing their data first. If you are doing your own project and your step 1 was normalizing your data, then I would imagine that adding the "normalize" parameter will have no effect. Why not try both commands, "rotate" and "rotate, normalize" to see what differences (if any) you get?
Remember that you would realistically be limited to a maximum of 3 factors if you wanted to visualise a plot. Here there are 4, which is why the source text used for this video does not try to show such a plot. 4-dimensional plots on a 2-D piece of paper are not strictly speaking impossible, but are unavoidably messy and hard to interpret.
First, I'd like to give you my gratitude for replying. Your answer makes sense as it provides little to no information making plot from these factors. What I had in mind was I tried to make time series graph in which there were plot lines of each factors (X axis is variable time and Y axis is the value of factors loadings). Perhaps there is a tutorial for making such graph? As always, thank you in advanced. @@financefundamentals
Hello sir can you please explain why x11 in the cross loading is eliminated although the value is not the same in both columns. in fact they are close to same, if this is the case then other factors are also close to each other why they are not dropped. Thanks.
[Time stamp: issue starts around 9.55] Take a careful look at all the loadings. Notice that for all variables, except for X11, there is one (and only one) factor that has a high loading. X11 is different. It does not have any loading that is as high as any of the others, with a maximum loading of only 0.6420. But that is not the main problem. Even worse, it has TWO loadings around 0.59 to 0.64. This is called a cross-loading. So X11 is dropped. A cross-loading is NOT defined as two loadings that are exactly the same. Instead you are looking for two or more high(ish) loadings on a single variable, which are greater than your chosen significance level.
There are a number of methods. I personally have used the approach in Anderson, TW and Rubin, H. 1956. Statistical inference in factor analysis. Proceedings of the Third Berkeley Symposium on Mathematical Statistics and Probability, 5:111-150.
Very clear explanation. You've made it all seem easy. Thank you!
This is exactly what I try to aim for - helping others to avoid the frustrations I ran into when I was learning this stuff myself! Really appreciate you letting me know it helped.
Thanks for this! it was super helpful
Awesome! Happy I could help! Good luck with your Stata/PCA journey!
Dear Sir, I extend my gratitude for the insightful lecture you provided. In my research, I have identified two variables with noteworthy cross-loading factors. The dilemma arises as to which variable should be prioritized for removal, considering their significant cross-loading with Factor 1 and Factor 2.
tour4 | 0.7039 -0.5249
ser | 0.7423 0.5641
Thank you for your comment/question! As I mentioned in the video, I'm not a statistics expert. Just a generalist interested in sharing knowledge about using Stata for various analyses. So you need to consider my response below while bearing that in mind.
Regarding your specific question about which variable to remove due to cross-loading, a common approach is to consider both the statistical and theoretical aspects. From a statistical perspective, you would most likely remove the variable with the lower communality. (Based on the limited numbers you provided, this might be 'tour4' - but you need to check that column of your results.)
However, you should also think about the theoretical relevance of each variable to your research question. Consider which variable is more meaningful to retain, based on your study's objectives and underlying theory. Sometimes a variable with slightly lower communality may be more crucial to keep from a conceptual standpoint.
Another option to consider is trying the analysis with each variable removed in turn, and comparing the results to see which solution makes more sense and aligns better with your research goals.
Thanks so much, I want to know,why you use rotate normalize not rotate. What i can do if i did normalize of data at beginning
Remember that I am replicating the Hair et al results, and they did not start their PCA by normalizing their data first. If you are doing your own project and your step 1 was normalizing your data, then I would imagine that adding the "normalize" parameter will have no effect. Why not try both commands, "rotate" and "rotate, normalize" to see what differences (if any) you get?
Hello, can I ask you one little question? Is there a way to create plot using the factors here 9:29? Thanks in advanced.
Remember that you would realistically be limited to a maximum of 3 factors if you wanted to visualise a plot. Here there are 4, which is why the source text used for this video does not try to show such a plot. 4-dimensional plots on a 2-D piece of paper are not strictly speaking impossible, but are unavoidably messy and hard to interpret.
First, I'd like to give you my gratitude for replying. Your answer makes sense as it provides little to no information making plot from these factors. What I had in mind was I tried to make time series graph in which there were plot lines of each factors (X axis is variable time and Y axis is the value of factors loadings). Perhaps there is a tutorial for making such graph? As always, thank you in advanced. @@financefundamentals
Hello sir can you please explain why x11 in the cross loading is eliminated although the value is not the same in both columns. in fact they are close to same, if this is the case then other factors are also close to each other why they are not dropped. Thanks.
[Time stamp: issue starts around 9.55] Take a careful look at all the loadings. Notice that for all variables, except for X11, there is one (and only one) factor that has a high loading. X11 is different. It does not have any loading that is as high as any of the others, with a maximum loading of only 0.6420. But that is not the main problem. Even worse, it has TWO loadings around 0.59 to 0.64. This is called a cross-loading. So X11 is dropped. A cross-loading is NOT defined as two loadings that are exactly the same. Instead you are looking for two or more high(ish) loadings on a single variable, which are greater than your chosen significance level.
How do you use the loadings to create an index please?
There are a number of methods. I personally have used the approach in Anderson, TW and Rubin, H. 1956. Statistical inference in factor analysis. Proceedings of the Third Berkeley Symposium on Mathematical Statistics and Probability, 5:111-150.
Thank you!