Hi Alan, thanks for the insightful video. I have a question about creating an index if my variables are measured differently. For example, if I want to create an index of global cognition and my variables are: orientation_test (score 1 to 4), fluency_test (score 1 to 100) and recall_test (score 1 to 10) how can I create a global index with those? Can you write down an example with Stata commands (if it is possible to create such an index)? Thank you very much Chiara
I was familiar with additive indices in SPSS, but your video was very useful at seeing how to do it in Stata. That #delimit ; and #delimit cr seems extremely handy at keeping code clean! Thanks!
Hi Alan, Thanks for your useful video. I have a question: how do I create an additive index if the variables I am using have different categories. For example: Index of participation in an activity. Variables: Years of participation 0 to 45yrs Frequency of participation: 1 (most of often) to 5 (never). Thank you very much.
Thanks for the video its really helpful! I have five yes(1), no(0), and don't know(-1) questions that I'm using to create an indices. My question is, what do I do with the "dont know" category coded as -1?
Jose C, there is no reply button on your comment. What I would do is code the -1 responses, the people who responded "don't know" as missing and drop them from the index since they don't fit anywhere with "yes" and "no" responses.
Very interesting! What if items are measured differently? I have 5 items with scores ranging from either 0 to 4 or 0 to 3. Also, I have some missing data. Would it be methodologically acceptable to keep missing rows as long as each row has complete data for a certain number of items? Is there a way to selectively keep rows with, for instance, at least 4 complete items out of 5? Thank you your videos are great!
Hi Andrea, I think that creating an index using variables with a different number of outcome categories is possible if you standardize the measures first by making each part of the index a percentage. For example, if you have nine variables coded 0 and 1, responses of 1 on 8 of the 9 variables produces scores of 89. But, people wh only responded to 7 of the nine variables and had responses of 1 on 7 of the 8 responses would have scores of 88. Regarding creating an index where you have some missing values on some variables, I use Stata as my preferred programming language and here is one way the problem can be addressed: egen nomiss=rownonmiss(conbus conclerg coneduc conlabor conpress conmedic contv consci conlegis) tab nomiss gen index=. replace index=(conbus+conclerg+coneduc+conlabor+conpress+conmedic+contv+consci+conlegis)/8 if nomiss==8 replace index=(conbus+conclerg+coneduc+conlabor+conpress+conmedic+contv+consci+conlegis)/9 if nomiss==9 drop nomiss In short, I create a variable that shows the distribution of non-missing responses. In my example, people who responded to all 9 questions have scores of 9, those who answered 8 of the 9 survey questions have scores of 8, and so forth. The above code calculates a percentage type of index only for people who responded to at least 8 (8 or 9) of the 9 questions. Best, Alan
hi, I have a question.. is it methodologically acceptable (and has it been done in some cases) to create an index defined by the value of a single variable and evaluate other variables as "additional" by adding their pre-defined value to the value of the original index? example: the original index has a range 0-1, after analyzing the value of the variable, I would look at the presence of other variables that would have a pre-defined value of (lets say) 0,05...the reason behind this is that while one of my variables concerns an action that will take place one way or another, other variables relate to actions that may or may not happen - but their non-occurence (contrary to their occurence) does not necessarily affects the phenomenon that the index is measuring... hope that it was understandable and thank you very much for any advice :)
I must admit, I am uncertain of exactly what you are proposing but I have two thoughts. If the "additional" variables are coded 0 and 1, then simply adding them does pretty much what you want. If it is one, it increments your variable by 1, if not, it does nothing. If your additional variables are discrete but with more than two response categories, you might be able to dichotomize them variables and create an additive index as suggested above. My second thought is that it is different to make decisions up on the fly (this counts, this doesn't count) than developing thought out and defensible rules (like I suggested above) that can be applied without case-by-case subjectivity. Best, Alan
Alan, thank you very much for your answer! Since I´m still not sure how to do it and any advice would be of an enormous help, I´ll try to describe my case. I´ve seen several of your videos and since you´re working in a related field, I´m sure you´ll understand it :): I am trying to create an index (of level) of conflict between the mayor and the council - I identified four different forms of conflictual relationships (mayor-entire council, mayor-majority, mayor-minority, mayor-one or few councillors) and I want to come up with the index for each of them (so, in the end, I will have four indices for every municipality in the sample)... and now to my problem - the most important variable in the index is the % of the votes where the particular form of conflict occured... however, there are three additional variables, that help draw the picture of conflict - the use of the mayoral veto, the modification of the mayor´s salary by the council and the modification of the mayor´s powers by the council... I understand that the typical procedure would be to weigh them into one index of conflict but since the occurence of these additional three variables happens sporadically (but when it does, it can tell us a lot about the level of conflict) and they are not relevant for the latter two indices (forms of conflict), I don´t want to include them in the original indices, because they might distort their values that way... Hope you understand my problem better now and thank you very much for your time! Martin
Hello I have a question. I have three different variables that should be put together into an index. However, one of them is the page number/the number of pages in total. I am not sure how I can generate a index using your example as I do not have a limit of the range as the page number/ number of pages changes and I do not have a max. value... in the other two variables the max value is 2, the min. 0
I'm using stata, are there any different commands for principal component analysis PCA in PANEL DATA Or Just simply run PCA after standardizing variables?
You might want to check to see if the data are stationary. If they are not, you might consider using a dynamic factor model with deJong's diffuse Kalman filter to provide for the transition model in the state space form. You can also find a discussion of the issue (with some debate) at stats.stackexchange.com/questions/153873/principal-component-analysis-on-time-series-data-and-panel-data
Can I generate additive indexes with this formula instead of what you tried? for example, gen combine = (var1 +var2 +var3)/3 would it be getting the same results of yours?
Iseaul. no, the indices will be different. The index I created is a simple additive index. Your index calculates an average index. Often these two kinds of indices will be highly (even perfectly) correlated. The choice is your about which one is most appropriate for your purposes. The other possible difference between your index and the one detailed in the video is that I showed how to reverse order the outcome values. This is not always necessary. Best, Alan
Hi. The video was pretty helpful but I'm encountering a problem due to the data I have. Like in your video, I also do have three responses "great deal of confidence", "only some confidence" and "hardly any confidence". But, additionally, there is a response which says "dont know" which has a value of -3. How do I correct for that? Thanks in advance.
Madhura Bhrushundi Usually, these observations are coded as missing on that variable and therefore excluded from any analysis. For example, given a variable called "confidence" with -3's in Stata you could: replace confidence=. if confidence=-3 If you want to retain your original variable you could do this: generate confidencenew=confidence replace confidence=. if confidence=-3 Make sense? Best, Alan
Almost 8 years later and this video's still helping people like me do their job. Thank you!
Hi Alan, thanks for the insightful video. I have a question about creating an index if my variables are measured differently. For example, if I want to create an index of global cognition and my variables are: orientation_test (score 1 to 4), fluency_test (score 1 to 100) and recall_test (score 1 to 10) how can I create a global index with those? Can you write down an example with Stata commands (if it is possible to create such an index)?
Thank you very much
Chiara
I was familiar with additive indices in SPSS, but your video was very useful at seeing how to do it in Stata. That #delimit ; and #delimit cr seems extremely handy at keeping code clean! Thanks!
I'm glad you found the video useful.
Sincerely,
Alan
Really useful and clear video. Thank you.
Thank you!. I'm glad you found this video useful.
Very useful video, thank you!
Hi Alan,
Thanks for your useful video. I have a question: how do I create an additive index if the variables I am using have different categories.
For example:
Index of participation in an activity.
Variables: Years of participation 0 to 45yrs
Frequency of participation: 1 (most of often) to 5 (never).
Thank you very much.
Thanks for the video its really helpful! I have five yes(1), no(0), and don't know(-1) questions that I'm using to create an indices. My question is, what do I do with the "dont know" category coded as -1?
hi Alan, how do you create additive index for two responses "yes" and "no".
Jose C, there is no reply button on your comment. What I would do is code the -1 responses, the people who responded "don't know" as missing and drop them from the index since they don't fit anywhere with "yes" and "no" responses.
Very interesting! What if items are measured differently? I have 5 items with scores ranging from either 0 to 4 or 0 to 3. Also, I have some missing data. Would it be methodologically acceptable to keep missing rows as long as each row has complete data for a certain number of items? Is there a way to selectively keep rows with, for instance, at least 4 complete items out of 5? Thank you your videos are great!
Hi Andrea, I think that creating an index using variables with a different number of outcome categories is possible if you standardize the measures first by making each part of the index a percentage. For example, if you have nine variables coded 0 and 1, responses of 1 on 8 of the 9 variables produces scores of 89. But, people wh only responded to 7 of the nine variables and had responses of 1 on 7 of the 8 responses would have scores of 88.
Regarding creating an index where you have some missing values on some variables, I use Stata as my preferred programming language and here is one way the problem can be addressed:
egen nomiss=rownonmiss(conbus conclerg coneduc conlabor conpress conmedic contv consci conlegis)
tab nomiss
gen index=.
replace index=(conbus+conclerg+coneduc+conlabor+conpress+conmedic+contv+consci+conlegis)/8 if nomiss==8
replace index=(conbus+conclerg+coneduc+conlabor+conpress+conmedic+contv+consci+conlegis)/9 if nomiss==9
drop nomiss
In short, I create a variable that shows the distribution of non-missing responses. In my example, people who responded to all 9 questions have scores of 9, those who answered 8 of the 9 survey questions have scores of 8, and so forth. The above code calculates a percentage type of index only for people who responded to at least 8 (8 or 9) of the 9 questions.
Best,
Alan
@@smilex3 Thank you so much Alan! Very helpful
@@andreab2114 👍
Please, I need help with index creation for panel data. Please help me.
hi, I have a question.. is it methodologically acceptable (and has it been done in some cases) to create an index defined by the value of a single variable and evaluate other variables as "additional" by adding their pre-defined value to the value of the original index? example: the original index has a range 0-1, after analyzing the value of the variable, I would look at the presence of other variables that would have a pre-defined value of (lets say) 0,05...the reason behind this is that while one of my variables concerns an action that will take place one way or another, other variables relate to actions that may or may not happen - but their non-occurence (contrary to their occurence) does not necessarily affects the phenomenon that the index is measuring... hope that it was understandable and thank you very much for any advice :)
I must admit, I am uncertain of exactly what you are proposing but I have two thoughts. If the "additional" variables are coded 0 and 1, then simply adding them does pretty much what you want. If it is one, it increments your variable by 1, if not, it does nothing.
If your additional variables are discrete but with more than two response categories, you might be able to dichotomize them variables and create an additive index as suggested above.
My second thought is that it is different to make decisions up on the fly (this counts, this doesn't count) than developing thought out and defensible rules (like I suggested above) that can be applied without case-by-case subjectivity.
Best,
Alan
Alan,
thank you very much for your answer!
Since I´m still not sure how to do it and any advice would be of an enormous help, I´ll try to describe my case. I´ve seen several of your videos and since you´re working in a related field, I´m sure you´ll understand it :):
I am trying to create an index (of level) of conflict between the mayor and the council - I identified four different forms of conflictual relationships (mayor-entire council, mayor-majority, mayor-minority, mayor-one or few councillors) and I want to come up with the index for each of them (so, in the end, I will have four indices for every municipality in the sample)... and now to my problem - the most important variable in the index is the % of the votes where the particular form of conflict occured... however, there are three additional variables, that help draw the picture of conflict - the use of the mayoral veto, the modification of the mayor´s salary by the council and the modification of the mayor´s powers by the council... I understand that the typical procedure would be to weigh them into one index of conflict but since the occurence of these additional three variables happens sporadically (but when it does, it can tell us a lot about the level of conflict) and they are not relevant for the latter two indices (forms of conflict), I don´t want to include them in the original indices, because they might distort their values that way... Hope you understand my problem better now and thank you very much for your time!
Martin
Hello I have a question. I have three different variables that should be put together into an index. However, one of them is the page number/the number of pages in total. I am not sure how I can generate a index using your example as I do not have a limit of the range as the page number/ number of pages changes and I do not have a max. value... in the other two variables the max value is 2, the min. 0
Depending on the amount of variance, you could consider converting all scores to z-scores and then adding or averaging them.
I'm using stata, are there any different commands for principal component analysis PCA in PANEL DATA Or Just simply run PCA after standardizing variables?
You might want to check to see if the data are stationary. If they are not, you might consider using a dynamic factor model with deJong's diffuse Kalman filter to provide for the transition model in the state space form.
You can also find a discussion of the issue (with some debate) at stats.stackexchange.com/questions/153873/principal-component-analysis-on-time-series-data-and-panel-data
Can I generate additive indexes with this formula instead of what you tried? for example, gen combine = (var1 +var2 +var3)/3 would it be getting the same results of yours?
Iseaul. no, the indices will be different. The index I created is a simple additive index. Your index calculates an average index. Often these two kinds of indices will be highly (even perfectly) correlated. The choice is your about which one is most appropriate for your purposes. The other possible difference between your index and the one detailed in the video is that I showed how to reverse order the outcome values. This is not always necessary.
Best,
Alan
Hi. The video was pretty helpful but I'm encountering a problem due to the data I have.
Like in your video, I also do have three responses "great deal of confidence", "only some confidence" and "hardly any confidence". But, additionally, there is a response which says "dont know" which has a value of -3. How do I correct for that?
Thanks in advance.
Madhura Bhrushundi Usually, these observations are coded as missing on that variable and therefore excluded from any analysis.
For example, given a variable called "confidence" with -3's in Stata you could:
replace confidence=. if confidence=-3
If you want to retain your original variable you could do this:
generate confidencenew=confidence
replace confidence=. if confidence=-3
Make sense?
Best,
Alan
Alan Neustadtl Yes, Its sorted. Thank you so much! Your video has been a great help!!
I am glad you found it useful.
Hi can you do a index where the variables are weightet, say the variabels are not of equal importans. One may be 50%, the next 30% and the last 20%?
+Aud Djurhuus How about some kind of weighted average. The following may work:
generate scale=((v1*.5)+(v2*.3)+(v3*.2))/.9
thanks I will try it