Thank you so much! I am a grad student trying to work with survey data and this helped immensely! Your video saved me a solid three hours of time I would have spent making mistakes in Stata.
Thank you so very much. 🙏🏾 Helping a friend with masters and this has really helped me understand and am able to move forward after completing this section to help her to move on. Cheers mucho
Thank you for this very precise and succinct lecture. A quick question: I saw positive coefficients when the base was "No Diploma" and negative coefficients when "Graduate". Kindly interpret one of the coefficients in either regression results.
The reason for this is that "no diploma" is the lowest income group. Any departure from that group would be associated with an increase in income. For example, the estimate for the bachelor's degree tells us how much more income bachelor's degree holders have above those with no diploma. Graduate, on the other hand, is the highest income group (people with PhDs, MBAs, and so forth). Here, people with (only) bachelor's degrees have less income on average than the base group.
Hi. If i have values such as “grade 1, grade 2 grade 3, grade 10, grade 11 etc” under a variable named “Education” but I want all lower grades such as grade 1-grade 7 to be called “Primary School” and higher grades such as grade 8-grade 11 to be called “High School”- how do I code that in stata?
I came on here trying to find answers to the same question. I was able to group my grade levels but not in the order that I wanted. I have grade levels 7-11 and I want to group grades 7-9 as "lower school" and grades 10-11 as "upper school". The command I have is was able to group grades 7-8 as "lower school" and grades 9-11 as "upper school". I am trying to figure out how to group grade 9 in the lower school category. Anyways, Try this and see if it works. gen schoollevel = recode (education, 1,2) Label define schoollevels 1 "Primary School" 2 "High School" As I mentioned this command worked for me but not in the order I wanted. Hope it helps in some small way.
You can do this with logical operators, in this case the "pipe" (|), which means "or." For your example, gen primaryschool = Education == "grade 1" | Education == "grade 2" and so on. Keep adding pipes and statements for each grade.
I am running a regression using Stata with the dependent variable being R.O.A and the independent variable being green-house-gas emissions. I also have 4 control variables. I also want to control for each industry. For example, firms that operate in the industry sector will typically have higher GHG emissions than firms in the health care sector. Would this be the way to control for each industry? If not is there a way to do so? Thanks
You can force Stata to omit the constant using the noconstant option. This will remove the collinearity and allow all categories to be in the regression. However, be careful about the interpretation and what statistical significance would mean in this new context. I personally wouldn't advise doing what I'm suggesting but that is the answer to your question.
Hi how can i apply this information if my categorical variable is already in numbers (0,1,2) and i need to regress it with a continuous variable? STATA doesn't know what the values correspond to but 0= conservative, 1=labour and 2=other
I only bought stata yesterday and first time using it for my Msc dissertation. This was so helpful. When using the label define command, I assume you have to type in the variable names exactly as was shown when tab. What if the variable name has spaces in between, as in instead of NODIPLOMA, it was NO DIPLOMA? The variable in my data had a space in between and I got a syntax error when I tried to use the command label define on it.
Found the answer to my own question by playing around with stata. When entering the name for instance NO DIPLOMA, make sure you enter it as 1 "NO DIPLOMA"
@@mussahemed1153 Yes, you got it. The reason you had this problem is that spaces are used as delimiters in Stata commands, similar to how commas are generally used in Excel functions. So, any time you have text with a space you need quotation marks.
Hi I have an issue. I am trying to convert my string variables (in a group called stage) to numeric variables. I used the encode option and created a new variable called stage_cat. But when I tabulate stage_cat I get no observations. The list of stage_cat also seems to be empty. It looks like the encoding option didn't work. How can I fix this?
You can put the numeric variable directly in the regression. If you want to use a numeric variable as a categorical variable, you can still do that. You skip the step of encoding, and just use the "i." structure.
Hi, I need your help, Like i have only two educational categories, 1=PhD, 2nd Mphil then how can I estimate the return of two education levels. I have no the third category for the base then how can I find the returns of these two? My problem is different like I want o estimate the private return of education for these two-level, like how much impact on earning after one-year education increase
Short answer is that you can't. You cannot estimate a treatment effect if you do not have any observations without the treatment. The best you can do is estimate the difference between the PhD and the MPhil.
Hi, at the end of the video you say "we have the same exact numbers" when you compare the last two regressions you run. But the coefficients changed. So, not only from positive to negative. I tried with other data and the number of my coefficients also change when I change the category of reference but most importantly the significance changes too. So, I have a variable with 4 categories, when I chose category 1 as reference I get somme significant (p-value) results. But, when I chose category 3 as reference I have no significant results. Could you help me to understand why? And should we use then, the reference that give us the most interesting results?
The coefficients change because the base group changed. The coefficients always give you the difference from the base group. Any predictions you make will be mathematically identical. The significance would change because you're doing a different test when you change the base group - it's a comparison between different groups now.
Can I use a dummy variable as my dependent variable..with the i. command.I tried it, but I am getting an error message saying this "depvar may not be a factor variable" so what can I do
You can use a dummy variable (ie. the values must all be zeros or ones) as the dependent variable. See my video on this: ua-cam.com/video/vRKesKWMCsg/v-deo.html There are special regression tools, such as the multinomial logit, that allow for more complex categoricals as the dependent variable, but I haven't covered those in videos.
Thanks so much for this clear explanation. I am now doing PhD and your videos helps me a lot for my analysis. However, can I use "i. " in multiple regression? Is there any differences in STATA 14 and 13 for this creating dummy variable command "i. " in front of the variables we are going to use? Please help me.
You can use the "i." structure in any regression. It definitely works the same in Stata 14 and 15. I haven't used 13 in a very long time, so I can't remember for sure.
@@sebastianwaiecon Thanks so much for the prompt reply. I tested in logistic regression in Stata 14, it works but that command from DO file (Stata 14) did not work in Stata 13. I tested it in other friend's computer as I have only stata 13. When I commend all DO files from Stat 14 to 13, this "i" structure did not work!! Again, is this "i." structure the same in logistics regression, also no difference in "DO files" either 13 or 14 or 15?? Appreciated on this online free teaching!
It shouldn't make any difference using a logit. I guess this was a new feature in Stata 14. The oldest reference in my own do files I can find to this was from late 2015, after Stata 14 came out. From version to version, usually not a whole lot changes, but I'd recommend upgrading to Stata 15 at this point.
Hi thank you for the video very helfpful! I was wondering if now that you have the "educcodes" variable you can drop "educcategory"? Or should you keep it?
There is no particular reason you would drop it, but it's not needed for the "educcodes" variable to function properly once it's been created. You might want to keep in case you want to do something else with it later, though.
Hi I have a string of numbers but it's red. So I use this command to encode but all my numbers have been changed to others. I dont know why, pls help me 😭 🙏🙏🙏
The command you probably need is "destring." Most likely what happened is that you have at least one value that is not a number, which is why Stata read it as a string. You first need to make sure all the values are valid numbers, then use destring to generate a numerical version of your variable.
Just wanna turn the var after encoding into real numbers, like 1 for Someone, 2 for Anotherone, ... without claiming them one by one cause there are just too much of them. No one, literally no one can tell me how to do this. I am wondering why we are still using STATA rather than R, which is much more direct
When you encode, the new variable is just numbers (with labels on top). You can see this by clicking on the values in the data browser. Just make sure to set the order to how you want it. From there, just put the encoded variable into the regression without the "i." structure. You can see me do this in the video at 5:40.
Thank you so much! I am a grad student trying to work with survey data and this helped immensely! Your video saved me a solid three hours of time I would have spent making mistakes in Stata.
Glad to hear you found it useful.
@@sebastianwaiecon can u help me in gaining training stata plz
Thank you so very much. 🙏🏾 Helping a friend with masters and this has really helped me understand and am able to move forward after completing this section to help her to move on. Cheers mucho
Sebastian sir, thank you so much for this! you went straight to the point with no time wasted
This was so helpful! I spent hours looking for this on the internet. Thank you so much!!!
I think you just saved my life😢
Thank you, Sebastian, this is so helpful and you made it easy
Muchas gracias. Saludos desde Colombia.
Thank you for this very precise and succinct lecture. A quick question: I saw positive coefficients when the base was "No Diploma" and negative coefficients when "Graduate". Kindly interpret one of the coefficients in either regression results.
The reason for this is that "no diploma" is the lowest income group. Any departure from that group would be associated with an increase in income. For example, the estimate for the bachelor's degree tells us how much more income bachelor's degree holders have above those with no diploma.
Graduate, on the other hand, is the highest income group (people with PhDs, MBAs, and so forth). Here, people with (only) bachelor's degrees have less income on average than the base group.
Hi. If i have values such as “grade 1, grade 2 grade 3, grade 10, grade 11 etc” under a variable named “Education” but I want all lower grades such as grade 1-grade 7 to be called “Primary School” and higher grades such as grade 8-grade 11 to be called “High School”- how do I code that in stata?
I came on here trying to find answers to the same question. I was able to group my grade levels but not in the order that I wanted. I have grade levels 7-11 and I want to group grades 7-9 as "lower school" and grades 10-11 as "upper school". The command I have is was able to group grades 7-8 as "lower school" and grades 9-11 as "upper school". I am trying to figure out how to group grade 9 in the lower school category.
Anyways,
Try this and see if it works.
gen schoollevel = recode (education, 1,2)
Label define schoollevels 1 "Primary School" 2 "High School"
As I mentioned this command worked for me but not in the order I wanted. Hope it helps in some small way.
Try this video ua-cam.com/video/XWVaXN2KwmA/v-deo.html
You can do this with logical operators, in this case the "pipe" (|), which means "or." For your example, gen primaryschool = Education == "grade 1" | Education == "grade 2" and so on. Keep adding pipes and statements for each grade.
THANK YOU🙏🏼 This video has been extremely helpful
how about dummy varible
I have code nominal variable for exp. status how is it ?
I am running a regression using Stata with the dependent variable being R.O.A and the independent variable being green-house-gas emissions. I also have 4 control variables. I also want to control for each industry. For example, firms that operate in the industry sector will typically have higher GHG emissions than firms in the health care sector. Would this be the way to control for each industry? If not is there a way to do so? Thanks
Sounds like dummy variables indicating each industry would be appropriate. You can do this using the methods outlined in this video.
How could I see if even NODIPLOMA is significant in regression? Is there any command which shows all the variables without any base reference?
You can force Stata to omit the constant using the noconstant option. This will remove the collinearity and allow all categories to be in the regression. However, be careful about the interpretation and what statistical significance would mean in this new context. I personally wouldn't advise doing what I'm suggesting but that is the answer to your question.
Hi how can i apply this information if my categorical variable is already in numbers (0,1,2) and i need to regress it with a continuous variable? STATA doesn't know what the values correspond to but 0= conservative, 1=labour and 2=other
I only bought stata yesterday and first time using it for my Msc dissertation. This was so helpful. When using the label define command, I assume you have to type in the variable names exactly as was shown when tab. What if the variable name has spaces in between, as in instead of NODIPLOMA, it was NO DIPLOMA? The variable in my data had a space in between and I got a syntax error when I tried to use the command label define on it.
Found the answer to my own question by playing around with stata. When entering the name for instance NO DIPLOMA, make sure you enter it as 1 "NO DIPLOMA"
@@mussahemed1153 Yes, you got it. The reason you had this problem is that spaces are used as delimiters in Stata commands, similar to how commas are generally used in Excel functions. So, any time you have text with a space you need quotation marks.
Excellent! Thank you for saving me tons of time :)
You are the best man.
hi, is it true that the i. command for categorical variables does not work when using the Oaxaca regression command?
I haven't used that, so I don't know.
whats the value of x if you could calculate estimated value of wage for each category.
Hi I have an issue. I am trying to convert my string variables (in a group called stage) to numeric variables. I used the encode option and created a new variable called stage_cat. But when I tabulate stage_cat I get no observations. The list of stage_cat also seems to be empty. It looks like the encoding option didn't work. How can I fix this?
I'm not sure about recode, but did you try using encode like I showed in the video?
@SebastianWaiEcon I am a student of MRes course and I need to understand stata and it's working. I feel it is difficult. Can u train
Thanks this is really helpful!
But what should I do if the original variable is numeric??
You can put the numeric variable directly in the regression. If you want to use a numeric variable as a categorical variable, you can still do that. You skip the step of encoding, and just use the "i." structure.
SebastianWaiEcon ok, thank you!
How to check colleration for categorical variables
How would I adjust the education variable to comprise of fewer categories before doing a moderation analysis?
If you want just two categories, then you could just generate a dummy variable indicating the subcategories you wanted.
Hi, I need your help, Like i have only two educational categories, 1=PhD, 2nd Mphil then how can I estimate the return of two education levels. I have no the third category for the base then how can I find the returns of these two? My problem is different like I want o estimate the private return of education for these two-level, like how much impact on earning after one-year education increase
Short answer is that you can't. You cannot estimate a treatment effect if you do not have any observations without the treatment. The best you can do is estimate the difference between the PhD and the MPhil.
@@sebastianwaiecon I have cross sectional data of 800 respondents.
Hi, at the end of the video you say "we have the same exact numbers" when you compare the last two regressions you run. But the coefficients changed. So, not only from positive to negative.
I tried with other data and the number of my coefficients also change when I change the category of reference but most importantly the significance changes too. So, I have a variable with 4 categories, when I chose category 1 as reference I get somme significant (p-value) results. But, when I chose category 3 as reference I have no significant results. Could you help me to understand why? And should we use then, the reference that give us the most interesting results?
The coefficients change because the base group changed. The coefficients always give you the difference from the base group. Any predictions you make will be mathematically identical. The significance would change because you're doing a different test when you change the base group - it's a comparison between different groups now.
Can I use a dummy variable as my dependent variable..with the i. command.I tried it, but I am getting an error message saying this "depvar may not be a factor variable" so what can I do
You can use a dummy variable (ie. the values must all be zeros or ones) as the dependent variable. See my video on this: ua-cam.com/video/vRKesKWMCsg/v-deo.html
There are special regression tools, such as the multinomial logit, that allow for more complex categoricals as the dependent variable, but I haven't covered those in videos.
thanks Sebastian...
what about the post estimation? is it the post estimation for continuous and categorical variables the same?
Any postestimation commands can still be used, since this is all contained within the regress command.
Thanks so much for this clear explanation. I am now doing PhD and your videos helps me a lot for my analysis.
However, can I use "i. " in multiple regression? Is there any differences in STATA 14 and 13 for this creating dummy variable command "i. " in front of the variables we are going to use? Please help me.
You can use the "i." structure in any regression. It definitely works the same in Stata 14 and 15. I haven't used 13 in a very long time, so I can't remember for sure.
@@sebastianwaiecon Thanks so much for the prompt reply. I tested in logistic regression in Stata 14, it works but that command from DO file (Stata 14) did not work in Stata 13.
I tested it in other friend's computer as I have only stata 13. When I commend all DO files from Stat 14 to 13, this "i" structure did not work!!
Again, is this "i." structure the same in logistics regression, also no difference in "DO files" either 13 or 14 or 15??
Appreciated on this online free teaching!
It shouldn't make any difference using a logit. I guess this was a new feature in Stata 14. The oldest reference in my own do files I can find to this was from late 2015, after Stata 14 came out. From version to version, usually not a whole lot changes, but I'd recommend upgrading to Stata 15 at this point.
@@sebastianwaiecon Thanks so much. I think the point is old version Stata needs to be upgraded.
Thanks you so much sir.if the dependent and independent variable are categorical how I run it on stata
See my video on binary choice models for basic ways to handle categorical dependent variables.
Hi thank you for the video very helfpful! I was wondering if now that you have the "educcodes" variable you can drop "educcategory"? Or should you keep it?
There is no particular reason you would drop it, but it's not needed for the "educcodes" variable to function properly once it's been created. You might want to keep in case you want to do something else with it later, though.
@@sebastianwaiecon thank you so much!
I entered the command 'label define .......' but the result is invalid syntax. I just replicated your steps. Why does this happen?
You probably have a typo somewhere. Did you forget to put all the numbers in?
@@sebastianwaiecon I'm not sure where I did wrong😭 Could you have a look at it: sm.ms/image/hjFQGbX1kpwaz9N
I'm not sure, but my guess is that it didn't like the slash in Others/Uncertainty.
Can you please make video about how we can get the frequency of a category as a new variable.
You can do that with egen:
egen frequency = count(category), by(category)
What if I want the education but only females, or only males?
You can add an "if" statement to most commands in Stata if you want to limit your analysis to a certain group.
Pls I need help with my data analysis especially creating a composite variable
what specifically are you looking for?
thank you...this video is quite helpfull
Thank you so much Sir.! Really helped me..!
Great video. Thanks a lot
Hi I have a string of numbers but it's red. So I use this command to encode but all my numbers have been changed to others. I dont know why, pls help me 😭 🙏🙏🙏
The command you probably need is "destring." Most likely what happened is that you have at least one value that is not a number, which is why Stata read it as a string. You first need to make sure all the values are valid numbers, then use destring to generate a numerical version of your variable.
@@sebastianwaiecon thank u, I can fix it now
Are the coefficients here your R^2
In the upper-right area of the Stata regression output, you can see where it says "R-squared."
Just wanna turn the var after encoding into real numbers, like 1 for Someone, 2 for Anotherone, ... without claiming them one by one cause there are just too much of them. No one, literally no one can tell me how to do this. I am wondering why we are still using STATA rather than R, which is much more direct
When you encode, the new variable is just numbers (with labels on top). You can see this by clicking on the values in the data browser. Just make sure to set the order to how you want it. From there, just put the encoded variable into the regression without the "i." structure. You can see me do this in the video at 5:40.
@@sebastianwaiecon Thanks mate:) Just found out that egen group() can also do this.
Thanks a lot
Thank you!
It,s concise.
What is the intrepretation?
You'll have to be more specific.
encode the variable you want to change, gen (new name)
legend!
Is there anyone who help me on Stata exam?
10q very much