Categorical Variables in Stata

SebastianWaiEcon

Додати в
- Мій плейлист
- Переглянути пізніше
Поділитися

Поділитися

Вставка

Розмір відео:

Показувати елементи керування програвачем

Автоматичне відтворення

Автоповтор

Опубліковано 6 січ 2025

КОМЕНТАРІ • 89

@bevbeautifulhealing Рік тому
Thank you so very much. 🙏🏾 Helping a friend with masters and this has really helped me understand and am able to move forward after completing this section to help her to move on. Cheers mucho
@wkatieivey 5 років тому ⁺⁴
Thank you so much! I am a grad student trying to work with survey data and this helped immensely! Your video saved me a solid three hours of time I would have spent making mistakes in Stata.
@sebastianwaiecon 5 років тому
Glad to hear you found it useful.
@asmasultana2732 4 роки тому
@@sebastianwaiecon can u help me in gaining training stata plz
@DavidMwangi 3 роки тому ⁺¹
Sebastian sir, thank you so much for this! you went straight to the point with no time wasted
@anushkanegi5220 3 роки тому
This was so helpful! I spent hours looking for this on the internet. Thank you so much!!!
@LouisaOsei-Bonsu 4 місяці тому
Thank you, Sebastian, this is so helpful and you made it easy
@GM__user 5 місяців тому
THANK YOU🙏🏼 This video has been extremely helpful
@anastaciamatlebjane1040 4 роки тому ⁺²
I think you just saved my life😢
@anlanhnguyen-ly9vi 8 місяців тому
Excellent! Thank you for saving me tons of time :)
@danilorodriguez1638 5 років тому ⁺¹
Muchas gracias. Saludos desde Colombia.
@HamidAbbas-m6d 3 роки тому
how about dummy varible
I have code nominal variable for exp. status how is it ?
@shannonbarnes1888 3 роки тому
Hi how can i apply this information if my categorical variable is already in numbers (0,1,2) and i need to regress it with a continuous variable? STATA doesn't know what the values correspond to but 0= conservative, 1=labour and 2=other
@ViralVidz21 Рік тому
You are the best man.
@koketsomokoditoa3255 3 роки тому ⁺¹
Hi. If i have values such as “grade 1, grade 2 grade 3, grade 10, grade 11 etc” under a variable named “Education” but I want all lower grades such as grade 1-grade 7 to be called “Primary School” and higher grades such as grade 8-grade 11 to be called “High School”- how do I code that in stata?
@keri-annfacey6794 3 роки тому
I came on here trying to find answers to the same question. I was able to group my grade levels but not in the order that I wanted. I have grade levels 7-11 and I want to group grades 7-9 as "lower school" and grades 10-11 as "upper school". The command I have is was able to group grades 7-8 as "lower school" and grades 9-11 as "upper school". I am trying to figure out how to group grade 9 in the lower school category.
Anyways,
Try this and see if it works.
gen schoollevel = recode (education, 1,2)
Label define schoollevels 1 "Primary School" 2 "High School"
As I mentioned this command worked for me but not in the order I wanted. Hope it helps in some small way.
@keri-annfacey6794 3 роки тому
Try this video ua-cam.com/video/XWVaXN2KwmA/v-deo.html
@sebastianwaiecon 3 роки тому
You can do this with logical operators, in this case the "pipe" (|), which means "or." For your example, gen primaryschool = Education == "grade 1" | Education == "grade 2" and so on. Keep adding pipes and statements for each grade.
@theReal_Mimi 4 роки тому
whats the value of x if you could calculate estimated value of wage for each category.
@simonetaddeo1935 3 роки тому
How could I see if even NODIPLOMA is significant in regression? Is there any command which shows all the variables without any base reference?
@sebastianwaiecon 3 роки тому
You can force Stata to omit the constant using the noconstant option. This will remove the collinearity and allow all categories to be in the regression. However, be careful about the interpretation and what statistical significance would mean in this new context. I personally wouldn't advise doing what I'm suggesting but that is the answer to your question.
@wharrison2010 6 років тому
Thank you for this very precise and succinct lecture. A quick question: I saw positive coefficients when the base was "No Diploma" and negative coefficients when "Graduate". Kindly interpret one of the coefficients in either regression results.
@sebastianwaiecon 6 років тому ⁺²
The reason for this is that "no diploma" is the lowest income group. Any departure from that group would be associated with an increase in income. For example, the estimate for the bachelor's degree tells us how much more income bachelor's degree holders have above those with no diploma.
Graduate, on the other hand, is the highest income group (people with PhDs, MBAs, and so forth). Here, people with (only) bachelor's degrees have less income on average than the base group.
@inestnewdocile1646 7 місяців тому
How to check colleration for categorical variables
@mussahemed1153 3 роки тому
I only bought stata yesterday and first time using it for my Msc dissertation. This was so helpful. When using the label define command, I assume you have to type in the variable names exactly as was shown when tab. What if the variable name has spaces in between, as in instead of NODIPLOMA, it was NO DIPLOMA? The variable in my data had a space in between and I got a syntax error when I tried to use the command label define on it.
@mussahemed1153 3 роки тому
Found the answer to my own question by playing around with stata. When entering the name for instance NO DIPLOMA, make sure you enter it as 1 "NO DIPLOMA"
@sebastianwaiecon 3 роки тому
@@mussahemed1153 Yes, you got it. The reason you had this problem is that spaces are used as delimiters in Stata commands, similar to how commas are generally used in Excel functions. So, any time you have text with a space you need quotation marks.
@redface4444 4 роки тому
I am running a regression using Stata with the dependent variable being R.O.A and the independent variable being green-house-gas emissions. I also have 4 control variables. I also want to control for each industry. For example, firms that operate in the industry sector will typically have higher GHG emissions than firms in the health care sector. Would this be the way to control for each industry? If not is there a way to do so? Thanks
@sebastianwaiecon 4 роки тому ⁺¹
Sounds like dummy variables indicating each industry would be appropriate. You can do this using the methods outlined in this video.
@asmasultana2732 4 роки тому
@SebastianWaiEcon I am a student of MRes course and I need to understand stata and it's working. I feel it is difficult. Can u train
@keeks4914 3 роки тому
Hi I have an issue. I am trying to convert my string variables (in a group called stage) to numeric variables. I used the encode option and created a new variable called stage_cat. But when I tabulate stage_cat I get no observations. The list of stage_cat also seems to be empty. It looks like the encoding option didn't work. How can I fix this?
@sebastianwaiecon 3 роки тому
I'm not sure about recode, but did you try using encode like I showed in the video?
@markvanderlinde30 4 роки тому
hi, is it true that the i. command for categorical variables does not work when using the Oaxaca regression command?
@sebastianwaiecon 4 роки тому
I haven't used that, so I don't know.
@n10f98 5 років тому
How would I adjust the education variable to comprise of fewer categories before doing a moderation analysis?
@sebastianwaiecon 5 років тому
If you want just two categories, then you could just generate a dummy variable indicating the subcategories you wanted.
@ceciliadelvi2724 2 роки тому
Hi, at the end of the video you say "we have the same exact numbers" when you compare the last two regressions you run. But the coefficients changed. So, not only from positive to negative.
I tried with other data and the number of my coefficients also change when I change the category of reference but most importantly the significance changes too. So, I have a variable with 4 categories, when I chose category 1 as reference I get somme significant (p-value) results. But, when I chose category 3 as reference I have no significant results. Could you help me to understand why? And should we use then, the reference that give us the most interesting results?
@sebastianwaiecon 2 роки тому
The coefficients change because the base group changed. The coefficients always give you the difference from the base group. Any predictions you make will be mathematically identical. The significance would change because you're doing a different test when you change the base group - it's a comparison between different groups now.
@emilbinny 5 років тому
Can I use a dummy variable as my dependent variable..with the i. command.I tried it, but I am getting an error message saying this "depvar may not be a factor variable" so what can I do
@sebastianwaiecon 5 років тому
You can use a dummy variable (ie. the values must all be zeros or ones) as the dependent variable. See my video on this: ua-cam.com/video/vRKesKWMCsg/v-deo.html
There are special regression tools, such as the multinomial logit, that allow for more complex categoricals as the dependent variable, but I haven't covered those in videos.
@emilbinny 5 років тому
thanks Sebastian...
@201120sebastian 6 років тому
Hi thank you for the video very helfpful! I was wondering if now that you have the "educcodes" variable you can drop "educcategory"? Or should you keep it?
@sebastianwaiecon 6 років тому ⁺¹
There is no particular reason you would drop it, but it's not needed for the "educcodes" variable to function properly once it's been created. You might want to keep in case you want to do something else with it later, though.
@201120sebastian 6 років тому
@@sebastianwaiecon thank you so much!
@emotionalstories8152 3 роки тому
Hi, I need your help, Like i have only two educational categories, 1=PhD, 2nd Mphil then how can I estimate the return of two education levels. I have no the third category for the base then how can I find the returns of these two? My problem is different like I want o estimate the private return of education for these two-level, like how much impact on earning after one-year education increase
@sebastianwaiecon 3 роки тому
Short answer is that you can't. You cannot estimate a treatment effect if you do not have any observations without the treatment. The best you can do is estimate the difference between the PhD and the MPhil.
@emotionalstories8152 3 роки тому
@@sebastianwaiecon I have cross sectional data of 800 respondents.
@thidachawhlaing3494 5 років тому
Thanks so much for this clear explanation. I am now doing PhD and your videos helps me a lot for my analysis.
However, can I use "i. " in multiple regression? Is there any differences in STATA 14 and 13 for this creating dummy variable command "i. " in front of the variables we are going to use? Please help me.
@sebastianwaiecon 5 років тому ⁺²
You can use the "i." structure in any regression. It definitely works the same in Stata 14 and 15. I haven't used 13 in a very long time, so I can't remember for sure.
@thidachawhlaing3494 5 років тому
@@sebastianwaiecon Thanks so much for the prompt reply. I tested in logistic regression in Stata 14, it works but that command from DO file (Stata 14) did not work in Stata 13.
I tested it in other friend's computer as I have only stata 13. When I commend all DO files from Stat 14 to 13, this "i" structure did not work!!
Again, is this "i." structure the same in logistics regression, also no difference in "DO files" either 13 or 14 or 15??
Appreciated on this online free teaching!
@sebastianwaiecon 5 років тому ⁺¹
It shouldn't make any difference using a logit. I guess this was a new feature in Stata 14. The oldest reference in my own do files I can find to this was from late 2015, after Stata 14 came out. From version to version, usually not a whole lot changes, but I'd recommend upgrading to Stata 15 at this point.
@thidachawhlaing3494 5 років тому
@@sebastianwaiecon Thanks so much. I think the point is old version Stata needs to be upgraded.
@ralphnestorpadero950 4 роки тому
what about the post estimation? is it the post estimation for continuous and categorical variables the same?
@sebastianwaiecon 4 роки тому
Any postestimation commands can still be used, since this is all contained within the regress command.
@habtamudoe8868 5 років тому
Thanks you so much sir.if the dependent and independent variable are categorical how I run it on stata
@sebastianwaiecon 5 років тому
See my video on binary choice models for basic ways to handle categorical dependent variables.
@ericli6027 4 роки тому
Thanks this is really helpful!
But what should I do if the original variable is numeric??
@sebastianwaiecon 4 роки тому
You can put the numeric variable directly in the regression. If you want to use a numeric variable as a categorical variable, you can still do that. You skip the step of encoding, and just use the "i." structure.
@ericli6027 4 роки тому
SebastianWaiEcon ok, thank you!
@saurabhsahu175 6 років тому
Thank you so much Sir.! Really helped me..!
@sunrose68 4 роки тому
I entered the command 'label define .......' but the result is invalid syntax. I just replicated your steps. Why does this happen?
@sebastianwaiecon 4 роки тому
You probably have a typo somewhere. Did you forget to put all the numbers in?
@sunrose68 4 роки тому
@@sebastianwaiecon I'm not sure where I did wrong😭 Could you have a look at it: sm.ms/image/hjFQGbX1kpwaz9N
@sebastianwaiecon 4 роки тому ⁺¹
I'm not sure, but my guess is that it didn't like the slash in Others/Uncertainty.
@jayanthsaishiva 3 роки тому
Great video. Thanks a lot
@vitriawaode6302 6 років тому
thank you...this video is quite helpfull
@saaakill 4 роки тому
Can you please make video about how we can get the frequency of a category as a new variable.
@sebastianwaiecon 4 роки тому
You can do that with egen:
egen frequency = count(category), by(category)
@godwinnerarwill2885 4 роки тому
Pls I need help with my data analysis especially creating a composite variable
@thedatahall 4 роки тому
what specifically are you looking for?
@domillima 4 роки тому
Are the coefficients here your R^2
@sebastianwaiecon 4 роки тому
In the upper-right area of the Stata regression output, you can see where it says "R-squared."
@256hzart 7 місяців тому
Hi I have a string of numbers but it's red. So I use this command to encode but all my numbers have been changed to others. I dont know why, pls help me 😭 🙏🙏🙏
@sebastianwaiecon 7 місяців тому ⁺¹
The command you probably need is "destring." Most likely what happened is that you have at least one value that is not a number, which is why Stata read it as a string. You first need to make sure all the values are valid numbers, then use destring to generate a numerical version of your variable.
@256hzart 7 місяців тому
@@sebastianwaiecon thank u, I can fix it now
@reyaa8593 5 років тому
What if I want the education but only females, or only males?
@sebastianwaiecon 5 років тому
You can add an "if" statement to most commands in Stata if you want to limit your analysis to a certain group.
@tahmidfaysal8315 2 роки тому
Thanks a lot
@TaniaOsadcha 3 роки тому
Thank you!
@Alex-sy4gg Рік тому
legend!
@mahbubhasan8102 4 роки тому
It,s concise.
@hazelw Рік тому
Just wanna turn the var after encoding into real numbers, like 1 for Someone, 2 for Anotherone, ... without claiming them one by one cause there are just too much of them. No one, literally no one can tell me how to do this. I am wondering why we are still using STATA rather than R, which is much more direct
@sebastianwaiecon Рік тому
When you encode, the new variable is just numbers (with labels on top). You can see this by clicking on the values in the data browser. Just make sure to set the order to how you want it. From there, just put the encoded variable into the regression without the "i." structure. You can see me do this in the video at 5:40.
@hazelw Рік тому
@@sebastianwaiecon Thanks mate:) Just found out that egen group() can also do this.
@zerohero109 6 років тому
What is the intrepretation?
@sebastianwaiecon 6 років тому
You'll have to be more specific.
@manpreetuk4277 2 роки тому
Is there anyone who help me on Stata exam?
@nuranidonesru9092 3 роки тому
10q very much
@YahyaMarei 4 роки тому
encode the variable you want to change, gen (new name)

Наступне

Автоматичне відтворення