Difference in Differences Estimation in Stata

SebastianWaiEcon

Додати в
- Мій плейлист
- Переглянути пізніше
Поділитися

Поділитися

Вставка

Розмір відео:

Показувати елементи керування програвачем

Автоматичне відтворення

Автоповтор

Опубліковано 17 жов 2024
An introduction to implementing difference in differences regressions in Stata.

КОМЕНТАРІ • 214

@wm6698 3 роки тому ⁺³
Thank you so much for this! My concern is why didn't you run a complete regression model for house price? Why only a bivariate regression? (i.e., dependent and dummies).
@sebastianwaiecon 3 роки тому ⁺⁸
The purpose of this video is to demonstrate the basic technique of differences in differences estimation. You can certainly add controls to the basic model, but that is outside the scope of this video. You can search my channel for other videos on that.
@tdogz8932 4 роки тому ⁺²⁰
I watched the video 2 years ago, it helped me understand the DID model and Stata so that I could finish my graduation dissertation on time. After the graduation, I published another paper using the same model, thank you sooooooooo much!!!!!!!!
@fgghdfg8638 3 роки тому
Can you help me to do my diff in diff other way I will miss my year we can talk about price
@tdogz8932 3 роки тому
@@fgghdfg8638 I'm sorry that I just see your message now. Hope you are doing fine with your dissertation:)
@amnashaukat7827 3 роки тому
Can you help me in this technique?
@tamandanikuchanje260 2 роки тому
Hello can you help me?
@dargon1084 2 роки тому ⁺¹
I learnt more in this video than six 2-hour videos of my own uni's lectures
@davecullins1606 4 роки тому ⁺⁵
You saved my exam in the previous semester, and you're saving me in this semester as well!
@jonaFUN999 3 роки тому
I’m from Andover, England and I approve this video 👍
@pneumascope 4 роки тому
I note that you have large Standard Errors in your findings. Does this in any way have an impact on the reliability of the findings or the interpretation of the overall impact of the program (or incinerator in this case)?
@sebastianwaiecon 4 роки тому ⁺¹
It's all relative when it comes to standard errors. You could say an SE of about 8000, as it is here, is large, but the estimate is -20,000. Standard errors are always going to be big numbers when dealing with things like the prices of homes, which are in the tens of thousands. All other things being equal, larger standard errors mean less precision in the estimates. Here, we can still be quite confident the incinerator did decrease property values.
@Muhammadilyas-ij6jh 3 роки тому
Hello sir! I have a question...it looks like you first run a simple OLS regression and then you compute the differences using the collapse command. I do not understand whether to use just OLS regression and report the differences estimator (-18824) as the DID estimator. Please guide me..
@sebastianwaiecon 3 роки тому
The number you gave estimates the difference between the treatment and control group before the treatment. We need to use the coefficient estimate for the interaction term to get the DID estimator.
@MrLi1231 4 роки тому
Hi Sebastian, thank you so much. Quick question. Is this dataset a panel, or two separate cross section datasets? I am assuming it is two separate cross section, right?
@sebastianwaiecon 4 роки тому ⁺¹
You are correct. It's a pooled cross section. It would be very unlikely for the same houses to be on the market in both years.
@MrLi1231 4 роки тому
@@sebastianwaiecon Good point and thank you so much for the quick reply! I am working on a thesis and realised that I was supposed to be doing DiD when I had done a different methodology for the many few weeks. Your video is incredible. Big thanks from Australia!
@sebastianwaiecon 4 роки тому
Happy to help!
@dandellionsy6537 3 роки тому ⁺²
Thank you so much, I need it. My model might be more complicated but at least I can sense the idea of doing it. Awesome! Keep sharing more
@VINAYKUMAR-kf6kd Рік тому
Thanks for the detailed Info. what if my Dependent variable is Categorical like Anemia (Yes / No). What should i need to take B coefficient or Exp(B)?? And how to cross check in excel ?
@nazda2007 4 роки тому
Dear Sebastian, I am working on my dissertation using DiD, i included additional control variables in my model. However, the model suffers from heteroskedasticity and autocorrelation. How to deal with them?
@sebastianwaiecon 4 роки тому
You might want to look at my videos on heteroscedasticity.
@sylvieyin5261 3 роки тому ⁺¹
Thank you so much. This video makes my HW much easier.
@huangkiana6165 3 роки тому ⁺¹
THIS VIDEO SAVED ME FROM MY DEADLINE. THANK YOU SO MUCH *cry
@danielkrupah 2 роки тому ⁺¹
Sir, please do you provide a paid service for the DD. I needed a coach
@trobberkah3425 2 роки тому
Hi, im doing a DiD for my thesis, but im dealing with panel data. Do you know what i should do differently compared to the regression you show in this video? I noticed that there is a stata command for a fixed effects DiD regression for example.
@nunosilva1563 Рік тому
I face exactly the same situation, can you please reply to the above question?
@user-vb3do7hh9v 2 роки тому
Thanks, Very well explained.
Can I get this dummy data set or can you please guide from where I can get such dummy data set for educational / learning purpose only ?
@vojtechkolar5897 Рік тому
Hey, I kind of understand diff-in diff, now I am dealing with a problem, what if the control is on way larger levels than the treatment Lets stay Control before: 100, after: 200 = 100 % increase, Treatment before: 5, after 9. If I calculate the DID efffect using the standard table so like the diff between differnces i get in this case 100-4= 96!... So the conterfactual state of the world would in the case of treatment be 105 ? !, that does not make sense no? Even the R with OLS gives me these results. What am I doing wrong? Thank you
@vojtechkolar5897 Рік тому
I get, that I can solve this problems by working with log-level model. But isnt this problem always with level-level dif in dif? What Am i missing?
@sebastianwaiecon Рік тому
You can do diff in diff with the dependent variable in logs. That's no problem as long as you are careful with the percentage change interpretation.
@indagame9 5 років тому
Have you ever done a coefplot to test the treatment effect? If so, I get a positive but not significant coefficient for my treat dummy variable. This would mean that the treatment group actually saw an increase in the fatalities (my y variable) or does it mean my treatment effect is positive? It is confusing because if I do a lowess plot on just the different states fatalities drops over time. However, in the coefplot the graph is trending upwards.
@sebastianwaiecon 5 років тому
I don't use coefplot, but I don't see why it would show results any different from your regression table.
@YY-ty5fx 4 роки тому ⁺³
What a clear explanation! I'm working on my own DD regression, and it really helped. Does the dependent variable 'price' cover prices before & after the treatment here, right?
@sebastianwaiecon 4 роки тому
At the beginning of the video, I show the data browser and scroll through the data. You can see some observations are before and some are after.
@simonazambelli5320 2 роки тому
Thank you very much. You explained everything very clearly! Thanks
@AAH123-v4x 4 роки тому
A very useful video. Thank you so much. I have a question. So i created 3 columns similar to y81 nearinc and y81nric. I am running two part logit and glm model. Since the value of y81 and other two is either 0 or 1. Will we put i.y81 and etc? I mean before binary variable ain't we suppose to put i.
@sebastianwaiecon 4 роки тому ⁺¹
For a binary variable, you will get the same result just putting the variable in or using the i. structure. If you have a categorical variable with more than two possible values, then you need to use i.
@AAH123-v4x 4 роки тому
@@sebastianwaiecon Thanks a lot!!
@ariagalit1875 5 років тому
Hi. My data ranges from 2009 to 2018, and i have both treatment and comparison groups. i just want to ask whether DID, just like what you did in the video, is applicable. I am not much familiar with the method and stata, actually.
@ariagalit1875 5 років тому
And how come the interaction variable is all zero?
@sebastianwaiecon 5 років тому ⁺¹
You can do DID if you set up a dummy variable to indicate when the treatment went into effect. Once this is in place, you can create the interaction term.
@ariagalit1875 5 років тому
Thanks much for your reply sir
@katieleck9955 3 роки тому
Hi, many thanks for the video.
When I try to do DID for my panel data set, stata says that my treatment group dummy and did variable are omitted due to collinearity, do you know why this would be / how i could fix it?
@sebastianwaiecon 3 роки тому
Most likely what happened is that you made a mistake creating your dummy variables. Click the magnifying glass button to look at your data to check what went wrong.
@md.arrahman7125 4 роки тому
Dr. Thanks for your excellent explanation. Is this step the same for panel data as I planning to run DID for panel(2000-2019)? Expecting your kind suggestion
@sebastianwaiecon 4 роки тому ⁺¹
I have some other videos on general panel data methods.
@samknight7290 5 років тому ⁺¹
Hi Sebastian, thank you very much for the video. Just wondering why you did not regress the other independent variables?
@sebastianwaiecon 5 років тому
I wanted to keep things simple and focus just on the diff in diff technique. However, you can certainly add more variables to the regression as controls.
@achintyawidhi2299 4 роки тому
sir, what the difference between xtreg and reg? if i use data from year 2007 and 2014, should i use reg org xtreg? my dataset doesn't have same units across 2007 and 2014.
@sebastianwaiecon 4 роки тому
Reg is the basic regression command and xtreg is used for panel data methods such as within estimation and random effects. If you don't have the same units across years (pooled cross section), then you probably want to use reg.
@abmakwara8010 4 роки тому
Hi Sebastian thank you for the great content very informative, however i have a question, my research is looking at the impact of bank regulation implemented in 2014 and this regulation only affect bigger banks within my population. Banks with population of 25b and over. I have gathered panel data from 2010 - 2019. i intent on using performance ratios as depended and variable that determine profitability as control variables. I am using DID in FE model in Gretl to run the regression. I have generated some dummy variables , time dummy variable for the before and after, group dummy variable with those impacted by regulation as treatment group and the rest as control, regulatory dummy which i am not sure if its necessary.
Two questions:
1. Is this research feasible in terms of parallel trend
2. will i need to interact all other variable in my model with time or the interaction only needs to be between time and group dummy. If yes then do i need to add group dummy on every interaction i do?
3. Is there need to add individual time effect since i am running the regression in FE model
Many thanks in advance
@sebastianwaiecon 4 роки тому
1. I have no idea, but it sounds like you have enough data to make that determination yourself.
2. You should think about this on a case-by-case basis. Think about what you're trying to accomplish and whether or not interactions would help with that.
3. Time dummy variables are an important component in FE. I have some videos on FE and panel data on my channel.
@TommasoSchembri 3 роки тому
Hi, thanks for the clear explanation. Is it possible to to a DID by percentage level? So that i come up with a %increase/decrease in the treatment group? thanks!!
@sebastianwaiecon 2 роки тому
Yes, you can take the natural log of the dependent variable to get an approximation of a percentage change.
@consultingfaqs 5 років тому
Could you please tell if we are using for example DHS data, which has data on demographics and health of a nation; but we want to see the effect of an external policy, like NREGA on labourforce participation of females ( the data for which is available in DHS). Then, should we merge NREGA data with DHS data, and then apply matching techniques to determine treatment and control groups? If not this, then how should we see the impact?
Thanks
@sebastianwaiecon 5 років тому
This question is specific to data I don't have any experience with and is therefore outside the scope of this video.
@aymanissa6722 Рік тому
Thank for such informative video,
Could you plz explain DiD method using diff command
@nirobkothopokothon 6 років тому
Hi, I would like to know whether Difference in differences analysis is suitable for a small data set thats contains only 2 years of data and have only 168 samples (84 control and 84 treatment)? Thank you so much.
@sebastianwaiecon 6 років тому ⁺¹
I don't see any reason why not. However, with only 2 years of data, you have no idea of how the outcomes have been trending over time, and you may have a hard time justifying your counterfactual.
@nirobkothopokothon 6 років тому
thank you so much.
@peterdastan1288 4 місяці тому
Does that mean house prices near garbage incinerator declined by an average of 21.13%?
@nip5554 5 років тому
Hi what if I want to control for additional variables?
Then the command "collapse (mean) y, by(after treatment)
" is not sufficient. Please tell me what to do to control for variables.
@sebastianwaiecon 5 років тому ⁺¹
You can add control variables, but you'll have to run a regression rather than using the collapse method.
@nip5554 5 років тому
@@sebastianwaiecon Thanks :)
@KIMKIM-bt6hr 3 роки тому
Good morning. I am a student working with the DID model. Thanks to your DID explanation, I was able to complete my assignment smoothly. But yesterday, the professor asked, 'Why was the control variable excluded, so I couldn't actually answer it.' After class, the professor gave me a separate assignment. That is, put the control variable in and analyze it again. I want to use STATA again. But how do I add a control variable to the current video? Could you please advise which code to enter?
@sebastianwaiecon 3 роки тому
You can simply add control variables to the DID regression, if you want.
@KIMKIM-bt6hr 3 роки тому
@@sebastianwaiecon I'm a STATA beginner, so can you explain a little bit more about where to put this part?
@lVaNeSsA90 2 роки тому
what did u use rprice and lrprice varibles to?
@Maria-ny2mj 6 років тому ⁺¹
Hi! nice video thank you very much! I have a question, how do you do if there are time varying treatment ? in your example it would be… Imagine there is a neighbourhood (1) that got the incinerator got built in 81 but other neighbourhood (2)82, for example… Would it be reg price y81 y82 nearincneighbourdhood1 nearincneighorhood2 y81* nearincneighbourdhood1 y82*nearincneighorhood2? something like that?
@sebastianwaiecon 6 років тому
You could also consider including interactions between y81 and neighborhood 2 and y82 and neighborhood 1. Once we get into more than 2 periods you should also be thinking of this as a fixed effects model. You may find my video on that helpful.
@Maria-ny2mj 6 років тому
@@sebastianwaiecon thank you very much! I will give a look to the video!
@ssjvegeto4ever 3 роки тому
Hi Sebastian, thanks a lot for the clean explanation! Could you tell me why you were inlcuding post-treatment levels of your covariates? Aren't they endogenous and thus result into bias? Thanks in advance!
@jackgandhi 3 роки тому
I don't understand the question. What I showed here is the most basic version of diff in diff, with the bare minimum amount of variables needed. Even if I had added more variables, that would not have created any bias -- bias happens because you left variables out.
@ssjvegeto4ever 3 роки тому
@@jackgandhi Thank you for the fast reply! Sorry I meant the covariate data structure. I recently did an DiD setup making use of this video's datastructure - and got the criticism that, since I included covariates with a time index for the post traetment period in the regression - these were endogenous and would thus impose bias.
@sebastianwaiecon 3 роки тому
@@ssjvegeto4ever What you are describing is a common and valid criticism of time series analysis. The purpose of diff in diff is, if the data allows, solving this problem using a control and treatment group. The "post" dummy (y81 in the video) is not enough to establish a causal relationship. This is why we have the interaction term (y81nearinc in the video). In this video, y81 controls for effects over time that are constant across groups while nearinc controls for group effects that are constant over time. The interaction pulls out the estimated effect. This is not to say this method is perfect as there could still be endogeneity due to variables that are constant neither across groups nor across time, so you still may need to think about controls. The diff in diff method is just one tool in the analyst's toolbox.
@sabrinanasir5844 4 роки тому
Thanks for the video! If you don't have an ideal counterfactual control group (i.e. there are some slight differences between the treatment and control groups in the pre-treatment period), can you add other independent variables to the diff n diff when running the regression in Stata?
@sebastianwaiecon 4 роки тому ⁺¹
Yes, you can.
@nathanmasak 3 роки тому
That's really helpful. Thank you. Did you ever run the "event study" model? I can't find resources on this model? Your input would be appreciated.
@sebastianwaiecon 3 роки тому ⁺²
I haven't, but a Google search turned up some resources. Best of luck with it.
@timothyowuor9478 3 роки тому
Nice tutorial on DID, thanks for saving me
@amartilianom 6 років тому
Hello, if you want to add control variables or covariates, do you add them normally at the regression? Thanks for the information!
@sebastianwaiecon 6 років тому ⁺¹
Yes, I forgot to mention that in the video. You can add controls to the diff in diff regression as in any other.
@amartilianom 6 років тому
Thanks. Another question would be, it is not necessary to tell Stata we have Panel Data when we have already created the dummy variables that differentiate the control and treatment group, and the pre and post periods? No need to run a fixed effects regression too, I guess. I'm just learning about the subject :)
@sebastianwaiecon 6 років тому
For a simple DD like this, you don't need to use xtset, if that's what you're asking. You can actually think of a DD as a very simple sort of FE model that only has two groups and two periods. If you want to see more about FE, I also have a video on it.
@amartilianom 6 років тому
I really appreciate your responses. Keep helping us!
@myleswhitmore8803 3 роки тому
Hi SebastianWaiEcon, I am a student at Morehouse College, and I really enjoyed watching your video. I need help running a Diff in Diff regression for my research paper. For context, I am using Stata to analyze NAFTA's impact on GDP and trade flow for its member nations. To facilitate this process, I will be running an individual diff and diff analysis for each country. My dummy variable will be years before 1994 (when NAFTA was signed) and after 1994. My DV will be GDP growth. And my extra variables will be looking at human capital, agriculture industry growth percentage, manufacturing growth percentage, and other variables. However, I struggle with the Stata platform and would like your advice to ensure this regression runs smoothly.
@sebastianwaiecon 3 роки тому
The most important thing for diff in diff is to identify a control and treatment group. In your case, that might be countries that were part of NAFTA and countries that were not.
@amnashaukat7827 3 роки тому
@@sebastianwaiecon Enjoying your video.. But I neend help.. I have 25 countries and data from 1960-2020... How can I specify only one time 2012 while comparing it 2010-2016.. please help me
@sebastianwaiecon 3 роки тому
@@amnashaukat7827 A fixed effects model may be more appropriate: ua-cam.com/video/H95BHswbT3w/v-deo.html&ab_channel=SebastianWaiEcon
@FannysVista 4 роки тому
Hi Sebastian, your video helps me a lot to understand DID estimation. I have a follow-up question. Is it possible to estimate difference indifference for survey data analysis?
I try it on my survey data. However, the DID from regression and the DID from manual collapse calculations show a different result.
@sebastianwaiecon 4 роки тому ⁺¹
The actual source of the data shouldn't matter here, whether it's from a survey or not.
@FanettiMazakura 6 років тому
Sebastian, what if I want to include id and time fixed effects in the regression? Do I only keep the interaction variable in the regression?
@sebastianwaiecon 6 років тому
Unlike FE models, diff in diff does not necessarily have the same cross-sectional units across time periods. In my example, it's not the same houses in '78 and '81. As such, ID-based FE won't work. Here, the nearinc variable plays the same role as the FE. Your time dummy is already in there in DD.
@FanettiMazakura 6 років тому
Yes, I get that. I have unbalanced panel data and I want to conduct a Difference-in-Differences with id and time fixed effects. Is // xtreg DepVar i.treated##i.during controls i.month , fe cluster(id) // the correct model to achieve that? Or do you think that it would be better to exclude the fixed effects?
@sebastianwaiecon 6 років тому
If I'm understanding what you're trying to do correctly, I think you can include the fixed effects.
@motnaichuoiktnb 6 років тому
Firstly thank you for your video which is very helpful. As you have mentioned in your comment it was not the same house in '78 and 81', does that mean your treatment and control group are not the same pre and post-treatment ?
@sebastianwaiecon 6 років тому
The criterion for being in the control or treatment group is the same in both years, but the specific houses aren't the same.
@sarahfranz5748 3 роки тому
Thanks for this video! One question: how would you proceed if you are comparing the difference between control and treated group across a 4 week period, testing whether the difference is bigger in the beginning and decreases?
@sebastianwaiecon 3 роки тому
You can interact a time variable (linear trend, or quadratic, etc.) with a treatment dummy variable.
@cherrykhalil7481 6 років тому
Sebastian, thank yo so much for this video. Does the data have to be in long shape? Is there a way to run the diff in diff regression on a wide dataset? Thank you.
@sebastianwaiecon 6 років тому ⁺¹
Yes, you can do it. Generate a new variable for the difference, then regress the difference on a dummy variable for the treatment group.
@cherrykhalil7481 6 років тому
Thank you very much! What about the interaction dummy between year and dummy? Given that my dataset is a balanced panel of 400 firms observed in both 2008 and 2013? Thanks again
@sebastianwaiecon 6 років тому
With the wide dataset, there's no interactions as you've already built it in by taking the difference ahead of time.
@jargodm 5 років тому
@@sebastianwaiecon Just to follow up on this, if you do have the same units before and after, the paired difference test gives a different result than the regression you discuss in the video: Y = b1 + b2*treat + b3*time + b4*treat*time, which assumes independent samples, does it not?
@sebastianwaiecon 5 років тому
I believe the estimate would be the same, but the standard error would be different.
@nazlcaneroglu4427 4 роки тому ⁺¹
Thank you for the video! Btw is there any way that we can also see the trends of both groups by drawing a line graph in Stata? If the trends are same before the treatment period, we should be able to see that right?
@sebastianwaiecon 4 роки тому
Yes, you can use a twoway graph to do that.
@antoniomastrandrea967 4 роки тому ⁺¹
Hi Sebastian, thank you for your video!
I've two questions:
1) What should I do if the FE variables (time and individual) are not significant? (I mean p-value > 0.1)
2) Do I have to take care of R squared in this case?
Thank you!
@sebastianwaiecon 4 роки тому ⁺¹
1) If what you're after is measuring the treatment effect, this doesn't matter.
2) I don't know what you mean by "take care," but R squared is not particularly relevant in DID estimation.
@narlikar78 5 років тому ⁺²
Sir, Another question in this regard and I humbly request your attention at the earliest:
Suppose I have a panel data set of 75 Banks for 5 years (Pre-merger) which have merged to become 30 Banks (also for 5 years Post Merger) and I have been able to establish my model using all the standard Panel Data Test viz. the F-test, BP-LM Test, and Hausman (1978) that it is a Fixed Effects Model.
given that my Dependent Variable is an Index of Inclusion (whose values lie between 0 and 1), while all other Independent variables are metric data from Balance sheets of banks, with a time dummy (0 for pre-and post merger), CAN I run a Panel Tobit model knowing well that it is a fixed effects Model. I use Stata 14 for my econometrical model testing? I have been told that Panel Tobit can be accompanied only for Random Effects Model
My problem is my Dependent variable has a truncated range ? Please guide asap
@sebastianwaiecon 5 років тому
Mechanically, you can do it with dummy variables (see my fixed effects video). While I am not aware of a specific reason you should not do so, I don't know enough to definitively tell you one way or another.
@shamsunnahar2294 4 роки тому
clear presentation. Do you have any video on two way cluster regression in stata. If yes, please send me the link here.
@simonazambelli5320 Рік тому
Love it! Thank you Sebastian!!
@pudurvivek 5 років тому
Do we need to check the p values of the variables before understanding the effect of the interaction variable on the dependent variable?
@sebastianwaiecon 5 років тому
If you want to know about p-values, I suggest taking a look at my video on hypothesis testing: ua-cam.com/video/lhoqZjQHHjk/v-deo.html
@yanvianna4737 2 роки тому
Could you demonstrate how it would work when more than a year before and after treatment?
@BrickTemplar 4 роки тому
Hi Sebastian, I wonder what do we have to do if the effect is spread over the years, say, treatment was implemented in one year for the firms in one industry, next year for another?
Say, over the three decades, the U.S. authorities have gradually cut import tariffs on a large variety of goods and services. CUT=1 if this happened, 0 otherwise.
The equation will have a form of
Investment=b1*tariff CUT + b2*lagged controls + industry FE etc, cluster by industry-year.
I do not understand what do I have to add to a simple regression to make it diff-in-diffs in this case...
Dummy CUT interacted with what?
@BrickTemplar 4 роки тому
or, like in your example, incinerator would have been installed for one neighborhood in 1981, for another in 1985 etc, for another in 2005... y81 time dummy won't work anymore, so what do we have to interact?
@sebastianwaiecon 4 роки тому
You'll need a dummy variable that "turns on" from a 0 to a 1 once the treatment is active. You won't be able to do this by building an interaction term, as it's more complex than that now. I'm not sure there's a better way than putting in the 1s on a case by case basis.
@aung9211 5 років тому ⁺²
Could you please provide how to check the Equal Trend (Parallel Trend) assumption.
@sebastianwaiecon 5 років тому
Unfortunately, we can't do it with this dataset, since we don't have extra data on either side of the change.
4 роки тому
Hey, thanks. How do you do it with multiple time points?
@sebastianwaiecon 4 роки тому
You can still make a variable indicating before and after treatment. You might also want to think about a fixed effects regression.
@sireenkhalili8631 2 роки тому
Thank you so much for this video, it was really helpful!
@usmannasim618 4 роки тому
Hi Sebastian,
Can you also please describe the coding to be used when we have a dummy variable for 'treatment' and 'control' groups?
Thanks,
@sebastianwaiecon 4 роки тому
I did that in the video. The variable nearinc is the dummy variable for the treatment group.
@zdavirandimuhammad1515 3 роки тому
hi thank you for the explanation. but can we req the data so we can also practice?
@sebastianwaiecon 3 роки тому
This is the dataset KIELMC.dta from the Wooldridge econometrics textbook. It is widely available online.
@zdavirandimuhammad1515 3 роки тому
@@sebastianwaiecon thank you. also for kindly reply my message. God bless. stay safe stay healthy
@mertbakirci6030 4 роки тому
Hey, thanks for the great content here. QUESTION: How can I test for the "common trend" assumption of the DiD-estimator in Stata or in general? Thanks in advance!
@sebastianwaiecon 4 роки тому ⁺²
Usually, this is done informally by comparing the dependent variable movement across groups in an extended period of time before and after the treatment goes into effect. You need a lot more data than I have in this example.
@mertbakirci6030 4 роки тому
@@sebastianwaiecon thank you!
@sajidnoor9482 3 роки тому
Thank you very much for explaining this very clearly.
@Bibirallie 2 роки тому
What if there are multiple before and after variables, but not one conclusive before and after or year variable.
@sebastianwaiecon 2 роки тому
You may want to consider a fixed effects model instead.
@GHSHAH 3 місяці тому
How to interpret the interaction term, also how to check it is significant or not.reply fast
@johnkaimenyi9292 2 роки тому
Hello, is DID regression possible in STATA 15.0?
@sebastianwaiecon 2 роки тому ⁺¹
I'm not aware of any changes in recent versions of Stata that would change anything in this video.
@keith-ole 4 роки тому
Phenomenal explanation, thank you.
If you wanted to include more prior years and a few years after, would you have to make a dummy variable for each year?
@sebastianwaiecon 4 роки тому
You don't have to do that, but you might want to look into fixed effects models for that kind of thing.
@harunasanibk2662 5 років тому
Sir, how am I supposed to run the data for both "treatment and control" groups?
Should I run the data separately? Please, what command should I use?
@sebastianwaiecon 5 років тому
I don't know what you mean by "run the data" here.
@monikasrivastava5565 3 роки тому
What are the steps to generate the result why u have not shown them. Plzz do i really need how to do it
@Diana-mo6mg 4 роки тому
if you used logprice instead of price would the coefficient be different?
@sebastianwaiecon 4 роки тому
Yes, it would. See my video on natural logarithms for how that would work.
@zdavirandimuhammad1515 3 роки тому
could you explain to us about Propensity Score Matching using STATA?
@jamesleleji9470 3 роки тому
How can you do DID using SSPSS or R programming. Thanks
@sebastianwaiecon 3 роки тому
The idea will be the same -- create dummy variables for treatment and time and an interaction, then put those in a regression.
@pujiannauli 6 років тому
wht if the p value after the reg. for the dummy time*dummy group is not significant, how to fix this? thank you so much
@sebastianwaiecon 6 років тому
You don't "fix" it, it's just the result you got. It tells you that you can't reject the hypothesis that your treatment had no effect. Now, it could be that you have some endogeneity that you need to control for, but statistical significance, or lack thereof, is not (by itself) a problem to be fixed.
@consultingfaqs 5 років тому
@@sebastianwaiecon Hi, is the interaction term is insignificant, will adding more variables help us getting the result significant? Since, in the results show that the constant term is highly significant, which means that there is an omitted variable bias. I guess, adding more controls can help solve the problem for the insignificant interaction term.
@sebastianwaiecon 5 років тому ⁺²
@@consultingfaqs It bears repeating that the treatment not being significant is not a "problem" to be be solved unless you think this is because of an omitted variable. Tinkering around with different models with the explicit purpose of finding a significant effect is not an ethical use of data. The constant term being highly significant is also not evidence of omitted variables. I'm not sure where you got that idea. Adding more variables might or might not result in existing terms being more significant. It all depends on the direction of the bias, if there is one.
@thanhtoba1464 5 років тому
Thank you for your helpful sharing, when I run the command: "corr(y81 nearinc y81nrinc)" to test the autocorrelation between variables and the result shows there is an autocorrelation between "nearinc" and "y81nrinc" variables. The confidence of correlation is 0.5776. So my question is: what should we do in this situation.
@sebastianwaiecon 5 років тому
First of all, "autocorrelation" is a very specific term, which you are using incorrectly. In time series data, this refers to a variable correlating with itself across time. In any case, you've pointed out that an interaction term is correlated with one of the variables you are interacting. This is true by definition. There isn't anything you do about that -- it would be strange if it were not the case. In a more general sense, there is nothing wrong with two variables in a regression being correlated with each other. That is completely normal and probably the case in most regressions.
@thanhtoba1464 5 років тому
Thank you for pointing out my problem. You are right, it was my fault in using the term "autocorrelation". What I really mean is the "multicollinearity" but there was a mistake in typing. Anyway, according to the data in the video, the truth is "multicollinearity" really happens in the regression result because the coefficient of correlation between " nearinc" and "y81nrinc" variables is 0.5776. Usually, in the case of encountering "multicollinearity", we usually omit one of the two variables out of the model. However, it is impossible to omit any variable of these two variables due to the requirement of "Difference in difference" method because they must be included together to show the effect of the construction of the incinerator. That is why I asked the question "what should we do in this situation". And this problem not only happens in this example, but it also occurs in every "DID" model because we usually create a "did" variable by multiplying the "time" and "treated" variables (did = time * treated). And the consequence is there always is "multicollinearity" in "DID" model. Can you help me to solve this issue?
@sebastianwaiecon 5 років тому
Multicollinearity is not a big deal. Getting into the practice of dropping variables because they are correlated with another variable in the model will lead you quickly into omitted variable bias. There is a simple test where you regress the one variable you are concerned about on all the other explanatory variables. If the R-squared is under 0.9, don't worry about it.
As I explained previously, it is mathematically impossible for a variable and an interaction term involving it to be uncorrelated. The interaction term is absolutely key to a diff in diff regression.
@thanhtoba1464 5 років тому
@@sebastianwaiecon Thank you very much for the explanation.
@gregorychung9421 4 роки тому
@@sebastianwaiecon Hello, I found this video very helpful. However, when running my model, my DID variable keeps getting dropped because of collinearity. Is there a fix to that?
@adriabc7614 4 роки тому
Hi Sebastian, very useful video at a great pace ;). In this example you compare the differences in price, how would you interpret the results if the variable is categorical (eg. completed studies, married, etc). Many thanks!
@sebastianwaiecon 4 роки тому
You can only do this if the categorical variable is binary (eg. married and not married). Assign a 1 to married and 0 to unmarried. We now have a linear probability model (see my video on binary choice models). The interpretation of the diff-in-diff is now the difference in probability of being married.
@mathewchandy9588 4 роки тому
Is heteroscedasticity ever an issue when you conduct a difference-in-difference analysis?
@sebastianwaiecon 4 роки тому
Yes, it is. In this example, you could imagine there might be a difference in the variance of prices with and without the incinerator.
@mathewchandy9588 4 роки тому
@@sebastianwaiecon Then to solve this, would you add the vcerobust command at the end of your regression?
@sebastianwaiecon 4 роки тому
I can't think of a theoretical reason why you couldn't do that. To be honest, I think most people just use robust all the time and don't really think about it.
@dalemantey6028 6 років тому
Can you do a DD with logistic regression? Say I have a dichotomous outcome - for this example, it could be something like house sold (yes/no). Would it be a similiar stata code, just change "regress" to "logistic" or are the considerations within DD that might limit the statistical validity of that sort of analysis?
@sebastianwaiecon 6 років тому
The principles which drive DD -- controlling for time trends and cross sectional trends -- are still useful for logits (and probits also). However, you need to be careful about the coefficient interpretations, as it's not as clean as in the least squares DD. I would suggest looking at my video on binary choice models for details.
@sebastianwaiecon 6 років тому ⁺¹
For the code, yes, you can change "regress" to "logit" and it will run.
@dalemantey6028 6 років тому
Thank you!
@DX-nh8qc 3 роки тому
May I know How to type control covariable in stata
@lateralus5117 6 років тому
Hello, i ran into a problem when running my regression.
My regression looks like this:
regress DepVar post_tr_yr treat_group treat_groupXpost_tr_yr
Where post_tr_yr is a dummy for year>2007
However my interaction term (treat_groupXpost_tr_yr) gets omitted due to collinearity.
Is this a problem?
@sebastianwaiecon 6 років тому
I always recommend you go to the data browser and take a look at the values. Presumably something went wrong in your variable generation.
@Ilaay23 6 років тому
I also have this problem. My interaction term is omitted due to collinearity, does anyone know how you can fix this?
@bencaplan4565 6 років тому
I have the same issue - what sort of issue in the variable generation can result in this?
@xMooshy 5 років тому
@@bencaplan4565 for the time dummy, the control group also gets a 1 even if it is not treated at all
@MrAdhoul 2 роки тому
Gread video, thank you.
@vaishalisharma6519 5 років тому
Hello sir. How to create the dummy for near inc. The actual command?
@sebastianwaiecon 5 років тому
nearinc indicates whether the house is within 3 miles of the incinerator. There is a variable called "dist" which is the distance from the incinerator in feet. To create the dummy, we would use the command: gen nearinc = dist
@emilieriislarsen5134 6 років тому
Hi, Sebastian, thank you so much for your video. I was wondering if it's possible to do propensity score matching and difference in differences when my dependent variable is dichotomous?
@sebastianwaiecon 6 років тому
I can't comment on specifics as I've never combined all of these myself. However, both diff in diff and propensity score matching can be done with dichotomous dependent variables. You just need to be careful about the issues inherent in linear probability. See my video on binary choice models for details.
@IamPaste 4 роки тому
How would you do it for 1978?
@2thedata 3 роки тому
Thank you so much! Your video helps me! :D
@subhalakshmipaul4816 6 років тому ⁺¹
Hello sir, please provide a video on reshape long from wide particularly when data sets is very large in size ..I.e., how to organise the variables before reshape... please sir ...
@oluwaseunoginni9828 5 років тому
please , how did you generate the interaction variable?
@sebastianwaiecon 5 років тому
Create an interaction term by multiplying the two variables you are interacting.
@popi20101 3 роки тому
What if we add more than 1 control variable? not only nearinc.
@sebastianwaiecon 3 роки тому ⁺¹
You are always allowed to add controls if you think the DD method did not eliminate endogeneity.
@popi20101 2 роки тому
And if we have 5 years of period 2007 to 2011 and the policy is announced at 2009, how to set the year variable?
@raulfotso4032 4 роки тому
Good morning for all.please i want know how to do a Fairlie décomposition.i am student lecturer in university of Douala
@alexbrunofmn 5 років тому
When was the incinerator built?
@sebastianwaiecon 5 років тому ⁺¹
According to the original paper, construction took place from 1981-1984.
@LaFemmeExec 4 роки тому
Hey! How do I generate a variable that separates the years?
@sebastianwaiecon 4 роки тому
In this dataset, that is y81 -- a dummy variable with a 1 for 1981 and 0 otherwise. I have another video with some examples of how to create dummy variables: ua-cam.com/video/DuAhUpM-56E/v-deo.html
@manojsapkota4880 4 роки тому
Hello sir I am interested on DID and want to know the command to run DID regression on Stata
@sebastianwaiecon 4 роки тому
It's all in the video.
@ashishstat 3 роки тому
Can I have the link of data set used in this video
@sebastianwaiecon 3 роки тому ⁺¹
It's KIELMC.dta, which comes with the Wooldridge econometrics textbook. You should be able to find it online.
@frankzhao1678 3 роки тому
Thank you so much, it is a great video. Could you please show me how to do a DiD with multi periods?
@sebastianwaiecon 3 роки тому
Do you mean you have multiple periods before and after the change? It functions the same as this, but you need to define your "post" variable to include all periods after the change.
@frankzhao1678 3 роки тому
@@sebastianwaiecon So if I have 2000-2010 data, and the policy happened in 2005. I need to set 2000-2004 equal to '0', and 2005-2010 equal to '1'?
@sebastianwaiecon 3 роки тому
That would be the simplest way to do it. I'm not promising this is the perfect solution as you may need to think about more sophisticated ways to handle your specific data, but it is a good starting point.
@himaep_agungkrisyana1013 2 роки тому
can i get do-file this stata?
@jaredgreathouse3672 6 років тому
What if your data have multiple units treated and untreated at the same time? There, a clean post period makes no sense. If one city 1, for example, is being treated at time t, but city 2 and 4 aren't, but the next year, city 3 is being treated and so on, wouldn't you just do treatment##time variable
@sebastianwaiecon 6 років тому
For that, you might want to look into a full fixed effects model. I have a video on that, as well.
@jodieteague8254 4 роки тому ⁺¹
could you then graph this in Stata?
@sebastianwaiecon 4 роки тому ⁺¹
Yes. You would do this after running the collapse to get all the averages. The "classic" diff in diff graph has the outcome on the vertical axis and time on the horizontal axis. There are three lines: the treated group, the untreated group, and a counterfactual with the same starting point as the untreated group but the same slope as the treated group. See my video on graphing for how to use the twoway command.
@jodieteague8254 4 роки тому
@@sebastianwaiecon Thank you will do!
@narlikar78 5 років тому
Can we have your dataset used in the video to try the results again ourselves
@sebastianwaiecon 5 років тому
The dataset is KIELMC.dta that comes with the Wooldridge econometrics book. You should be able to find it online.
@fgghdfg8638 3 роки тому
Hi professor I hope you are doing well I'm a follower on UA-cam professor can you help me to do an assignment in method difference in differences because I didn't find subject or data can help me to do it I must to do it other way I will repeat the year and I sleep only 3 hours more than 3 weeks just because of this project can you help me and if you want I can pay you to help me
@sebastianwaiecon 3 роки тому
I recommend you ask your professor for help - it's what they're there for!

Наступне

Автоматичне відтворення