@@SteffensClassroom Thank you! I have a further question, if i use "drop if " to drop some observations, but then I want to use these obserations in the following regression, what should I do?
Thank you for the question! The way to do this is as follows: (Use preserve and restore) What this does is that everything that happens after you write preserve and until you write restore, will be reset till whatever you had before you wrote preserve. Sounds strange? Let me give an example: preserve drop if obs=something reg y x restore reg y x Your regression inside the preserve/restore will be without the observations you dropped, and the second regression will be with the sample where you did not drop anything i.e. the sample you had before you typed preserve. Hope this helps!
No, not really. Unless you surrounded that part of the code with preserve/restore (See help preserve). However, you can just write this up in your do-file. If you figure out that you did not need to drop a certain variable, you can just adapt your do0file, re-run it, and you are back!
First off, I am not sure you really want to do that. It is good to know that Stata removes observations with missing values in at least one variable that is included in your estimation automatically. So you don't have to do it for that reason. It is better to present all your data. However, if you really want to do it, you could do this: (if you don't have that many variables) keep if !missing(var1) & !missing(var2) & !missing(var3) or you can install the dropmiss command and write: dropmiss, obs any This is better if you have a larger dataset with many variables. I hope it helps!
Hi Steffen! Thank you so much for youor videos they are extremely helpful. I was wondering if you could answer a quick question, I am using data from the World Bank Development Indicators and have no 'blank' values or '.' values only zeros which I beleive to be indicative of a missing value. As such, would the command to drop the variable be 'drop if Inflation==0'?
Hi! Thank you for your question. Indeed, the command you suggest would work if the variable you try to drop is not a string. If it is a string, you would have to use " " around teh 0, such that the command would be: drop if Inflation=="0" Likewise, if it is blank or ".", then you can use drop if Inflation=="" and drop if Inflation=="." respectively. Let me know if this helps!
Hi Steffen, thanks a lot for your video. How do I exclude a particular observation while running a regression in Stata? Say I want to regress wage on age, gender and experience but I want to use only data for those below a certain age, how do I go about it?
There could be many reasons, some of which are highlighted here: shorturl.at/psAM6 It is also important to think about why there are missing values, as there could be many reasons for this. Especially, if you have a panel. I discuss this a bit here (early in the lecture): I hope this helps!
But you can drop variable that are completely empty with: missings dropvars, force You may need to install the missings command first: ssc install missings
Hi thank you for replying! I have coded a bunch of values as missing across the data set (non-answered questions on surveys)and want to find a way to drop these missing points in one go. Currently I am using: drop var if ==. for each variable but was wondering if there was a more efficient way to do this? Thank you for your help! @@SteffensClassroom
Not gonna lie. I don't think it is a great idea to drop all your missing observations. If you want to do it, then check here: www.stata.com/statalist/archive/2009-12/msg00524.html Good luck! :)
Hi Prof. My oil price data has missing values and I am trying to test it for structural breaks. But I am getting an error msg 'gaps not allowed' repeatedly. Is it due to missing values?
Hi! Indeed, when testing for structural breaks, you should have no missing values. Having missing values for oil prices seems strange, so you should be able to fill them out. Otherwise, you would have to change to a different data frequency.
Many thanks, Steffen, it is absolutely a clear explanation. 🙂
Happy to help!
Thank you so much, Steffen!
Happy you liked it! Good luck with your Stata journey!
Thank you, i have serached on how to drop under condition for a whole day!
Glad I could help!
If you are missing anything else, don't hesitate to ask!
@@SteffensClassroom Thank you! I have a further question, if i use "drop if " to drop some observations, but then I want to use these obserations in the following regression, what should I do?
Thank you for the question!
The way to do this is as follows:
(Use preserve and restore)
What this does is that everything that happens after you write preserve and until you write restore, will be reset till whatever you had before you wrote preserve. Sounds strange? Let me give an example:
preserve
drop if obs=something
reg y x
restore
reg y x
Your regression inside the preserve/restore will be without the observations you dropped, and the second regression will be with the sample where you did not drop anything i.e. the sample you had before you typed preserve.
Hope this helps!
@@SteffensClassroom Thank you so much Steffen, it works. Really appreciate your help!
Happy to help!
Good luck :)
Please share the videos. Hope this they will be a help to as many as possible!
Excellent videos! Loving them so far! :) I was wondering that if I drop a variable, can I also undo it?
No, not really.
Unless you surrounded that part of the code with preserve/restore (See help preserve).
However, you can just write this up in your do-file. If you figure out that you did not need to drop a certain variable, you can just adapt your do0file, re-run it, and you are back!
@@SteffensClassroom Thank you for the response!
Hi Steffen, thank you so much for your help! Is there a way I could drop all missing values from my dataset?
First off, I am not sure you really want to do that. It is good to know that Stata removes observations with missing values in at least one variable that is included in your estimation automatically. So you don't have to do it for that reason. It is better to present all your data.
However, if you really want to do it, you could do this: (if you don't have that many variables)
keep if !missing(var1) & !missing(var2) & !missing(var3)
or you can install the dropmiss command and write:
dropmiss, obs any
This is better if you have a larger dataset with many variables.
I hope it helps!
Terimakasih
Hi Steffen! Thank you so much for youor videos they are extremely helpful. I was wondering if you could answer a quick question, I am using data from the World Bank Development Indicators and have no 'blank' values or '.' values only zeros which I beleive to be indicative of a missing value. As such, would the command to drop the variable be 'drop if Inflation==0'?
Hi! Thank you for your question.
Indeed, the command you suggest would work if the variable you try to drop is not a string. If it is a string, you would have to use " " around teh 0, such that the command would be: drop if Inflation=="0"
Likewise, if it is blank or ".", then you can use drop if Inflation=="" and drop if Inflation=="." respectively.
Let me know if this helps!
Hi Steffen, thanks a lot for your video. How do I exclude a particular observation while running a regression in Stata? Say I want to regress wage on age, gender and experience but I want to use only data for those below a certain age, how do I go about it?
Hi!
The easiest way is to add an if statement in your regression call. E.g. reg y x if age
Thanks a million!
Happy to help :)
So, if stata already automatically drops observations with missing values, should you not worry about them?
There could be many reasons, some of which are highlighted here:
shorturl.at/psAM6
It is also important to think about why there are missing values, as there could be many reasons for this. Especially, if you have a panel. I discuss this a bit here (early in the lecture):
I hope this helps!
Can you create a new variable to contain the values dropped and kept?
Hi!
Short answer; yes. You drop via a condition, then you can simply create a variable that is that condition. See the gen video :)
@@SteffensClassroom thank you so much for the reply! I managed to figure it out! 👍👍👍
Is there a way to conditionally drop variables when missing across a whole data set? Or do I have to do it variable by variable?
Not sure what you mean. You want to drop a variable if it essentially empty? That is, contain not a single non-missing value?
But you can drop variable that are completely empty with: missings dropvars, force
You may need to install the missings command first: ssc install missings
Hi thank you for replying! I have coded a bunch of values as missing across the data set (non-answered questions on surveys)and want to find a way to drop these missing points in one go. Currently I am using: drop var if ==. for each variable but was wondering if there was a more efficient way to do this? Thank you for your help! @@SteffensClassroom
Not gonna lie. I don't think it is a great idea to drop all your missing observations. If you want to do it, then check here: www.stata.com/statalist/archive/2009-12/msg00524.html
Good luck! :)
but how to keep more than one observations in data ?
Hello!
You can use & to add more variables to keep/drop, or add more conditions to your command.
Hi Prof. My oil price data has missing values and I am trying to test it for structural breaks. But I am getting an error msg 'gaps not allowed' repeatedly. Is it due to missing values?
Hi!
Indeed, when testing for structural breaks, you should have no missing values. Having missing values for oil prices seems strange, so you should be able to fill them out. Otherwise, you would have to change to a different data frequency.