A great Canadian musician named Stan Rogers. The song is called: The White Collar Holler" and is not that typical of his music. One of my favorite songs of his is "Northwest Passage". Pierre Trudeau called it the unofficial national anthem of Canada. I also recommend Barrett's Privateers amony many others!
Thanks a lot for all the videos, Its really helpful. I have some difficulty in using 'foreach loop' for my survey data. The data is in 'string' formate and when i run the 'foreach loop', it gives error. I need your help for solving this. Many Thanks
Thank you so much for clear explanation, Sir,I have one question. should we eliminate outliers before the apllication of strutural breaks tests ?? thanks
+Guiri Amine Guiri, Just because an observation is apart from others, it may be wrong to call it an outlier. An observation could be far away from others still very close to the fitted slope in a model, for example, and can produce smaller residual than others. You are probably introducing serious bias in your data and models by your arbitrarily dropping cases you have called extreme or outliers. A better approach is to Identify which extreme observations ask why it is an outlier? How much influence does it cause in your model? This is based on careful investigation and sincere judgement rather just replacing them with an arbitrary value or discarding them from analysis. You can find further discussion and material about this at www3.nd.edu/~rwilliam/stats2/l24.pdf and stats.stackexchange.com/questions/78063/replacing-outliers-with-mean In short, dropping cases without careful consideration is generally a bad idea.
I have a question. I have a dataset with three variables including company names, industry codes, and countries in which the companies exist. What I want to do is to remove the observations for which the total number of companies for a given industry in a given country is less than 5. How can I do this in Stata???
Ali, these kinds of questions are easier to address if you provide some example data, preferably an example dataset shipped with Stata. I think one approach is to use the -bysort- prefix command to create a count of the number of companies within an industry and then to use that measure to keep or drop cases as appropriate. Here is a program that may point you in the right direction for an answer: /* Load a Stata supplied dataset */ sysuse nlsw88.dta, clear /* Look at two variables Pretend c_city is your country and industry is your industry */ tab c_city tab industry /* Get rid of missing industries */ drop if industry>=. /* Create new variable indicating # of companies in industry. This is the heart of the program. */ bysort c_city industry: gen cnt=_N /* Keep or drop cases based on the number of companies in an industry. You would change this to -drop- */ keep if cnt
I have dataset contructed from the world value survey and I want to create subsets of that dataset based on developed and developing countries. Can you please guide me?
+Sarah Suleman There are a couple of ways of doing this and I show one way to do this below. The method I show here is what I call a "brute force" method". It is not very clever, but it is simple and does work. I suggest making a new variable, I called it devcnt, that is coded - to 0 for developing countries and 1 for developed countries. Then you can this variable as a flag in if statements for different analyses. Further, this variable allows for comparing measures across developing and developed countries. In the code below, simple replace the "devcnt=0" or "devcnt=1" with 0 or 1 depending on whether you consider the country develping or developed. Sincerely, Alan Neustadtl desc V2 preserve gen V2TMP=V2 contract V2 V2TMP list V2TMP V2, noobs clean restore capture drop devcnt generate byte devcnt=. replace devcnt=0 if V2== 12 /* Algeria */ replace devcnt=0 if V2== 31 /* Azerbaij */ replace devcnt=1 if V2== 32 /* Argentin */ replace devcnt=1 if V2== 36 /* Australi */ replace devcnt=1 if V2== 48 /* Bahrain */ replace devcnt=1 if V2== 51 /* Armenia */ replace devcnt=1 if V2== 76 /* Brazil */ replace devcnt=1 if V2== 112 /* Belarus */ replace devcnt=1 if V2== 152 /* Chile */ replace devcnt=1 if V2== 156 /* China */ replace devcnt=1 if V2== 158 /* Taiwan */ replace devcnt=1 if V2== 170 /* Colombia */ replace devcnt=1 if V2== 196 /* Cyprus */ replace devcnt=1 if V2== 218 /* Ecuador */ replace devcnt=1 if V2== 233 /* Estonia */ replace devcnt=1 if V2== 268 /* Georgia */ replace devcnt=1 if V2== 275 /* Palestin */ replace devcnt=1 if V2== 276 /* Germany */ replace devcnt=1 if V2== 288 /* Ghana */ replace devcnt=1 if V2== 344 /* Hong Kon */ replace devcnt=1 if V2== 356 /* India */ replace devcnt=1 if V2== 368 /* Iraq */ replace devcnt=1 if V2== 392 /* Japan */ replace devcnt=1 if V2== 398 /* Kazakhst */ replace devcnt=1 if V2== 400 /* Jordan */ replace devcnt=1 if V2== 410 /* South Ko */ replace devcnt=1 if V2== 414 /* Kuwait */ replace devcnt=1 if V2== 417 /* Kyrgyzst */ replace devcnt=1 if V2== 422 /* Lebanon */ replace devcnt=1 if V2== 434 /* Libya */ replace devcnt=1 if V2== 458 /* Malaysia */ replace devcnt=1 if V2== 484 /* Mexico */ replace devcnt=1 if V2== 504 /* Morocco */ replace devcnt=1 if V2== 528 /* Netherla */ replace devcnt=1 if V2== 554 /* New Zeal */ replace devcnt=1 if V2== 566 /* Nigeria */ replace devcnt=1 if V2== 586 /* Pakistan */ replace devcnt=1 if V2== 604 /* Peru */ replace devcnt=1 if V2== 608 /* Philippi */ replace devcnt=1 if V2== 616 /* Poland */ replace devcnt=1 if V2== 634 /* Qatar */ replace devcnt=1 if V2== 642 /* Romania */ replace devcnt=1 if V2== 643 /* Russia */ replace devcnt=1 if V2== 646 /* Rwanda */ replace devcnt=1 if V2== 702 /* Singapor */ replace devcnt=1 if V2== 705 /* Slovenia */ replace devcnt=1 if V2== 710 /* South Af */ replace devcnt=1 if V2== 716 /* Zimbabwe */ replace devcnt=1 if V2== 724 /* Spain */ replace devcnt=1 if V2== 752 /* Sweden */ replace devcnt=1 if V2== 764 /* Thailand */ replace devcnt=1 if V2== 780 /* Trinidad */ replace devcnt=1 if V2== 788 /* Tunisia */ replace devcnt=1 if V2== 792 /* Turkey */ replace devcnt=1 if V2== 804 /* Ukraine */ replace devcnt=1 if V2== 818 /* Egypt */ replace devcnt=1 if V2== 840 /* United S */ replace devcnt=1 if V2== 858 /* Uruguay */ replace devcnt=1 if V2== 860 /* Uzbekist */ replace devcnt=1 if V2== 887 /* Yemen */ label define devcnt 0 "developing" 1 "developed" label values devcnt devcnt tab devcnt if devcnt==0 tab devcnt if devcnt==1
Alan Neustadtl Maybe easier to create a list of developing country ID's and using a foreach loop that replaces a variable with 0 if the observations ID matches the developing country ID list
Yes, certainly an option. Given the information I had and the editor I use, I chose a brute force method. But somtimes, shorter code is more efficient and easier to read and correct if there are errors.
+Hamzah Hasyim Hamzah, my email address is smilex3@umd.edu. I made all of the videos to support a graduate course on statistical programming. I try to answer questions that people have but it depends on 1) if I know of an answer, and 2) if I have time! Best wishes, Alan.
Thanks so much for this video. Such a useful video of tips and tricks. Just a quick question. Can I use the drop or keep if function with country names too or only numbers? For example, keep if PartnerISO3 = AGO ARE ARG AUS AUT BEL BGD BHR BHS BOL BRA CAN CHE CHL CHN CIV CMR COL CRI CZE DEU DNK DO M DZA ECU EGY ESP EST FIN FRA GBR GHA GRC GTM HKG HND HUN IDN IND IRL IRN ISR ITA JAM JOR JPN KEN KHM KWT LBN LBR LKA LTU LVA MAR MEX MRT MUS MYS NGA NIC NLD NOR NZL OMN PAK PAN PER PHL POL PRT PRY QAT SAU SDN SEN SGP SLV SVK SVN SWE THA TTO TUN TUR TWN TZA UGA URY USA VEN VNM YEM ZMB ZWE These are all the countries I want to keep and delete the rest. Unfortunately, my STATA keeps saying type mismatch or PArtnerISO3 not found? Thanks so much.
Hi Nicole, yes, keep and drop work with strings. Stata, however, expects strings to be enclosed by double-quotes. Something like this: keep if PartnerISO3 == "AGO" Note the use of quotes but also the use of the double equal signs, "==" which Stata uses to check for equalities. Finally, you need to string together your keep/drop with OR operators. Something like this: keep if PartnerISO3 == "AGO" | PartnerISO3 == "ARE" | etc. There are probably some shortcuts you could take to make this a bit more efficient, but that should work. Also, I suggest that you look at the help file for the command "encode" which can be used to convert your string variable to a numeric variable but assign the string values to the value labels. It might be easier to process your list of countries if they were converted to numbers first. Best, Alan
And very dangerous! All data changes should be documented in a program do-file in my opinion. That said, the browse command is considerably safer than the edit command.
Thank you for this video. A lot of useful commands and hints. I wish we would have had a class like this at our university!
Wow! This was very helpful because it didn't gloss over all of the important details. Thanks for sharing!!!
The keep if !missing(vars) is what I really need! Thank you!!
There are some other functions that can be useful in situations like this. Look for help inrange() and help inlist().
You saved my life with the compress function. Thank you
Thanks a lot from Afghanistan
👍
ښکاري چې په سټی ټا دې ښه خواري کړې. لول
Many thanks from New Zealand!!!!
Aimee Ward I'm glad you found this video useful!
13:45 'keep if !missing(varlist)' drops missing values from variables listed in parenthesis
Awesome video, thanks for the great content!
+JoeTheShmoe Thank you!
Thank you so much for this video :) very clear and helpful. Regards from Frankfurt
+claes ribot Danke.
2:35 'compress' command stores data more efficiently where possible
What was that intro tune? Rare cross over between my analyst & swing dancing personas
A great Canadian musician named Stan Rogers. The song is called: The White Collar Holler" and is not that typical of his music. One of my favorite songs of his is "Northwest Passage". Pierre Trudeau called it the unofficial national anthem of Canada. I also recommend Barrett's Privateers amony many others!
Thanks a lot for all the videos, Its really helpful. I have some difficulty in using 'foreach loop' for my survey data. The data is in 'string' formate and when i run the 'foreach loop', it gives error. I need your help for solving this. Many Thanks
perfect lecture thanks
Thank you so much for clear explanation, Sir,I have one question. should we eliminate outliers before the apllication of strutural breaks tests ?? thanks
+Guiri Amine Guiri, Just because an observation is apart from others, it may be wrong to call it an outlier. An observation could be far away from others still very close to the fitted slope in a model, for example, and can produce smaller residual than others.
You are probably introducing serious bias in your data and models by your arbitrarily dropping cases you have called extreme or outliers. A better approach is to Identify which extreme observations ask why it is an outlier? How much influence does it cause in your model? This is based on careful investigation and sincere judgement rather just replacing them with an arbitrary value or discarding them from analysis.
You can find further discussion and material about this at www3.nd.edu/~rwilliam/stats2/l24.pdf and
stats.stackexchange.com/questions/78063/replacing-outliers-with-mean
In short, dropping cases without careful consideration is generally a bad idea.
ok thanks a lot Sir
I have a question. I have a dataset with three variables including company names, industry codes, and countries in which the companies exist. What I want to do is to remove the observations for which the total number of companies for a given industry in a given country is less than 5. How can I do this in Stata???
Ali, these kinds of questions are easier to address if you provide some example data, preferably an example dataset shipped with Stata. I think one approach is to use the -bysort- prefix command to create a count of the number of companies within an industry and then to use that measure to keep or drop cases as appropriate. Here is a program that may point you in the right direction for an answer:
/* Load a Stata supplied dataset */
sysuse nlsw88.dta, clear
/*
Look at two variables
Pretend c_city is your country and
industry is your industry
*/
tab c_city
tab industry
/* Get rid of missing industries */
drop if industry>=.
/*
Create new variable indicating # of
companies in industry. This is the
heart of the program.
*/
bysort c_city industry: gen cnt=_N
/*
Keep or drop cases based on the number of
companies in an industry. You would change
this to -drop-
*/
keep if cnt
I have dataset contructed from the world value survey and I want to create subsets of that dataset based on developed and developing countries. Can you please guide me?
+Sarah Suleman There are a couple of ways of doing this and I show one way to do this below. The method I show here is what I call a "brute force" method". It is not very clever, but it is simple and does work. I suggest making a new variable, I called it devcnt, that is coded - to 0 for developing countries and 1 for developed countries. Then you can this variable as a flag in if statements for different analyses. Further, this variable allows for comparing measures across developing and developed countries.
In the code below, simple replace the "devcnt=0" or "devcnt=1" with 0 or 1 depending on whether you consider the country develping or developed.
Sincerely,
Alan Neustadtl
desc V2
preserve
gen V2TMP=V2
contract V2 V2TMP
list V2TMP V2, noobs clean
restore
capture drop devcnt
generate byte devcnt=.
replace devcnt=0 if V2== 12 /* Algeria */
replace devcnt=0 if V2== 31 /* Azerbaij */
replace devcnt=1 if V2== 32 /* Argentin */
replace devcnt=1 if V2== 36 /* Australi */
replace devcnt=1 if V2== 48 /* Bahrain */
replace devcnt=1 if V2== 51 /* Armenia */
replace devcnt=1 if V2== 76 /* Brazil */
replace devcnt=1 if V2== 112 /* Belarus */
replace devcnt=1 if V2== 152 /* Chile */
replace devcnt=1 if V2== 156 /* China */
replace devcnt=1 if V2== 158 /* Taiwan */
replace devcnt=1 if V2== 170 /* Colombia */
replace devcnt=1 if V2== 196 /* Cyprus */
replace devcnt=1 if V2== 218 /* Ecuador */
replace devcnt=1 if V2== 233 /* Estonia */
replace devcnt=1 if V2== 268 /* Georgia */
replace devcnt=1 if V2== 275 /* Palestin */
replace devcnt=1 if V2== 276 /* Germany */
replace devcnt=1 if V2== 288 /* Ghana */
replace devcnt=1 if V2== 344 /* Hong Kon */
replace devcnt=1 if V2== 356 /* India */
replace devcnt=1 if V2== 368 /* Iraq */
replace devcnt=1 if V2== 392 /* Japan */
replace devcnt=1 if V2== 398 /* Kazakhst */
replace devcnt=1 if V2== 400 /* Jordan */
replace devcnt=1 if V2== 410 /* South Ko */
replace devcnt=1 if V2== 414 /* Kuwait */
replace devcnt=1 if V2== 417 /* Kyrgyzst */
replace devcnt=1 if V2== 422 /* Lebanon */
replace devcnt=1 if V2== 434 /* Libya */
replace devcnt=1 if V2== 458 /* Malaysia */
replace devcnt=1 if V2== 484 /* Mexico */
replace devcnt=1 if V2== 504 /* Morocco */
replace devcnt=1 if V2== 528 /* Netherla */
replace devcnt=1 if V2== 554 /* New Zeal */
replace devcnt=1 if V2== 566 /* Nigeria */
replace devcnt=1 if V2== 586 /* Pakistan */
replace devcnt=1 if V2== 604 /* Peru */
replace devcnt=1 if V2== 608 /* Philippi */
replace devcnt=1 if V2== 616 /* Poland */
replace devcnt=1 if V2== 634 /* Qatar */
replace devcnt=1 if V2== 642 /* Romania */
replace devcnt=1 if V2== 643 /* Russia */
replace devcnt=1 if V2== 646 /* Rwanda */
replace devcnt=1 if V2== 702 /* Singapor */
replace devcnt=1 if V2== 705 /* Slovenia */
replace devcnt=1 if V2== 710 /* South Af */
replace devcnt=1 if V2== 716 /* Zimbabwe */
replace devcnt=1 if V2== 724 /* Spain */
replace devcnt=1 if V2== 752 /* Sweden */
replace devcnt=1 if V2== 764 /* Thailand */
replace devcnt=1 if V2== 780 /* Trinidad */
replace devcnt=1 if V2== 788 /* Tunisia */
replace devcnt=1 if V2== 792 /* Turkey */
replace devcnt=1 if V2== 804 /* Ukraine */
replace devcnt=1 if V2== 818 /* Egypt */
replace devcnt=1 if V2== 840 /* United S */
replace devcnt=1 if V2== 858 /* Uruguay */
replace devcnt=1 if V2== 860 /* Uzbekist */
replace devcnt=1 if V2== 887 /* Yemen */
label define devcnt 0 "developing" 1 "developed"
label values devcnt devcnt
tab devcnt if devcnt==0
tab devcnt if devcnt==1
Alan Neustadtl Maybe easier to create a list of developing country ID's and using a foreach loop that replaces a variable with 0 if the observations ID matches the developing country ID list
Yes, certainly an option. Given the information I had and the editor I use, I chose a brute force method. But somtimes, shorter code is more efficient and easier to read and correct if there are errors.
how to get link of vedios
Thank you so much for this video May I know email from his tutor in the video or source of link, if we want further discuss the topics. regards
+Hamzah Hasyim Hamzah, my email address is smilex3@umd.edu. I made all of the videos to support a graduate course on statistical programming. I try to answer questions that people have but it depends on 1) if I know of an answer, and 2) if I have time! Best wishes, Alan.
Is there an edit function for notes?
Not that I know of, but you can either delete and recreate notes or replace them. From the command window in Stata enter -help notes- for details.
Alan Neustadtl
Merci
Very good :-)
Thanks so much for this video. Such a useful video of tips and tricks.
Just a quick question. Can I use the drop or keep if function with country names too or only numbers?
For example, keep if PartnerISO3 = AGO ARE ARG AUS AUT BEL BGD BHR BHS BOL BRA CAN CHE CHL CHN CIV CMR COL CRI CZE DEU DNK DO
M DZA ECU EGY ESP EST FIN FRA GBR GHA GRC GTM HKG HND HUN IDN IND IRL IRN ISR ITA JAM JOR JPN KEN KHM KWT LBN LBR LKA LTU LVA MAR MEX MRT MUS MYS NGA NIC NLD NOR NZL OMN PAK PAN PER PHL POL PRT PRY QAT SAU SDN SEN SGP SLV SVK SVN SWE THA TTO TUN TUR TWN TZA UGA URY USA VEN VNM YEM ZMB ZWE
These are all the countries I want to keep and delete the rest. Unfortunately, my STATA keeps saying type mismatch or PArtnerISO3 not found?
Thanks so much.
Hi Nicole, yes, keep and drop work with strings. Stata, however, expects strings to be enclosed by double-quotes. Something like this:
keep if PartnerISO3 == "AGO"
Note the use of quotes but also the use of the double equal signs, "==" which Stata uses to check for equalities. Finally, you need to string together your keep/drop with OR operators. Something like this:
keep if PartnerISO3 == "AGO" | PartnerISO3 == "ARE" | etc.
There are probably some shortcuts you could take to make this a bit more efficient, but that should work. Also, I suggest that you look at the help file for the command "encode" which can be used to convert your string variable to a numeric variable but assign the string values to the value labels. It might be easier to process your list of countries if they were converted to numbers first.
Best,
Alan
@@smilex3 thank you so much. This is very helpful!
Hello, please can you send your du-file
The "edit"-command - the dark side of STATA!
And very dangerous! All data changes should be documented in a program do-file in my opinion. That said, the browse command is considerably safer than the edit command.