This was super helpful!!! I first bootstrapped using bsample and then ran a multi-level fixed effect using reghdfe in the loop. It works great! I noticed by default it sets the obs to be equal to the size of the dataset that you are sampling from. It also lets you oversample by setting an obs greater than the size of the dataset. I also tried bootstrapping by using the command bootstrap, reps(#): then reghdfe. This by default lets you specify the obs number to sample as equal to the number of clusters in the dataset. Thank you again for creating content and sharing! Looking forward to reading your book and hope that you'll have workshops tailored to grad students around the world.
Hi, Nick. Thank you so much for the nice video. I am doing panel regression, and wondering whether it is possible to use bootstrap to get the confidence intervals for the panel model using stata (or r).
Hi Nick .Thank you so much for the insightful video.I have a question to ask you .I am running a regression model and have also added weights to it ( eg I used aw= wt) since the survey data comes with a survey weight /multiplier.I need to bootstrap the model and report the standard errors thereafter.However I cannot use the same weights while bootstrapping . Is there a way around this issue? Will the standard errors generated without weights after bootstrapping be significantly different from the standard errors of the regression model with the weight ?
@@NickHuntingtonKleinThank you for the reply .I just want to mention that the survey data that I am using does not have replicate weights. From what I understand, bsweights are helpful when the survey data also includes replicate weights. Can bsweights be used to manually generate these replicate weights for survey data without them ?
@@aibannongspung1765 oh I see. Maybe look at svy bootstrap. The replicate weights refer to the weights you get from bootstrap www.stata.com/manuals/svysvybootstrap.pdf
It sounds like you're looking for bootstrapped standard errors, which is something a bit different than this video is about. But yes you can apply bootstrap SEs to any model in Stata, see www.stata.com/features/overview/bootstrap-sampling-and-estimation/
Thank you very much for this video!! I was wondering if I could use bootstrap with different samples. To construct an index, I constructed an index merging data from a different dataset (I extracted mean values per variable from the latter one since it is way larger than my sample). I would like to check if the final index measurement is influenced by the external sample dimension. So as original dataset I considered my sample and after preserve I inputted the external dataset, whereas in the loop I put the distance index formula. Yet, once I run it, it says already preserved. what am I getting wrong?
There's a bit too much in here for me to follow it, but if you're getting an already-preserved erorr, that means that you tried to preserve twice in a row without a restore in between. So make sure each preserve is matched by a restore, or if you want to clear out your last preserve without restoring, use "restore, not"
If you are running a regression in your bootstrap you can pull a coefficient out and store it in a local (just like in the code in the video). The way to refer to a coefficient after running the regression is with _b[x], where x is the name of the variable you want the coefficient for
Thank you so much Nick. I have a query on whether bootstrapping can be used on a survey weighted data set, which uses a svy command before a regression. If yes, how can the codes be modified?
I am using the stata KCDF function and then the variable generated from this into my regression model. Since my variable is estimated, I have to bootstrap the process. I am able to do the looping and bootstrapping based on your method, But I not able to use the generated bootstrapped variable in the model to get bootstrapped standard errors. any suggestions would be very helpful. Thank yo.
Thank you so much for your wonderful video! I just registered this channel as my favorite. Thanks. I'm wondering if I could use this in the regression command. In each loop, I opened the original dataset, ran the regression command and obtained the coefficient. Then I aggregated the results of each resampling. (I mean I calculated the mean and sd of the coefficient.) Am I right?
@@NickHuntingtonKlein By the way, in Stata software, the bootstrap command can also work but the coefficients do not change and only standard errors change. I could not understand why. sysuse auto, clear regress mpg weight gear foreign regress mpg weight gear foreign, vce(bootstrap, rep(1000)) In the second command, you can get the coefficient and SE. But the coef is actually the same as the original model. What is the difference?
@@貴広今泉-l4e The second command is estimating the coefficient by regular OLS and only the standard errors by bootstrap. This is actually a good idea if you plan to use them for hypothesis tests, as it helps any hypothesis tests done after the fact be sure they're comparing the right things.
@@NickHuntingtonKlein Thank you very much! Got it! Now I understand the mechanism. Much appreciate it. I am working on prediction model development and I wanted to learn how to perform internal validation using the bootstrap resampling method. I guess your program would work to calculate the optimism statistics to evaluate the prediction model based on the regression models. Aren't you going to make some video on this topic??
How does one get a bootstrap 95CI and p-value for the difference in two proportions, particularly in multilevel data? I have dataset where eyes are nested within subjects. I want to show that the proportion of var1 is significantly different from the proportion of var2, and since the data is multilevel I'm assuming bootstrap 95CI and p value would be the way to address this?
For multilevel data you generally want to do bootstrap sampling by cluster. Once you do that, just store all the ratio estimates from all the bootstrap iterations. The 2.5th and 97.5th percentiles of the estimates are your confidence interval.
@@ataliethompson6725 that's the beauty of bootstrap - just calculate whatever it is you want to calculate in each of the bootstrap samples. So calculate the difference in proportions
great video,I have question .I did exactly you show in video,but without g x normal,because I already had data. But error happens every time. ''invalid obs no'' what does it mean?
For bootstrap SEs? I'm not certain that the bootstrap standard error assumptions are justified in the Arellano-Bond case. But in any case you should be able to apply the guide on this page about boostrapping in a panel/ts setting www.stata.com/support/faqs/statistics/bootstrap-with-panel-data/
Missing values you just keep using as normal. For multiple imputation you could bootstrap each imputation separately. There might even be a special MI bootstrap in stata 16, I'm not sure, they added a bunch of MI stufd
That suggests there's an error in the line with floor in it. Remember, floor is a function, not a variable. So floor() is correct, not floor () or floor*()
@@alisadavtyan2133 Everything before the "save originaldata.dta" line is just me creating the fake data, you don't need it. You can just open up your existing data instead.
I am doing a Cost effectiveness analysis for costs and health benefit. from my data I calculated an average cost and an average effect per treatment arm in order to calculate the ICER . Then my Supervisors told me that this is not enough and that I need to do bootstraping...i know how but... I DO NOT HAVE A CLUE WHY do I need to do this though. Does anyone know? THANKS!!
Thank for taking the time to reply. I figured it out. There was a minor issue in the code I entered. The boot code is working. Would you happen to know how this can be done for nested data? I have diving data with parameters of depths and bottom times (how long and how deep). These dives belong to a group of 17 small-scale fishermen divers. Each fishermen conducted a range of 100-400 dives per year. My goal is get a good understand for what their average depth and bottom time. The dives are nested within each fishermen. The average per fishermen have a lot of variance. Anyway any help is greatly appreciated.
Walter Chin There are two ways to go about this depending on what you want to do with it. One uses the "strata" option of bsample, and the other uses the "cluster" option (see help bsample). Strata does a bootstrap such that you are resampling within fishermen (ie fisherman A did ten trips and B did 16, so you resample from A ten times and B 16 times). Cluster resamples at the fisherman level (ie it will resample from fisherman A and fisherman B, picking all the trips that fisherman goes on). If the problem is that there's a lot of noise within fishermen, you probably want the strata option, but I'd recommend looking closer at the help file for more details.
Could you please help me? My code is not working. It's showing following error: . set obs 'boots' ''' invalid It is not an integer or its value is too large.
First 3 minutes were exactly what I wanted to know. Thank you!
This was super helpful!!! I first bootstrapped using bsample and then ran a multi-level fixed effect using reghdfe in the loop. It works great!
I noticed by default it sets the obs to be equal to the size of the dataset that you are sampling from. It also lets you oversample by setting an obs greater than the size of the dataset.
I also tried bootstrapping by using the command bootstrap, reps(#): then reghdfe. This by default lets you specify the obs number to sample as equal to the number of clusters in the dataset.
Thank you again for creating content and sharing! Looking forward to reading your book and hope that you'll have workshops tailored to grad students around the world.
Very helpful video! Thanks a lot!
Omg you literally SAVED MY LIFE!!!!! Thank you Thank you Thank you!!!!!!
Hi, Nick. Thank you so much for the nice video. I am doing panel regression, and wondering whether it is possible to use bootstrap to get the confidence intervals for the panel model using stata (or r).
Yep! That's a different goal than in this video though. See www.stata.com/support/faqs/statistics/bootstrap-with-panel-data/
@@NickHuntingtonKlein Thank you so much. This is what I want to learn. It is helpful.
Thank you for this clear explanation.
Hi Nick .Thank you so much for the insightful video.I have a question to ask you .I am running a regression model and have also added weights to it ( eg I used aw= wt) since the survey data comes with a survey weight /multiplier.I need to bootstrap the model and report the standard errors thereafter.However I cannot use the same weights while bootstrapping . Is there a way around this issue? Will the standard errors generated without weights after bootstrapping be significantly different from the standard errors of the regression model with the weight ?
The downloadable package bsweights will help you do this
@@NickHuntingtonKleinThank you for the reply .I just want to mention that the survey data that I am using does not have replicate weights. From what I understand, bsweights are helpful when the survey data also includes replicate weights. Can bsweights be used to manually generate these replicate weights for survey data without them ?
@@aibannongspung1765 oh I see. Maybe look at svy bootstrap. The replicate weights refer to the weights you get from bootstrap www.stata.com/manuals/svysvybootstrap.pdf
@@NickHuntingtonKlein Thank you Nick .I will give it a try .
Thanks for the video! I'm wondering if bootstrapping can be used to run an MLM model with random effects predictors in Stata?
It sounds like you're looking for bootstrapped standard errors, which is something a bit different than this video is about. But yes you can apply bootstrap SEs to any model in Stata, see www.stata.com/features/overview/bootstrap-sampling-and-estimation/
Thank you very much for this video!! I was wondering if I could use bootstrap with different samples. To construct an index, I constructed an index merging data from a different dataset (I extracted mean values per variable from the latter one since it is way larger than my sample). I would like to check if the final index measurement is influenced by the external sample dimension. So as original dataset I considered my sample and after preserve I inputted the external dataset, whereas in the loop I put the distance index formula. Yet, once I run it, it says already preserved. what am I getting wrong?
There's a bit too much in here for me to follow it, but if you're getting an already-preserved erorr, that means that you tried to preserve twice in a row without a restore in between. So make sure each preserve is matched by a restore, or if you want to clear out your last preserve without restoring, use "restore, not"
Hi thank you so much Nick! If I wanna get the coefficient for each iteration, what should I do?
If you are running a regression in your bootstrap you can pull a coefficient out and store it in a local (just like in the code in the video). The way to refer to a coefficient after running the regression is with _b[x], where x is the name of the variable you want the coefficient for
@@NickHuntingtonKlein Got it thank you so much it works perfectly.
Thank you so much Nick. I have a query on whether bootstrapping can be
used on a survey weighted data set, which uses a svy command before a
regression. If yes, how can the codes be modified?
If you're just trying to get bootstrapped SEs, look at the "svy bootstrap" help file
@@NickHuntingtonKlein Thank you so much. I am going through the file currently to clear the basics.
Nice Job Nick 💓💓💓💓
May u share ur I'd for asking some problem related stata
I am using the stata KCDF function and then the variable generated from this into my regression model. Since my variable is estimated, I have to bootstrap the process. I am able to do the looping and bootstrapping based on your method, But I not able to use the generated bootstrapped variable in the model to get bootstrapped standard errors. any suggestions would be very helpful. Thank yo.
Just take the standard deviation of your bootstrapped coefficient (for example, with the summarize command). That's the bootstrap standard error.
@@NickHuntingtonKlein Thank you Nick!
Thank you so much for your wonderful video! I just registered this channel as my favorite. Thanks. I'm wondering if I could use this in the regression command. In each loop, I opened the original dataset, ran the regression command and obtained the coefficient. Then I aggregated the results of each resampling. (I mean I calculated the mean and sd of the coefficient.) Am I right?
Yep, that works
@@NickHuntingtonKlein Thanks! Your videos went viral in my community!
@@NickHuntingtonKlein
By the way, in Stata software, the bootstrap command can also work but the coefficients do not change and only standard errors change. I could not understand why.
sysuse auto, clear
regress mpg weight gear foreign
regress mpg weight gear foreign, vce(bootstrap, rep(1000))
In the second command, you can get the coefficient and SE. But the coef is actually the same as the original model.
What is the difference?
@@貴広今泉-l4e The second command is estimating the coefficient by regular OLS and only the standard errors by bootstrap. This is actually a good idea if you plan to use them for hypothesis tests, as it helps any hypothesis tests done after the fact be sure they're comparing the right things.
@@NickHuntingtonKlein Thank you very much! Got it! Now I understand the mechanism. Much appreciate it.
I am working on prediction model development and I wanted to learn how to perform internal validation using the bootstrap resampling method. I guess your program would work to calculate the optimism statistics to evaluate the prediction model based on the regression models. Aren't you going to make some video on this topic??
How does one get a bootstrap 95CI and p-value for the difference in two proportions, particularly in multilevel data? I have dataset where eyes are nested within subjects. I want to show that the proportion of var1 is significantly different from the proportion of var2, and since the data is multilevel I'm assuming bootstrap 95CI and p value would be the way to address this?
For multilevel data you generally want to do bootstrap sampling by cluster. Once you do that, just store all the ratio estimates from all the bootstrap iterations. The 2.5th and 97.5th percentiles of the estimates are your confidence interval.
@@NickHuntingtonKlein How does one bootstrap for the difference in two proportions (as opposed to a mean)?
@@ataliethompson6725 that's the beauty of bootstrap - just calculate whatever it is you want to calculate in each of the bootstrap samples. So calculate the difference in proportions
great video,I have question .I did exactly you show in video,but without g x normal,because I already had data. But error happens every time. ''invalid obs no'' what does it mean?
The "set obs" command is for the purpose of creating the fake data, you don't need it if you already have data, and it will produce that error.
@@NickHuntingtonKlein do I need g store_means that you write before the word 'quietly'
@@evahakobjanyan8528 You need some sort of variable to store the results in, yes.
Can you guide using bootstrap with xtabond2? Thanks
For bootstrap SEs? I'm not certain that the bootstrap standard error assumptions are justified in the Arellano-Bond case. But in any case you should be able to apply the guide on this page about boostrapping in a panel/ts setting www.stata.com/support/faqs/statistics/bootstrap-with-panel-data/
What if I have missing values or a multiply imputed dataset ?
Missing values you just keep using as normal. For multiple imputation you could bootstrap each imputation separately. There might even be a special MI bootstrap in stata 16, I'm not sure, they added a bunch of MI stufd
what if it shows ''floor not found''?
That suggests there's an error in the line with floor in it. Remember, floor is a function, not a variable. So floor() is correct, not floor () or floor*()
How to implement Kónya (2006) bootstrap panel granger causality approach in stata?please help me😢
No idea! Never heard of it. If it were me I'd Google for it.
@@NickHuntingtonKlein Thank you.Of course I searched, unfortunately I couldn't find it.
what command should I change if I already have exsiting varaible. thsi part g X=rnormal(4)*2+4
Bootstrapping over an existing variable? It should all work the same, you can just skip generating a new variable and use the old one.
@@NickHuntingtonKlein and what about set obs 10000 ?Should I write my obs number ?
@@alisadavtyan2133 Everything before the "save originaldata.dta" line is just me creating the fake data, you don't need it. You can just open up your existing data instead.
@@NickHuntingtonKlein and local boots are number of my obs ?
@@alisadavtyan2133 That's the number of bootstrap iterations
I am doing a Cost effectiveness analysis for costs and health benefit. from my data I calculated an average cost and an average effect per treatment arm in order to calculate the ICER . Then my Supervisors told me that this is not enough and that I need to do bootstraping...i know how but... I DO NOT HAVE A CLUE WHY do I need to do this though. Does anyone know? THANKS!!
I would recommend posting this question in more detail on StackExchange
@@NickHuntingtonKlein thanks Nick
Could you provide the do file. I keep getting an error
Walter Chin I'm afraid I didn't keep the do file. It's just the same code you can see in the video though.
Thank for taking the time to reply. I figured it out. There was a minor issue in the code I entered.
The boot code is working.
Would you happen to know how this can be done for nested data? I have diving data with parameters of depths and bottom times (how long and how deep). These dives belong to a group of 17 small-scale fishermen divers. Each fishermen conducted a range of 100-400 dives per year.
My goal is get a good understand for what their average depth and bottom time. The dives are nested within each fishermen. The average per fishermen have a lot of variance.
Anyway any help is greatly appreciated.
Walter Chin There are two ways to go about this depending on what you want to do with it. One uses the "strata" option of bsample, and the other uses the "cluster" option (see help bsample). Strata does a bootstrap such that you are resampling within fishermen (ie fisherman A did ten trips and B did 16, so you resample from A ten times and B 16 times). Cluster resamples at the fisherman level (ie it will resample from fisherman A and fisherman B, picking all the trips that fisherman goes on). If the problem is that there's a lot of noise within fishermen, you probably want the strata option, but I'd recommend looking closer at the help file for more details.
Could you please help me? My code is not working. It's showing following error:
. set obs 'boots'
''' invalid
It is not an integer or its value is too large.
@@ASMTowhid the first apostrophe should be a backtick (next to the one) i.e. `boots'; it is an annoying feature of specifying locals
i am still confused....
complicated and confusing... Better to use original data