Using Bootstrapping to Calculate p-values!!!

Поділитися
Вставка
  • Опубліковано 2 жов 2024

КОМЕНТАРІ • 244

  • @statquest
    @statquest  2 роки тому

    Support StatQuest by buying my book The StatQuest Illustrated Guide to Machine Learning or a Study Guide or Merch!!! statquest.org/statquest-store/

  • @KoLMiW
    @KoLMiW 3 роки тому +29

    I started watching these videos to prepare for my Introduction to Machine Learning exam but now I just watch them because it's fun to learn about it when it is so well explained. Thank you for your effort!

  • @insertacoin738
    @insertacoin738 3 роки тому +70

    I have really no words to express how incredibly amazing, clear and enlightening your videos are, you transform the historically "hard and complex" concepts into kids' games, it is astoundingly magnificent, almost majestic. Thank you, really, from the bottom of my heart. You deserve a whole university named after you.

    • @statquest
      @statquest  3 роки тому +2

      Thank you so much 😀

    • @insertacoin738
      @insertacoin738 3 роки тому

      @@statquest I just came with this Josh, do you have any explanation for why does this happen? stats.stackexchange.com/questions/535343/bootstrapped-mean-always-almost-identical-to-sample-mean

    • @statquest
      @statquest  3 роки тому +3

      @@insertacoin738 Yes, I do. First, keep in mind that the person who posted that is calculating the mean of the the bootstrapped means. And this mean of means is very similar to the original mean. In other words, the mean of the histogram that bootstrapping created is centered on the mean of the original data. That's to be expected. Bootstrapping works because the sample distribution is an estimate (not an exact copy) of the population distribution. This estimate gets better as the sample size increases.

  • @OwenMcKinley
    @OwenMcKinley 3 роки тому +16

    Thank you! I've never really realized the power of bootstrapping until watching your 'Quests. Great stuff 👍👍

    • @statquest
      @statquest  3 роки тому

      Thank you very much! :)

  • @usmanazhar7073
    @usmanazhar7073 3 роки тому +1

    Really informative, thank you so much for uploading

  • @rattaponinsawangwong5482
    @rattaponinsawangwong5482 3 роки тому +12

    The way you explain bootstrap is so good. Make it simpler for everyone.

  • @rayman2704
    @rayman2704 3 роки тому +1

    Thank you soooooooooooooo much!

  • @citron2725
    @citron2725 Місяць тому +1

    hey really nice video! I am wondering why P value is calculated by adding up the proportions of values that are farther than the observed value at either side instead of one side (in the case of observed mean = 0.5, just p-value > 0.5)?

    • @statquest
      @statquest  Місяць тому +1

      It's because two-sided p-values are almost always better than 1-sided p-values. To understand why, see: ua-cam.com/video/JQc3yx0-Q9E/v-deo.html

  • @PauloBuchsbaum
    @PauloBuchsbaum Рік тому

    Great video and I understood your procedure perfectly.
    I just believe that, in the process of shifting 0.5 to the left to redo the bootstrap taking the mean to 0, it would not be strictly necessary (except for ease of understanding)
    I think that instead of redoing the shifted bootstrap, it would be enough in the original bootstrap to take the probability of above 1.0 plus the probability below 0.0.
    In the original boostrap this would correspond respectively to get the probability above 0.5 and below -0.5, after shifting 0.5 to the left.
    Am I wrong?
    Another point is that at 4:11, the probability above 0.5 was 48%, but at 5:04 to get the p-value you used 47%.

  • @petercourt
    @petercourt 3 роки тому +7

    Awesome video Josh! Really well explained, as usual. I was curious as to how the data is shifted (e.g. what function is applied) so that you can get from your original mean, to a mean of zero. Otherwise I think I understood everything!

    • @statquest
      @statquest  3 роки тому +6

      BAM! :) We just subtract the original mean value from all of the original values to shift the data.

    • @petercourt
      @petercourt 3 роки тому +1

      @@statquest Haha, I should've thought of that! Thanks Josh!

  • @ismailalkhalaf6061
    @ismailalkhalaf6061 3 місяці тому +1

    great Video!! thank you so much. 🌻🌻
    would you please make some other videos about Wild Bootstrapping?

    • @statquest
      @statquest  3 місяці тому

      I'll keep that in mind.

  • @rissalhedna5534
    @rissalhedna5534 8 місяців тому +1

    Amazing video as usual! I just was wondering why the value of 0.05 was used as a threshold for the p-value. Was it just arbitrarily set or did we assume that it was meaningful for our experiment with the drug ?

    • @statquest
      @statquest  8 місяців тому

      I explain p-value thresholds here: ua-cam.com/video/vemZtEM63GY/v-deo.html

  • @cjh4467
    @cjh4467 3 роки тому +3

    Why don't people just use bootstrapping for everything instead of worrying about robust standard errors and other types of similar concerns?

    • @statquest
      @statquest  3 роки тому +3

      It's a good question. The answer, I believe, is "power". Bootstrapping works in all kinds of situations, but (I believe) it has less power than parametric methods.

    • @cjh4467
      @cjh4467 3 роки тому +1

      @@statquest Thank you!

    • @SunSan1989
      @SunSan1989 Рік тому +3

      @@statquest That's a really good question, dear Josh, can you make a video about the differences in power? Thank you for the tutorial.I appreciate it very much.

  • @jamesstrickland833
    @jamesstrickland833 2 роки тому +2

    Must we always consider both tails when calculating a pvalue from bootstrapping? Had we looked at the medians and only considered the right tail that would have been significant (@.05) to reject Ho. Or did we assume that Ha was not equal to zero and therefore a two tail test?

    • @statquest
      @statquest  2 роки тому +3

      You don't always need to use two-tailed p-values. However, I think it is almost always a mistake to not use two-tailed p-values. Not once in my career as a biostatistician did I use a single sided test. If you want to know why, see: ua-cam.com/video/JQc3yx0-Q9E/v-deo.html

  • @PsyK0man
    @PsyK0man 3 роки тому +4

    clarification needed: to fail to reject the hypothesis that the drug has 0 effect, does it means that we don't reject the null hypothesis and this mean that the experiment is not statistically significant ? does this therefore mean that we cannot conclude whenever the drug is effective or not? or that the drug is not effective?

    • @v0ldelord
      @v0ldelord 3 роки тому +4

      It means that we do not have enough evidence to exclude that the drug has no effect. Or in other words we can't conclude that the drug is effective.

    • @statquest
      @statquest  3 роки тому +6

      @@v0ldelord BAM! :)

    • @statquest
      @statquest  3 роки тому +2

      To learn more about hypothesis testing, check out ua-cam.com/video/0oc49DyA3hU/v-deo.html

  • @thegimel
    @thegimel 3 роки тому +1

    It sounds like calculating p-values from bootstrapping can lend itself to p-hacking, if you find "the right" statistic that does lead to rejecting the null hypothesis because of some reason (e.g. being more or less sensitive to outliers). What do you think?

    • @statquest
      @statquest  3 роки тому +1

      That's why for everything in statistics, you plan what you are going to do (what metric you are going to use etc.) before collecting data.

  • @mattsmith4027
    @mattsmith4027 9 місяців тому +2

    Literal black magic.
    Cheers so much for making this I had some data that was a pain in the butt to get and Im trying to pull all I can out of it, this really helped!

  • @PunmasterSTP
    @PunmasterSTP 6 місяців тому +1

    Q: What's the significance of a urine test?
    A: The p-value!

    • @statquest
      @statquest  6 місяців тому +1

      Ugh! ;)

    • @PunmasterSTP
      @PunmasterSTP 6 місяців тому

      @@statquest Q: What do claims adjusters use to estimate hail damage?
      A: Confi-dents intervals.

  • @marcingrzebalski103
    @marcingrzebalski103 2 місяці тому +1

    Thanks for your work again. Another question that comes to my mind is:😊 Does testing the null hypothesis always involve centering the dataset so that the mean is zero? I mean there are common cases in real world when mean/any statistics we use is not equal to 0 when there is no effect/no differences between means?
    Also, does t test function in R or python also use shifting dataset so that it has mean equal to 0/ centered on null hypothesis and repeat that bootstrap like simulations many times to calculate p value? Or it uses different method?

    • @statquest
      @statquest  2 місяці тому +1

      You don't have to center the data to get the p-value - it just makes it easier to visualize and interpret.

    • @marcingrzebalski103
      @marcingrzebalski103 2 місяці тому

      @@statquest so technically and generally in statistics so as in R/python functions, calculating P value (which of course involves null hypothesis testing) does not obligatory use this data centering to 0 "in its mechanism" and calculating it using bootstrap method is just a method not involved in default r/python functions for t tests or linear models? Or whether the classic P value calculating also uses bootstrap?

    • @statquest
      @statquest  2 місяці тому +1

      @@marcingrzebalski103 that's correct and analytical methods do not use bootstrapping.

  • @zivot6822
    @zivot6822 Рік тому +3

    You just saved my work report, keep it up man.

  • @dbuezas
    @dbuezas Рік тому +1

    Can you please meet 3blue1brown?
    If you two would do something together it would surely be glorious

    • @statquest
      @statquest  Рік тому

      That would be a dream come true. I wonder what the best way would be to introduce myself.

    • @dbuezas
      @dbuezas Рік тому +1

      @@statquest does asking your crowd to spam his comment section go against youtuber's etiquette? 😁

    • @statquest
      @statquest  Рік тому +1

      @@dbuezas I bet. Maybe we can find another way. I'll do what I can.

  • @jiangshaowen1149
    @jiangshaowen1149 3 роки тому +1

    Hi Josh, May I know the reason why p value is calculated of two-sided?

    • @statquest
      @statquest  3 роки тому

      Because 99 times out of a 100 you always want a two-sided p-value. For details, see: ua-cam.com/video/JQc3yx0-Q9E/v-deo.html

  • @EdoardoMarcora
    @EdoardoMarcora 3 роки тому +1

    Wouldn't shifting the bootstrap distribution that was obtained from the original sample data be basically equivalent (for the purpose of calculating a pvalue) to the bootstrap null distribution?

  • @maxyen9892
    @maxyen9892 3 місяці тому +1

    I really appreciate how you start off with a simple application example and then you build up from there with explanations and real time drawings. Lots of times when I read about concepts, they start more abstract or from theory, and that makes it less intuitive.

    • @statquest
      @statquest  3 місяці тому

      Thank you! I'm glad you appreciate my style.

  • @moali001
    @moali001 3 роки тому +2

    Damn, that's some good quality here ! hope to see more videos !

  • @PeteHwang
    @PeteHwang 5 місяців тому

    Hi Josh, thank you for the great video. I had a question at 4:57. Why do you look at the probabilities of observing means ≤ or ≥ ±0.5 in the bootstrap distribution?

    • @statquest
      @statquest  4 місяці тому

      Are already familiar with p-values? If not, check out these two videos: ua-cam.com/video/vemZtEM63GY/v-deo.html and ua-cam.com/video/JQc3yx0-Q9E/v-deo.html I believe those will answer your question.

  • @summerai8724
    @summerai8724 Рік тому +1

    Thanks a lot on the explanation. I was confused on how to create a simulated distribution for calculating p-value and this video explains really well. Shifting the data to a mean of zero before resampling is the key!

  • @zhefeng4949
    @zhefeng4949 День тому

    hi josh thanks as always! @5:03, Probability of observing a Bootstrapped mean >=0.5 is 0.48 not 0.47 according to the previous cal, maybe?

  • @finanzassainz4013
    @finanzassainz4013 Рік тому +1

    OMG, looked like too complicated learn about this topics, however, you make so easy

  • @juanete69
    @juanete69 2 роки тому

    I don't understand why you use the shifted data to perform the bootstrap. What if you don't "know" the null hypothesis but just your sample?

    • @statquest
      @statquest  2 роки тому +1

      You don't have to shift the data, it just makes the math easier.

  • @PastryDonut
    @PastryDonut 2 роки тому +2

    I'm just following your Fundamentals playlist in order. My first encounter with statistics ever. Thank you so much for putting it together!! Can you recommend any collection of beginner stat problems to practice on? It would help to learn tremendously.

    • @PastryDonut
      @PastryDonut 2 роки тому

      Also thank you for stripping away most of the terminology! Can't imagine learning this from a regular lecture or a texbook ugh

    • @statquest
      @statquest  2 роки тому +1

      I'm glad you are enjoying the video. I have a few "beginner" stat problems here statquest.org/video-index/ (just search for "StatTest")

    • @PastryDonut
      @PastryDonut 2 роки тому +1

      @@statquest Awesomeness, thank you!

  • @이지연-r1q
    @이지연-r1q 4 місяці тому +1

    Thank u for your knowledge sharing. This video is helpful for me.

    • @statquest
      @statquest  4 місяці тому

      Glad it was helpful!

  • @FedericoMerlo-tx2uq
    @FedericoMerlo-tx2uq 3 місяці тому

    Great video, as always. I admire so much your work, your knowledge and your ability to make concept understandable.
    What about if our interest is compare some statistics between two different groups?
    For example, the mean difference between two groups:
    - Calcolate the difference of two group means
    - Bootstrapping each group by itself
    - Calculate the bootstrap mean difference and subtract the observed mean difference
    - Repeat to obtain the bootstrap mean difference under null hypotesis of no mean difference?
    Could make sense?
    Thank you so much

    • @statquest
      @statquest  3 місяці тому

      Here's a discussion on how to use the bootstrap to compare two means: stats.stackexchange.com/questions/92542/how-to-perform-a-bootstrap-test-to-compare-the-means-of-two-samples

  • @PuneetMehra
    @PuneetMehra 3 місяці тому

    1:54 - Since 95% CI includes 0, we cant reject null hypothesis (drug not working). Why? What has inclusion of 0 in the CI to do with null hypothesis rejection? I am confused.
    Ps: I have studied all previous videos.

    • @statquest
      @statquest  3 місяці тому +1

      When the confidence interval contains 0, then we can't be confident that the true value is not 0, even though our estimate is not 0. In other words, there is enough variation in the data that we can't have a lot of confidence in the estimate we made with it.

  • @jasd100
    @jasd100 2 роки тому +1

    My brother thought I was watching Blue's Clues, but stats edition

  • @mata21pf75
    @mata21pf75 23 дні тому

    Hey Josh, when counting p.value for medians you are assuming that they come from normal distribution (or at least simetrical around the 0)? If do, why?

    • @statquest
      @statquest  22 дні тому

      No, I don't make any assumptions about the bootstrapped distribution.

  • @majidzare9800
    @majidzare9800 8 днів тому

    Thanks a lot for the fantastic video. I have one question: if the test statistic is something other than mean, for example, if we want to see the slope of a trend analysis passes 0 or not, how can we scale the dataset so that it represents the null hypothesis? In the video, we simply shift samples 0.5 units to the left because we are bootstrapping for mean values. But if we have some samples that we want to bootstrap the slope of the trend, should we de-trend the data first and them bootstrap them? Thanks a lot

    • @statquest
      @statquest  8 днів тому +1

      In this video, I shifted the values just because it made the math more obvious. However, you don't need to do that, and you can just calculate the p-value of whatever value you want with the raw (unshifted) histogram. Or you can create a confidence interval.

  • @kanikabagree1084
    @kanikabagree1084 3 роки тому +2

    This channel deserves atleast a million subscribers!

  • @lbb2rfarangkiinok
    @lbb2rfarangkiinok 3 роки тому +1

    the jingles are off the chain

  • @bLuemaNMKO
    @bLuemaNMKO 3 роки тому +3

    your work is amazing

  • @yazanal-shoushie9929
    @yazanal-shoushie9929 2 роки тому +1

    بحبك

  • @juanete69
    @juanete69 7 місяців тому

    How do you use bootstrapping when you have several variable? For example for a regression model.
    How would you use it to test the standard deviation?

    • @statquest
      @statquest  7 місяців тому

      See: www.sciencedirect.com/science/article/abs/pii/S0167715217303450

  • @goonerrn
    @goonerrn 3 роки тому +1

    Josh just made the part2 so he could sing "part 2... calculate p value" this is gem!

  • @bobiq
    @bobiq Рік тому

    We fail to reject the hypothesis that the drug makes no difference. - a triple negation in one sentence is what makes statistics such a mind-bending exercise. Why can't this be expressed more easily?

    • @statquest
      @statquest  Рік тому

      Good point! Yes, classical statistics lends itself to a lot of awkward wording. Bayesian statistics attempts to make the language easier - and one of the ideas in this video, using computers to generate a lot of data, is a big step towards getting there.

  • @l.josephineandresen610
    @l.josephineandresen610 3 роки тому +1

    Thanks so much! These videos are really great. I was wondering if you will make one on Mixed ANOVAs? :-) Your explanations really help to understand the concepts quickly.

  • @АртемВасенков-е3ч

    It's very easy to understand! Super explanation
    Thank you

  • @DrThalesAlexandre
    @DrThalesAlexandre 5 місяців тому

    Amazing video!
    Any ideas on how to make bootstrapping run faster on python? It starts lagging once you are doing > 10^5 trials with large sample sizes.

    • @statquest
      @statquest  5 місяців тому +1

      Good question...I'm not really sure, but with a large sample size, you might be able to get away with doing less bootstrapping.

    • @DrThalesAlexandre
      @DrThalesAlexandre 3 місяці тому

      @@statquest Thanks! There is probably some library that does this efficiently. I was just curious about how one could be implemented, but it something that can be learned at another point in time.

  • @ChanukaFernando12129
    @ChanukaFernando12129 28 днів тому +1

    Simply awesome !

  • @HannahMeaney
    @HannahMeaney Рік тому

    I dont understand how you got the actual p-value number? for example the p-value of 0.47 - how was that calculated?

    • @statquest
      @statquest  Рік тому

      First off, the p-value is not 0.47, so that might be part of the problem. At 3:29 we have a histogram that tells us what would happen if the null hypothesis was true. Then at 3:36 we can calculate the percentage of means that were between -0.5 and 0.5 (this is just the number of means that we calculated that fell between -0.5 and 0.5 divided by the total number of means). This percentage was 36%, which also tells us that the probability of observing a mean between -0.5 and 0.5 is 0.36. Likewise, we then calculate the probability of observing a mean = 0.5 + the probability of observing a mean

  • @alinaastakhova8412
    @alinaastakhova8412 4 місяці тому

    Thank You for amazing explanation, still I am a little confused. On 4.10 of the video You have the probabilty for a mean >=0.5 as 0.48 and on 5.02 of the video the probability for a mean >= 0.5 becomes 0.47... How is that? And for the median - how do You get the probability for a median >= 1.8 as 0.01. How is that calculate once the bootstrapped distributions for medians does not go beyond ~ 0.5 units? Isn't the calculated probability simply a portion of the distribution beyond the given value (like 1.8 for the median in our example)? What do I miss?

    • @statquest
      @statquest  4 місяці тому

      1) That's just a minor typo.
      2) We count the number of bootstrapped generated medians >= 1.8 and divide by the total number of bootstrapped generated medians.

    • @alinaastakhova8412
      @alinaastakhova8412 4 місяці тому +1

      ​@@statquest Thanks!

  • @tejasbhagwat877
    @tejasbhagwat877 3 роки тому +1

    Hi Josh, Big fan of your videos (and merchandise)! They are incredibly helpful :)
    Could you please also do a series on running models in Bayesian framework?

    • @statquest
      @statquest  3 роки тому

      Yes, that's a plan.

    • @tejasbhagwat877
      @tejasbhagwat877 3 роки тому +1

      @@statquest That would be a TRIPLE BAM! Looking forward :)

  • @AkashSiddabattula
    @AkashSiddabattula 9 місяців тому

    please reply!!
    when you were calculating the p value i think we were supposed to find the p value supporting the null hypothesis and if that value is less than 0.05 we can reject the null hypothesis, but here you were calculating the p value of observing mean value of 0.5 or something more extreme and i think this is not supposed to be null hypothesis, then if we get a p value of greater than 0.05 of observing the mean >=0.5 that means often we will get mean >= 0.5 which means drug is having some effect. This is what i understood can u explain?

    • @statquest
      @statquest  9 місяців тому +1

      In this video, the null hypothesis is that, on average, the drug has no effect (average effect = 0). We then use bootstrapping to calculate a p-value for this null hypothesis and we get 0.63, so we fail to reject the null hypothesis that the drug has no effect. In other words, there's a high likelihood that any random set of 8 people that have the disease will have, on average, an effect = 0.5.

    • @AkashSiddabattula
      @AkashSiddabattula 9 місяців тому +1

      Thank you so much

  • @streetsmart5033
    @streetsmart5033 3 роки тому +1

    Sir,please explain the convolutional neural networks I'm eagerly waiting for your way of explanation

    • @statquest
      @statquest  3 роки тому

      I've already done that, see: ua-cam.com/video/CqOfi41LfDw/v-deo.html For a complete list of all of my videos, see: statquest.org/video-index/

    • @streetsmart5033
      @streetsmart5033 3 роки тому

      @@statquest yes sir,thank you for reply but in that playlist there is no CNN and RNN.

  • @XoCortanaXo
    @XoCortanaXo Рік тому +1

    This is exactly what I was looking for, thank you!

  • @saeidsas2113
    @saeidsas2113 4 місяці тому +1

    I finally did it for my real problem case.

  • @thbdf3879
    @thbdf3879 3 роки тому +1

    I wish I could see this video earlier before my exam

  • @unlearningcommunism4742
    @unlearningcommunism4742 Рік тому

    I gave it a try today. It's still not working / returning what I want it to return.

  • @joeguerriero3841
    @joeguerriero3841 7 місяців тому

    but how would you do this for a test statistic (like a correlation coefficient), where creating a "null data set" from which to resample is not as straightforward as just mean-centering the data?

    • @statquest
      @statquest  7 місяців тому

      See: www.sciencedirect.com/science/article/abs/pii/S0167715217303450

    • @joeguerriero3841
      @joeguerriero3841 7 місяців тому +1

      TRIPLE BAM!!@@statquest

  • @LittleLightCZ
    @LittleLightCZ 2 роки тому

    The main question is, how is it different from simply running a t.test to see if the mean equals to 0 or not? Is there anything that bootstrapping adds to it? Originally I thought that bootstrapping might help for example to get tighter confidence intervals without the need to take more sample data in the field, but according to my tests which I made with boot library, the confidence intervals from the bootstrapped data are basically the same as the ones computed from the original data. Well, when I call boot.ci() they tend to be a little bit tighter, but I think it's because the t.test computation is probably a little more conservative (I guess).

    • @statquest
      @statquest  2 роки тому +1

      The purpose of bootstrapping isn't to replace a t-test, or any other known statistical test. Those known tests will always perform better because they make assumptions about the data that bootstrapping does not, and that results in them having an edge. However, the magic with bootstrapping is that it can be used to calculate p-values or confidence intervals in any situation - including those that are not appropriate for t-tests or any other known test. For example, with bootstrapping we can compare medians or modes instead of means, and you can't do that with a t-test.

  • @redcat7467
    @redcat7467 2 роки тому +1

    That was a bam with different statistics.

  • @engr.majidkaleem8810
    @engr.majidkaleem8810 Рік тому

    Could you please upload 5 unavailable hidden videos?

  • @jeffz7310
    @jeffz7310 2 роки тому

    where did the 0.05 come from at 5:33 ? thank you

    • @statquest
      @statquest  2 роки тому

      0.05 is the standard threshold for hypothesis testing. For details, see: ua-cam.com/video/vemZtEM63GY/v-deo.html

  • @caroldanvers6306
    @caroldanvers6306 Рік тому

    Great video and helpful examples! What do you do when you're testing the median (with HO: median = 0; HA: median not 0), and the observed median is 0? As there is no shift, I'm thinking the p-value is 1.000 (as all of the bootstrapped medians are either >=0 or =

  • @Daniel88santos
    @Daniel88santos Рік тому

    Great video! Is this the working principles of "Particle Filters"/"Sequential Monte Carlo"?

    • @statquest
      @statquest  Рік тому

      I have no idea. I've never heard of those things before. :(

  • @ModernTolkien143
    @ModernTolkien143 2 роки тому

    Hey Josh, thanks for this awesome video!!
    Do you know of any reference (paper, handbook chapter etc.) that shows the asymptotic validity of the approach you are using?
    Best, Sebastian

    • @statquest
      @statquest  2 роки тому

      Here's a great place to start if you want to learn more details: en.wikipedia.org/wiki/Bootstrapping_(statistics)

  • @yongkailiu1448
    @yongkailiu1448 Рік тому

    make another video talking one-sided test?

    • @statquest
      @statquest  Рік тому

      You can just multiply the p-value by 2.

  • @zerocoll20
    @zerocoll20 3 роки тому

    There's anyway to know how good this method is? I mean, comparing resampling with actual knew statistics?

    • @statquest
      @statquest  3 роки тому +1

      Yes, the same theory that we use to trust "normal" statistics (like t-tests and what not) also applies to bootstrapping. In other words, the theory that allows you to put trust into a t-test also suggests we should put trust in bootstrapping.

  • @ilusoeseconomicas2371
    @ilusoeseconomicas2371 Рік тому

    There is no reason to subtract the mean of the distribution before bootstrapping and then adding it later. Just bootstrap the original data and see where the original mean is in the generated histogram.

    • @statquest
      @statquest  Рік тому

      I shifted the data because the null hypothesis is that the "true mean" is 0 and it's helpful to see how the distribution would be distributed around 0 in that case.

  • @СеменЕлисеев-б1с

    Hi Josh!
    How do we calculate critical value of statistic in this case?

    • @statquest
      @statquest  Рік тому

      If, for example, alpha = 0.05, then you can incrementally add the tails of the histogram together until you get 0.05. The last parts of this histogram added define the critical values.

    • @СеменЕлисеев-б1с
      @СеменЕлисеев-б1с Рік тому +1

      ​@@statquest Got it! Thank you!

  • @joxa6119
    @joxa6119 2 роки тому

    So what happened exactly when we shift the data (so the mean will be 0)? Any formula for the data shift?

  • @gardaramadhito1650
    @gardaramadhito1650 2 роки тому

    Isn’t this just randomization inference and you’re testing the sharp null hypothesis?

    • @statquest
      @statquest  2 роки тому

      I believe they are different: jasonkerwin.com/nonparibus/2017/09/25/randomization-inference-vs-bootstrapping-p-values/

  • @xinlu82
    @xinlu82 2 роки тому

    Thanks a lot. Really nice video. I have a question about the number of replicates when doing the bootstrapping. Is this related to the sample size?

    • @statquest
      @statquest  2 роки тому +1

      In a small way it is dependent on the sample size (if the sample size is small, there are only so many different bootstrapped samples you can create).

  • @acc3095
    @acc3095 2 роки тому

    Is there a minimum sample size needed for bootstrap to be valid?

    • @statquest
      @statquest  2 роки тому

      I think 8 might be a good starting point.

  • @DeepROde
    @DeepROde 2 роки тому

    Hey, your videos are a treasure! I had a doubt, at 6:18, the histogram of median doesn't look bell-shaped. This made me wonder whether the distribution of medians would be Normal (like distribution of means) or not, could you please let us know?

    • @statquest
      @statquest  2 роки тому +1

      The distribution of medians is not normally normal.

  • @SunSan1989
    @SunSan1989 Рік тому

    Perhaps because of the different ways of thinking between East and West, as an Asian I find it easier to understand not to switch to a mean of zero and use the drug no effect as -0.5, but to do so is somewhat inconsistent with the null hypothesis method,good tutorial.
    There is another problem, that is, the example of 0.36 probability and the probability of less than -0.5 is 0.16 and the probability of greater than 0.5 is 0.47, which seems to be a bit contradictory to do bootstrapping on the basis of the null hypothesis. If the bootsrtrapping times are enough, shouldn't the probability of less than 0.5 and greater than 0.5 be equal?

    • @statquest
      @statquest  Рік тому

      What time point, minutes and seconds, are you asking about?

    • @SunSan1989
      @SunSan1989 Рік тому

      Dear Josh,time point is 4:07, the probability of less than or equal to-5 is 0.16, greater than or equal to 5 is 0.48 in time pint4:10. Is this probability a reasonable example? If bootstrapping enough times, shouldn't 0.16 be equal to 0.48?
      In addition, why can't the paper version of the book be sent to China? I bought it in Japan and transferred it from Japan to China.@@statquest

    • @statquest
      @statquest  Рік тому

      @@SunSan1989 My guess is that they will probably meet in the middle. As for my book, there should be a Chinese version (and translation) available in the next year. People are working on it.

    • @SunSan1989
      @SunSan1989 Рік тому

      Sorry, since my English is not very good, I want to confirm that my understanding 0.16 should be replaced with the same value as 0.48. Is this understanding correct? @@statquest

    • @statquest
      @statquest  Рік тому

      @@SunSan1989 No, I'm not sure what the value will be, but the sum will probably still add up to something close to 0.63

  • @alputkuiyidilli
    @alputkuiyidilli 3 роки тому

    1) Make a bootstrapped Dataset
    2) Calculate a statistic
    3)???
    4) Profit.

  • @גיאחדד-ר1ר
    @גיאחדד-ר1ר 2 роки тому

    Thank you!
    Why you calculate +-0.5 in the histogram and not only 0-0.5?

    • @statquest
      @statquest  2 роки тому

      What time point in the video, minutes and seconds, are you asking about?

  • @themoan
    @themoan 3 роки тому

    Hi Josh, do you have to make assumptions about normality of the data? Or does bootstrapping work for parametric and non parametric cases (because of the central limit theorem)? Thank you for another informative video!

    • @statquest
      @statquest  3 роки тому

      Bootstrapping makes no assumptions about the data.

  • @yurnero07
    @yurnero07 12 днів тому

    I'm sorry, where did 0.05 come from, in 5:29?

    • @statquest
      @statquest  11 днів тому +1

      0.05 is the standard threshold that we use when we try to understand if a p-value is significant or not. Values less than 0.05 have "statistical significance" and values larger than 0.05 don't.For more details, see: ua-cam.com/video/vemZtEM63GY/v-deo.html

    • @yurnero07
      @yurnero07 10 днів тому +1

      @@statquest Bam :)

  • @rupiyaldekai6136
    @rupiyaldekai6136 3 роки тому

    can you do pytorch implementation for ann.and fuzzy systems.please sir

    • @statquest
      @statquest  3 роки тому

      I'll keep that in mind.

  • @alikoushki6483
    @alikoushki6483 Рік тому +1

    Great video, thanks

  • @shivverma1459
    @shivverma1459 3 роки тому

    lets says we dont see the p values and see that the 95% confidence interval is crossing 0 at 5:41 then cant we say that the majority of means are crossing 0 therefore drug has been helping in the recovery instead of having no effect. I mean, with confidence interval point of view.

    • @statquest
      @statquest  3 роки тому +1

      This example is not great for discussing CIs because we shifted the data to be centered on 0. If we wanted to calculate a CI, we would do this: ua-cam.com/video/Xz0x-8-cgaQ/v-deo.html

    • @shivverma1459
      @shivverma1459 3 роки тому

      @@statquest ohkk thanks bam!

  • @accountname1047
    @accountname1047 3 роки тому +1

    ah the elusive triple bam

  • @Julsten3107
    @Julsten3107 3 роки тому

    Hey Josh, thanks for this comprehensive explanation!
    I'm a bit confused why you need to add values greater than and equal to 0.5 but also values less than and equal to -0.5 for the p-value? Why can't I just look at values >=0.5?

    • @HankGussman
      @HankGussman 3 роки тому

      It is 0.05 actually. To reject the null hypothesis, observe results must be rare. Such that the probability of observing such results is

    • @statquest
      @statquest  3 роки тому +1

      In this video we calculate a two-sided p-value and I describe these, and the reasons for them, extensively in this StatQuest on p-values: ua-cam.com/video/JQc3yx0-Q9E/v-deo.html

  • @KirillBezzubkine
    @KirillBezzubkine 3 роки тому +1

    Good bless you mister

  • @mikelmenaba
    @mikelmenaba Рік тому +1

    Great video mate

  • @frashertseng9426
    @frashertseng9426 3 роки тому

    Thank you the awesome video, 1) how does this apply to compare means from two different group (ctrl/test)? 2) What if my measure is proportion (%), how can we apply this method?

    • @statquest
      @statquest  3 роки тому

      1) see: stats.stackexchange.com/questions/128694/bootstrap-two-sample-t-test
      2) see: online.stat.psu.edu/stat200/lesson/4/4.3/4.3.1

    • @frashertseng9426
      @frashertseng9426 3 роки тому +1

      @@statquest Thank you Josh!!

  • @mikhaeldito
    @mikhaeldito 2 роки тому

    When to use permutation over bootstrap (and the other way around) to calculate P-values?

    • @statquest
      @statquest  2 роки тому

      If you have a relatively small dataset, you can use permutation. If it's relatively large, then you can use bootstrap.

    • @mikhaeldito
      @mikhaeldito 2 роки тому +1

      @@statquest BAM!

  • @alexvass
    @alexvass Рік тому +1

    Thanks

    • @statquest
      @statquest  Рік тому

      BAM! Thank you so much for contributing to StatQuest!!!

  • @mohamedsase7250
    @mohamedsase7250 2 роки тому

    Can we use bootsrap to calculate confidence interval (%) for conditional event element like cross-tab element and how? Thank you

    • @statquest
      @statquest  2 роки тому

      Probably, but I don't know what a cross-tab element is so it would be better to get someone else to answer.

    • @mohamedsase7250
      @mohamedsase7250 2 роки тому

      @@statquest cross-tab actually who use spss know it
      It is cross table like cross two variables such as as gender and healthy (yes or no), you will end with 4 group, i want to know if i can consider the each group as independent group and calculate CI as normal

    • @mohamedsase7250
      @mohamedsase7250 2 роки тому

      Note: i have searched on the answer from months, thank you alot

  • @shirleygui6533
    @shirleygui6533 2 роки тому +1

    awesome!

  • @DrMcZombie
    @DrMcZombie 3 роки тому

    Hi Josh and thank's for the overview. I have been using bootstrapping for quite some time now, but not to look at p-values for just one data set. What you describe is---more or less---a different kind of t-test, right?
    I am using bootstrapping for determining confidence intervals, but also to compare two datasets, e.g., I use two models to predict data and compare the models' performance with bootstrapping.
    For example, is the root-mean-squared prediction error (RMSE) larger in data set A in comparison to data set B?
    When repeating this (e.g.) 1000 times, each time comparing the RMSEs, I get a p-value from these comparisons.
    --> Model A performed better than model B in 990 of 1000 comparisons --> p = 0.99 (or 0.01)
    I hope this was understandable.
    What are your thoughts on this application of bootstrapping?

    • @statquest
      @statquest  3 роки тому

      This example is like a one-sample t-test (without having to refer to the t-distribution). Your experiment is a little confusing. You have data sets A and B and also models A and B, so I don't know what you are comparing.

    • @DrMcZombie
      @DrMcZombie 3 роки тому

      ​@@statquest Thanks, and I try to explain a bit more: I have data that I measured (in my case those are Speech Recognition Thresholds, i.e., the signal to noise ration at which 50 % of spoken words can be understood in a noisy environment, I hope this is not getting to abstract). I want to simulate this data with different models and I want to determine which model is better (e.g. model A and model B).
      To figure out, which model is better, I create a bootstrapped data set of the measured data and calculate the RMSE for both model simulations. Let's say, the RMSE for the bootstrapped data set of model A is 1 and of model B it is 2. I compare these values and count how often the RMSE of model A was lower than the RMSE of model B:
      --> For this first comparison, I count 1.
      Second run: RMSE of model A is 1.5, RMSE of model B is 1.4
      --> I do not count this (1 of 2 comparisons indicate that the RMSE of model A is lower than the RMSE of model B)
      When repeating this procedure 1000 times, 990 of the comparisons showed that model A has a lower RMSE, and in 10 comparisons model B had a lower RMSE.
      I consider this to yield a p value of 0.99 (which is effectively an p value of 0.01).
      I hope you find this interesting, and I would be happy to get your thoughts on this application of bootstrapping.

    • @statquest
      @statquest  3 роки тому +2

      @@DrMcZombie You've calculated a probability, which is part of a p-value, but not a p-value. A p-value is the probability of the observed result or data plus the probabilities of all results that are more extreme. For details, see: ua-cam.com/video/JQc3yx0-Q9E/v-deo.html
      So, here's what you should do (or consider doing):
      0) The null hypothesis is that there is no difference between models A and B. This means that we would expect the difference in RMSE to be 0 between models A and B.
      1) Bootstrap your data, run it through your models and make a histogram of differences in RMSE.
      2) Draw a 95% CI between the 2.5% quantile and the 97.5% quantile of that histogram
      3) Does that CI include 0? If so, fail to reject the hypothesis that models A and B are the same. If not, reject the hypothesis that models A and B are the same. Bam.

    • @DrMcZombie
      @DrMcZombie 3 роки тому

      @StatQuest with Josh Starmer Thank you for your reply, and I also see the point that you make. But just to clarify: Wouldn't this boil down to the "counting the comparisons approach"? (not with regard to the p-value, but just for failing to reject the null hypothesis)
      When 10 of 1000 comparisons (1%) showed, that model A had a lower RMSE than model B, then the 95%-CI of the histogram of differences between the models would not include 0.
      The CI would include 0 when 25 or more of 1000 comparisons (i.e. more than 2.5 % of the comparisons) would show that model A has a lower RMSE than model B.
      Anyway, thank's and I am looking forward to more of your great videos.
      --> octave code example (e.g. use octave-online.net/):
      % let's assume A and B are the RMSEs of two models.
      % H1: A is significantly different from B (0 not in 95%-CI of the difference histogram)
      % H0: A and B are the same (0 in 95%-CI)
      A = randn(10000,1) + 3; % random numbers, mean = 3; std = 1;
      B = randn(10000,1); % same, but mean = 0;
      hist(A-B); % draw histogram
      comparisons = sum(B > A) / numel(B);
      CI = quantile(A-B,[0.025 0.975]);
      printf('comparisons: %1.3f ; CI: [%1.3f %1.3f]
      ', comparisons, CI);
      % when CI does not include 0 --> H0 rejected, H1 true

  • @saeidsas2113
    @saeidsas2113 5 місяців тому

    Hi Josh, I have a question, how I can contact you and ask my question?

    • @statquest
      @statquest  5 місяців тому

      If you have a question about my videos, the best place to ask it is right here, in the comments.

    • @saeidsas2113
      @saeidsas2113 5 місяців тому

      @@statquest Yes, but I need to write a bit of narrative to clarify my question related to Bootstrap but not particularly your nice video. I am a risk analyst working at a company and also doing my PhD in the field of actuarial science. We recently encountered an issue related to a model being used at the company.

    • @statquest
      @statquest  5 місяців тому

      @@saeidsas2113 Unfortunately I don't have time to do much consulting work. :(

    • @saeidsas2113
      @saeidsas2113 5 місяців тому

      @@statquest @statquest , If you do not mind I shoot my question here :) To begin with, I am a model validator, and one of our tasks is to ensure that a model works as expected and is fit for business purposes. To do so, back-testing is typically performed to check the model performance. In a nutshell and simple language, we have the following problem:
      A financial model generates thresholds at a confidence level of 90 percent. In order to check the model performance, it is important to count the number of defects over a given period which is usually 250 working days (i.e., one year). The defect is defined as below:
      A defect occurs if the relative market movement in 10 days is greater than the threshold, in other words:
      log(P_{t+10} /P_{t}) > v_t, where i = 1, 2, ..., 240 and P_{i} is the market price at time t and v_t stands for the thresholds comes out of the model. Note that the market movements are obtained on a rolling basis so we have overlapping intervals. If we believe that the model works good, then one can expect that the number of defects observed over 240 should be 2.4 ~ 3 violations because only at the confidence level 90 percent there is 10 percent chance for observing defects, i.e., 240*0.01 = 2.4.
      Now let's consider the test hypothesis that needs to be done in order to back-test the model:
      Null hypothesis: p = 0.01
      Alternative hypothesis: p > 0.01
      where p is the probability of defect. Under the null hypothesis, the model works as expected because the probability of defect is 1% which is acceptable at the confidence level of 90 percent. Here are the steps taken to back-test the model
      1) Compute the spread which is the difference between the market movement and threshold, i.e., Spread = log(P_{t+10} /P_{t}) - v_t
      2) Generate 1000 synthetic samples each with size 240 from the original spreads while preserving the dependency structure, for example, the Maximum Entropy Bootstrap approach is applied in this stage.
      3) Count the number of positive spreads (indicating defects) for each synthetically generated sample.
      4) Obtain the defect ratio for each synthetically generated smaple using (#defects)/240.
      5) Use the distribution of the generated defect ratios (i.e., the probability of defect) to find the p-value corresponding to the above hypothesis test. So, using p*_1, p*_2, ..., p*_1000 we calculate the following probability:
      p-value = P_H0( p > 0.01 ) that is approximated basedo the distribution of p*_1, p*_2, ..., p*_1000.
      My question: Here the quantity under consideration is the probability of a defect or we could consider the defect rate. If the observed defect rate in the original data set is greater or less than 0.01, then we need to apply a transformation, like what you did for mean where you shifted the data to get zero mean, to have ratio equal to 0.01 and then generate samples from spreads for which the defect ratio is 0.01 to compute the probability of being greater than 0;01 under the alternative hypothesis right?

    • @saeidsas2113
      @saeidsas2113 5 місяців тому

      @@statquest It is fine howevere I already asked my question and I think it is interesting to be taken into account. Feel free to answer it. Thank you for your time.

  • @chrislam1341
    @chrislam1341 2 роки тому

    I cannot understand why do we care about the region of -0.5..
    Given a data with mean 0.5 and variance v, how likely i see this data if the mean is 0. lets assume the data is from a normal distribution, N
    p-value = P(mean >= 0.5| N(0, v))
    if p-value reject H0
    if p-value > 0.05: it is likely that the H0 is true => cannot reject H0
    where is the role of -0.5 here?

    • @statquest
      @statquest  2 роки тому

      I almost always use two-sided p-values, and I explain the reasons here: ua-cam.com/video/JQc3yx0-Q9E/v-deo.html

  • @willw234
    @willw234 2 роки тому

    Thanks for the very clear and informative description of this. I have a question - whenever the absolute value of the mean/median/statistic-of-interest of the original data is greater than the absolute value calculated from the shifted data, the p-value will be zero. I have a large set of tests to run and would like to do an FDR correction on the resultant set of p-values, but a not-insignificant number of them are zero. Is this still a legitimate thing to do?

    • @statquest
      @statquest  2 роки тому

      I'm not sure I understand your problem because each time you calculate a p-value you have to calculate the bootstrapped statistic. Are you saying that when the absolute value for every single bootstrapped statistic (and there should be > 10,000 of them) is > then the original statistic, the p-value is 0? Well... if that is the case, all 10,000 bootstrapped statistics are way far away from 0, then the p-value should be 0.

    • @willw234
      @willw234 2 роки тому

      @@statquest Sorry, I probably didn't explain very well. For the shifted data, the largest possible mean of a bootstrap resample is just the largest value in the shifted data (which happens when it is chosen for every element of a resample). When the mean of the original unshifted data is larger than this, the p-value will be zero, regardless of the number of bootstrap resamples carried out. But this does not distinguish between cases when it is just a little bit larger, or very much larger. So if I have a lot of tests on independent data sets, I am concerned that the 'zero p-vaue' ones will be treated identically by the FDR procedure, when perhaps they shouldn't be??

    • @statquest
      @statquest  2 роки тому

      @@willw234 Since you are just testing the mean, you might consider just using a one sample t-test. Then you're p-values will be more spread out.

    • @willw234
      @willw234 2 роки тому +1

      @@statquest I will do that. I was just hoping to use the bootstrap so I could use the median instead of the mean. (btw I recently purchased your book on ML - very helpful, thank you!)

    • @statquest
      @statquest  2 роки тому

      @@willw234 Awesome! Thank you!

  • @drachenschlachter6946
    @drachenschlachter6946 Рік тому

    How do you shift the data?

    • @statquest
      @statquest  Рік тому

      At 2:29 I say that we shift the data to the left by 0.5 units (where 0.5 is the mean of the data). That means we subtract 0.5 from each value in the dataset.

    • @drachenschlachter6946
      @drachenschlachter6946 Рік тому

      @@statquest but why Josh? If you have the bootstrap distribution and you calculate the 95% confidence interval you can say if the hypothesis can be rejected or not? If 0 is in than it can't be rejected. So why shift the data it doesn't matter?

    • @statquest
      @statquest  Рік тому

      @@drachenschlachter6946 Because this video is talking about how to calculate p-values, not confidence intervals. The first bootstrapping video describes confidence intervals (and does not require shifting the data): ua-cam.com/video/Xz0x-8-cgaQ/v-deo.html

  • @marcoventura9451
    @marcoventura9451 3 роки тому +1

    I wish I had more time for your videos. Non only they are high standard pieces of higher education but also a moment to relax and to enjoy the day.