Biostatistics By Design
Biostatistics By Design
  • 4
  • 43 371
Deflategate: The Statistical Perspective
Full disclosure: I’m a Patriots fan. I grew up in New England and I started watching NFL football right around the time Tom Brady became the starting quarterback for the Pats. We have been spoiled as fans to see our team go to the Super Bowl over and over again, taking home the trophy six times. But, I have to admit, the number of times the Patriots organization has been accused of cheating in some sort of fashion does taint the joy of winning a little bit (and to be clear, I do not support the Patriots cheating to win games). One of those instances was when the Patriots were accused of purposefully deflating their footballs during the AFC championship game against the Indianapolis Colts in the 2014 - 2015 playoff season, affectionately referred to as the Deflategate scandal (a play on words - remember Watergate?).
Anyhow, as a statistician, I have always wanted to take a look at the data from the allegedly deflated footballs myself. This video takes you step by step through my analysis. It also covers the topic of pseudoreplication, and shows you how to analyze data from a split-plot design in SAS.
Need statistical help? Schedule a consultation with me here: www.biostatisticsbydesign.com/request-a-consultation
Initial consultations are FREE!
Переглядів: 285

Відео

Assumptions for Statistical Analyses: Being NORMAL is Overrated!
Переглядів 1,4 тис.5 років тому
In this lesson I will explain why you don't have to be as concerned about the assumption of normality as you may believe. Short on time? The answer is the Central Limit Theorem. But, I encourage you to watch and let me know if you spot my stats cat. He may just have to star in an upcoming video! If you are interested in some easy SAS codes for tests for normality, check out my blog: www.biostat...
Statistical POWER and Power Analysis
Переглядів 11 тис.5 років тому
This video covers the types of errors you can commit when making conclusions about populations based on sample data (Type I and Type II errors), p-values, statistical power, and power analysis. To see an example of a power analysis for a study for which the data are analyzed using a two-sample t-test, check out my blog: www.biostatisticsbydesign.com/blog/2019/1/11/power-analysis-an-underutilize...
The Standard Deviation vs. the Standard Error
Переглядів 31 тис.5 років тому
Are you still confused about the difference between the standard deviation and the standard error? Let’s see if I can help clear that up! Wondering when you should report the standard deviation vs. the standard error in a publication? Check out this blog post to hear my answer: www.biostatisticsbydesign.com/blog/2019/1/5/when-to-report-the-standard-deviation-vs-the-standard-error

КОМЕНТАРІ

  • @kowtharhassan882
    @kowtharhassan882 10 місяців тому

    So can we say thatcSE is the SD of the means of samples around the mean of the means?

    • @biostatisticsbydesign
      @biostatisticsbydesign 10 місяців тому

      Yes!

    • @kowtharhassan882
      @kowtharhassan882 10 місяців тому

      @@biostatisticsbydesign thank u but if so then why do we need to find the CI from the Z tables? Why not take 2 SE to be the CI as 2 SE contain 95% of the means ie we are 95% sure that the true mean will lie within 2 SE.

  • @SaiKrishna-fz3fh
    @SaiKrishna-fz3fh Рік тому

    Its funny and I loved it. thanks for making it clear for me.

  • @கோபிசுதாகர்

    Old video, but really very interesting and such a good example! Thanks for the informative and fun video!

  • @googlesheetmyway7067
    @googlesheetmyway7067 Рік тому

    I understand the explanation but what I don't have any idea is that when I run a simple regression in R, it always comes up with SE. Refering to your explanation that SE is acquired by demeaning all sampe sets' standard deviation (in your example there are 4 sample sets N=5 N=10 etc), my question is, how does R get different sample sets to calculate SE from varied samples mean, whereas I only have one dataset or one sample set. Does R simulate the repeated sampling and resampling in background?

  • @nabie981
    @nabie981 Рік тому

    Thank you!

  • @Zakariah1971
    @Zakariah1971 Рік тому

    Excellent

    • @biostatisticsbydesign
      @biostatisticsbydesign Рік тому

      Thank you so much 😀

    • @Zakariah1971
      @Zakariah1971 Рік тому

      Taking vector calculus now and it actually is making sense. No i am not an engineer. Just for fun...

  • @user-yee1021
    @user-yee1021 Рік тому

    clear and fun, love your videos!

    • @biostatisticsbydesign
      @biostatisticsbydesign Рік тому

      Thank you! I haven't made a new video in a while, but comments like this one make me realize it is long overdue!

  • @hameddadgour
    @hameddadgour 2 роки тому

    Great video! Thank you for sharing.

  • @patrickjohnson9879
    @patrickjohnson9879 2 роки тому

    The equation for SE has mu (population mean) but we don’t know mu in most cases. How do we calculate SE from x-bar (sample mean)?

    • @saadiachaudary
      @saadiachaudary 2 роки тому

      Where ever we do not know the sd of population we can use the sample sd instead

    • @biostatisticsbydesign
      @biostatisticsbydesign Рік тому

      Your mean and SD calculated from your sample are your best estimate of the population parameters, therefore you would use those to calculate your SE.

  • @englishlion947
    @englishlion947 2 роки тому

    it is a misfortune that you stoped publishing

    • @biostatisticsbydesign
      @biostatisticsbydesign 2 роки тому

      I know I fell off the wagon! But I have plans to make more videos still! I have so many ideas! Stay tuned. I promise to get a new video out soon!

  • @yasharthtrivedi2971
    @yasharthtrivedi2971 2 роки тому

    Mam iam not able to understand why do we divide standard deviation of sample by √n. Iam not able to see logic behind it. Please explain it it would be a great help.

    • @biostatisticsbydesign
      @biostatisticsbydesign Рік тому

      The standard error (SE) measures how accurately a sample represents a population. It tells you how different the population mean is likely to be from a sample mean. The standard error is the average error that would be expected in using a sample mean as an estimate of the real population mean. The standard error is inversely proportional to the sample size. It tends to zero as the number of observations in the sample increases, so the sample represents the population more accurately (the "law of large numbers"). The standard deviation describes the variation between observations/individuals within a sample or population, and is an unbiased estimator, meaning that on average, the SD will not change as sample size increases. This is why we divide by the square root of N to estimate the SE.

  • @zenithbiswas2074
    @zenithbiswas2074 2 роки тому

    Really helpful. Carry on.

    • @biostatisticsbydesign
      @biostatisticsbydesign 2 роки тому

      Glad to hear it! I've been slacking lately but need to get back on it!

  • @1966lavc
    @1966lavc 3 роки тому

    Dear teacher, could you show us a solved example for power of the test?

    • @biostatisticsbydesign
      @biostatisticsbydesign 3 роки тому

      I have an example in my blog here: www.biostatisticsbydesign.com/blog/2019/1/11/power-analysis-an-underutilized-tool I hope that helps!

  • @tymothylim6550
    @tymothylim6550 3 роки тому

    Great video! Thanks a lot!

  • @Photologistic
    @Photologistic 3 роки тому

    Don’t blame Brady, it was Belicheck. Great example for a stats video, thanks.

    • @biostatisticsbydesign
      @biostatisticsbydesign 3 роки тому

      Thank you! I think Brady showed us last year he knows how to win without Belichick. Even though he is in Tampa Bay, I'm still a big fan!

  • @dolondawn
    @dolondawn 3 роки тому

    I have spent all night trying to understand the concept of Sem. This is the simplest and the best video that clears all my doubts and provides the perfect definition "The statistic that measures the amount of variation of the sample mean around the true population mean " thats it !! Thanks a ton

  • @corradoblondi9792
    @corradoblondi9792 3 роки тому

    Quite helpful thanks!

  • @thaleslopes6269
    @thaleslopes6269 3 роки тому

    Well done ! great video.

  • @adithyavj1220
    @adithyavj1220 3 роки тому

    Extremely useful. Thank you very much!

  • @hazel5400
    @hazel5400 3 роки тому

    Hi! When my question asks me to find the point estimates for the population standard deviation, so which do i use? The standard deviation value or standard error value? Thank you so much for the video.

    • @biostatisticsbydesign
      @biostatisticsbydesign 3 роки тому

      You would use the standard deviation that you calculated from your sample. That is the best estimate you have of the true population standard deviation.

  • @tamimahmed2450
    @tamimahmed2450 3 роки тому

    Thanks for the nice video...just to know....What if the population is unknown? Can we determine sample size by power analysis then?

    • @biostatisticsbydesign
      @biostatisticsbydesign 3 роки тому

      I'm not sure I know what you mean. Do you mean if the expected means are unknown or expected difference is unknown?

  • @rorybreaker23
    @rorybreaker23 4 роки тому

    "The standard error is used to represent the precision of the estimate of the mean" Thank you so much! Its such a simple and concise answer that I haven't heard in other videos! Why don't they just say this? lol thanks so much, I'd like to give you more thumbs ups!

  • @marcoglara2012
    @marcoglara2012 4 роки тому

    The formula for the standard error looks like it’s just a guess, “rule of thumb” It’s basically just reducing the standard deviation as the sample size increases. My question is, are we simply just Assigning a value to the statistic According to the sample size?

    • @AlbertoLugli
      @AlbertoLugli 4 роки тому

      Exactly my same question, why is that sqrt of n down there? In other words, why taking something that is already a mean between all the observation and basically dividing this quantity again for all the observation? What that value, which comes from a single sample, which is basically the mean of a mean of distances from the mean, tell us about the population?

    • @biostatisticsbydesign
      @biostatisticsbydesign 3 роки тому

      When you are comparing two populations, you are often comparing their means. The statistical tests used to compare means (t-test, ANOVA, etc) use the measure of uncertainty around your estimation of the mean (SE) as part of the calculation in determining if the two means are significantly different from one another (the more uncertainty there are in the estimates, the harder it would be to declare them significantly different). Because the distribution of sampling means (the distribution of the means of all possible samples taken of size n) becomes more narrow as n increase, the probability that any sample mean more truly represents the true population mean increases (therefore the SE is smaller in these calculations). This principle is called the law of large numbers. The formulas are based on mathematical properties.

  • @karthikm46
    @karthikm46 4 роки тому

    If you have 5 treatments and each has 5 replicates, what is the appropriate way of reporting SEM, (i) calculate SEM based on 25 measurement (5x5) (ii) calculate SEM separately for the 5 treatments and do average SEM

    • @biostatisticsbydesign
      @biostatisticsbydesign 3 роки тому

      Each treatment will have it's own mean and SEM, so in this case the SEM would be calculated with N=5. The idea of an "average SEM" comes into play when you are using an Analysis of Variance (ANOVA), where one of the assumptions is equality of variances, therefore all of the treatments should have the same estimate for the variance, and if balanced (same number of replicates for all treatments) the SEMs would also be the same.

  • @gazzzada
    @gazzzada 4 роки тому

    "Precision of estimate of the mean" thank u so much for this clear and short definition. Each word in it is full of sense

  • @mpen7307
    @mpen7307 4 роки тому

    This made understanding beta very clear! Thanks

  • @nesrienmohamed7577
    @nesrienmohamed7577 4 роки тому

    Thanks! I am watching this the night before my PhD comprehensive exam.

    • @biostatisticsbydesign
      @biostatisticsbydesign 4 роки тому

      I'm sorry I didn't see this in time to wish you luck, but I hope you killed it! My comps were the hardest part of the PhD for me. It's all easy-peasy from here! :P

    • @mjm2b
      @mjm2b 3 роки тому

      Oh lord im a month from those and i havent taken biostats just yet. I have a committee member based on stats, and i dont take his course until spring :/

  • @winvictorywin5612
    @winvictorywin5612 4 роки тому

    Hello, I am having airborne dust concentrations data as PM10, PM2.5, PM 1 . These data was taken before and during dust producing work in a civil construction site. N=5 How can i compare these before and during operations data ? It seems that there is percent variation in dust concentrations in atmosphere between before and during operation data based on particle size. Before operation: PM 10 ( particle size less than 10 microns) is sharing 40% of total airborne dust, and PM 2.5 ( particle size less than 2. 5 micron) shares 10% of total airborne dust. During machine operation: PM 10 shares 60% and PM 2.5 10% only. It seems that PM 10 share is increased due to that machine operation? Which test is suitable for analysing these type similar data for discussion ? How to use statistics? Any comparison among these particle sizes? thank u.

    • @biostatisticsbydesign
      @biostatisticsbydesign 4 роки тому

      If you are just looking to compare before vs. after for a given particle size, then you should be able to use a paired t-test if you have multiple measurements, which I assume you do when you say N=5. In this case, you would be comparing the mean difference from before to after for the 5 trials, because data are paired. If the mean difference is significantly different from zero, then you have a change from before to after. If you are also interested in differences between particle sizes, you would need to use an ANOVA, where particle size is one factor, and time (before/after) is a second factor, and include "trial" as a random block to give you the same kind of power a paired t-test would provide.

    • @winvictorywin5612
      @winvictorywin5612 4 роки тому

      Biostatistics By Design thank u, Let me try the same.

  • @Sarani8
    @Sarani8 4 роки тому

    Thanks for your video! Is there a difference between "standard error" (SE) and "standard error of the mean" (SEM), or are these the same thing?

  • @mindfullness7652
    @mindfullness7652 4 роки тому

    Very helpful. Thank you!

  • @karlvonbatten1764
    @karlvonbatten1764 4 роки тому

    :)

  • @izzthewizz6
    @izzthewizz6 4 роки тому

    Thank you for sharing!

  • @basilatom
    @basilatom 4 роки тому

    Super useful, clear, and easy to understand. Thank you!

  • @vipindangi8265
    @vipindangi8265 4 роки тому

    Why so less views? It was great!!!

  • @capy_can_code
    @capy_can_code 4 роки тому

    amazing video! thanks so much!

  • @Umar_P
    @Umar_P 5 років тому

    Thank you for this video

  • @dr.komalsharma1052
    @dr.komalsharma1052 5 років тому

    I finally understood these concepts now. I never understood this in med school, and I've been struggling with biostatistics mentioned in journals for years now. Thanks a lot!!! :) Also, I absolutely love your voice!

    • @biostatisticsbydesign
      @biostatisticsbydesign 5 років тому

      This is such a great compliment. Very much appreciated, thank you!!

  • @hermela4279
    @hermela4279 5 років тому

    Thanks

  • @ridhimarai8817
    @ridhimarai8817 5 років тому

    Great video THANKYOU 😇😇

  • @dr.alialchalabi
    @dr.alialchalabi 6 років тому

    How can I run 2 non-parametric variables analysis using SPSS?

    • @biostatisticsbydesign
      @biostatisticsbydesign 6 років тому

      Assuming your two variables are quantitative and you are looking to investigate the association between them, you can use a Spearman's rank correlation analysis. Though I dont use SPSS, any decent statistical package will have that analysis as an option.

  • @joshcalvert4202
    @joshcalvert4202 6 років тому

    Wow! Smart AND hot!

  • @dr.alialchalabi
    @dr.alialchalabi 6 років тому

    Well done