An easier way to do sample size calculations

Very Normal

1 000

Додати в
- Мій плейлист
- Переглянути пізніше
Поділитися

Поділитися

Вставка

Розмір відео:

Показувати елементи керування програвачем

Автоматичне відтворення

Автоповтор

Опубліковано 9 лют 2025
You just got to know a little bit of code. The code shown in this video can be found at: github.com/ver...
Stay updated with the channel and some stuff I make!
👉 verynormal.sub...
👉 very-normal.se...

КОМЕНТАРІ • 84

@Eckster 10 місяців тому ⁺³²
I've been shocked at how broadly useful Monte Carlo approaches are in general.
I remember one problem I spent weeks figuring out the correct way to solve an issue, by that point it had been so convoluted to figure out, I decided writing a Monte Carlo simulation to verify I hadn't made a mistake would be smart.
The simulation got the exact same results out to three decimal points, and took about 10 minutes to write.
The other great thing about Monte Carlo simulations is they make all of your assumptions exceedingly clear, while equations tend to obfuscate your assumptions.
@very-normal 10 місяців тому ⁺⁹
Right?? One time I was doing a technical interview for a company, and they gave me a probability question. I didn’t know how to do it but they also gave me a code editor, so I just ran a quick simulation and got the right answer from it
@gaussology 10 місяців тому ⁺⁹
How are these videos so well made?!
@deltax7159 10 місяців тому ⁺⁴
great video man. really enjoy brushing up on my skills via your channel.
@bingobongo131 10 місяців тому ⁺⁷
this video comes exactly at the right time for me as I'm trying to run a power analysis for maximum likelihood fitted sigmoid functions and I was really running out of ideas :))
@marcusstoica 10 місяців тому ⁺¹¹
Yes. I found this method myself after being unsatisfied with traditional power analysis. What's nice is how flexible it is, and how it can be used to quantify and challenge assumptions you have about your data / population.
@IamJacksHeartCA 9 місяців тому ⁺²
Undergrad in math here, love this!
@YaofuZhou 10 місяців тому ⁺⁴
Yup, this is standard practice in particle physics. Eventually the technicality boils down to the modeling of the physical process being investigated, which may involve hundreds of gigabytes of equations. One of the reasons that this is necessary is that there can be signal-background interference in the particle physics processes. What also makes it extra worth it is that the same MC simulation will be used again during data analysis when the actual data collection reaches a checkpoint.
Hopefully, the day-to-day business applications do not often involves complex modeling, and formulae for rough estimations may still be the most economic, especially when the signal and background do not interfere significantly. However, when heavy machiniary, such as MC simulation based on complex model is built, its value can exceed mere advising on sample size. For example. After the statistical analysis with real-life data is done, if the business wants to improve its operation, the model and simulation can be adjusted to provide outlooks for the improvements being considered.
@santiagodm3483 10 місяців тому ⁺¹
I love your videos!!
When i was thinking about creating an statistical test, I thought about doing the same to find out how powerful my test could be!
@ronaldjensen2948 10 місяців тому ⁺⁴
This is similar to a method I use to show why we "fail to reject the null" instead of just rejecting it. If we change the criteria from the confidence interval not including the null to simply the p-value, then plot the returned sims as a histogram, we see when the null is actually true the p-value is simply a uniform random variable. The "falser" the null becomes, the more right tailed our p-value distribution becomes.
library("foreach")
sims = foreach(i=1:10000, combine = c) %do% {
groupA = rnorm(30, mean=0, sd=1)
groupB = rnorm(30, mean=0.125, sd=1)

test = t.test(groupA, groupB, conf.level = 0.95)
result = test$p.value
}
hist(unlist(sims), freq = FALSE)
@very-normal 10 місяців тому ⁺¹
That’s a great demo, I think I might use that for future TA office hours
@Nino21370 9 місяців тому ⁺²
This channel is so underrated🔥
@ronbally2312 9 місяців тому ⁺⁶
Just one step away from using a Bayesian approach :-)
@galenseilis5971 8 місяців тому ⁺¹
One plus to the mathematical formulae (which are not always equations but sometimes also inequations) is that they are computationally fast. A Monte Carlo simulation requires more electrical power than most formulae. The downside of the formulae is primarily that they can be very technical to obtain in the first place and they are only known to be valid under the assumptions they were derived. What's the electrical power cost of spending some length of time working on a formula? I don't know.
@AllemandInstable 10 місяців тому ⁺¹
I personally still prefer deriving the sample size needed for my estimators from concentration bounds given a certain level of control, which makes more intuitive sense to me. But I also like having other tools in my belt, so thank you for the video, great as usual 😀
@anne-katherine1169 9 місяців тому ⁺¹
Hey! I saw that simulations are used to estimate sample size for mixed models too, but it seemed a bit more complex. If you'd like to make a video on that, it would be super super useful :)
@_r_ma_ 9 місяців тому ⁺¹
Very helpful, thank you! In your code you should replace the magrittr pipe: %>% with the new native pipe in R: |>
Just a thought for future videos, so that no one gets hung up with an error that "%>% doesn't exist" if they don't load the tidyverse.
@very-normal 9 місяців тому ⁺¹
I just learned how to replace magrittrs pipe with the native pipe for the keyboard shortcut. Will do, thank you!
@nicksamek12 10 місяців тому ⁺¹
You make good videos. Keep it up!
@joaopedrorocha4790 5 місяців тому
Hi man! quite useful! thanks! i've just used it to estimate how many cross validation folds i'd need to determine if a small improvement between two machine learning models is significant (which is about 45 for a power of 95% ... i'll need more training data '-' )
I'm just missing references to the original material (papers, books, etc), my advisor don't like me putting yt videos as reference (it's not boring enough to work as a serious reference for serious academics hehehe)
It would be a nice detail if you include then in your next ones! Also it would make easier to learn more about the subject too. Thanks!
@very-normal 5 місяців тому
Hey! Thanks for watching!
Most of the stuff in this video came from my own personal experience, so that’s why there’s not much in terms of references. But here’s one that I usually reference in my own stuff: www.ncbi.nlm.nih.gov/pmc/articles/PMC2924739/. Pubmed will have plenty of references for the use of simulations in different contexts, and there are certainty textbooks you can use too. I’ll try to post my references when I do use them
@UnPureMaddness 10 місяців тому
This video is a blessing.
@RinoLovreglio 4 місяці тому
Beautiful video!
I believe one of the key issues for a power analysis is the selection of a reasonable effect size. What's your suggestion?
@very-normal 4 місяці тому ⁺¹
this is kinda hard because it’s really dependent on the particular context. One approach that’s been suggested to me is to find the smallest value that’s practically meaningful, i.e. maybe a 20% increase in the response rate. From there you can increase it slightly to define medium and large effects, but at least the small size is pegged to something that would matter in the real world
@rslazas Місяць тому
@@very-normal @RinoLovreglio We used to talk about this a lot when I worked in an industrial lab. Some of the instruments used in an experiement could differentiate a 0.1% change. But, we would not change our behavior unless we saw a change of 10% or more (the practical difference). So we would power for the practical difference, not the smallest achievable with our equipment.
@vseiti426 8 місяців тому
Do you have some suggestions for testing differences between the variances instead of the mean following the same idea?
@very-normal 8 місяців тому ⁺¹
You could replace the t-test with the F-test to test the ratio of variances, and then alter the two variances in the data generation. I’m not sure about case for a difference of variances though, I’m not aware of a hypothesis test for that.
@vseiti426 8 місяців тому
@@very-normal I will definitly give a look on that solution. Thank you very much for your reply and congrats for the nice job!
@rslazas Місяць тому
Well explained and great video production! This method makes sense for tests of one group (sample) against a value, or 2 groups against each other. How would you adapt this for something like a 2-way ANOVA? I am having trouble imagining how you would generate groups of random data to get power out of a test like that.
@very-normal Місяць тому
I’d prolly do it in a similar way to generating data for a linear regression. There are two categorical factors that I can generate, and then a two-way ANOVA is essentially a linear regression with both of the categorical variables as covariates, along with the interaction terms.
Sometimes you’d want to look at the main effects for the predictors, but at other times, researchers might also be interested in testing for interactions, so you’d power based on what the primary hypothesis is.
@diegodelgadocaceres7243 7 місяців тому
Quick question here. With the MC approach, we need to know the difference we are looking for right?. In this scenario you had 0.5 as the difference to create the second sample and apply the test afterwards. Should we always try to have a specific difference in mind before running an experiment? Or how could I approach this issue if I'm not sure what difference I'm expecting. Thanks for the content!
@very-normal 7 місяців тому ⁺¹
Yes, you’re right, you’ll need to specify a treatment difference to do the simulation. The specific value of this difference will depend on your context. But since we don’t know what this difference is beforehand, it’s usually good practice to simulate a range of values that might span “minimally effective” to a large effect. For example, with a binary response, I might simulate new treatments that have a +10%, +20% and +30% increase in response rate over placebo. To get concrete values, you might have to read previous papers or consult a collaborator/expert on what are ranges they might consider
@titong_totong 9 місяців тому
Thank you for doing your part.
@innerbloomset 10 місяців тому ⁺¹
The hard part is that you don't really know the true effect, and it heavily affects the sample size you need to get the same confidence
@very-normal 10 місяців тому ⁺¹
Yeah, the best you can really do is check your sims against a range of realistic effect sizes, and it gets computationally expensive fast
@andresmagallanes787 10 місяців тому
Hi, what books do you recomend for begginer, intermediate and advance levels on stat?
@Lorenzo-ri2vz 5 місяців тому
What if i don't know how much the null and alternative distribution differ. Is 0.5 important? Sorry for not understanding
@very-normal 5 місяців тому
No need to apologize! It’s usually the case that we don’t know the precise alternative hypothesis. To account for this, you usually repeat this sample size calculation for different specific alternative hypotheses. 0.5 was a specific alternative hypothesis I used, but it’s not particularly special
@melm4251 9 місяців тому
"you can't" - I cackled
@milkpuddle 9 місяців тому
Can your recommend some books for someone who knows nothing about statistics? Where would you start if you had to start over?
@very-normal 9 місяців тому
Yeah I can try to think of something. It would help me to understand your goals for learning statistics, can you tell me a bit more about them?
@milkpuddle 9 місяців тому
@@very-normal I come from a physics background, so basically anything that has to do with experimental science and data. Also, I know that probability is connected to statistics in some way but I’m not sure how to breach into that either. I am curious for the sake of quantum physics
@yonatanofek4424 9 місяців тому
Awesome vid. Makes me wanna go monte carlo something.
But what's this about plugging variables into a function from some 95-star opensource library which magically gives us the right numbers? What's it do? Where's the code for it? Feels incomplete.
@very-normal 9 місяців тому
It’s a popular library for power and sample size calculations. The code for it is there, but I didn’t give it a lot of spotlight cuz it would mean I’d need to explain it.
It takes a bit to learn, but it gets the job done once you know how it works. It uses a different method for getting the sample size, which even I don’t entirely understand after reading their source code. Could be a future video topic
@Dondo1 10 місяців тому
Idk if you saw my reply from the other video but.. Could you possibly look into doing a video on set theory? I feel as if that is a foundation on making statistics more accessible as it is a whole different language from basic xyz variables.
@98danielray 9 місяців тому
might as well to do a high level overview of measures at that point
@Possumman93 10 місяців тому ⁺¹
What do you use to create your videos? Manim? What video editing program?
@very-normal 10 місяців тому ⁺²
I use Final Cut Pro for the editing, and manim to produce my plot and equation animations
@Possumman93 10 місяців тому ⁺¹
@@very-normalthanks! I love your work, great stuff!
@itexsoo 9 місяців тому
can you do a video of renewal processes or renewal theory it's rare to find videos about it,i would really appreciate it.
@lemurpotatoes7988 9 місяців тому
Are there distribution free methods for generating the data? If not, then I don't see the advantage of this approach over formulas. I guess this works even if you don't know the formila for the particular distributions you're working with.
@very-normal 9 місяців тому ⁺¹
I suppose you could do a bootstrap-type thing with a dataset you collect, but I’m not 100% about that. And yeah you’re right, from my own experience, they’re a boon for sample size calculations for complicated experimental designs where the distribution won’t be clear
@andrestorres7343 9 місяців тому
why did you choose an alternative with a difference of 0.05?
@very-normal 9 місяців тому
For that example, my variance was one and I wanted an example that gave me relatively low power compared to 80%
@DontCareAboutUsernme Місяць тому
This may be a (very) stupid question, but your simulations require some a-priori knowledge of what your data actually looks like. How would you make an a-priori sample size selection when you haven't collected real-world data? You don't necessarily know how different (if at all) the conditions will be. So would you repeat this approach with the means of the gaussians slightly increasing until you reach an asymptote?
@very-normal Місяць тому ⁺¹
not a dumb question!
I’m not sure if this is a complete answer, but hopefully it helps. In my field of biostats, we have the luxury of planning our experiments (clinical trials), so we have a good idea of what data will be collected a priori. This means we’d know what data would be categorical, and ideally, we’d know realistic ranges for continuous variables. With this in mind, you can roughly simulate this data with simple random number generators.
What guides the simulation study more is what we consider to be ideal and realistic parameters that we’d want to detect for the experiment. You’re right that it’s impossible to know exactly how the world works, so a good simulation study will try to suss out sample sizes we might need for ideal and realistic simulations and try to pick one that covers our bases.
Another cool thing about simulations is that you can assess how badly you can violate a model’s assumptions (with carefully simulated data) before we might consider a more sophisticated one.
@DontCareAboutUsernme Місяць тому
@@very-normal Thank you for the answer, it does help!
@djangoworldwide7925 9 місяців тому
StaTiSTiCS iS aLl liEs!!! Great vid
@BrakeForLoop 9 місяців тому
You should make a class on Udemy covering how to use statistics for different job titles. Maybe partner with ZeroToMastery? A lot of us have degrees but have to make a shift to a new field that needs more statistics. I'm biology but now I'm going towards Business Analyst and Project Management. I need help connecting the theory to the business world. Coding examples that use SQL, Python, and R is needed, too.
@very-normal 9 місяців тому
That’s always been a vague feeling I’ve had about my audience.
One of my long term goals is to ultimately make courses, but it feels very different from making these UA-cam videos. Thanks for the input, it definitely helps sculpt what I’d think about including in a course/s
@prod.kashkari3075 9 місяців тому
Can you do a video on the bootstrap? Unless you already have
@very-normal 9 місяців тому
I do have one on the theory, but you reminded me that I haven’t gone back to do the code demo for it! Future video!
@prod.kashkari3075 9 місяців тому
@@very-normal awesome. Also, could you do a video on nonparametric regression. Just the idea behind it? Maybe compare it to the parametric regression case?
@Mrissecool 10 місяців тому ⁺¹
Monte Carlo simulations feel like cheating or just dumb brute force, but I guess it's not if it works, which it does obviously.
@very-normal 10 місяців тому ⁺²
That’s exactly how I felt when I first learned about it
@jeffreychandler8418 9 місяців тому
I tried to run this code translated to julia to see how much faster
Let's just say that "waiting for it to finish" is a non issue
@very-normal 9 місяців тому ⁺¹
Ooh always looking for a tool to let me wait less
@jeffreychandler8418 9 місяців тому
@@very-normal julia has some growing pains associated with it, but for operations like these its perfect with it's simple syntax and JIT compilation
@joshstat8114 9 місяців тому
@@jeffreychandler8418it's better to let him choose to tools he needed
@BleachWizz 9 місяців тому
1:50 - well man, in that case there should've been one person that has spoken.
You didn't had to come up with an answer you could've just said you knew but you needed to pull up a computer to calculate.
Maybe you could've simplified the method to quickly explain what had to be done and that would be a satisfying answer in the middle of a class.
@very-normal 9 місяців тому
missed opportunity for my past self
@pipertripp 10 місяців тому
It dropped. Time to fire up obsidian and take some notes.
@very-normal 10 місяців тому
obsidian users RISE UP
@gonzalodiazamor5494 9 місяців тому
Amazing videos!!
Congratulations for the lessons and how easy is to understand you :)
I am thinking about sampling from one population and test the hypothesis like a bootstrap method also as a kind of sample size lesson.
Is this approach right?
Thank you very much!
@fredfred9847 9 місяців тому
Aren't we now going from an arbitrary sample size to an arbitrary mu_A - mu_B?
@very-normal 9 місяців тому
That’s also something that can varied in a Monte Carlo simulation. If you fix sample size and vary the true difference instead, you can plot the power function. But in this video, the true difference is fixed to 0.5, and the sample size is varied
@carlospena98 10 місяців тому
I knew the answer from the top of my head without the use of a laptop does that mean I’m better than the half grads?
@very-normal 10 місяців тому ⁺¹
yes it does, im proud of you
@carlospena98 9 місяців тому
@@very-normal t...thanks dad😭
@Hexahedrico 8 місяців тому
Very gauss!
@viejozorrex 7 місяців тому
i wish I were able to understand everything and to apply this kind of thing.. :S. I know statistics in a more broad way.
I feel dumb
@very-normal 7 місяців тому
Give it some time! Statistics is far from easy or intuitive. I didn’t understand a lot of it at first, even into my graduate studies. You’ll get it with time

Наступне

Автоматичне відтворення