When you mentioned the Cauchy not having a mean it through me for a loop. I had never thought about how the integrals involved in computing an expectation values can just... not converge and the quantity just isn't defined for that distribution.
Yeah, I remember the Cauchy being a recurring frustration in my grad math classes. I had no idea what my professor was talking about. One day I just decided to try a quick simulation on it and then it became clear
Expectation, (and likewise variance and other moments) are just integrals. So if the area under the curve is infinite then you get non-convergence of expected values. Cauchy is just one of the few that happens to not converge in the first moment
Came across the video by accident but will definitely stay for longer. I'm honestly surprised that such a good video with clear explanation of the topic has such a small amount of views, definitely deserve more. Keep up the great work!
Great video. Would love one explaining markov chain monte carlo methods. Another place where assumptions can be sneakily violated is the CLT because it assumes finite variance, so the standard cauchy distribution again gives a counterexample.
@@very-normal Hi, I just wanted to blurt out that in my field of Psychology, Marcov Chain Monte Carlo methods are used a lot for estimation of Item Response Models from Item Response Theory. Just a fun tidbit 🙂. Great video!
Great video. One note on the part with the supercomputers: I have the feeling that statisticians' code can oftentimes be sped up by quite a bit if one just takes efficiency into account. In a vectorized language like R, you want to avoid looping over vectors and dataframes. For example, your central limit theorem simulation can work with much more samples, for example as follows: ``` N
Thanks! I’m always happy to get some helpful snippets of code, especially if it makes my work faster. My limiting factor here was making the gif from the plots, those took forever to finish 💀
Another tip that i was experimenting with these days is the use of GPU's. Since random numbers can be generated independently that's a embarrassingly parallel task that's greatly accelerated by the use of GPU's ... i've got 5000-7000x speedups in some simple monte carlo simulations, the greater the number of generated numbers the greater the speedup.
Quite insightful content, yes; but let's not underestimate the importance of good visuals on conveying a message (or simply hooking up new audience). Both of which your channel does wonderfully, great video!!
At my university, student managed portfolio analysts have to code and run Monte Carlo on stocks to see all likely moves in price a stock can make based on volatility. It’s crazy
In that context, is assessing all likely moves/paths a variety of stocks can take meant to arrive at some average state of every stock in the portfolio? Are those paths and averages simulated independently or is there some type of Bayesian interdependence to the path likelihoods?
Very helpful for option pricing. Depends on your assumption around volatility or you can also feed in the market observed price and back out into a market implied volatility. It’s interesting stuff.
For my senior project for my Petroleum Engineering degree, at first i used Monte Carlos simulations to have a proximation of oil and gas production rates using the surrounding offsets (surrounding wells or with similar features) data. It was awesome 👌 I then created a machine learning model for more accurate forecasting:)
@@josephdaquila2479 Yes, not only in quantitative but also in behavioral: It did a good job of showing possible dips (days of no production). No overfitting.
wow sounds good man , at the risk of sounding like a noob (which I am) you basically used a monte carlo simulation to generate data and that data was then used to create a machine learning model?
Thank you for sharing your knowledge! I'm curious and enthusiastic about data and statistics. I'm currently binge watching your videos on my spare time. Keep it up!
Monte Carlo is well knwon as a method to simulate the behavior of particle in physics. The most popular particle transport software is Monte Carlo N Particle (MNCP), developed under The Los Alamos National Laboritory. Yes, it's where the Project Manhattan took place. In fact, the MC simulation was invented to overcome the problem when they create such weapon. It's really fascinating to know a method that originally invented as a weapon development during the war now has an immense broad of application in the world like forecasting, pharmaceutical development, finance, radiation science, etc. What a nice method (except for the comically long computational time)!
One issue I have with this video is that it first describes statistics as in "We observe reality and make inference about parameters and our models from it and refine it in a feedback loop", which is a completely bayesian framework,, while in practice you then explain frequentist methods (like estimating the power, type 1 or 2 errors, and even the usage of the law of large numbers isn't entirely correct for that type of inference). It's not your fault that statistics is often taught like this, but the philosophical framework you think you've setup is different from the one you actually use, leading to confusion in the long run, like what does it really mean to do a null hypothesis test under the standard framework? If I can rule out something, can I also rule **in** that some parameters are a certain way? At the end of the day, what's the optimal way of deciding things about reality? Standard frequentist tests won't answer that for you. If you think that the methods you're using allow you to say "Oh, given this data (even simulated) the parameter must be within this range" then you've been misled and should rather search for bayesian methods.
I originally intended that the diagram get across the idea that people learn from data and do more experiments, frequentist or Bayesian. But I definitely see that the it’s more properly Bayesian in the way I’ve described it. I thought about talking about Bayesian stuff here but I decided against it. The Frequentist-vs-Bayesian topic deserves its own video I really appreciate the feedback btw, thanks!
@@very-normal Sure! I hope you include something that's not seen in the usual videos, for example what's the likelihood principle. If you want a brief introduction, the likelihood principle essentially says that assuming some model, if you observe some data you should arrive at the same conclusion no matter how you decided to gather it. For example, in two experiments on the same coin I can decide to: - Throw the coin 10 times - Throw the coin until 3 heads in a row appear Say I get the same exact result from these two different experiments, notice how the decision of when to stop doesn't influence the coin's behaviour itself. One would then think that whatever decision you make about the coin (for example determining whether it's a fair coin) should depend only on what you've seen, no matter the experiment, assuming the results were the same. It turns out however that frequentist methods don't respect this and will get you different p-values for the two cases, while bayesian methods respect it! This simplifies a lot of issues when deciding whether to stop midway when carrying an experiment, bayesians can do it without issue, frequentists will get different p-values from that decision.
That’s definitely a key idea to include in a video, I appreciate the introduction. UA-cam needs more Bayesian content, and I’ll be a part of pushing that lol
Nice video! I have a PhD in physics, and some of my colleagues use MC very extensively in their work. In some fields this is the main if not the only viable simulation method.
As a software engineering student I always wanted to do this in my stats classes. Just build up all the fancy distributions and tests from first principles (and the bog-standard PRNG you get in every language under the sun.) Thought it was a crutch, but nice to see that even you galaxy-brain types like to get computational.
When I was young, I always distanced myself from statistics, although a part of me loved its brother provabiliy, because I love mathematics. This is because for me back then it's just all memorization and involves no critical thinking at all! Now that I'm studying machine learning and has finally accepted statistics, I realized that I've been in love with it all along but is just too arrogant to accept it. "I knew you'll come back to me", I sometimes here it say this. 😂
I dont get one thing: I studied this subject and I was wondering how I effectively uses this. I mean, as the professor told us, we have some amount of variables with a specific distribution. We then try to guess those distributions going to see the frequency of the combination on the values of thevariables, that are dependent on the distriburion. Suppose we have two boolean variables, we that have 4 possible states for the system, TT TF FT FF. We generate random samples for the probabilities and use them to see what value gets assigned to the whole system... but to be able to assign the correct value dont we have to know before the proabability of those variables? Are we simulating those values to check after how many sample we reach the actual proability, that we already knew? I might be missing something... If we have a some kind of system where you input datas and the output is a state than its ok for me to look for the probability of each state using montecarlo. Maybe this has nothing to do with finding THIS distrobution, rather something else?
This may not be the right answer for you, but it’s an educated guess. In this simulation, you have control over the success probability of these two Boolean/Bernoulli variables. These are the true underlying mechanisms. If you generate lots of samples from these two variables, you should see that the sample mean will approach the probability you chose. I think the thing you are missing is some numerical result of interest to you. In what you described, you’re interested in the probability of heads (I think) and that happens to be something you can control. But there may be other values you could be interested in that can be generated by this system. One example is the number of coin flips you need until you get your first heads. This number is of interest to gacha players how many times (on average) they might need to pull until they get what they want. lol relevant example to my life, but hopefully it helps clear things up
It seems weird that the Cauchy distribution doesn't have a finite mean. It looks like the Normal distribution with fatter tails. Any reason why it doesn't have a finite mean?
The technical reason is because of how the expectation is defined, the resulting integral for a Cauchy pdf turns out to be undefined. Another way I like thinking about it is in terms of the law of large numbers. It turns out that, thanks to those fat tails, super extreme outcomes are just common enough that a sample mean of Cauchys will never approach the middle value. It will approach it for a bit and then - bam - an outlier totally throws it off. Even though the Normal is also bell-shaped, its values are concentrated around the mean to such an extent such that outliers are improbable enough that this doesn’t happen
One thing I don't understand how do we obtain the expected value of a model. Since metric like biase, empirical SE and coverage depend on it. If we have the expected value, why build a model that will then try to obtain a value close to the expected value. Is the expected value something measured empiralcally (like in a wet lab)? In the paper presented, what is being used as the expected value.
heyo, I’ll do my best to answer your question, based on my understanding. I’m not quite sure what you mean by “expected value,” but I’m interpreting it as the average of a numerical result of interest (bias, SE, coverage). These are all functions of the estimated treatment effect, which means that they also have probability distributions. These distributions are all influenced by the underlying data and are too complex to derive analytically. The population means of these distributions would be interpreted as the “true/population” bias, SE, coverage, etc. The authors then estimate these using Monte Carlo simulations to produce a distribution for each of these metrics. The empiric mean of these metrics are listed in the table, and they are implied to be good estimates of their respective expected (“population”) values. In this case, “expected value” of the metric is the theoretical population value we’ll never see; the sample mean is the estimate we can get from data produced by Monte Carlo sims Hope this helps a little bit, it’s a great question
Thanks! That could be a good video to do! I'll keep a note to myself about this comment. Both my MS and PhD are in Biostatistics. Slightly more applied than Statistics, but a lot of my coursework was with Statistics MS and PhDs. Responding to your other comment, I did a Ph.D because I really wanted the independent research skillset that Ph.Ds have. After going through most of my MS work, I felt like I had practical technical skills, but I felt like I could only do things after being told what to do. I liked the idea of being able to face a problem by myself, figure out a plan to tackle it, and then act on the plan. This isn't specific to Ph.Ds, but getting one gives you dedicated time to develop as an independent researcher, especially after you finish coursework. I think it's perfectly valid to try industry first before going for a PhD. It can give you a better idea of areas you like/hate, and make your time in the Ph.D more focused once you get in. But, you'll definitely feel the drop in pay going from industry to academia lol. Hope this briefly answers your question, I'll think about a more thoughtful response in the meantime.
Really a nice video, you got a new subscriber :) I had a question if it's possible, I know you gave some examples but I would like to understand better the "intermediate" one: I should think that the hypothesis test applied is the same for all the simulations (in which for each one we test 2 models which, for example, estimate an unknown parameter) or what we are actually doing is investigating the power of 2 statistical tests? My doubt arises from the fact that the concept of power is something that usually refers to hypothesis tests and not to a "model"... but perhaps when you talk about power you mean the one that in machine learning models (for example) is usually called Recall (aka sensitive)?
Thanks! I’m glad you liked it! I can try to explain a bit more. The problem I was interested in was estimating a response rate for some hypothetical drug. I was comparing how different models perform in different types of data. Each of these models have different ways of estimating the response rate. Each of the models themselves contain a parameter which represents the response rate I want to estimate, so I can apply the same hypothesis for all of them and see if it was rejected or not based on the data I generate. Some models perform much more poorly when their assumptions aren’t met, and I was trying to quantify how badly their performance (I.e. power, type-I error) was affected compared to ideal situations. And you’re right, power and sensitivity are very similar, but come from slightly different contexts. My feeling is that power is for decision making in hypothesis tests, sensitivity is for prediction tasks. They both condition on there being a true effect or actually having some condition. As an aside, my simulation studies were actually Bayesian in nature, so my work was actually kinda like a Bayesian-frequentist hybrid. But that’s a whole other story lol, hope this helps to clarify
Thanks for the reply! Now it's clearer to me. It would be interesting to understand in detail what type of data you generated and how, what type of clinical trial you simulated and what models you used, etc. You could make a little spin-off of this video, it would be super cool! :)@@very-normal
Hi, Is there any good book that you will recommend me to have a better understanding of statistical modelling ? I had a course in my bachelor in statistics and I've always used some of these tools in other courses. But this explanation seems to be one step further. Is it possible to find literature explaining these concepts more in detail ? Thank you for the video, subscribed
Sure, I can try recommending something. What kind of problems you usually work with? Generally I’d say “Statistical Inference”by Casella Berger since it has solutions to help you check your understanding. I’ve also read a bit of “All of Statistics” by Larry Wasserman, though I have less experience with it. Thanks for watching and subscribing!
You’re right, it is going to zero, but outliers in Cauchy variables are common enough that this probably won’t happen. The sample size is already at 10,000, so it’s pretty large already The Law of Large Numbers more technically requires that the deviation away from the population mean stay within some arbitrary range with infinite sample size. Occasional outliers will push this deviation out of this range
@@very-normal the devil lives in your assumptions of what is large 7 bilion people in the world and there is data about each one. How about that for a sample?
Great video. I love it when people break down and opperationalise the statistical process of collecting and testing empirical data. I'll be sharing this with future cognitive science students when supervising their thesis projects.
I remember when I was looking at Monte Carlo risk assessments I could just re run the model to get the result I wanted. It was such a flawed way of looking at risk.
Perhaps you know already, but just for clarity. Yes that would be deceptive, but just for clarity, the way to fix that is run the simulation many times and construct a whole probability distribution of the risk (from which you could take simplified measures like mean, variance) ... using one simulation to draw conclusions, is like using one point from a probability distribution ...
When you say "model", do you mean the type of statistical distribution that you're drawing from? It took me a little while to realize that the randomness we need to generate for the simulations shouldn't always be a normal distribution, and we have to generate samples for all sources of variation in the "model".
@@very-normal Thank you; great video. I didn't even learn Monte Carlo sims, because I thought it was not introduced as a way to build statistical models. I was happy with these "skills in statistics" video. Keep it up please
I got lost mid way through. Might help if you kept up with the concrete examples for each concept, like you did with the normal/cauchy plots. Liked your videos style.
im here cous youtube suggestion, I was looking for information about monte carlo simulations to ajust predictions on cycletime in software development. I'd be great if you can make a video about apply montecarlo simulations to solve problems like mine, someone told me that its better than taking the "average" but I dont really know why its better
I’m not really familiar with software dev, but I can take a crack at an answer. My best guess is that software deliverables have several steps that need to be done before they’re completed. Each of these steps takes time, but you don’t know how muchahead of time because that’s how coding is. A Monte Carlo approach here might be to give each “task” a distribution on the time it’ll take, like a Poisson. Then, the sum over all these “time distributions” gives you the total time needed to finish a project. By replicating this many times, you can get a sense of best and worse case scenarios (ie the variance of the total time), in addition to average behavior. This gives you more information to plan from, compared to a simple average of past times you’ve had to complete tasks
Man, it kills me too 💀I love that that the manim library can animate code typing for me, but I can’t get it to line them up. It’s better than watching me type with typos tho, that’s for sure
I feel like I just Dont get it. I’ve always felt like Monte Carlo is useless. I hoped this video would change my mind, but it didn’t. I’m an actuary, have my FCAS designation, worked in stats for insurance for 9 years. If I can define the parameters to build my simulated dataset, then my thumb is on the scale. I get to decide the parameters of the data that I generate. If I am the one who builds the simulated data, how is that data an appropriate way to measure the power and bias of a model? Your example of Normal vs Cauchy resonates with me. If the underlying process is actually a different type of random than the type I choose to build for my simulation, then any conclusions drawn from the simulated data are unreliable. If i am pricing auto insurance, and I could use Monte Carlo to just decide how often different hypothetical drivers have hypothetical car accidents, then I would have a super power. But if I decide what hypothetical drivers and accidents to create, then my simulation is a reflection of my world view. Am I completely misunderstanding this?
Hey, thanks for watching! I’m sorry it didn’t fully resolve the questions you had. Based on what you said, it sounds like you have extensive statistical experience. I don’t think I can give you a fully satisfactory answer to your questions, but I’ll try! For the case of power, I can simulate datasets with a given treatment effect, null or not, and perform hypothesis tests with some model of choice. It’s only really “appropriate” when the real world matches with the data I simulated, which is practically impossible. But, for a biostatistician, it at least helps plan how many people are needed for a new trial to be successful. It’s more a planning tool than anything. You’re right that the simulation is really a reflection of how you think the world is. Any results you get from doing simulation are only applicable to whatever parameters you used to generate it, so it’s Monte Carlo is pretty limited in that respect. Like you mentioned the Normal vs Cauchy example, you can try to see how a model’s performance is influenced by deviations from ideal conditions. I wouldn’t say that the results would be unreliable, but a reflection of the fact that the model can be negatively influenced by misspecification. On a side note, do you have a textbook you like for actuarial statistics? It’s a totally new field to me, and you’re the first actuary I’ve encountered in my life lol
@@very-normal thanks for the thoughtful reply! I will concede any point in the bio stats domain. Maybe that’s the difference. If you and I need different results, then we can use different tools, and it’s about using the right tool for the job. I can appreciate the massive task there is to build a power test for bio stats, and that it would be cost prohibitive to do that all with organic tests. Thank you for sharing some knowledge about it :) If you want a nice intro to actuarial work (at least the branch that I know best!) I would point you towards the Casualty Actuarial Society (CAS) Intro to Ratemaking textbook by Werner and Modlin. It’s a free pdf available online. If the first chapter seems interesting, then you might enjoy more about the actuarial worldview! If the first chapter isn’t for you, then feel free to leave it alone. We are stats people who really dig into insurance problems.
'Professors are at leading edge of research.' - That's a generalization. Most of the research and writing work is done by multiple PhD and post-docs for the professor. Becoming a professor is more like a political achievement nowadays, rather than an academic one.
Spotted many errors in this video unfortunately. The worst one is not ubderstanding the Law of Large Numbers, it doesnt refer to a specific statistic like tge mean; but about the assymptotic dist of a sum of random variables.
Ai will replace every statistician on earth can I get credit for being one of the people pointing this out as soon as I heard about ai a year ago? Yes I know so many did but I just want credit because so many didn't understand the impact ai would have, and since I work in statistics, I immediately knew...
@@Yuvraj. when society breeds poor mentalities so that politicians can divide and conquer- it creates reality and illusions. When the richest men on earth don’t have college degrees and have people beneath them that know rocket science but can’t build rockets- there is something with the world
One way you could try is to apply the Central Limit Theorem. You can decide how far away you can tolerate having your sample mean be from the unknown population mean, which will give you a “width” you want. Since more data will shrink the variance of the sample mean, you can solve for a sample size that gives you the width (ie distance from population mean) you want. That being said, this won’t ever tell you exactly what the true population mean is, but it gives you a rough guess
You’re right that the sample mean is random, but you can control how much it will vary around the population mean with greater sample size. It will still vary based on different data, but the variance you solve for will keep it close to the population mean with high probability
what are the odds of me getting pinned in the comment section?
better than the lottery
When you mentioned the Cauchy not having a mean it through me for a loop. I had never thought about how the integrals involved in computing an expectation values can just... not converge and the quantity just isn't defined for that distribution.
Yeah, I remember the Cauchy being a recurring frustration in my grad math classes. I had no idea what my professor was talking about. One day I just decided to try a quick simulation on it and then it became clear
It's aaalways the Cauchy, lol.
Expectation, (and likewise variance and other moments) are just integrals. So if the area under the curve is infinite then you get non-convergence of expected values. Cauchy is just one of the few that happens to not converge in the first moment
Threw
@@ashlerh4103But isn't a finite area a requirement for a function to be called "distribution"?
Just wanted to thank you for your channel. This kind of quality material helps a lot :)
It's gambling
You have a genuine talent for distilling the essentials of a topic and explaining them clearly and succinctly. Wonderful work!
Seriously?
@@jonr6680Genuine compliment without snark. Sometimes there is positivity on the internet.
Came across the video by accident but will definitely stay for longer. I'm honestly surprised that such a good video with clear explanation of the topic has such a small amount of views, definitely deserve more. Keep up the great work!
Great video. Would love one explaining markov chain monte carlo methods. Another place where assumptions can be sneakily violated is the CLT because it assumes finite variance, so the standard cauchy distribution again gives a counterexample.
Thanks! That’s a great idea, they’re so useful but it’s not something I see around a lot
@@very-normal Hi, I just wanted to blurt out that in my field of Psychology, Marcov Chain Monte Carlo methods are used a lot for estimation of Item Response Models from Item Response Theory. Just a fun tidbit 🙂. Great video!
Great video.
One note on the part with the supercomputers: I have the feeling that statisticians' code can oftentimes be sped up by quite a bit if one just takes efficiency into account. In a vectorized language like R, you want to avoid looping over vectors and dataframes. For example, your central limit theorem simulation can work with much more samples, for example as follows:
```
N
Thanks! I’m always happy to get some helpful snippets of code, especially if it makes my work faster.
My limiting factor here was making the gif from the plots, those took forever to finish 💀
Another tip that i was experimenting with these days is the use of GPU's. Since random numbers can be generated independently that's a embarrassingly parallel task that's greatly accelerated by the use of GPU's ... i've got 5000-7000x speedups in some simple monte carlo simulations, the greater the number of generated numbers the greater the speedup.
Quite insightful content, yes; but let's not underestimate the importance of good visuals on conveying a message (or simply hooking up new audience). Both of which your channel does wonderfully, great video!!
Great video! As a first year Grad Student, I loved the Casella Berger shoutout.
Thanks! Good luck through your trials and tribulations ahead, intrepid first-year! MS or PhD?
At my university, student managed portfolio analysts have to code and run Monte Carlo on stocks to see all likely moves in price a stock can make based on volatility. It’s crazy
That’s really cool! What university if you don’t mind me asking?
In that context, is assessing all likely moves/paths a variety of stocks can take meant to arrive at some average state of every stock in the portfolio? Are those paths and averages simulated independently or is there some type of Bayesian interdependence to the path likelihoods?
Very helpful for option pricing. Depends on your assumption around volatility or you can also feed in the market observed price and back out into a market implied volatility. It’s interesting stuff.
For my senior project for my Petroleum Engineering degree, at first i used Monte Carlos simulations to have a proximation of oil and gas production rates using the surrounding offsets (surrounding wells or with similar features) data. It was awesome 👌
I then created a machine learning model for more accurate forecasting:)
how do you know it was more accurate?
Was it more fit to your test data?
@@josephdaquila2479 Yes, not only in quantitative but also in behavioral: It did a good job of showing possible dips (days of no production). No overfitting.
@@SkegAudiowhich kind of model did you use afterwards?
wow sounds good man , at the risk of sounding like a noob (which I am) you basically used a monte carlo simulation to generate data and that data was then used to create a machine learning model?
Thank you for sharing your knowledge! I'm curious and enthusiastic about data and statistics. I'm currently binge watching your videos on my spare time. Keep it up!
Monte Carlo is well knwon as a method to simulate the behavior of particle in physics. The most popular particle transport software is Monte Carlo N Particle (MNCP), developed under The Los Alamos National Laboritory. Yes, it's where the Project Manhattan took place. In fact, the MC simulation was invented to overcome the problem when they create such weapon.
It's really fascinating to know a method that originally invented as a weapon development during the war now has an immense broad of application in the world like forecasting, pharmaceutical development, finance, radiation science, etc. What a nice method (except for the comically long computational time)!
One issue I have with this video is that it first describes statistics as in "We observe reality and make inference about parameters and our models from it and refine it in a feedback loop", which is a completely bayesian framework,, while in practice you then explain frequentist methods (like estimating the power, type 1 or 2 errors, and even the usage of the law of large numbers isn't entirely correct for that type of inference).
It's not your fault that statistics is often taught like this, but the philosophical framework you think you've setup is different from the one you actually use, leading to confusion in the long run, like what does it really mean to do a null hypothesis test under the standard framework? If I can rule out something, can I also rule **in** that some parameters are a certain way? At the end of the day, what's the optimal way of deciding things about reality? Standard frequentist tests won't answer that for you.
If you think that the methods you're using allow you to say "Oh, given this data (even simulated) the parameter must be within this range" then you've been misled and should rather search for bayesian methods.
I originally intended that the diagram get across the idea that people learn from data and do more experiments, frequentist or Bayesian. But I definitely see that the it’s more properly Bayesian in the way I’ve described it.
I thought about talking about Bayesian stuff here but I decided against it. The Frequentist-vs-Bayesian topic deserves its own video
I really appreciate the feedback btw, thanks!
@@very-normal Sure! I hope you include something that's not seen in the usual videos, for example what's the likelihood principle.
If you want a brief introduction, the likelihood principle essentially says that assuming some model, if you observe some data you should arrive at the same conclusion no matter how you decided to gather it.
For example, in two experiments on the same coin I can decide to:
- Throw the coin 10 times
- Throw the coin until 3 heads in a row appear
Say I get the same exact result from these two different experiments, notice how the decision of when to stop doesn't influence the coin's behaviour itself.
One would then think that whatever decision you make about the coin (for example determining whether it's a fair coin) should depend only on what you've seen, no matter the experiment, assuming the results were the same.
It turns out however that frequentist methods don't respect this and will get you different p-values for the two cases, while bayesian methods respect it!
This simplifies a lot of issues when deciding whether to stop midway when carrying an experiment, bayesians can do it without issue, frequentists will get different p-values from that decision.
That’s definitely a key idea to include in a video, I appreciate the introduction.
UA-cam needs more Bayesian content, and I’ll be a part of pushing that lol
I believe you forgot to include "parameter estimation" versus "state estimation" is one of the hottest upcoming skills as well
LOL, your intro got me subbed. Cause you're right!
Nice video! I have a PhD in physics, and some of my colleagues use MC very extensively in their work. In some fields this is the main if not the only viable simulation method.
Same in computer graphics, it's all Monte Carlo now :)
As a software engineering student I always wanted to do this in my stats classes. Just build up all the fancy distributions and tests from first principles (and the bog-standard PRNG you get in every language under the sun.)
Thought it was a crutch, but nice to see that even you galaxy-brain types like to get computational.
Love your videos!
It would be good to recommend a book to learn Montecarlo!
Phenomenal work as always, such a succinct explanation and the graphics complement it perfectly!
I really like that you covered this concept. Do you have any plan to cover a video about Markov Chain Monte Carlo (MCMC) soon?
When I was young, I always distanced myself from statistics, although a part of me loved its brother provabiliy, because I love mathematics. This is because for me back then it's just all memorization and involves no critical thinking at all! Now that I'm studying machine learning and has finally accepted statistics, I realized that I've been in love with it all along but is just too arrogant to accept it. "I knew you'll come back to me", I sometimes here it say this. 😂
I dont get one thing: I studied this subject and I was wondering how I effectively uses this. I mean, as the professor told us, we have some amount of variables with a specific distribution. We then try to guess those distributions going to see the frequency of the combination on the values of thevariables, that are dependent on the distriburion. Suppose we have two boolean variables, we that have 4 possible states for the system, TT TF FT FF. We generate random samples for the probabilities and use them to see what value gets assigned to the whole system... but to be able to assign the correct value dont we have to know before the proabability of those variables? Are we simulating those values to check after how many sample we reach the actual proability, that we already knew? I might be missing something... If we have a some kind of system where you input datas and the output is a state than its ok for me to look for the probability of each state using montecarlo. Maybe this has nothing to do with finding THIS distrobution, rather something else?
This may not be the right answer for you, but it’s an educated guess. In this simulation, you have control over the success probability of these two Boolean/Bernoulli variables. These are the true underlying mechanisms. If you generate lots of samples from these two variables, you should see that the sample mean will approach the probability you chose.
I think the thing you are missing is some numerical result of interest to you. In what you described, you’re interested in the probability of heads (I think) and that happens to be something you can control. But there may be other values you could be interested in that can be generated by this system.
One example is the number of coin flips you need until you get your first heads. This number is of interest to gacha players how many times (on average) they might need to pull until they get what they want. lol relevant example to my life, but hopefully it helps clear things up
Happy to see Casella and Berger mentioned!
Recently learnt about MC simulations because they are a key part in testing algorithmic trading systems. Interesting stuff.
"on average you need it" cracks me up every time! hahaha
Damn this video really blew up. You deserve it
Chapter 5 Casella & Berger reference goes hard💀
It seems weird that the Cauchy distribution doesn't have a finite mean. It looks like the Normal distribution with fatter tails. Any reason why it doesn't have a finite mean?
The technical reason is because of how the expectation is defined, the resulting integral for a Cauchy pdf turns out to be undefined.
Another way I like thinking about it is in terms of the law of large numbers. It turns out that, thanks to those fat tails, super extreme outcomes are just common enough that a sample mean of Cauchys will never approach the middle value. It will approach it for a bit and then - bam - an outlier totally throws it off.
Even though the Normal is also bell-shaped, its values are concentrated around the mean to such an extent such that outliers are improbable enough that this doesn’t happen
@@very-normal thank you!
One thing I don't understand how do we obtain the expected value of a model. Since metric like biase, empirical SE and coverage depend on it. If we have the expected value, why build a model that will then try to obtain a value close to the expected value. Is the expected value something measured empiralcally (like in a wet lab)? In the paper presented, what is being used as the expected value.
heyo, I’ll do my best to answer your question, based on my understanding. I’m not quite sure what you mean by “expected value,” but I’m interpreting it as the average of a numerical result of interest (bias, SE, coverage). These are all functions of the estimated treatment effect, which means that they also have probability distributions. These distributions are all influenced by the underlying data and are too complex to derive analytically. The population means of these distributions would be interpreted as the “true/population” bias, SE, coverage, etc.
The authors then estimate these using Monte Carlo simulations to produce a distribution for each of these metrics. The empiric mean of these metrics are listed in the table, and they are implied to be good estimates of their respective expected (“population”) values.
In this case, “expected value” of the metric is the theoretical population value we’ll never see; the sample mean is the estimate we can get from data produced by Monte Carlo sims
Hope this helps a little bit, it’s a great question
I implemented Monte carlo sampler for a Raytracer but never fully understood why it works. Great video
Hey, I’m a MS stats here. I love your channel! Can you maybe do a video talking about your career path? Did you do a MS or PhD in Stats?
Thanks! That could be a good video to do! I'll keep a note to myself about this comment.
Both my MS and PhD are in Biostatistics. Slightly more applied than Statistics, but a lot of my coursework was with Statistics MS and PhDs.
Responding to your other comment, I did a Ph.D because I really wanted the independent research skillset that Ph.Ds have. After going through most of my MS work, I felt like I had practical technical skills, but I felt like I could only do things after being told what to do. I liked the idea of being able to face a problem by myself, figure out a plan to tackle it, and then act on the plan. This isn't specific to Ph.Ds, but getting one gives you dedicated time to develop as an independent researcher, especially after you finish coursework.
I think it's perfectly valid to try industry first before going for a PhD. It can give you a better idea of areas you like/hate, and make your time in the Ph.D more focused once you get in. But, you'll definitely feel the drop in pay going from industry to academia lol. Hope this briefly answers your question, I'll think about a more thoughtful response in the meantime.
Cool that I found your channel!
I litterally have a psychometrics (statistics for psychology research) test tomorrow and this was reccomended to me now
Good luck!
Passed it! I got blessed from you I think@@very-normal
It was a nightmare, really admire you guys for studying statistics 💀
Congrats! Honestly the stuff from my psych stats class kinda kicked my ass when I took it
A suggestion in code at 6:04:
observations = rnorm(10000, 2, 1)
xbars = cumsum(observations)/(1:10000)
is almost 900 times faster
What time to be alive.
Really a nice video, you got a new subscriber :) I had a question if it's possible, I know you gave some examples but I would like to understand better the "intermediate" one: I should think that the hypothesis test applied is the same for all the simulations (in which for each one we test 2 models which, for example, estimate an unknown parameter) or what we are actually doing is investigating the power of 2 statistical tests? My doubt arises from the fact that the concept of power is something that usually refers to hypothesis tests and not to a "model"... but perhaps when you talk about power you mean the one that in machine learning models (for example) is usually called Recall (aka sensitive)?
Thanks! I’m glad you liked it!
I can try to explain a bit more. The problem I was interested in was estimating a response rate for some hypothetical drug. I was comparing how different models perform in different types of data. Each of these models have different ways of estimating the response rate.
Each of the models themselves contain a parameter which represents the response rate I want to estimate, so I can apply the same hypothesis for all of them and see if it was rejected or not based on the data I generate. Some models perform much more poorly when their assumptions aren’t met, and I was trying to quantify how badly their performance (I.e. power, type-I error) was affected compared to ideal situations.
And you’re right, power and sensitivity are very similar, but come from slightly different contexts. My feeling is that power is for decision making in hypothesis tests, sensitivity is for prediction tasks. They both condition on there being a true effect or actually having some condition.
As an aside, my simulation studies were actually Bayesian in nature, so my work was actually kinda like a Bayesian-frequentist hybrid. But that’s a whole other story lol, hope this helps to clarify
Thanks for the reply! Now it's clearer to me. It would be interesting to understand in detail what type of data you generated and how, what type of clinical trial you simulated and what models you used, etc. You could make a little spin-off of this video, it would be super cool! :)@@very-normal
Yes! This was the biggest gap in my stats sequence in grad school. I use simulations all the time now.
Can you make a video on synthetic data generation for multivariate datasets?
You mean like multiple outcomes yeah?
Your content is amazing!
Nice video! Well explained and fun. Thanks
This an incredible video. Thank you for creating this content
Subbed. If you can work some poker or game theory math, that would be super interesting.
Hi,
Is there any good book that you will recommend me to have a better understanding of statistical modelling ? I had a course in my bachelor in statistics and I've always used some of these tools in other courses. But this explanation seems to be one step further. Is it possible to find literature explaining these concepts more in detail ? Thank you for the video, subscribed
Sure, I can try recommending something. What kind of problems you usually work with?
Generally I’d say “Statistical Inference”by Casella Berger since it has solutions to help you check your understanding. I’ve also read a bit of “All of Statistics” by Larry Wasserman, though I have less experience with it.
Thanks for watching and subscribing!
Make a detailed video on degrees of freedom....
Please
lol I’ve had a script gathering dust for degrees of freedom because I’m still trying to make it make sense, but I’m working on it!
7:51
The mean is going to zero. It just needs more sample. No?
You’re right, it is going to zero, but outliers in Cauchy variables are common enough that this probably won’t happen. The sample size is already at 10,000, so it’s pretty large already
The Law of Large Numbers more technically requires that the deviation away from the population mean stay within some arbitrary range with infinite sample size. Occasional outliers will push this deviation out of this range
@@very-normal the devil lives in your assumptions of what is large
7 bilion people in the world and there is data about each one. How about that for a sample?
Now I know all the most important skills ... but I still don't have them (pun on 0:14)
Bei good video! What tool do you use to make the animations?
Thanks! For mathematical and notation animation, I use the manim Python library. And for most others, I use key framing in Final Cut Pro.
Great video. I love it when people break down and opperationalise the statistical process of collecting and testing empirical data. I'll be sharing this with future cognitive science students when supervising their thesis projects.
I remember when I was looking at Monte Carlo risk assessments I could just re run the model to get the result I wanted. It was such a flawed way of looking at risk.
Perhaps you know already, but just for clarity. Yes that would be deceptive, but just for clarity, the way to fix that is run the simulation many times and construct a whole probability distribution of the risk (from which you could take simplified measures like mean, variance) ... using one simulation to draw conclusions, is like using one point from a probability distribution ...
When you say "model", do you mean the type of statistical distribution that you're drawing from? It took me a little while to realize that the randomness we need to generate for the simulations shouldn't always be a normal distribution, and we have to generate samples for all sources of variation in the "model".
Thank you for putting things in English. Now I will binge watch all your videos 😅
I'm just here for the excellent memes. thank you
🫡
Very good video! You kinda slided over the topic of the power of model....
Yeahhhhh I coulda spent more time on the intermediate section. But! The power calc deserves its own video, so that’s a future thing
I hope you'll please consider using a thicker font on some of your labels. They are very difficult to read.
Thanks for your feedback! I’ll look at some different fonts for future videos
i only know how to calculate mean, median, modus. my brain hurt. help.
Don’t worry, I know graduate students who don’t understand this stuff even after a quarter of learning. It takes time my dude
On average you need it. Decided to watch whole videos when I heard that lol
Should the argument inside the rnorm or rcauchy command not be n instead of 1 that it has. Great video by the way
The 1 lines up with the “n” argument in the function, so it ends up running the same. Good catch tho, I’ll try to write it out explicitly next time
Hi Christian, do you use measure theory at all?
Nah, I didn’t need to take the grad level probability classes at my university, so I’m only aware that it exists lol
@@very-normal Thank you; great video. I didn't even learn Monte Carlo sims, because I thought it was not introduced as a way to build statistical models. I was happy with these "skills in statistics" video. Keep it up please
great video
Damn this video really blew up. Nice that you start to get more views
I got lost mid way through. Might help if you kept up with the concrete examples for each concept, like you did with the normal/cauchy plots. Liked your videos style.
Thanks for the feedback! I felt there was something more I could do with it, but I didn’t want to dwell on it, I’ll have a better viz for it next time
We want the MCMC
🫡
im here cous youtube suggestion, I was looking for information about monte carlo simulations to ajust predictions on cycletime in software development. I'd be great if you can make a video about apply montecarlo simulations to solve problems like mine, someone told me that its better than taking the "average" but I dont really know why its better
I’m not really familiar with software dev, but I can take a crack at an answer.
My best guess is that software deliverables have several steps that need to be done before they’re completed. Each of these steps takes time, but you don’t know how muchahead of time because that’s how coding is. A Monte Carlo approach here might be to give each “task” a distribution on the time it’ll take, like a Poisson. Then, the sum over all these “time distributions” gives you the total time needed to finish a project.
By replicating this many times, you can get a sense of best and worse case scenarios (ie the variance of the total time), in addition to average behavior. This gives you more information to plan from, compared to a simple average of past times you’ve had to complete tasks
Bruh, all well and good, but the code lines not lining up with the *line numbers* on the terminal triggered me.
Man, it kills me too 💀I love that that the manim library can animate code typing for me, but I can’t get it to line them up. It’s better than watching me type with typos tho, that’s for sure
1:30s in and so many buzzwords are used
lol
my brain hurts.
You and me both my friend
I feel like I just Dont get it. I’ve always felt like Monte Carlo is useless. I hoped this video would change my mind, but it didn’t.
I’m an actuary, have my FCAS designation, worked in stats for insurance for 9 years.
If I can define the parameters to build my simulated dataset, then my thumb is on the scale. I get to decide the parameters of the data that I generate. If I am the one who builds the simulated data, how is that data an appropriate way to measure the power and bias of a model? Your example of Normal vs Cauchy resonates with me. If the underlying process is actually a different type of random than the type I choose to build for my simulation, then any conclusions drawn from the simulated data are unreliable.
If i am pricing auto insurance, and I could use Monte Carlo to just decide how often different hypothetical drivers have hypothetical car accidents, then I would have a super power. But if I decide what hypothetical drivers and accidents to create, then my simulation is a reflection of my world view. Am I completely misunderstanding this?
Hey, thanks for watching! I’m sorry it didn’t fully resolve the questions you had. Based on what you said, it sounds like you have extensive statistical experience.
I don’t think I can give you a fully satisfactory answer to your questions, but I’ll try! For the case of power, I can simulate datasets with a given treatment effect, null or not, and perform hypothesis tests with some model of choice. It’s only really “appropriate” when the real world matches with the data I simulated, which is practically impossible. But, for a biostatistician, it at least helps plan how many people are needed for a new trial to be successful. It’s more a planning tool than anything.
You’re right that the simulation is really a reflection of how you think the world is. Any results you get from doing simulation are only applicable to whatever parameters you used to generate it, so it’s Monte Carlo is pretty limited in that respect. Like you mentioned the Normal vs Cauchy example, you can try to see how a model’s performance is influenced by deviations from ideal conditions. I wouldn’t say that the results would be unreliable, but a reflection of the fact that the model can be negatively influenced by misspecification.
On a side note, do you have a textbook you like for actuarial statistics? It’s a totally new field to me, and you’re the first actuary I’ve encountered in my life lol
@@very-normal thanks for the thoughtful reply!
I will concede any point in the bio stats domain. Maybe that’s the difference. If you and I need different results, then we can use different tools, and it’s about using the right tool for the job. I can appreciate the massive task there is to build a power test for bio stats, and that it would be cost prohibitive to do that all with organic tests. Thank you for sharing some knowledge about it :)
If you want a nice intro to actuarial work (at least the branch that I know best!) I would point you towards the Casualty Actuarial Society (CAS) Intro to Ratemaking textbook by Werner and Modlin. It’s a free pdf available online. If the first chapter seems interesting, then you might enjoy more about the actuarial worldview! If the first chapter isn’t for you, then feel free to leave it alone. We are stats people who really dig into insurance problems.
I'm a mathematician, so I know all that, but it is very helpful and I'll definitely recommend to friends
LETS GOOOOO THANK YOU
we in here bois
Hi I'm normal
but are you very normal
❤❤❤❤
FIRST!!!
'Professors are at leading edge of research.' - That's a generalization. Most of the research and writing work is done by multiple PhD and post-docs for the professor. Becoming a professor is more like a political achievement nowadays, rather than an academic one.
Terrible, and I have over 3 years research level stat studies
🆒
You’re Indian tho
Spotted many errors in this video unfortunately. The worst one is not ubderstanding the Law of Large Numbers, it doesnt refer to a specific statistic like tge mean; but about the assymptotic dist of a sum of random variables.
what other errors were there?
Ai will replace every statistician on earth can I get credit for being one of the people pointing this out as soon as I heard about ai a year ago? Yes I know so many did but I just want credit because so many didn't understand the impact ai would have, and since I work in statistics, I immediately knew...
I’ll give you credit my dude
I thought AI needs statistics
@@very-normal thank you bro
What I find pathetic about these self-proclaimed geniuses is that they are always smarter than their boss but lack risk. Pathetic
Results and intelligence are not the same and you would do well to be able to hold them separately in your mind
@@Yuvraj. when society breeds poor mentalities so that politicians can divide and conquer- it creates reality and illusions. When the richest men on earth don’t have college degrees and have people beneath them that know rocket science but can’t build rockets- there is something with the world
HHow to tell exactly how many samples are needed to achieve a desired accuracy in estimation of mean?
One way you could try is to apply the Central Limit Theorem. You can decide how far away you can tolerate having your sample mean be from the unknown population mean, which will give you a “width” you want. Since more data will shrink the variance of the sample mean, you can solve for a sample size that gives you the width (ie distance from population mean) you want.
That being said, this won’t ever tell you exactly what the true population mean is, but it gives you a rough guess
@@very-normal This doesn't answer though. Such a bound is random because sample mean it's random
You’re right that the sample mean is random, but you can control how much it will vary around the population mean with greater sample size. It will still vary based on different data, but the variance you solve for will keep it close to the population mean with high probability