The most important skill in statistics

Поділитися
Вставка
  • Опубліковано 23 гру 2024

КОМЕНТАРІ • 160

  • @youknowwhatlol6628
    @youknowwhatlol6628 Місяць тому +6

    what are the odds of me getting pinned in the comment section?

  • @TheZectorian
    @TheZectorian 10 місяців тому +248

    When you mentioned the Cauchy not having a mean it through me for a loop. I had never thought about how the integrals involved in computing an expectation values can just... not converge and the quantity just isn't defined for that distribution.

    • @very-normal
      @very-normal  10 місяців тому +60

      Yeah, I remember the Cauchy being a recurring frustration in my grad math classes. I had no idea what my professor was talking about. One day I just decided to try a quick simulation on it and then it became clear

    • @kristianwichmann9996
      @kristianwichmann9996 10 місяців тому +3

      It's aaalways the Cauchy, lol.

    • @ashlerh4103
      @ashlerh4103 10 місяців тому +14

      Expectation, (and likewise variance and other moments) are just integrals. So if the area under the curve is infinite then you get non-convergence of expected values. Cauchy is just one of the few that happens to not converge in the first moment

    • @flubadubdubthegreat1272
      @flubadubdubthegreat1272 10 місяців тому +3

      Threw

    • @samuellongo9530
      @samuellongo9530 10 місяців тому +3

      ​@@ashlerh4103But isn't a finite area a requirement for a function to be called "distribution"?

  • @chimurawill
    @chimurawill 3 місяці тому +1

    Just wanted to thank you for your channel. This kind of quality material helps a lot :)

  • @______4790
    @______4790 9 місяців тому +105

    It's gambling

  • @ElNick09
    @ElNick09 10 місяців тому +57

    You have a genuine talent for distilling the essentials of a topic and explaining them clearly and succinctly. Wonderful work!

    • @jonr6680
      @jonr6680 10 місяців тому +1

      Seriously?

    • @ElNick09
      @ElNick09 9 місяців тому +1

      @@jonr6680Genuine compliment without snark. Sometimes there is positivity on the internet.

  • @msiec
    @msiec 10 місяців тому +35

    Came across the video by accident but will definitely stay for longer. I'm honestly surprised that such a good video with clear explanation of the topic has such a small amount of views, definitely deserve more. Keep up the great work!

  • @atlas4074
    @atlas4074 10 місяців тому +23

    Great video. Would love one explaining markov chain monte carlo methods. Another place where assumptions can be sneakily violated is the CLT because it assumes finite variance, so the standard cauchy distribution again gives a counterexample.

    • @very-normal
      @very-normal  10 місяців тому +8

      Thanks! That’s a great idea, they’re so useful but it’s not something I see around a lot

    • @charlesbwilliams
      @charlesbwilliams 10 місяців тому +2

      @@very-normal Hi, I just wanted to blurt out that in my field of Psychology, Marcov Chain Monte Carlo methods are used a lot for estimation of Item Response Models from Item Response Theory. Just a fun tidbit 🙂. Great video!

  • @MrNitroklaus
    @MrNitroklaus 10 місяців тому +53

    Great video.
    One note on the part with the supercomputers: I have the feeling that statisticians' code can oftentimes be sped up by quite a bit if one just takes efficiency into account. In a vectorized language like R, you want to avoid looping over vectors and dataframes. For example, your central limit theorem simulation can work with much more samples, for example as follows:
    ```
    N

    • @very-normal
      @very-normal  10 місяців тому +17

      Thanks! I’m always happy to get some helpful snippets of code, especially if it makes my work faster.
      My limiting factor here was making the gif from the plots, those took forever to finish 💀

    • @joaopedrorocha4790
      @joaopedrorocha4790 9 місяців тому +1

      Another tip that i was experimenting with these days is the use of GPU's. Since random numbers can be generated independently that's a embarrassingly parallel task that's greatly accelerated by the use of GPU's ... i've got 5000-7000x speedups in some simple monte carlo simulations, the greater the number of generated numbers the greater the speedup.

  • @kevinquiroscanales6240
    @kevinquiroscanales6240 10 місяців тому +2

    Quite insightful content, yes; but let's not underestimate the importance of good visuals on conveying a message (or simply hooking up new audience). Both of which your channel does wonderfully, great video!!

  • @anthonybernardi4929
    @anthonybernardi4929 10 місяців тому +3

    Great video! As a first year Grad Student, I loved the Casella Berger shoutout.

    • @very-normal
      @very-normal  10 місяців тому +1

      Thanks! Good luck through your trials and tribulations ahead, intrepid first-year! MS or PhD?

  • @dlhfm4281
    @dlhfm4281 10 місяців тому +4

    At my university, student managed portfolio analysts have to code and run Monte Carlo on stocks to see all likely moves in price a stock can make based on volatility. It’s crazy

    • @daltonsilverman1974
      @daltonsilverman1974 10 місяців тому

      That’s really cool! What university if you don’t mind me asking?

    • @Chrisratata
      @Chrisratata 9 місяців тому

      In that context, is assessing all likely moves/paths a variety of stocks can take meant to arrive at some average state of every stock in the portfolio? Are those paths and averages simulated independently or is there some type of Bayesian interdependence to the path likelihoods?

    • @flubbernub808
      @flubbernub808 7 місяців тому

      Very helpful for option pricing. Depends on your assumption around volatility or you can also feed in the market observed price and back out into a market implied volatility. It’s interesting stuff.

  • @SkegAudio
    @SkegAudio 10 місяців тому +22

    For my senior project for my Petroleum Engineering degree, at first i used Monte Carlos simulations to have a proximation of oil and gas production rates using the surrounding offsets (surrounding wells or with similar features) data. It was awesome 👌
    I then created a machine learning model for more accurate forecasting:)

    • @joelwillis2043
      @joelwillis2043 10 місяців тому +6

      how do you know it was more accurate?

    • @josephdaquila2479
      @josephdaquila2479 10 місяців тому +1

      Was it more fit to your test data?

    • @SkegAudio
      @SkegAudio 10 місяців тому +7

      @@josephdaquila2479 Yes, not only in quantitative but also in behavioral: It did a good job of showing possible dips (days of no production). No overfitting.

    • @jaserogers997
      @jaserogers997 10 місяців тому +2

      ​@@SkegAudiowhich kind of model did you use afterwards?

    • @Rch7780
      @Rch7780 10 місяців тому

      wow sounds good man , at the risk of sounding like a noob (which I am) you basically used a monte carlo simulation to generate data and that data was then used to create a machine learning model?

  • @muhammetmelikkolgesiz9252
    @muhammetmelikkolgesiz9252 10 місяців тому +5

    Thank you for sharing your knowledge! I'm curious and enthusiastic about data and statistics. I'm currently binge watching your videos on my spare time. Keep it up!

  • @rio_agustian_
    @rio_agustian_ 7 місяців тому +2

    Monte Carlo is well knwon as a method to simulate the behavior of particle in physics. The most popular particle transport software is Monte Carlo N Particle (MNCP), developed under The Los Alamos National Laboritory. Yes, it's where the Project Manhattan took place. In fact, the MC simulation was invented to overcome the problem when they create such weapon.
    It's really fascinating to know a method that originally invented as a weapon development during the war now has an immense broad of application in the world like forecasting, pharmaceutical development, finance, radiation science, etc. What a nice method (except for the comically long computational time)!

  • @mgostIH
    @mgostIH 10 місяців тому +14

    One issue I have with this video is that it first describes statistics as in "We observe reality and make inference about parameters and our models from it and refine it in a feedback loop", which is a completely bayesian framework,, while in practice you then explain frequentist methods (like estimating the power, type 1 or 2 errors, and even the usage of the law of large numbers isn't entirely correct for that type of inference).
    It's not your fault that statistics is often taught like this, but the philosophical framework you think you've setup is different from the one you actually use, leading to confusion in the long run, like what does it really mean to do a null hypothesis test under the standard framework? If I can rule out something, can I also rule **in** that some parameters are a certain way? At the end of the day, what's the optimal way of deciding things about reality? Standard frequentist tests won't answer that for you.
    If you think that the methods you're using allow you to say "Oh, given this data (even simulated) the parameter must be within this range" then you've been misled and should rather search for bayesian methods.

    • @very-normal
      @very-normal  10 місяців тому +6

      I originally intended that the diagram get across the idea that people learn from data and do more experiments, frequentist or Bayesian. But I definitely see that the it’s more properly Bayesian in the way I’ve described it.
      I thought about talking about Bayesian stuff here but I decided against it. The Frequentist-vs-Bayesian topic deserves its own video
      I really appreciate the feedback btw, thanks!

    • @mgostIH
      @mgostIH 10 місяців тому +5

      @@very-normal Sure! I hope you include something that's not seen in the usual videos, for example what's the likelihood principle.
      If you want a brief introduction, the likelihood principle essentially says that assuming some model, if you observe some data you should arrive at the same conclusion no matter how you decided to gather it.
      For example, in two experiments on the same coin I can decide to:
      - Throw the coin 10 times
      - Throw the coin until 3 heads in a row appear
      Say I get the same exact result from these two different experiments, notice how the decision of when to stop doesn't influence the coin's behaviour itself.
      One would then think that whatever decision you make about the coin (for example determining whether it's a fair coin) should depend only on what you've seen, no matter the experiment, assuming the results were the same.
      It turns out however that frequentist methods don't respect this and will get you different p-values for the two cases, while bayesian methods respect it!
      This simplifies a lot of issues when deciding whether to stop midway when carrying an experiment, bayesians can do it without issue, frequentists will get different p-values from that decision.

    • @very-normal
      @very-normal  10 місяців тому +4

      That’s definitely a key idea to include in a video, I appreciate the introduction.
      UA-cam needs more Bayesian content, and I’ll be a part of pushing that lol

  • @mikiallen7733
    @mikiallen7733 10 місяців тому +7

    I believe you forgot to include "parameter estimation" versus "state estimation" is one of the hottest upcoming skills as well

  • @elishmuel1976
    @elishmuel1976 9 місяців тому

    LOL, your intro got me subbed. Cause you're right!

  • @yds6268
    @yds6268 10 місяців тому +9

    Nice video! I have a PhD in physics, and some of my colleagues use MC very extensively in their work. In some fields this is the main if not the only viable simulation method.

    • @eigentensor
      @eigentensor 10 місяців тому +3

      Same in computer graphics, it's all Monte Carlo now :)

  • @drako3659
    @drako3659 10 місяців тому +3

    As a software engineering student I always wanted to do this in my stats classes. Just build up all the fancy distributions and tests from first principles (and the bog-standard PRNG you get in every language under the sun.)
    Thought it was a crutch, but nice to see that even you galaxy-brain types like to get computational.

  • @santiagodm3483
    @santiagodm3483 10 місяців тому +4

    Love your videos!
    It would be good to recommend a book to learn Montecarlo!

  • @keppr44
    @keppr44 10 місяців тому +3

    Phenomenal work as always, such a succinct explanation and the graphics complement it perfectly!

  • @joshstat8114
    @joshstat8114 10 місяців тому +1

    I really like that you covered this concept. Do you have any plan to cover a video about Markov Chain Monte Carlo (MCMC) soon?

  • @leeris19
    @leeris19 5 місяців тому +1

    When I was young, I always distanced myself from statistics, although a part of me loved its brother provabiliy, because I love mathematics. This is because for me back then it's just all memorization and involves no critical thinking at all! Now that I'm studying machine learning and has finally accepted statistics, I realized that I've been in love with it all along but is just too arrogant to accept it. "I knew you'll come back to me", I sometimes here it say this. 😂

  • @drunky5247
    @drunky5247 10 місяців тому +1

    I dont get one thing: I studied this subject and I was wondering how I effectively uses this. I mean, as the professor told us, we have some amount of variables with a specific distribution. We then try to guess those distributions going to see the frequency of the combination on the values of thevariables, that are dependent on the distriburion. Suppose we have two boolean variables, we that have 4 possible states for the system, TT TF FT FF. We generate random samples for the probabilities and use them to see what value gets assigned to the whole system... but to be able to assign the correct value dont we have to know before the proabability of those variables? Are we simulating those values to check after how many sample we reach the actual proability, that we already knew? I might be missing something... If we have a some kind of system where you input datas and the output is a state than its ok for me to look for the probability of each state using montecarlo. Maybe this has nothing to do with finding THIS distrobution, rather something else?

    • @very-normal
      @very-normal  10 місяців тому +2

      This may not be the right answer for you, but it’s an educated guess. In this simulation, you have control over the success probability of these two Boolean/Bernoulli variables. These are the true underlying mechanisms. If you generate lots of samples from these two variables, you should see that the sample mean will approach the probability you chose.
      I think the thing you are missing is some numerical result of interest to you. In what you described, you’re interested in the probability of heads (I think) and that happens to be something you can control. But there may be other values you could be interested in that can be generated by this system.
      One example is the number of coin flips you need until you get your first heads. This number is of interest to gacha players how many times (on average) they might need to pull until they get what they want. lol relevant example to my life, but hopefully it helps clear things up

  • @Siroitin
    @Siroitin 10 місяців тому +1

    Happy to see Casella and Berger mentioned!

  • @tr0wb3d3r5
    @tr0wb3d3r5 10 місяців тому +4

    Recently learnt about MC simulations because they are a key part in testing algorithmic trading systems. Interesting stuff.

  • @barrilha
    @barrilha 8 місяців тому

    "on average you need it" cracks me up every time! hahaha

  • @Unaimend
    @Unaimend 10 місяців тому +2

    Damn this video really blew up. You deserve it

  • @briskioO
    @briskioO 10 місяців тому +12

    Chapter 5 Casella & Berger reference goes hard💀

  • @EW-mb1ih
    @EW-mb1ih 6 днів тому

    It seems weird that the Cauchy distribution doesn't have a finite mean. It looks like the Normal distribution with fatter tails. Any reason why it doesn't have a finite mean?

    • @very-normal
      @very-normal  6 днів тому +1

      The technical reason is because of how the expectation is defined, the resulting integral for a Cauchy pdf turns out to be undefined.
      Another way I like thinking about it is in terms of the law of large numbers. It turns out that, thanks to those fat tails, super extreme outcomes are just common enough that a sample mean of Cauchys will never approach the middle value. It will approach it for a bit and then - bam - an outlier totally throws it off.
      Even though the Normal is also bell-shaped, its values are concentrated around the mean to such an extent such that outliers are improbable enough that this doesn’t happen

    • @EW-mb1ih
      @EW-mb1ih 6 днів тому

      @@very-normal thank you!

  • @priyesh123456789
    @priyesh123456789 10 місяців тому

    One thing I don't understand how do we obtain the expected value of a model. Since metric like biase, empirical SE and coverage depend on it. If we have the expected value, why build a model that will then try to obtain a value close to the expected value. Is the expected value something measured empiralcally (like in a wet lab)? In the paper presented, what is being used as the expected value.

    • @very-normal
      @very-normal  10 місяців тому

      heyo, I’ll do my best to answer your question, based on my understanding. I’m not quite sure what you mean by “expected value,” but I’m interpreting it as the average of a numerical result of interest (bias, SE, coverage). These are all functions of the estimated treatment effect, which means that they also have probability distributions. These distributions are all influenced by the underlying data and are too complex to derive analytically. The population means of these distributions would be interpreted as the “true/population” bias, SE, coverage, etc.
      The authors then estimate these using Monte Carlo simulations to produce a distribution for each of these metrics. The empiric mean of these metrics are listed in the table, and they are implied to be good estimates of their respective expected (“population”) values.
      In this case, “expected value” of the metric is the theoretical population value we’ll never see; the sample mean is the estimate we can get from data produced by Monte Carlo sims
      Hope this helps a little bit, it’s a great question

  • @preston7376
    @preston7376 10 місяців тому +5

    I implemented Monte carlo sampler for a Raytracer but never fully understood why it works. Great video

  • @prod.kashkari3075
    @prod.kashkari3075 9 місяців тому

    Hey, I’m a MS stats here. I love your channel! Can you maybe do a video talking about your career path? Did you do a MS or PhD in Stats?

    • @very-normal
      @very-normal  9 місяців тому +3

      Thanks! That could be a good video to do! I'll keep a note to myself about this comment.
      Both my MS and PhD are in Biostatistics. Slightly more applied than Statistics, but a lot of my coursework was with Statistics MS and PhDs.
      Responding to your other comment, I did a Ph.D because I really wanted the independent research skillset that Ph.Ds have. After going through most of my MS work, I felt like I had practical technical skills, but I felt like I could only do things after being told what to do. I liked the idea of being able to face a problem by myself, figure out a plan to tackle it, and then act on the plan. This isn't specific to Ph.Ds, but getting one gives you dedicated time to develop as an independent researcher, especially after you finish coursework.
      I think it's perfectly valid to try industry first before going for a PhD. It can give you a better idea of areas you like/hate, and make your time in the Ph.D more focused once you get in. But, you'll definitely feel the drop in pay going from industry to academia lol. Hope this briefly answers your question, I'll think about a more thoughtful response in the meantime.

  • @charlesbourgoigne2130
    @charlesbourgoigne2130 10 місяців тому +1

    Cool that I found your channel!

  • @Tyronlol
    @Tyronlol 10 місяців тому

    I litterally have a psychometrics (statistics for psychology research) test tomorrow and this was reccomended to me now

    • @very-normal
      @very-normal  10 місяців тому

      Good luck!

    • @Tyronlol
      @Tyronlol 10 місяців тому

      Passed it! I got blessed from you I think@@very-normal

    • @Tyronlol
      @Tyronlol 10 місяців тому

      It was a nightmare, really admire you guys for studying statistics 💀

    • @very-normal
      @very-normal  10 місяців тому

      Congrats! Honestly the stuff from my psych stats class kinda kicked my ass when I took it

  • @letstalkaboutmath2121
    @letstalkaboutmath2121 7 місяців тому

    A suggestion in code at 6:04:
    observations = rnorm(10000, 2, 1)
    xbars = cumsum(observations)/(1:10000)
    is almost 900 times faster

  • @ufuoma833
    @ufuoma833 10 місяців тому +2

    What time to be alive.

  • @JimmyMGhill27
    @JimmyMGhill27 10 місяців тому

    Really a nice video, you got a new subscriber :) I had a question if it's possible, I know you gave some examples but I would like to understand better the "intermediate" one: I should think that the hypothesis test applied is the same for all the simulations (in which for each one we test 2 models which, for example, estimate an unknown parameter) or what we are actually doing is investigating the power of 2 statistical tests? My doubt arises from the fact that the concept of power is something that usually refers to hypothesis tests and not to a "model"... but perhaps when you talk about power you mean the one that in machine learning models (for example) is usually called Recall (aka sensitive)?

    • @very-normal
      @very-normal  10 місяців тому +1

      Thanks! I’m glad you liked it!
      I can try to explain a bit more. The problem I was interested in was estimating a response rate for some hypothetical drug. I was comparing how different models perform in different types of data. Each of these models have different ways of estimating the response rate.
      Each of the models themselves contain a parameter which represents the response rate I want to estimate, so I can apply the same hypothesis for all of them and see if it was rejected or not based on the data I generate. Some models perform much more poorly when their assumptions aren’t met, and I was trying to quantify how badly their performance (I.e. power, type-I error) was affected compared to ideal situations.
      And you’re right, power and sensitivity are very similar, but come from slightly different contexts. My feeling is that power is for decision making in hypothesis tests, sensitivity is for prediction tasks. They both condition on there being a true effect or actually having some condition.
      As an aside, my simulation studies were actually Bayesian in nature, so my work was actually kinda like a Bayesian-frequentist hybrid. But that’s a whole other story lol, hope this helps to clarify

    • @JimmyMGhill27
      @JimmyMGhill27 10 місяців тому

      Thanks for the reply! Now it's clearer to me. It would be interesting to understand in detail what type of data you generated and how, what type of clinical trial you simulated and what models you used, etc. You could make a little spin-off of this video, it would be super cool! :)@@very-normal

  • @shanetutwiler
    @shanetutwiler 10 місяців тому

    Yes! This was the biggest gap in my stats sequence in grad school. I use simulations all the time now.

  • @darkchoco7407
    @darkchoco7407 10 місяців тому +1

    Can you make a video on synthetic data generation for multivariate datasets?

    • @very-normal
      @very-normal  10 місяців тому +1

      You mean like multiple outcomes yeah?

  • @kamikamen_official
    @kamikamen_official 9 місяців тому

    Your content is amazing!

  • @AdolfoWatanabe
    @AdolfoWatanabe 9 місяців тому

    Nice video! Well explained and fun. Thanks

  • @damiaoribeiro620
    @damiaoribeiro620 10 місяців тому

    This an incredible video. Thank you for creating this content

  • @AcesAndNates
    @AcesAndNates 10 місяців тому +1

    Subbed. If you can work some poker or game theory math, that would be super interesting.

  • @Paragonatore1670
    @Paragonatore1670 10 місяців тому

    Hi,
    Is there any good book that you will recommend me to have a better understanding of statistical modelling ? I had a course in my bachelor in statistics and I've always used some of these tools in other courses. But this explanation seems to be one step further. Is it possible to find literature explaining these concepts more in detail ? Thank you for the video, subscribed

    • @very-normal
      @very-normal  10 місяців тому +1

      Sure, I can try recommending something. What kind of problems you usually work with?
      Generally I’d say “Statistical Inference”by Casella Berger since it has solutions to help you check your understanding. I’ve also read a bit of “All of Statistics” by Larry Wasserman, though I have less experience with it.
      Thanks for watching and subscribing!

  • @zohaibbaloch1566
    @zohaibbaloch1566 8 місяців тому

    Make a detailed video on degrees of freedom....
    Please

    • @very-normal
      @very-normal  8 місяців тому +1

      lol I’ve had a script gathering dust for degrees of freedom because I’m still trying to make it make sense, but I’m working on it!

  • @lavandolouca6630
    @lavandolouca6630 10 місяців тому

    7:51
    The mean is going to zero. It just needs more sample. No?

    • @very-normal
      @very-normal  10 місяців тому +1

      You’re right, it is going to zero, but outliers in Cauchy variables are common enough that this probably won’t happen. The sample size is already at 10,000, so it’s pretty large already
      The Law of Large Numbers more technically requires that the deviation away from the population mean stay within some arbitrary range with infinite sample size. Occasional outliers will push this deviation out of this range

    • @lavandolouca6630
      @lavandolouca6630 10 місяців тому

      @@very-normal the devil lives in your assumptions of what is large
      7 bilion people in the world and there is data about each one. How about that for a sample?

  • @TheOneMaddin
    @TheOneMaddin 10 місяців тому +2

    Now I know all the most important skills ... but I still don't have them (pun on 0:14)

  • @martinsanchez-hw4fi
    @martinsanchez-hw4fi 10 місяців тому

    Bei good video! What tool do you use to make the animations?

    • @very-normal
      @very-normal  10 місяців тому

      Thanks! For mathematical and notation animation, I use the manim Python library. And for most others, I use key framing in Final Cut Pro.

  • @brainboyben
    @brainboyben 10 місяців тому

    Great video. I love it when people break down and opperationalise the statistical process of collecting and testing empirical data. I'll be sharing this with future cognitive science students when supervising their thesis projects.

  • @rowanrobinson
    @rowanrobinson 10 місяців тому +3

    I remember when I was looking at Monte Carlo risk assessments I could just re run the model to get the result I wanted. It was such a flawed way of looking at risk.

    • @whatyouwantyouare
      @whatyouwantyouare 6 місяців тому +1

      Perhaps you know already, but just for clarity. Yes that would be deceptive, but just for clarity, the way to fix that is run the simulation many times and construct a whole probability distribution of the risk (from which you could take simplified measures like mean, variance) ... using one simulation to draw conclusions, is like using one point from a probability distribution ...

  • @adammontgomery7980
    @adammontgomery7980 6 місяців тому

    When you say "model", do you mean the type of statistical distribution that you're drawing from? It took me a little while to realize that the randomness we need to generate for the simulations shouldn't always be a normal distribution, and we have to generate samples for all sources of variation in the "model".

  • @FocusProj
    @FocusProj 10 місяців тому

    Thank you for putting things in English. Now I will binge watch all your videos 😅

  • @SILOETTE100page
    @SILOETTE100page 10 місяців тому

    I'm just here for the excellent memes. thank you

  • @paaabl0.
    @paaabl0. 10 місяців тому

    Very good video! You kinda slided over the topic of the power of model....

    • @very-normal
      @very-normal  10 місяців тому

      Yeahhhhh I coulda spent more time on the intermediate section. But! The power calc deserves its own video, so that’s a future thing

  • @MagicAndReason
    @MagicAndReason 10 місяців тому

    I hope you'll please consider using a thicker font on some of your labels. They are very difficult to read.

    • @very-normal
      @very-normal  10 місяців тому +1

      Thanks for your feedback! I’ll look at some different fonts for future videos

  • @RezaArdhiansyah
    @RezaArdhiansyah 10 місяців тому +3

    i only know how to calculate mean, median, modus. my brain hurt. help.

    • @very-normal
      @very-normal  10 місяців тому +2

      Don’t worry, I know graduate students who don’t understand this stuff even after a quarter of learning. It takes time my dude

  • @toxickremedy
    @toxickremedy 10 місяців тому +1

    On average you need it. Decided to watch whole videos when I heard that lol

  • @niftkislay
    @niftkislay 10 місяців тому

    Should the argument inside the rnorm or rcauchy command not be n instead of 1 that it has. Great video by the way

    • @very-normal
      @very-normal  10 місяців тому

      The 1 lines up with the “n” argument in the function, so it ends up running the same. Good catch tho, I’ll try to write it out explicitly next time

  • @AutoDisheep
    @AutoDisheep 10 місяців тому

    Hi Christian, do you use measure theory at all?

    • @very-normal
      @very-normal  10 місяців тому +1

      Nah, I didn’t need to take the grad level probability classes at my university, so I’m only aware that it exists lol

    • @AutoDisheep
      @AutoDisheep 10 місяців тому

      @@very-normal Thank you; great video. I didn't even learn Monte Carlo sims, because I thought it was not introduced as a way to build statistical models. I was happy with these "skills in statistics" video. Keep it up please

  • @johnchessant3012
    @johnchessant3012 10 місяців тому +1

    great video

  • @Unaimend
    @Unaimend 10 місяців тому

    Damn this video really blew up. Nice that you start to get more views

  • @paulhax
    @paulhax 10 місяців тому

    I got lost mid way through. Might help if you kept up with the concrete examples for each concept, like you did with the normal/cauchy plots. Liked your videos style.

    • @very-normal
      @very-normal  10 місяців тому +1

      Thanks for the feedback! I felt there was something more I could do with it, but I didn’t want to dwell on it, I’ll have a better viz for it next time

  • @Med1e-ro4jb
    @Med1e-ro4jb 10 місяців тому +1

    We want the MCMC

  • @Hescar1
    @Hescar1 10 місяців тому

    im here cous youtube suggestion, I was looking for information about monte carlo simulations to ajust predictions on cycletime in software development. I'd be great if you can make a video about apply montecarlo simulations to solve problems like mine, someone told me that its better than taking the "average" but I dont really know why its better

    • @very-normal
      @very-normal  10 місяців тому

      I’m not really familiar with software dev, but I can take a crack at an answer.
      My best guess is that software deliverables have several steps that need to be done before they’re completed. Each of these steps takes time, but you don’t know how muchahead of time because that’s how coding is. A Monte Carlo approach here might be to give each “task” a distribution on the time it’ll take, like a Poisson. Then, the sum over all these “time distributions” gives you the total time needed to finish a project.
      By replicating this many times, you can get a sense of best and worse case scenarios (ie the variance of the total time), in addition to average behavior. This gives you more information to plan from, compared to a simple average of past times you’ve had to complete tasks

  • @astarothgr
    @astarothgr 10 місяців тому

    Bruh, all well and good, but the code lines not lining up with the *line numbers* on the terminal triggered me.

    • @very-normal
      @very-normal  10 місяців тому

      Man, it kills me too 💀I love that that the manim library can animate code typing for me, but I can’t get it to line them up. It’s better than watching me type with typos tho, that’s for sure

  • @meowmix0008
    @meowmix0008 7 місяців тому

    1:30s in and so many buzzwords are used

  • @tonimorton
    @tonimorton 10 місяців тому

    my brain hurts.

    • @very-normal
      @very-normal  10 місяців тому +1

      You and me both my friend

  • @colossalsteve
    @colossalsteve 10 місяців тому

    I feel like I just Dont get it. I’ve always felt like Monte Carlo is useless. I hoped this video would change my mind, but it didn’t.
    I’m an actuary, have my FCAS designation, worked in stats for insurance for 9 years.
    If I can define the parameters to build my simulated dataset, then my thumb is on the scale. I get to decide the parameters of the data that I generate. If I am the one who builds the simulated data, how is that data an appropriate way to measure the power and bias of a model? Your example of Normal vs Cauchy resonates with me. If the underlying process is actually a different type of random than the type I choose to build for my simulation, then any conclusions drawn from the simulated data are unreliable.
    If i am pricing auto insurance, and I could use Monte Carlo to just decide how often different hypothetical drivers have hypothetical car accidents, then I would have a super power. But if I decide what hypothetical drivers and accidents to create, then my simulation is a reflection of my world view. Am I completely misunderstanding this?

    • @very-normal
      @very-normal  10 місяців тому +1

      Hey, thanks for watching! I’m sorry it didn’t fully resolve the questions you had. Based on what you said, it sounds like you have extensive statistical experience.
      I don’t think I can give you a fully satisfactory answer to your questions, but I’ll try! For the case of power, I can simulate datasets with a given treatment effect, null or not, and perform hypothesis tests with some model of choice. It’s only really “appropriate” when the real world matches with the data I simulated, which is practically impossible. But, for a biostatistician, it at least helps plan how many people are needed for a new trial to be successful. It’s more a planning tool than anything.
      You’re right that the simulation is really a reflection of how you think the world is. Any results you get from doing simulation are only applicable to whatever parameters you used to generate it, so it’s Monte Carlo is pretty limited in that respect. Like you mentioned the Normal vs Cauchy example, you can try to see how a model’s performance is influenced by deviations from ideal conditions. I wouldn’t say that the results would be unreliable, but a reflection of the fact that the model can be negatively influenced by misspecification.
      On a side note, do you have a textbook you like for actuarial statistics? It’s a totally new field to me, and you’re the first actuary I’ve encountered in my life lol

    • @colossalsteve
      @colossalsteve 10 місяців тому

      @@very-normal thanks for the thoughtful reply!
      I will concede any point in the bio stats domain. Maybe that’s the difference. If you and I need different results, then we can use different tools, and it’s about using the right tool for the job. I can appreciate the massive task there is to build a power test for bio stats, and that it would be cost prohibitive to do that all with organic tests. Thank you for sharing some knowledge about it :)
      If you want a nice intro to actuarial work (at least the branch that I know best!) I would point you towards the Casualty Actuarial Society (CAS) Intro to Ratemaking textbook by Werner and Modlin. It’s a free pdf available online. If the first chapter seems interesting, then you might enjoy more about the actuarial worldview! If the first chapter isn’t for you, then feel free to leave it alone. We are stats people who really dig into insurance problems.

  • @MatheusC1729
    @MatheusC1729 10 місяців тому

    I'm a mathematician, so I know all that, but it is very helpful and I'll definitely recommend to friends

  • @MKhan-zo8xo
    @MKhan-zo8xo 10 місяців тому +2

    LETS GOOOOO THANK YOU

  • @TheThreatenedSwan
    @TheThreatenedSwan 10 місяців тому

    Hi I'm normal

    • @very-normal
      @very-normal  10 місяців тому +1

      but are you very normal

  • @ShrK_123
    @ShrK_123 7 місяців тому

    ❤❤❤❤

  • @TheZectorian
    @TheZectorian 10 місяців тому +1

    FIRST!!!

  • @kumardigvijaymishra5945
    @kumardigvijaymishra5945 10 місяців тому +1

    'Professors are at leading edge of research.' - That's a generalization. Most of the research and writing work is done by multiple PhD and post-docs for the professor. Becoming a professor is more like a political achievement nowadays, rather than an academic one.

  • @a.gholiha6884
    @a.gholiha6884 6 місяців тому

    Terrible, and I have over 3 years research level stat studies

  • @santiagonoaccorosende8421
    @santiagonoaccorosende8421 8 місяців тому

    Spotted many errors in this video unfortunately. The worst one is not ubderstanding the Law of Large Numbers, it doesnt refer to a specific statistic like tge mean; but about the assymptotic dist of a sum of random variables.

    • @very-normal
      @very-normal  8 місяців тому

      what other errors were there?

  • @bobthebuilder9416
    @bobthebuilder9416 10 місяців тому

    Ai will replace every statistician on earth can I get credit for being one of the people pointing this out as soon as I heard about ai a year ago? Yes I know so many did but I just want credit because so many didn't understand the impact ai would have, and since I work in statistics, I immediately knew...

    • @very-normal
      @very-normal  10 місяців тому +3

      I’ll give you credit my dude

    • @incertosage
      @incertosage 10 місяців тому

      I thought AI needs statistics

    • @bobthebuilder9416
      @bobthebuilder9416 10 місяців тому

      @@very-normal thank you bro

  • @Abdulboatengi
    @Abdulboatengi 10 місяців тому

    What I find pathetic about these self-proclaimed geniuses is that they are always smarter than their boss but lack risk. Pathetic

    • @Yuvraj.
      @Yuvraj. 10 місяців тому

      Results and intelligence are not the same and you would do well to be able to hold them separately in your mind

    • @Abdulboatengi
      @Abdulboatengi 10 місяців тому

      @@Yuvraj. when society breeds poor mentalities so that politicians can divide and conquer- it creates reality and illusions. When the richest men on earth don’t have college degrees and have people beneath them that know rocket science but can’t build rockets- there is something with the world

  • @samowarow
    @samowarow 10 місяців тому

    HHow to tell exactly how many samples are needed to achieve a desired accuracy in estimation of mean?

    • @very-normal
      @very-normal  10 місяців тому

      One way you could try is to apply the Central Limit Theorem. You can decide how far away you can tolerate having your sample mean be from the unknown population mean, which will give you a “width” you want. Since more data will shrink the variance of the sample mean, you can solve for a sample size that gives you the width (ie distance from population mean) you want.
      That being said, this won’t ever tell you exactly what the true population mean is, but it gives you a rough guess

    • @samowarow
      @samowarow 10 місяців тому

      @@very-normal This doesn't answer though. Such a bound is random because sample mean it's random

    • @very-normal
      @very-normal  10 місяців тому

      You’re right that the sample mean is random, but you can control how much it will vary around the population mean with greater sample size. It will still vary based on different data, but the variance you solve for will keep it close to the population mean with high probability