Terence Tao's central limit theorem, Double Factorials (!!) and the Moment Method

Поділитися
Вставка
  • Опубліковано 23 лип 2024
  • An animated video version of the moment method proof of the central limit theorem, which I first read about in Terrence Tao's book “Topics in Random Matrix Theory” (see link to book below). The proof features the double factorial sequence as the bridge linking sums of random variables to Gaussians. According to Wikipedia, this proof method was originally discovered by Chebyshev in the 1800s. en.wikipedia.org/wiki/Method_.... A short written pdf that also contains the ideas of this proof is available here www.cs.toronto.edu/~yuvalf/CL....
    ==================================================
    FOLLOW UP QUESTIONS
    ==================================================
    Possible questions to answer in a follow up videos. Let me know in the comments if you would want to see any of these!
    Q: Why is this in a random matrix book? A: There is a similar proof for something called the semi-circle law for random matrices. If you understand this proof, you can understand that one quite easily!
    Q: Why is a sum of Gaussian’s still a Gaussian? A: There is a cute way to see this using only moments! It is a cute counting argument with pair partitions which are labeled with some extra labels.
    Q: How is this related to the characteristic function/Fourier transform? A: The moments can be added together in a generating series to get things like the characteristic function .
    ============================================
    LINKS
    ============================================
    Link to Terence Tao’s book (from his terrytao blog). The moment method proof is section 2.2.3 starting on page 106.
    terrytao.files.wordpress.com/...
    Short pdf that includes the main proof ideas
    www.cs.toronto.edu/~yuvalf/CL....
    3Blue1Brown Central Limit Theorem playlist:
    • Central limit theorem
    But what is the Central Limit Theorem?
    • But what is the Centra...
    Why π is in the normal distribution (beyond integral tricks)
    • Why π is in the normal...
    Convolutions | Why X+Y in probability is a beautiful mess
    • Convolutions | Why X+Y...
    Why is the "central limit" a normal distribution?
    • A pretty reason why Ga...
    Image of Terence Tao is a public domain image from wikipedia: commons.wikimedia.org/wiki/Fi...
    A note I referenced on fair use of book covers psu.libanswers.com/faq/336502
    ==================================
    Info on how the video was made
    ==================================
    All animations were made in manim www.manim.community/ and I also used manim-voiceover for the voiceover voiceover.manim.community/en/.... Unfortunately, this took a really long time to do! One way I am hoping to make things a bit faster might be to record it as a slideshow using manim-slides www.manim.community/plugin/ma... ... I might try to do this on my next video so it doesn't take quite so long to record everything!
    All my manim code is available at github.com/mcnica89/manim if you want to copy/paste something out :)
    ===============
    Timestamps
    ===============
    0:00 Terence Tao
    0:38 What is the CLT
    2:40 But why?
    3:30 Double factorials
    4:30 Gaussian moments
    5:52 Moments of sums
    6:29 Pair partitions
    9:00 Two main results
    10:08 Part 1-Gaussian moments
    13:08 Solving the recurrence relation
    15:35 Part 2-Moments of sums
    17:17 k equals 1
    18:00 k equals 2
    21:47 k equals 3
    25:42 k equals 4
    29:41 Recap and overview
    #3blue1brown #some #some3 #probability #Gaussian #math #integration #integrationbyparts #combinatorics

КОМЕНТАРІ • 187

  • @diribigal
    @diribigal 11 місяців тому +63

    I made it. Also, I was all ready to ask about Moment Convergence but you addressed it very well at the end. Great video showing the beautiful recursion here!

    • @mcnica89
      @mcnica89  11 місяців тому +5

      Congrats! 🎉 I think you are the first one to actually leave a comment :)

    • @diribigal
      @diribigal 11 місяців тому

      @@mcnica89 Actually, @mellifluousb.4549 was first.

  • @diffusegd
    @diffusegd 11 місяців тому +63

    If you're wondering about the "Random matrix" theory mentioned somewhere,
    Normally when we consider random variables, they are real/complez valued. In particular, they are *Commutative*. XY is distributed the same as YX
    In the case of random matrices, since matrices themselves aren't commutative, random matrices aren't either. This causes problems with usual independence properties. For example, for two Gaussian Unitary ensembles (a common random matrix), E(XYXY) != E(XXYY)
    (I just used the expectation of a random matrix here. This is defined as the expected value of the eigenvalues.)
    It turns out, when you have Non commutative random variables, there exists a central limit theorem!
    But it is not a normal distribution. It is a semicircular distribution!
    The proof follows the same way as the video, except when we have partitions, we instead have only the *non-crossing partitions*. It turns out the number of non-crossing partitions is the Catalan numbers, which also happen to be the moments of the Semicircle distribution!
    And lastly, remember the Gaussian ensembles before? It turns out as the matrix size goes to infinity, the eigenvalues are semicircular distributed.

    • @mcnica89
      @mcnica89  11 місяців тому +18

      Yes that's right! You have basically described the followup video I'm planning to make. This current CLT video is a helpful warmup to that proof :) Don't forget to subscribe if you are interested in seeing that one next!

  • @kikivoorburg
    @kikivoorburg 11 місяців тому +21

    I made it through the proofs! I really appreciated that you laid out the direction we were going before actually proving each step - giving an overview beforehand makes it much clearer what we’re doing and why. Great job!

    • @mcnica89
      @mcnica89  11 місяців тому +1

      Thanks! Glad you liked it :)

  • @rnengy
    @rnengy 11 місяців тому +7

    I made it through to the end! As a physicist, the connection between pairing objects and Gaussian moments reminds me of Wick’s theorem. I had heard that quantum field theory and random matrix theory were related, and this just looks like the start of the rabbit hole - excited to see where this leads in future videos!

    • @mcnica89
      @mcnica89  11 місяців тому +2

      Thanks! Yes this moment thing is closely related to the Wick formula for Gaussian. (And is also related to the cumulative expansion and the fact that Gaussian have all cumulants zero except the mean and variance)

  • @Qsdd0
    @Qsdd0 11 місяців тому +13

    This is pretty well-suited for someone at an undergrad level who has taken a solid probability course and remembers their partial integration. You sweep a little bit under the rug with the latter but I think we all survive. Being just a teensy bit more clear about E(X^0)=1 (that this is just the total probability) might be good for that level. Also maybe some comments about how we may reduce to the case of mean 0 and variance 1 would be good.

    • @mcnica89
      @mcnica89  11 місяців тому +1

      Thanks for the comment! Its good to know what people think about the level...its hard to estimate this sometimes because you so easily forget what was easy/hard. (For example, I would have guessed that E[X^0] = E[1] = 1 would be no problem, but actually I see now that this can be tricky if you don't have a lot of practice with E as you mentioned).
      I did originally plan to do it for general mean and variance, but I found it was mostly just annoying and didn't add much to the deep understanding of the proof, so I cut it for time. (Of course, the actual usefulness of the CLT is that it works for any fixed mean and variance!). Thanks again for the feedback!

  • @mcnica89
    @mcnica89  11 місяців тому +23

    I have a few ideas for follow up videos related to this video (subscribe if you are interested!). Let me know which of these three question you most would like to see an answer to (or let me know if you have other questions that would make a good follow up!)
    Q1: Why is this in a random matrix book? A: There is a similar proof for something called the semi-circle law for random matrices. If you understand this proof, you can understand that one quite easily! This topic is a great introduction into the wonderful world of random matrices.
    Q2: Why is a sum of Gaussian’s still a Gaussian? A: There is a cute way to see this using only moments! It is a cute counting argument with pair partitions which are labeled with some extra labels.
    Q3: How is this related to the characteristic function/Fourier transform? A: The moments can be added together in a generating function to get things like the characteristic function .

    • @diribigal
      @diribigal 11 місяців тому +5

      Q1>Q3>>>Q2. Q2 sounds like a fun exercise for people who liked this video.

    • @acerbic-piglet
      @acerbic-piglet 11 місяців тому +1

      @@diribigal I endorse this ranking.

    • @tomkerruish2982
      @tomkerruish2982 11 місяців тому +1

      I favor Q1, but that's because I took several courses from Craig Tracy when I was at Davis.

    • @Mr0rris0
      @Mr0rris0 11 місяців тому

      Number 3 for sure
      Because i used to shoe horses
      Plus Renmedallion of stock market fame.
      Like what are they building in there? With the shapes on Eulers day off and such

    • @theodorevassilakis2061
      @theodorevassilakis2061 10 місяців тому

      Q2->Q3->Q1. Also would love to see anything you can say about quantitative results like Berry-Esseen

  • @surojitbiswas3573
    @surojitbiswas3573 10 місяців тому +2

    ONE OF THE BEST VIDEO ON CLT. NICELY EXPLAINED, THANK YOU.

    • @mcnica89
      @mcnica89  10 місяців тому +1

      YOU ARE VERY WELCOME :)

  • @thefunpolice
    @thefunpolice 10 місяців тому +1

    A nice exposition. Made it to the end and enjoyed every moment.
    The CLT completely blew me away when I first heard of it, it made statistical analysis, which I had foolishly disregarded as a bit wishy washy, suddenly seem beautiful and curious.
    From memory there's a nice description of the convolution proof of the CLT in Brad Osgood's excellent lectures on Fourier Methods (available on UA-cam) for anyone who is interested in seeing that.

  • @joshdavis5224
    @joshdavis5224 11 місяців тому +2

    As a mathematician who is also currently taking mathematical probability and probability, this video is just what I have been needing! Thank you, good sir.

    • @mcnica89
      @mcnica89  11 місяців тому

      Happy to hear you were inspired! Probability truly has some of the nicest math problems and neat tricks I've ever seen :)

    • @jamesfullwood7788
      @jamesfullwood7788 10 місяців тому

      What's the difference between mathematical probability and probability? Perhaps you meant mathematical statistics and probability?

    • @joshdavis5224
      @joshdavis5224 10 місяців тому

      @@jamesfullwood7788 just look up the courses STAT 418 and STAT 581 from Rice University if you would like to know the difference

  • @columbus8myhw
    @columbus8myhw 11 місяців тому +1

    I really like how you "cut through the curtain" at the start and end and show us a more rough presentation. It allows for more off-the-cuff remarks that don't necessarily "fit" into a tightly scripted video.

  • @metcalfal
    @metcalfal 10 місяців тому +1

    Great video. Thanks for making it so clear.

    • @mcnica89
      @mcnica89  10 місяців тому

      You're very welcome!

  • @sdsa007
    @sdsa007 11 місяців тому +1

    unlike convolutions, the most striking aspects of this video for me are pair partitions and singletons and their notation for proving central limit theorem.... there are a lot of useful math tricks ... I am going to watch it again to learn them all! Thank you!

    • @mcnica89
      @mcnica89  11 місяців тому

      You're very welcome! Sorry if it was too fast the first time around...I definitely encourage you to watch the parts that felt too fast again to try and absorb it all :)

  • @jamesfullwood7788
    @jamesfullwood7788 10 місяців тому +1

    Amazing video!!! Can't wait to use this proof of the central limit theorem in my mathematical statistics course next semester!!!

    • @mcnica89
      @mcnica89  10 місяців тому

      Thank you so much! I'm also excited for your course. Actually the exercise to show that the sum of two Gaussian is still Gaussian could be a good one for your class too...it's very cute. Although I guess it's quite combinatorics-y so they might not like it. I can share a solution with you if you are interested

    • @jamesfullwood7788
      @jamesfullwood7788 10 місяців тому

      I actually embrace any connection with probability theory and combinatorics I can find, so please feel free to share!@@mcnica89

  • @Mutual_Information
    @Mutual_Information 9 місяців тому +1

    4:50 whoa.. I never knew this!
    And btw, just discovered your channel - it's top tier! Instant subscribe

    • @mcnica89
      @mcnica89  9 місяців тому

      Thanks!!! I love your channel too! I've used your videos a few times before in courses I've taught (both for a data science class and a course on RL). It'd be fun to collaborate on something sometime if you are interested:)

    • @Mutual_Information
      @Mutual_Information 9 місяців тому

      @@mcnica89 Yea I think that's a fine idea. Seems like we have data science and RL concepts in common, so we should be able to find something mutually interesting. I have a project that'll keep me busy for the next 4 weeks or so but after that I'll have room

  • @rafaelschipiura9865
    @rafaelschipiura9865 11 місяців тому +4

    Very nice video, it was exactly what I was looking for to go with my prob course. You talk about calculating probabilities of series in the end, but in practice for any n > 4, the Normal distribution is already pretty good, so it can be used for most summations, not just for a series. Talk about a small infinity. So I would love to see a video on not taking n to infinity, but about how fast does the two random variables approach each other.

    • @mcnica89
      @mcnica89  11 місяців тому +2

      Yes that's right! You can use this method to see how big the error is for any finite n, and indeed it can be already quite small for not-too-big n values.

    • @rafaelschipiura9865
      @rafaelschipiura9865 11 місяців тому +1

      @@mcnica89 I know it's right, I had to write three demonstrations of this fact in different courses for my Stats major. 🤣 But it sounds way too good to be true to get an actual method to check if the Normal approximation is good or not, thanks!

  • @davids9856
    @davids9856 10 місяців тому +2

    Nice Proof. I think the result that the k-th moment of the Sum of Variables is the number of pairings is interesting in its own right. That was an eye opener.

    • @mcnica89
      @mcnica89  10 місяців тому

      I agree! It's a bit harder to motivate on its own through without the Gaussian connection

  • @stanleydodds9
    @stanleydodds9 6 місяців тому +1

    The moment convergence proof of CLT (or proof outline, with rough details of moment convergence implying convergence in distribution) is what we were shown in school, probably because as mentioned it's the most elementary way to prove it, so it was easy to get through in one lesson. I'm sure I also saw further proofs of CLT at university (I know there are some ways to relax the conditions needed for the theorem to apply to a distribution), but I've definitely forgotten any details about those by now, so the moment convergence is always my go-to way to demonstrate it.

    • @mcnica89
      @mcnica89  6 місяців тому

      Cool! What does "in school" mean here? In university?

    • @stanleydodds9
      @stanleydodds9 6 місяців тому

      @@mcnica89 I learned it in year 12 or year 13 (6th form college) in the UK, before university. Not sure how it translates to other education systems - I think it's roughly the same as 12th grade in the US.

  • @joshuacoleman6122
    @joshuacoleman6122 11 місяців тому +3

    Will watch this again

  • @erictao8396
    @erictao8396 10 місяців тому +1

    This is a beautiful video!

    • @mcnica89
      @mcnica89  10 місяців тому

      Thanks so much! I spent a lot of time trying to get the partition diagrams to look good :)

  • @Prof.LamMath
    @Prof.LamMath 10 місяців тому +1

    Thank you, nice presentation

    • @mcnica89
      @mcnica89  10 місяців тому

      Glad you liked it!

  • @MatthewBouyack
    @MatthewBouyack 11 місяців тому +1

    Made it to the end! Excellent video!

  • @theodorevassilakis2061
    @theodorevassilakis2061 10 місяців тому +1

    Made it to the end and really appreciated the high quality presentation. I’d be interested to see your list of most impactful applications of CLT, whether in data science / computing or other areas of math.

    • @mcnica89
      @mcnica89  10 місяців тому +1

      Thanks! Probably the most common application is for statistical tests of the sampling distribution in stats. I can try to collect some interesting applications though, that's be fun!

    • @theodorevassilakis2061
      @theodorevassilakis2061 10 місяців тому

      I look forward to seeing anything you decide to do in this category.
      Personally I’d be interested to see how more example of how the theoretical framework map to practice. Often textbook applications tend to wave their hands a bit, especially since most probability spaces in practice are finite and don’t actually admit for instance an actual infinite sequence of IID RVs as the statement of the theorem requires. Statisticians will often say something like “well, just repeat the experiment infinitely many times” but that tends to elide what is the actual probability space being used etc.

  • @patrickt.4121
    @patrickt.4121 11 місяців тому +1

    The best video on the Gaushian distribution! 😁

    • @mcnica89
      @mcnica89  11 місяців тому +1

      Wow, thank you! EDIT: check out 31:00 if you haven't yet

  • @jacoboribilik3253
    @jacoboribilik3253 11 місяців тому +2

    Great video. I like this CLT version is valid for sequences of idependent r.vs not necessarily identically distributed. One would generally prove Lindeberg's and Lyapunov's versions to deal with non identically distributed r.vs

    • @mcnica89
      @mcnica89  11 місяців тому

      Yes this is one nice benefit! Great observation

  • @colinbackhurst1735
    @colinbackhurst1735 10 місяців тому +1

    I'm here as a SoME3 reviewer. An interesting subject and a good video. Oh and I made it to the end of the proof ;-)

    • @mcnica89
      @mcnica89  10 місяців тому

      Thanks! Congrats on making it through!!!!

  • @fibbooo1123
    @fibbooo1123 11 місяців тому +1

    Well done!

  • @mellifluousb.4549
    @mellifluousb.4549 11 місяців тому +2

    Congratulations on the video, I would say Q1= Q3 > Q2

  • @monishrules6580
    @monishrules6580 6 місяців тому +1

    Lovely video ❤❤

    • @mcnica89
      @mcnica89  6 місяців тому

      You are so kind. Thank you!

  • @mikeoffthebox
    @mikeoffthebox 11 місяців тому +1

    Very nice explanation!

    • @mcnica89
      @mcnica89  11 місяців тому

      Glad it was helpful!

  • @censoredamerican3331
    @censoredamerican3331 11 місяців тому +2

    I loved it

  • @johnlovesmath
    @johnlovesmath 9 місяців тому

    Great vid.

  • @holmavik6756
    @holmavik6756 7 місяців тому

    This is quite straight forward and neat, but the difficulties start to appear when we have to use a sample mean and/or the sample stdev to standardize (”studentize”) the parent variable

  • @acerbic-piglet
    @acerbic-piglet 11 місяців тому +3

    Lovely video - I did make it to the end. Would you have any recommendations for book for someone looking to learn more probability theory who has knowledge similar to that of a first year graduate student in pure mathematics? Though admittedly little in probability...
    I'm familiar with basics of real/complex analysis, as well as generating functions. I am better at algebra (algebraic topology, basic algebraic geometry, some lie theory) and have only interfaced with probability theory as a user in theoretical computer science. I might check out Terry Tao's book.
    Either way, thanks for putting this together.

    • @mcnica89
      @mcnica89  11 місяців тому +1

      Thanks for the kind words! I'm glad you liked the video :)
      At that level, there are two books I liked
      The first was "a first look at rigorous probability" by Rosenthal (or is the title something similar to that sentence but not exactly that?). This book is a nice gentle approach to everything.
      The second book is "Probability with Martingales" by Williams. The Martingales book presents things in an unusual sequence focusing on Martingales and how these can be used. My SoME2 video had one of these in it ua-cam.com/video/t8xqMxlZz9Y/v-deo.html . If you like that kind of trick, that book might be a fun one for you!

    • @acerbic-piglet
      @acerbic-piglet 11 місяців тому

      @@mcnica89 Thanks! I will check these out.

  • @kdpwil
    @kdpwil 11 місяців тому +1

    Great video! My only comment is that @27:44 in the video you state that the nE[x_i^4] term goes to zero when you normalize by square root of n. I didn't catch that when you normalize by the square root that you simultaneously raise the square root to the fourth power in the denominator. I had the impression that you were saying that the limit as n goes to infinity of the quantity nE[x_i^4]/sqrt(n) was zero when it clearly goes to infinity. You very nicely corrected this impression at @29:09.

    • @mcnica89
      @mcnica89  11 місяців тому

      Great feedback, thanks for the input! I was trying to strike a balance of giving intuition about what each term means as we go and also doing the actually detailed calculation...I can see exactly how the initial confusion arises. It probably would have been good to say something like "the factor of n goes to infinity but too slowly" or something similar

  • @davidwilkie9551
    @davidwilkie9551 9 місяців тому

    Like the "Gaps between Primes" video, there's multiple associations recognisable from placing functional images of implied action.
    "You just look at it, and you notice..", if you look at Euler's e-Pi-i 1-0-infinity, the transverse trancendental 2-ness picture-plane equivalence of a unit circle, to inside-outside Gaussian distribution looking at potential superimposed Infinity, of Eternity-now Interval collapsed Superspin= i-reflection standing wave modulation holography and potential reciprocation-recirculation of Dirac's number inversion techniques. Ie there's a resonance of pure-math instantaneous motion-potential pattern of sequence development in coordination of logarithmic condensation.., but this is parallel coexistence assessment that requires repetition of reading to make sense of, eventually. (Younger Math-Physics minds required)
    "Moment Convergence", the log-antilog inside-outside instantaneous con-sequence of e-Pi-i Superspin-spiral Superposition Totality of potential omnidirectional-dimensional holographic limit, @.dt line-of-sight = prime-cofactor wave-particle collapse-Convergence->1-0-infinity entangled probability range.
    Because it's "made-of-making" pure-math relative-timing ratio-rates motion all-ways all-at-once sync-duration positioning perspectives.
    Math-Physics WYSIWYG Bose-Einsteinian Quantum-fields Mechanism Condensation.
    The exercise of 2-ness in picture-plane Equations has, as a natural "e" unity-connection state, categorization of infinite possibilities for closed circuit resonance bonding perceived as sense-in-common Electron-photon-phonon-Proton Neutronic substantiation probability, quantization geometrically locked timing-phase chemistry.
    Superposition Totality, the Absolute Zero-infinity Moment of bending, inertia, Centre of Mass-Time, and Mind-Body attention in the Universal Thought Experimentalist's Intuitions of Math-Phys-Chem and Geometrical phase-locked coherence-cohesion harmonic objective-aspects.
    If the ancient world thought of a kind of jelly, then there's the first principles Origin of inside-outside ONE-INFINITY Singularity Conception of real-time Actuality.

  • @Yoseph-ph7hh
    @Yoseph-ph7hh 11 місяців тому +2

    amazing video!

  • @ti84satact12
    @ti84satact12 11 місяців тому +2

    What software packages did you use to create this AMAZING video? Math teacher teacher here and I’m looking for software to help create more dynamic videos for my classes!

    • @mcnica89
      @mcnica89  11 місяців тому +2

      It's all done in manim www.manim.community/ and I also used manim-voiceover for the voiceover voiceover.manim.community/en/stable/. Unfortunately, this took a really long time. One way to make things a bit faster might be to record it as a slideshow using manim-slides www.manim.community/plugin/manim-slides/ ... I might try to do this on my next video so it doesn't take quite so long to record everything!
      PS The code for manim videos can all be found here github.com/mcnica89/manim

  • @RiRiDingetjes
    @RiRiDingetjes 10 місяців тому +1

    I made it trough, and I was wondering whether there's a answer as to why convergence in distributions doesn't imply moment convergence? Your enthusiasm is well appreciated btw :)

    • @mcnica89
      @mcnica89  10 місяців тому +2

      Thanks for the comment; glad you appreciated it!! So there is a special case where moment convergence and convergence in distribution are 100% equivalent to each other. That special case is when the possible outputs live in a bounded interval. (e.g. if you know that the outputs are always in the range [1,6]). So in essence the only issue is what happens "near infinity"...the equivalence between moment convergence and convergence in distribution breaks for things that have badly behaived "probability tails" that don't go to zero fast enough at infinity. For things like Gaussians which go to zero very quickly its not a problem! (And things that go zero at least fast are called "sub-Gaussian" and also are also no problem!). I have some of this stuff planned for a follow up video based on peoples questions so far :)

  • @sl2357
    @sl2357 10 місяців тому +1

    Beautiful

  • @AnilKumarnn
    @AnilKumarnn 21 день тому +1

    I came from the 100 coin flips video that blew up on twitter.

  • @avi123
    @avi123 11 місяців тому +3

    25:13
    I think this only works if X has a finite 3rd momentum, what happens if the 3rd momentum is infinite?

    • @mcnica89
      @mcnica89  11 місяців тому +4

      Yes this is true! If the moments are not finite you can make things work by "truncation". That is you replace X with X times Indicator(|X|

  • @romaxromax
    @romaxromax 10 місяців тому +1

    Great video and an awesome complement to 3blue1brown from a fresh perspective. I wonder if the explanation of the algebra could be simpler using pascal's triangle for the expansion terms and normalizing them by sqrt(n)

    • @mcnica89
      @mcnica89  10 місяців тому

      Thanks for the kind words! The Pascal's triangle idea is a great way to visualize things. However I think it only works when there are two terms (i.e (a+b)^k ) Since we want to take the number of terms n to infinity we need the more complicated setup with partitions!

  • @AdiBenIsrael
    @AdiBenIsrael 11 місяців тому

    Thanks for a nice video.

    • @mcnica89
      @mcnica89  11 місяців тому

      I'm glad you liked it!

  • @bernardoabf
    @bernardoabf 8 місяців тому

    That was a great presentation! I'm somewhat confused by the convergence discussion. It would seem to me that moment convergence wouldn't be enough, as it's possible for different distributions to have all the same moments (e.g. lognormal distribution). Isn't there an additional step missing in the implication? (For example, a sufficient condition could be that the moment-generating function is well defined, which is evidently true for the Normal distribution) I'm no expert in the subject and I'm asking just for clarification. Thanks!

    • @mcnica89
      @mcnica89  8 місяців тому +1

      Yes, you needtl the moments to not grow "too fast" (which is related to the MGF converging). Fortunately in the Gaussian case this is no problem. These details are covered in the book!

  • @phscience797
    @phscience797 10 місяців тому +1

    Thank you for this video -- I think you did well with explaining the important ideas of this cute proof to people which may not really have a solid background in probability theory yet. For me, this is what math videos do best: Sharing nice insights in a non-standard manner. Looking forward to more!
    Not concerning your video, but the topic, I have to say that I am always confused about why many texts and even lecturers present the CLT as this rather arcane result, employing tools like convergence of moments and Lévy continuity (or even Stein functions) which are, while not super advanced, rather unneccesary technical complications for this proof. I wonder why the (in my opinion) "optimal" proof which is elementary, short, explanatory, and strong is not more common. (I'm referring to one where you just swap the variables in X_1 + ... + X_n for Gaussians with the same variance, compose with a smooth function and check that the expectation is what it should be, giving precisely the Lindeberg condition.) Terrence Tao dealt with this pet-peeve well, I think, opting to use the CLT as a demonstration for some of the techniques he needs later.

    • @mcnica89
      @mcnica89  10 місяців тому

      Thanks for the detailed comment! Indeed, I also love the swapping trick proof too. However, for me, the swapping trick hides too much....like I don't get that feeling of understanding what is special a out the Gaussian specifically. This maybe because the "magical" part of the Gaussian is kind of assumed as a hypothesis that the sum of Gaussians is Gaussian with no proof. Whereas with the moment/combinatorics proof you do get to see the magic more directly in my opinion (although I agree that the first time you see moments they also feel removed from reality!). And the combinatorics also let's you directly see why the sum of Gaussians is Gaussian. I recorded a video for that but haven't posted it because I plan to make an animated version, but I'll let you see it here in an attempt to convince you why the combinatorics are nice! ua-cam.com/video/8151w69remM/v-deo.html

    • @phscience797
      @phscience797 10 місяців тому

      @@mcnica89 I see your point. When I first saw the "swapping proof", I actually had to think for like an hour until I noticed where it uses any properties of the normal distribution whatsoever (as you say, it is precisely that the sum of two independent Gaussians is Gaussian). Personally, I was then satisfied with knowing that the normal distribution is (in some sense) the *unique* distribution with this "closed under sum"-property (which the swapping proof of course also establishes). I never thought about where that comes from... Thank you very much for the link, looking forward to your explanation!

  • @Tb5hTbtb
    @Tb5hTbtb 14 днів тому

    I really appreciate your great work thank you; but I have question why did you divide by n^0.5 and not by n^0.3 when k =3 at 25:00

    • @mcnica89
      @mcnica89  11 днів тому

      @@Tb5hTbtb thanks! it's n/(sqrt(n))^3 = 1/sqrt(n) there I think.

  • @Avianable
    @Avianable 10 місяців тому +1

    What is the background song? it's so mystical!

    • @mcnica89
      @mcnica89  10 місяців тому +1

      Haha it was a CC0 background "ambient sci-fi" music I download from a game-dev assets website. I can dig up the exact link if you are actually interested!

  • @romaxromax
    @romaxromax 10 місяців тому +1

    Thanks!

    • @mcnica89
      @mcnica89  10 місяців тому

      Thank you so much! Much appreciated :)

  • @ranshen1486
    @ranshen1486 10 місяців тому +1

    35:50 I would like to see more careful justification from moment convergence to convergence in distribution, since different distributions may have identical moments. I guess some middle step like mgf or cf might be needed to fill the gap here?

    • @mcnica89
      @mcnica89  10 місяців тому

      Great question! A few people have asked for that kind of thing, so I have roughly planned out another video that would go over some of that :) It basically turns into a question about approximating functions by polynomials. You can go through mgf/cf but I don't think this is actually nessasasiry.

    • @ranshen1486
      @ranshen1486 10 місяців тому

      @@mcnica89 Thanks for the response! I found Billingsley (1995, Probility and Measure) Theorems 30.1 and 30.2. Theorem 30.2 is basically what is used in this video, except that it has the extra condition "the limiting distribution is determined by moments", which is true for the case of standard normal. However, the proof of Theorem 30.1 still relies on cf as a vehicle. I am very much interested in your upcoming video that may bypass cf. Thanks again!

  • @fienysus9
    @fienysus9 10 місяців тому +1

    25:11 here line 2 I guess it misses a 3 in n^(3/2)? Great explanation and illustration!

    • @mcnica89
      @mcnica89  10 місяців тому

      Good eye! I think it's correct as written: the numerator is n and the denominator is n^{3/2} as you said. This simplifies to 1/sqrt(n) which is what is written.

  • @linco011235
    @linco011235 11 місяців тому +2

    What about other levy-T distributions?
    Gaussian maximizes entropy for a finite variance

    • @mcnica89
      @mcnica89  11 місяців тому +1

      Since the proof assumes variance 1, if you have infinite variance it will not work and you will converge to a levy distribution instead. I don't think there is a moment method proof of this, because the higher moments simply don't exist for these!

  • @ChrisBullock1978
    @ChrisBullock1978 10 місяців тому +1

    question you say any but not all are gaussian distributions. Is there a way to determine if data is a guassian data set?

    • @mcnica89
      @mcnica89  10 місяців тому

      Determining if a given data set is Gaussian (or more generally if a given data set comes from a given distribution) is a classical statistics problem! There are a few different "statistical tests" that let you test for this. I like the Kolmogrov-Smirnov test because it's very simple to explain: you plot the CDF of your data and the CDF of a Gaussian and you measure the distance between those two graphs. If it's small enough, your data passes the test!

  • @officiallyaninja
    @officiallyaninja 11 місяців тому +2

    25:14
    Cant we split E[X_i^3] into E[X_i^2]E[X_i] and see that it's zero this way?

    • @mcnica89
      @mcnica89  11 місяців тому +1

      Great instinct! However, the splitting up trick only works for two seperate X_i and X_j when i is different than j because of the independence assumption. It is not true for just X_i

  • @n.bourbakki
    @n.bourbakki 10 місяців тому +1

    It may be important to state for the viewers that this only applies to RVs with finite variance. The stable distribution is the distribution that autoconvolves without this requirement. The Gaussian and Cauchy distributions are special case of the stable distribution and both autoconvolve.

    • @mcnica89
      @mcnica89  10 місяців тому +1

      Ya I agree! Those are the "technical conditions" mentioned early on. I didn't want to overwhelm a viewer who doesn't know a lot about it all at once with too much information. Of course, later on in the proof, the unit variance assumption is used in a crucial way too.

    • @n.bourbakki
      @n.bourbakki 10 місяців тому +1

      @@mcnica89- excellent video btw. Keep up the great work.

    • @mcnica89
      @mcnica89  10 місяців тому

      Thanks!

  • @gdclemo
    @gdclemo 10 місяців тому

    I made it. That kind of makes sense to me. Thanks for explaining the proof. There's something I don't understand though, which is kind of tangentially related. I believe that Fourier transform turns convolution into multiplication, right? f(x) convolved with g(x) in the Fourier domain is f(x)*g(x). So f(x) convolved with itself is the same as IFT[ FT[f(x)]^2 ], and f(x) convolved with itself n times equals IFT[ FT[f(x)]^n ] - am I right? And the FT of a Gaussian is a Gaussian. So take any probability distribution and get its FT. This will give you a function (call it h(x)) such that h(x)^n tends towards a Gaussian when n gets bigger and bigger? I can't see how that happens, so I must be making a mistake somewhere? (FT=Fourier transform, IFT=Inverse Fourier Transform)

    • @mcnica89
      @mcnica89  10 місяців тому

      Glad you made it! What you are describing is the "standard" proof of the CLT. The detail you are missing that puts it all together is that the function h(x) is actually h_n(x)=h(x/sqrt(n)) and depends on n because you rescale by a factor of square root of n. And in fact what is true for any random variable is that the taylor series of FT are the moments with powers of i inserted: FT[f(x)] = h(x) = 1 +iE[X]x - 1/2E[X^2]x^2 + O(x^3). So when you have mean zero with E[X]=0 and unit variance with E[X^2]=1, then after rescaling h_n(x)= 1 - x^2/n + O(x^3). Then you can see that lim h_n(x)^n = (1 - 1/2 x^2/n + ... )^n = e^(-x^2/2) by the limit definition of e. I was thinking of potentially going over some of this stuff in a future video connecting moments to FT/characteristic functions.

  • @magno5157
    @magno5157 11 місяців тому +2

    Now that you've teased moment convergence can in some cases imply convergence in distribution, please don't leave us hanging.

    • @mcnica89
      @mcnica89  11 місяців тому +2

      Haha sure! There is actually a really great story about this called the "Hamburger moment problem" which is basically answering the question: "But when does it work?!?"

  • @tejasshetty519
    @tejasshetty519 9 місяців тому

    also it the integral of of that formula not just that, when you integrate it you get the bell shaped curve

  • @berryesseen
    @berryesseen 9 місяців тому +1

    There is actually simpler way to see this without going through all the moment calculations. We know that all moments characterize the random variable. Equivalently, the cumulant generating function C(t) = log E[e^(tX)] also characterizes a random variable. Using the independence of X_1, …, X_n, we write the cumulant generating function of the normalized sum 1/sqrt(sum of n X_i’s). Then you will see that first two cumulants march to the Gaussian. And the rest of the cumulants converge to 0 as n goes to infinity. On the other hand, Gaussian random variable’s cumulants are all zero beyond the 2nd one. Therefore; the normalized sum must converge to the Gaussian.
    You can even say more using cumulants. If you take the inverse Fourier transform of the moment generating function (without log), you will get probabilities. Then, using the cumulant gaps, you can characterize the probability gap to the Gaussian probability, which is known as the Berry-Esseen theorem. The gap scales with 1/sqrt(n) because the gap in the dominant third cumulant scales with 1/sqrt(n). Also Edgeworth expansion is derived in the same way.

    • @mcnica89
      @mcnica89  9 місяців тому

      Ya, what you've described is (more or less) the "standard" proof of the CLT, which is indeed the standard because it is the "most efficient" in some sense. I wanted to highlight the alternate moment proof in my video because its less well known and (in my view) a bit more intuitive to see what's actually special about Gaussians. And actually by the moment/cumulant expansion, you can view the pairing proof presented here exactly as the fact that all but the 2nd cumulant vanishes (since partuons of 3+ have no contribution in the limit). So if you like cumulants, this it's just a more visual/handon way to see them!

    • @berryesseen
      @berryesseen 9 місяців тому

      @@mcnica89 I agree. To a casual, moments are easier to explain than cumulants although they have one-to-one correspondence and cumulants are more preferable because of their nice properties like I explained. When I saw "double factorials", I imagined something more interesting actually. But it turned out the same exact proof but presented differently. I don't know why we need to refer to Terrence Tao for that.

    • @mcnica89
      @mcnica89  9 місяців тому

      @@berryesseen Because he's the one who "showed me" the proof!

  • @gregoryconnor3134
    @gregoryconnor3134 9 місяців тому

    Why is the other term in the integration by parts equal to zero? That is not clear to me.

  • @BigAsciiHappyStar
    @BigAsciiHappyStar 10 місяців тому +2

    As the brother of Terence Tao and International Master at chess, I believe this video is definitely deserving of a double exclamation mark 😁😁😁😁

    • @mcnica89
      @mcnica89  10 місяців тому

      Hahaha thanks so much Mr.Tao! I'm glad you approve and am humbled to be awarded your brilliant move annotation :)

    • @thefunpolice
      @thefunpolice 10 місяців тому

      I like the fact that you exclaimed that point without any convolution, Trevor.

  • @devalapar7878
    @devalapar7878 11 місяців тому +1

    Sum of n random numbers gets closer to the gauss curve if the variance of them are constant.

    • @mcnica89
      @mcnica89  11 місяців тому +1

      Two things are true:
      1. If the variance is constant, then the sum will converge to a Gaussian. This is the classical statement of the CLT and the one proven in this video.
      But also:
      2. As long as they are independent, the sum will converge to a Gaussian even if the variances are all different. The proof from this video actually genarlozes to this case....instead of each "pair" contributing the same variance (which was exactly 1 in the video), each pair contributed a different variance. But you still get the same convergence to a Gaussian!

    • @devalapar7878
      @devalapar7878 11 місяців тому +1

      @@mcnica89 oh thanks I didn't know that. It's been almost 15 years when I had probability and stats.
      I assumed that they are independent. But it didn't know that there is a difference in regard to variance.
      And you are sure? I will check it to be safe.
      But thanks for the rapid answer.

    • @mcnica89
      @mcnica89  11 місяців тому +1

      Yes it's true! You do need a condition that ensures the variances don't grow too fast (or else the sum X_1 +...+X_n can be basically just X_n if the variance of X_n is much larger than the other variances). For example, if there is a uniform upper bound on all variances that is fine. One of the more general conditions you can use is "Lindeberg's condition"...there is a nice Wikipedia page for that if you are interested (I won't link to it because UA-cam sometimes deletes comments that have links)

    • @devalapar7878
      @devalapar7878 11 місяців тому

      @@mcnica89 That's what I wanted to write next, because otherwise some values could be infinite.
      Thanks for clarification.

  • @micke_mango
    @micke_mango 11 місяців тому +2

    Gaushian?
    Is that how English-speaking people naturally pronounce Gaussian?
    Nice video!

  • @wesr9258
    @wesr9258 8 місяців тому

    I feel like it could use a proof that d_k can’t be something other than Gaussian, such as Poisson.

  • @MrSilversMathSheets
    @MrSilversMathSheets 11 місяців тому +1

    This is a nice video. I think the proof with the moment generating functions is easier, but this is interesting. It is an interesting coincidence (or is it a coincidence) that the double factorial shows up both in the moments of the normal distribution and in the calculation for sqrt(2*pi). By the way it is called the normal dist because it was discovered long before Gauss was born.

    • @mcnica89
      @mcnica89  11 місяців тому

      Yes the proof with the moment generating function is definitely more slick, but I find it harder to "see" what's really going on. Like I have less of an intuition of how the sum is melting into the Gaussian.
      I tried to avoid the word "normal" because it already has an everyday English meaning. So "normal distribution" can be kind of confusing to beginners I think. (Same thing with a "normal matrix"!). Also I'm planning to make a follow up video explaining why for random matrices the "normal" distribution is a semi-circle and not a Gaussian :)

  • @friedrichhayek4862
    @friedrichhayek4862 10 місяців тому +1

    You are missing the uniqueness proof of the Gaußian distribution being the only with this moment of convergence.

    • @mcnica89
      @mcnica89  10 місяців тому

      Yup! This is part of the results near the end that I directed people to the book. The uniqueness for Gaussian actually isn't that hard because the tail probabilitoes go to zero so fast, but the details can be a bit technical!

  • @venkybabu8140
    @venkybabu8140 10 місяців тому

    But why 8. Two square plates with an axile.

  • @valloway
    @valloway 11 місяців тому +1

    Woohoo go Mihai!

  • @rafaelschipiura9865
    @rafaelschipiura9865 11 місяців тому +2

    I made it to the end.

    • @mcnica89
      @mcnica89  11 місяців тому +2

      🎉👏 ❤️

  • @gyanprakashraj4062
    @gyanprakashraj4062 11 місяців тому

    MY FAN...

  • @SampleroftheMultiverse
    @SampleroftheMultiverse 11 місяців тому

    Because we live in a multiverse but we evolved to only see one verse at a time. For best results, sample the multiverse with care. 6:44

  • @codatheseus5060
    @codatheseus5060 11 місяців тому +1

    I like the balls it takes to make a video about something Tao did and still use 2pi in your formulas instead of tao

    • @mcnica89
      @mcnica89  11 місяців тому

      Haha amazing connection I did not think of. The whole proof works even if you replace it with "C", where C is whatever constant works :)

  • @samgoodwin89
    @samgoodwin89 11 місяців тому +1

    It’s a bit fast for me

    • @mcnica89
      @mcnica89  11 місяців тому

      Feel free to pause the video as needed!

  • @jakemetzger9115
    @jakemetzger9115 11 місяців тому +1

    "Gaussians... gaussians... gaussians..." 31:00

    • @mcnica89
      @mcnica89  11 місяців тому +1

      Haha nice! I think technically its: "Gaussians....gaussians....*writes out the word Gaussians*....Gaussians"

  • @yash1152
    @yash1152 11 місяців тому +1

    4:46 this video REALLY should have a prerequisite screen at the start just due to sheer amount of symbols it is using
    but having more prerequisites is discouraged in SOME, i think... is that why u didnt include it?

    • @mcnica89
      @mcnica89  11 місяців тому

      To me a "prerequisite screen" would give off the attitude of somewhat excluding people....I tried hard to make at least the intro understandable with very minimal background. (Of course some parts get more complicated later on) I would hate for someone to leave the video feeling like I told them they are not allowed to watch it. And in my experience watching something like this can motivate you to learn some of the background on your own if you see it first and get excited to learn a cool new trick.

    • @yash1152
      @yash1152 11 місяців тому

      @@mcnica89 fair point given the target demographics and the economics of how youtube works...
      it's just, to me personally an resource (be it a book, article, or video) is more valuable if it is clear upfront. the prerequisites align my mind into it.
      > _"To me a "prerequisite screen" would give off the attitude of somewhat excluding people...."_

    • @b43xoit
      @b43xoit 11 місяців тому

      Symbols come in numbers, not amounts.

  • @FatihKarakurt
    @FatihKarakurt Місяць тому

    Aha! moment.

  • @diegoyotta
    @diegoyotta 10 місяців тому

    G N A R L Y

  • @thephysicistcuber175
    @thephysicistcuber175 Місяць тому

    Proof by Feynman Diagrams.

  • @johnchang8279
    @johnchang8279 11 місяців тому +1

    Gaushian?

    • @mcnica89
      @mcnica89  11 місяців тому

      Haha ya! That's the fun two-syllable way to pronounce it :) (a bit like the word "Haitian")

  • @TheDavidlloydjones
    @TheDavidlloydjones 4 місяці тому

    Six dice, five dice, four dice, three dice, two dice --
    But sorreeee:
    One die.

    • @mcnica89
      @mcnica89  4 місяці тому

      I've thought "dice" vs "die" issue before, but your comment prompted me to rethink it again. I am now even more certain that always using "dice" (both for plural and singular) is more clear....the word "die" already has a different meaning so using "die" just adds potential confusion imo. Also, it seems modern dictionaries list "dice" as both the plural and the singular, e.g. dictionary.cambridge.org/us/dictionary/english/dice

  • @mrosskne
    @mrosskne 10 місяців тому +1

    x!! should mean (x!)!

    • @mcnica89
      @mcnica89  10 місяців тому +1

      Yes I agree! The notation is very confusing, but unfortunately we are stuck with it!

    • @mrosskne
      @mrosskne 10 місяців тому

      @@mcnica89 What? No we're not. Notation is created by humans. Just change it.

    • @mcnica89
      @mcnica89  10 місяців тому +1

      @@mrosskne I think the key point is that it's created by humanS plural. I am not able to change it by myself because there are already many people who use the existing x!! notation, see e.g. en.m.wikipedia.org/wiki/Double_factorial . It would be great if there was a movement to change this (like tau vs pinfor instance)

    • @samueldeandrade8535
      @samueldeandrade8535 2 місяці тому

      ​@@mcnica89 by Euler, you are reasonable. So patient.

    • @samueldeandrade8535
      @samueldeandrade8535 2 місяці тому

      I don't like the notation either, but I can't really be against it.

  • @TalsBadKidney
    @TalsBadKidney 6 місяців тому +1

    terence is a little punk

  • @b43xoit
    @b43xoit 11 місяців тому +1

    Unfortunate choice of symbology. Any reader would expect x!! to mean (x!)!.

    • @mcnica89
      @mcnica89  11 місяців тому

      Ya I agree!! It's definitely a weird one but that's what we are stuck with

  • @j9dz2sf
    @j9dz2sf 11 місяців тому +2

    Is there a kind of Γ function for the double factorial, i.e. its extension to ℂ?

    • @mcnica89
      @mcnica89  11 місяців тому

      Interesting question! I suppose you could generalize it by just doing integral of x to the k exp(-x^2/2) where k is any complex number? It's weird though cuz it has infinitely many zeros? I'm not sure!

    • @j9dz2sf
      @j9dz2sf 11 місяців тому +1

      @@mcnica89 Perhaps, since (2n+1)!!=(2n+1)!/(2^n.n!), therefore defined by single factorials, we could change all z! into Γ(z+1) and write ΓΓ(2z+2)=Γ(2z+2)/(2^z.Γ(z+1)). Then, with z'=2z+2, we have ΓΓ(z')=Γ(z')/(2^((z'-2)/2).Γ((z'-2)/2+1)

    • @mcnica89
      @mcnica89  11 місяців тому

      @@j9dz2sf Yes I think that works too! I suppose there is a slight difference between generalizing the special moment sequence (which is the center of the video) vs generalizing the double factorials only because of the zeros in every other number. It depends on what you actually want to do with it.

    • @mcnica89
      @mcnica89  11 місяців тому +1

      Side note: How did you make the latex \mathbb C appear in the comment? Is there a way to make latex work in the comments?

    • @j9dz2sf
      @j9dz2sf 11 місяців тому +1

      @@mcnica89 I do formal proofs (i.e. on computer) and I use emacs as an editor. I can do utf8. No magic: I just cut and pasted. 🙂

  • @AnNguyen-hn5gq
    @AnNguyen-hn5gq 8 місяців тому +1

    Thanks!

    • @mcnica89
      @mcnica89  8 місяців тому

      Thank you so much!