I made a video covering a proof of the central limit theorem, that is, answering why there is a "central limit" at all. It's currently posted for early viewing on Patreon: www.patreon.com/posts/draft-video-on-i-87894319 I think the video has room for improvement, and decided to put it on a shelf for a bit while working on other projects before turning back to it. In the meantime, though, if you are curious about why all finite variance distributions will tend towards some universal shape, it offers an answer. Also, you may be interested to know that a Gaussian is not the only distribution with the property described in this video, where convolving it with itself gives a (rescaled) version of the original distribution. The relevant search term here is "stable distributions", though all others will have infinite variance, so don't fit the criteria of the CLT. Often when the CLT doesn't apply, it's because the independence assumption doesn't hold, but another way it can break is if you're starting with one of these infinite variance cases.
Are the functions at 0:10 stable distributions? When you started talking about rotational symmetry I was expecting you to bring up a visual graph of one of those functions convoluted with itself and explain why it doesn’t have the special property, but instead 5:55 only shows trivial examples and my curiosity about this question remained unanswered. Is it because the functions from 0:10 are stable distributions? If not, why weren’t they shown when they would have been much more interesting demonstrations of the Gaussian’s specialness than trivial examples?
Grant, could you please make a video on when the discrete can be approximated by the continuous. For example, in this series you showed that discrete random variables added together approach a continuous normal distribution, and you did discrete and continuous convolutions. But what is the error formula one would get by assuming, say, that d6 dice are continuous valued, get your continuous convolution answer, and then take discrete samples of that answer to match the actual discrete nature of d6 dice. I find it much easier to integrate a 'nice' function then it is to simplify a discrete Σ sum.
I have to laugh. "Why the normal distribution?" was one of the questions that motivated me to get my M.S. Stat a couple of decades ago. I'm loving this series - it adds so much clarity to what I recall learning.
for you it's normal distribution and for me it's in general all probability distribution.But I am pursuing that now not 10 years ago but more focus on Machine Learning with inclination towards mathematics and statistics part.
One level of brilliance is simply to be brilliant. Another level is to be able to explain and teach. Yet another level of brilliance is to be able to clearly visualize & present the advanced concepts. Wow. No words.
I love that this series actually started with the Borwein integrals video. Like, here's a very curious sequence of integrals and here's an interesting concept to explain it, and then five videos later we've dug so far deep into convolutions that we got an intuitive explanation for one of the most important theorems in all of math. It's all interrelated!
Grant, this has been an absolute masterclass and I genuinely believe it has been your best work so far. Your visualisations have been top notch and it has brought concept space applied to mathematics to a level not seen before, all publicly accessible through UA-cam. You are making mathematics a better field for the entire world. Thanks for your hard work!
That made for a great lunch 😁. In your last video you described the Gaussian as an “attractive point in the space of all functions” and I LOVED that phrasing - really made it make sense. However I don’t do enough real math to realize that could be the foundation of a proof. That’s pretty cool.
Agreed! :) I'm at work eating my lunch and people around me sometimes ask, "Oh are you in school?" and I'm like, "Nope, just an engineer like you that likes learning the math that was never taught!"
What a funny coincidence, I'm having dinner to this video (I decided 7 hours of watching brainrot content was enough and fired up something that's intellectually nourishing)
I like this related explanation: Let X and Y be independent normal random variables, and write S = X+Y for their sum. You can think of S as the dot product of the 2-d vectors (X,Y) and (1,1). As Grant said, the key aspect of the normal random variables is that if you take a draw a pair of them, the result is rotationally symmetric. Now dot product is *also* rotationally symmetric (the dot product between two vectors only depends on their lengths and angle). So the distribution on S would be the same if we rotated (1,1) to any other vector with length sqrt2; in particular, to (sqrt2,0). But (X,Y) dotted with (sqrt2,0) is just sqrt2 X, so we see that S is distributed as (sqrt2 times) a normal random variable.
thank you for always making sure to show the "roadmap" before diving into the details! Knowing the broad outline beforehand really makes things easier to follow, and it's something that a lot of other explanatory videos/articles don't bother to do.
Wrapped up? He didn't prove the central limit theorem at all. Which is supposedly what this was all about. This video itself barely adds anything at all to the previous ones. Moment generating functions are really not all that complicated - it's high school stuff really. And it gives a much clearer intuition for why a Gaussian is the limit in the central limit theorem: it's the unique probability distribution that has a mean and a standard deviation but no higher moments. In other words it's the simplest* distribution: the one that can be described by the least information. Anything else like skew or asymmetry is "averaged out". Sadly, Grant is so obsessed about representing things visually that he brushes over alternatives that are at times far clearer and more powerful ways of understanding this. * [technically the simplest would be a point distribution were a single outcome has probability 1 and everything else probability 0, but that hardly counts as a distribution. And anyway, it's just a special type of Gaussian with width 0]. EDIT: I got mixed up, replace "moment" with "cumulant" above to correct it. Intuition is the same.
@@QuantumHistorian This series is an excellent demonstration of the idea of limits, not just in that the videos are all about the central limit theorem, but also in that he's tending towards the proof of the central limit theorem without ever actually reaching it.
@@QuantumHistorian This series of video is clearly not to give a fully technical answer but rather an intuitive view of why it's true. I also agree that the visual "trick" here does not seem to simplify a lot the work given that the integral is already easy to compute using trigonometric change of variable that arise naturally, but maybe i'm biased by my own experience.
You can probably argue that the purpose of the cliffhanger is to encourage the viewer to ponder upon a new solution. That's very much the format of his videos. 3Blue1Brown will never tell the viewer the answer but rather allow open-ended interpretation.
@@KingDuken That's not even remotely true. He starts with hints, but he almost always gives the full solution at the end. Look at the recent video on chords, or older ones on the chessboard puzzle or the Basel problem.
I guess Grant calculated the time of day with the highest probability that the world population would study statistics and then release the video at that time, lol
Binomials with same p are stable under convolution, Poisson distributions as well. The normal distribution is not unique in that regard. Even Cauchy distributions are stable without having any moments or satisfying the CLT. If I had to pick any intuitive reason why the normal distribution shows up in the CLT, I enjoy the fact that the normals cumulants are all zero from the third and that a standardized iid sum’s cumulants hence all tend to those of the standard normal distribution whenever they exist. Also, not all standardized sums converge in distribution to a normal distribution. The limit can be a Gumbel distribution for example as well.
Please oh please do a video on the Kalman filter, given how indescribably important it is to our modern existence. The result that the convolution of two Gaussians is a Gaussian is at the heart of the Kalman filter's magic.
I want to talk about a strange area of probability, where random variables no longer commute: Random Matrices You can define the expectation of a random matrix to be the expectation of its trace, which Essentially is the distribution of its eigenvalues. It turns out, theres a new kind of central limit theorem, known as the "Free central limit theorem" This theorem says that if you have "Freely independent" random matrices, then the mean's eigenvalue distribution tends towards not a normal distribution, but a semicircular distribution. In this probability theory (known as free probability theory), a free convolution exists, which essentially gives the distribution of eigenvalues of X+Y. It turns out the semicircle distribution convolved with itself is another semicircle, much like a normal distribution in classical probability.
What incredible content. I think like once a year I revisit the same list of statistical oriented content. Between Grant, Richard Mcelreath and Josh Starmer. You really get your bases covered on great stats content.
Last time, just after I completed IFFT, you dropped a video on continuous convolution. Yesterday, I finished studying Bivariate Normal distribution and you dropped this. Perfect timing for me!
My abstract brain would have loved showing that Gaussians aee their own convolution via the Fourier Transform, since a convolution in coordinate space is multiplication in momentum space (spot the physicist), and since an FT of a Gaussian is a Gaussian, and the product of two Gaussians is a Gaussian, then the convolution of two Gaussians must also be a Gaussian. But, this is an incredibly satisfying explanation. I'm not left wanting, and after being in the field for nearly a decade, I'm glad to see a frequent concept intuited so cleanly, without the need for arcane notation. ❤
Man I just love your videos Even though I'm way past the time of having genuine will and ability to learn abstract mathematics (living in wartorn hell doesn't really help) but they still give me a sad and lovely nostalgia of the things I love I'm just really glad I learned about your channel and watched it grow without losing any of the great things that made it simply extraordinary
After having received my Bachelors of Math this past December, I now just realized why we get that sqrt(2) when finding the convolution. The geometric visualization is extremely easy to understand! (I’m sure I derived this back in first year, but I must have forgotten lol)
Humanity will always be grateful for your superbly amazing, impactful, and meaningful work. I'm confident your viewers are the best candidates to improve our entire world. It's inspiring to see how your efforts can enhance our understanding of the world and empower people to engage with sophisticated ideas. With your powerful content, you hold the impressive potential to inspire and educate countless individuals, fostering a deeper appreciation for math and its importance in our lives. Such efforts unquestionably play a crucial role in advancing our society as a whole. Thanks a million, Sir 3Blue1Brown. You are genuinely enhancing our world with the most insightful visual content currently available. Please continue for good.
4:57 "For a mildly technical reason you need to divide by sqrt2" - dude that was the first thing I really understood so far! :D But this is more advanced than any math I've studied so I'm ok. I'm glad this seemed obvious to me.
Thanks so much for this--it makes it really clear. And the 3-dimensional model is really a lot more like a bell! (although I know that actual bells have a somewhat different shape). I've been using the concept of combining Gaussian (and uniform) distributions for a while now in my (Scala) library called Number. It keeps track of the error bounds on variables. If keeping track of relative bounds, it's easy: for multiplication, just add the relative bounds together; for functions like e^x or x^p, only very slightly more complex. But, for addition, we need to convert to absolute bounds and use the convolution that you've been describing.
Hi Grant, I've been a long-time follower of yours, and I've noticed that the last few videos seem to have a lot of content taken from general undergraduate math. Math puzzles for the general public are limited after all, and I would suggest that you consider expanding your audience a bit by including content that could tolerate the more frequent use of mathematical derivations while remaining the original style of generalization for the general public, so that undergraduate non-math students can get a feel for the general framework of the greater math thinking. I have recently been interested in Brouwer's Fixed Point Theorem, whose calculus proof connects the theorem's original literal meaning of there must be a center of the vortex when you stir coffee, and, the idea of the inability to stick a two-dimensional unit disk continuously to the unit circle as boundary without tearing it, but is just an ordinary mapping that connects two distinct and plain intuitions, and whose proof process is very understandable to students who have only taken calculus, but can let them see a greater picture that calculus can be used for more than just poles and integrals, and I think you can go with such mathematical examples in your videos. @3Blue1Brown
Moment generating function is directly tied to the characteristic function E(e^(itX)). In some classical textbooks (Casella and Berger) it is used to prove both CLT and its convergence to normal distribution, as well as to derive the pdf of normal (through talyor expansion iirc). Since cf is a Fourier transformation of the pdf, convolution of two independent normal RV can also be easily proven to be also normal by characteristic function E(e^(it(X+Y)). It's been a decade and I still remember that was how I was taught because how elegant it was. Also after polar transformation, the (r, theta) coordinates corresponding to a normal distribution makes its properties really clear, and why it's the only option. This is super neat maths.
Another important question is how fast the normalized sum of iid random variables’ distribution converge to that of the Gaussian. One way to quantify this is to ask max_A | P[S in A] - P[W in A]| where S is the sum and W is the Gaussian. This maximum scales as constant/sqrt(N) and is known as the Berry-Esseen theorem. The constant depends on the third moment and the variance. If you need an intuition for why the scaling is 1/sqrt(N), the answer would be the gaps between cumulants of S and W. Their first 2 cumulants are the same by design (mean and variance). Cumulants of W beyond 3rd degree are all zero. Cumulants of S beyond 3rd degree go as c/sqrt(N), d/N, … If you relate this gap with inverse Fourier transform, you will get probability gaps. And that c/sqrt(N) gap in the third cumulant leads to the scaling in Berry-Esseen Theorem. The order of scaling (1/sqrt(N)) is also quite universal. You don’t necessarily need iid. For example it works for independent sums and Markov-1 sums. The dimension can be more than 1. You can even pass the sum through a smooth function.
Wondeful video! The feel i got (in high school) when i proved something by symmetry always made my day cheer up! Usually these are the most elegants approches to do and the simplest in intuition. Much respect ❤
The animation at 7:17 about rotating your radius r to be perpendicularly aligned with the background x-y Cartesian grid is super. Like again animation is providing a very immediate, visual, and physically informed intuition / feeling that if you rotate it one way to align with the grid you'll preserve the area and simplify your computation. Just a small detail but these animations are great thank you very much!
Like it's almost like the feeling in linear algebra when you change to a natural (eigen) basis to decouple your vectors/directions and then the computation just proceeds orthogonally along their individual axes, not interfering with each other and making the computation much more literally straightforward. So like rotation for a better coordinate system. This was a cool video thanks!
Majority of your video goes top of my head 😅 as I'm not a good student. But i come here and watch your evey video because of your representation. Thank you 😊
Step 1 explanation is the thing that I have been waiting from this series... The series has been making a point about distributions approaching a normal distribution, and then the finale (or I think this is supposed to be the finale) skips the whole reasoning as to why they approach it in the first place. I hope he will be making a video about it.
The entropy explanation is really interesting and makes a lot of sense. As far as I can tell, what it is saying is that: noticing that convolving many different distributions leads to a gaussian distribution, is the same as noticing that repeated sampling the microstate of a system, which is the same as sampling N independent atomic distributions (or approximately independent... or not, depending on your system) of an equilibrium (maximal entropy) system, for large N, will always correspond to the same value of a macrostate variable.
Thank you very much for your hard work, the result is so pleasing. I’ve discovered your channel with the neural network series and I’ve been enjoying your videos ever since. You rekindled in me the taste for mathematics. Greetings and best regards from France
I've forgotten pretty much everything I learned in college, but one thing I kind of sort of remember is that one way to convolve two functions is to take their Laplace transform and then multiply them. Convolution in the time domain is multiplication in the frequency domain, basically.
@3Blue1Brown could you please do a series of videos for the time series analysis, I think we need a visual and intuitive explanation for a lot of things there! Thank you 😊
I'm a physics person so maybe this is why, but I really like the Fourier space argument (in Fourier space, a convolution becomes a product; any normalised PDF with finite variance looks like (1-(x/s)^2)e^imx where m is the mean and s is related to the standard deviation; multiply a bunch of things of this form together and by the definition of the exponential, you get out a Gaussian. And then a Gaussian in Fourier space is a Gaussian in real space. It also neatly tells you why functions with infinite variance don't work since they can't be written in that form.)
More generally, linear transformations of Gaussian-distributed random vectors are also Gaussian random vectors. This is one of the main reasons why Kalman filtering works. BTW, convolution is also a bilinear transformation on L^p spaces.
ما شاءالله عليك يا استاذي الله يكتب اجرك ويبارك فيك على شرحك الجميل كل كلمه تقولها احسن انك أضفت معلومه جديده لي شرح سلس و مفصل و عميق ما اقدر واوصف اعجابي بك اشكرك شكرا جزيلا جدا ...اتمنى اكون مثل يوما ما يعجبني ايضا استخدامك للبايثون اتمنى لك التوفيق والنجاح ...رغم اني اتمنى اشوف كل فديوهاتك لكن الانترنت في بلدي غالي لذالك اشوف مقاطع المهمه
I've been watching for a while now, Idk why I haven't subscribed till now, but I love your videos. I've always found it fascinating that there is an awesome maths channel with a logo that has relatively the same shape as one of my eyes :) (the brown spot is even in the right place too)
After taking AP stats in my high school senior year, I'm glad this series tied up some loose ends of that course. Thanks for all the amazing insight! By the way, I was wondering if you could possibly do a video based on a problem I solved and want to confirm my answers on. It goes like this: You have a line segment of any arbitrary length (it doesn't matter). If you cut it in two random places, what is the probability that the three new segments form a triangle without any excess length left over? Again, I believe I know the answer, but I still feel the need to have my results confirmed. I'm also curious if there is any extra insight that can be provided based on problems such as this one. Again, thanks for making this series, and I can't wait to hear what more spicy knowledge you have in store for us!
Here's a little idea that I figured out while thinking about catalysts in my high school chemistry class. There is a mysterious fact that's taught just for road memorization in chemistry, that catalysts lower energy use of activation but they don't shift equilibriums. This is broadly been explained as, if catalysts could shift equilibriums then it would be possible to add and remove catalysts from a reaction chamber, shift the equilibrium back and forth, and essentially build a perpetual motion machine from what you could generate power. This fact was mysterious to me until I realized that the distribution of energies in molecules bouncing around a reaction chamber approaches the normal distribution. The normal distribution. The amount of each reactants and products is only determined by the relative differences in energy and the temperature, not the ease of transition. This would not be true for any other distribution I can think of
8:47 remember on screen what is shown is (mean, variance) but what is said aloud is (mean, standard deviation). Don't get them mixed up if you're still new to stats lmao
I don't know if it gets mentioned, but I think one of the beautiful aspects of the Gaussian is that it's an eigenfunction of the Fourier transform, meaning the Fourier transform of a Gaussian function is just a Gaussian function. That's another way to look at why the sum of two Gaussians is a Gaussian, because the convolution turns into a product, so of course the product of two Gaussians is going to remain a Gaussian.
Another way to simplify the reason why the 3D visualisation needs to be scaled down by root 2: Take a square with side length 1 The diagonal is root 2 Scale down by root 2 and you get 1
11:18 I have often wondered, "Why the Gaussian?" Finally, Grant explains it in Step 2 which, of course, depends on Step 1. There is a universal shape that the process of convolutions tends towards in the infinite. What is that shape? Well, a Gaussian always results in another Gaussian after N convolutions, so the universal shape must be a Gaussian. Wow, I think my brain just did some Aristotleian gymnastics there. And if you can wrap your head around that and convince yourself that the logic is true, then ok.
I'd love to see a video on deconvolution, and its applications. One noteworthy one is basic processing of an image from a telescope. The aperture (typically circular) applies a convolution of a rectangle function to the incoming light. Convolving the resulting image with the inverse of the rect function will remove the distortions caused by the aperture. One strategy on smaller telescopes (especially using film instead of digital sensors) to avoid this is to put a filter on the aperture whose opacity follows a Gaussian, clearest in the center and darkest at the edge. This minimizes the distortions of the image coming through the telescope and avoids the need to process it afterward.
Am I the only one who does not find pleasure in statistical functions, and prefers topics that talk about deterministic functions and definite equations?
This isn't Statistics, it's Probability. There are no random processes at all in the video, everything Grant talked about in this entire series is entirely deterministic.
dang. we didnt get that friend in the coffee shop being satisfied as to what circles have to do with populations in this video. still realy good for us to head the end of it.
When I learned that the area under the Gaussian curve and Γ(1/2) are the same and equal to sqrt(π) I was blown away. It was like seeing an interesting cameo in my favourite movie.
at 5:44, being super-clear and specific: the properties that imply a 2D Gaussian are (i) a function x and y only through r, and (ii) independence, expressed as the functional equation g(r) = f(x)f(y) you mention independence earlier and it's on the screen in the upper right but i think it's worth emphasizing that it's essential to the derivation
The gist is that gaussian distribution is a distribution such that convolution of itself becomes itself. For CLT's final target, it has to satisfy the property that convolution of itself is itslef. Therefore CLT's target is gaussian.
9:30 I just assumed this based on the fact that Galton boards are composed of smaller Glaton boards. Might not be a smart assumptions, but it feels right, and that's what matters.
I actually predicted much of the proof in the previous video - that the convolution of two Gaussians is still a Gaussian. In fact, it felt very intuitive, now with the new visualizer of convolutions, with the diagonal slices and its rotational symmetry, some of you should have got it.
Thank you for this (once again) amazing video ! I think seeing this property as "the mean of 2 identical gaussians is this same gaussian" gives another intuitive reason for the CLT : there are a lot of links between convergence and fixed points ! Not all fixed points are attractive, but nevertheless finding a fixed point of some process might give you the intuition that this process tends to transform everything into its fixed point (Not really used to talking about math in english so i hope this is understandable)
Awesome video as always! I don't think I've seen you do it yet, but I would love to see you tackle explaining how and why the RSA encryption algorithm works.
I made a video covering a proof of the central limit theorem, that is, answering why there is a "central limit" at all. It's currently posted for early viewing on Patreon: www.patreon.com/posts/draft-video-on-i-87894319
I think the video has room for improvement, and decided to put it on a shelf for a bit while working on other projects before turning back to it. In the meantime, though, if you are curious about why all finite variance distributions will tend towards some universal shape, it offers an answer.
Also, you may be interested to know that a Gaussian is not the only distribution with the property described in this video, where convolving it with itself gives a (rescaled) version of the original distribution. The relevant search term here is "stable distributions", though all others will have infinite variance, so don't fit the criteria of the CLT. Often when the CLT doesn't apply, it's because the independence assumption doesn't hold, but another way it can break is if you're starting with one of these infinite variance cases.
please make a post there about the complete software stack you're using to make your videos!
9:08 the second part has "transformatoin" at the top
It's kinda hidden, but for people who prefer RSS to mailing lists, it's at /feed
Are the functions at 0:10 stable distributions? When you started talking about rotational symmetry I was expecting you to bring up a visual graph of one of those functions convoluted with itself and explain why it doesn’t have the special property, but instead 5:55 only shows trivial examples and my curiosity about this question remained unanswered. Is it because the functions from 0:10 are stable distributions? If not, why weren’t they shown when they would have been much more interesting demonstrations of the Gaussian’s specialness than trivial examples?
Grant, could you please make a video on when the discrete can be approximated by the continuous.
For example, in this series you showed that discrete random variables added together approach a continuous normal distribution, and you did discrete and continuous convolutions. But what is the error formula one would get by assuming, say, that d6 dice are continuous valued, get your continuous convolution answer, and then take discrete samples of that answer to match the actual discrete nature of d6 dice.
I find it much easier to integrate a 'nice' function then it is to simplify a discrete Σ sum.
I have to laugh. "Why the normal distribution?" was one of the questions that motivated me to get my M.S. Stat a couple of decades ago. I'm loving this series - it adds so much clarity to what I recall learning.
for you it's normal distribution and for me it's in general all probability distribution.But I am pursuing that now not 10 years ago but more focus on Machine Learning with inclination towards mathematics and statistics part.
One level of brilliance is simply to be brilliant. Another level is to be able to explain and teach. Yet another level of brilliance is to be able to clearly visualize & present the advanced concepts.
Wow.
No words.
I love that this series actually started with the Borwein integrals video. Like, here's a very curious sequence of integrals and here's an interesting concept to explain it, and then five videos later we've dug so far deep into convolutions that we got an intuitive explanation for one of the most important theorems in all of math. It's all interrelated!
Grant, this has been an absolute masterclass and I genuinely believe it has been your best work so far. Your visualisations have been top notch and it has brought concept space applied to mathematics to a level not seen before, all publicly accessible through UA-cam. You are making mathematics a better field for the entire world. Thanks for your hard work!
That made for a great lunch 😁.
In your last video you described the Gaussian as an “attractive point in the space of all functions” and I LOVED that phrasing - really made it make sense. However I don’t do enough real math to realize that could be the foundation of a proof. That’s pretty cool.
Agreed! :)
I'm at work eating my lunch and people around me sometimes ask, "Oh are you in school?" and I'm like, "Nope, just an engineer like you that likes learning the math that was never taught!"
The legend is here! 🙏🏽🛐
i also had lunch to this video
What a funny coincidence, I'm having dinner to this video (I decided 7 hours of watching brainrot content was enough and fired up something that's intellectually nourishing)
I like this related explanation:
Let X and Y be independent normal random variables, and write S = X+Y for their sum.
You can think of S as the dot product of the 2-d vectors (X,Y) and (1,1).
As Grant said, the key aspect of the normal random variables is that if you take a draw a pair of them, the result is rotationally symmetric.
Now dot product is *also* rotationally symmetric (the dot product between two vectors only depends on their lengths and angle).
So the distribution on S would be the same if we rotated (1,1) to any other vector with length sqrt2; in particular, to (sqrt2,0).
But (X,Y) dotted with (sqrt2,0) is just sqrt2 X, so we see that S is distributed as (sqrt2 times) a normal random variable.
thank you for always making sure to show the "roadmap" before diving into the details! Knowing the broad outline beforehand really makes things easier to follow, and it's something that a lot of other explanatory videos/articles don't bother to do.
Thank you for bringing us amazing math content Grant! The world needs it! Enjoying my afternoon coffee while watching this one! :)
After all the cliffhangers, it's nice to get this series all wrapped up so neatly.
Wrapped up? He didn't prove the central limit theorem at all. Which is supposedly what this was all about. This video itself barely adds anything at all to the previous ones.
Moment generating functions are really not all that complicated - it's high school stuff really. And it gives a much clearer intuition for why a Gaussian is the limit in the central limit theorem: it's the unique probability distribution that has a mean and a standard deviation but no higher moments. In other words it's the simplest* distribution: the one that can be described by the least information. Anything else like skew or asymmetry is "averaged out". Sadly, Grant is so obsessed about representing things visually that he brushes over alternatives that are at times far clearer and more powerful ways of understanding this.
* [technically the simplest would be a point distribution were a single outcome has probability 1 and everything else probability 0, but that hardly counts as a distribution. And anyway, it's just a special type of Gaussian with width 0].
EDIT: I got mixed up, replace "moment" with "cumulant" above to correct it. Intuition is the same.
@@QuantumHistorian This series is an excellent demonstration of the idea of limits, not just in that the videos are all about the central limit theorem, but also in that he's tending towards the proof of the central limit theorem without ever actually reaching it.
@@QuantumHistorian This series of video is clearly not to give a fully technical answer but rather an intuitive view of why it's true. I also agree that the visual "trick" here does not seem to simplify a lot the work given that the integral is already easy to compute using trigonometric change of variable that arise naturally, but maybe i'm biased by my own experience.
You can probably argue that the purpose of the cliffhanger is to encourage the viewer to ponder upon a new solution. That's very much the format of his videos. 3Blue1Brown will never tell the viewer the answer but rather allow open-ended interpretation.
@@KingDuken That's not even remotely true. He starts with hints, but he almost always gives the full solution at the end. Look at the recent video on chords, or older ones on the chessboard puzzle or the Basel problem.
Now there are so many great explanations on this channel, that it really completes making one understand it.
I was studying statistics right now and saw this drop
For real. Happened twice now. With binomial and this.
They are watching..... There will come a time when they will order us....
I guess Grant calculated the time of day with the highest probability that the world population would study statistics and then release the video at that time, lol
Lucky you
Same
WONDERFUL THANKS FOR INSPIRING AN ENTIRE GENERATION TO GET AND UNDERSTAND THE TRUE BEAUTY OF MATHEMATICS
Congratulations on finally wrapping up this pseudo-series. They’re some of my favorite videos you’ve done!
honestly one of of the best series on youtube
Binomials with same p are stable under convolution, Poisson distributions as well. The normal distribution is not unique in that regard. Even Cauchy distributions are stable without having any moments or satisfying the CLT.
If I had to pick any intuitive reason why the normal distribution shows up in the CLT, I enjoy the fact that the normals cumulants are all zero from the third and that a standardized iid sum’s cumulants hence all tend to those of the standard normal distribution whenever they exist.
Also, not all standardized sums converge in distribution to a normal distribution. The limit can be a Gumbel distribution for example as well.
Thanks!
Please oh please do a video on the Kalman filter, given how indescribably important it is to our modern existence. The result that the convolution of two Gaussians is a Gaussian is at the heart of the Kalman filter's magic.
Yes…. So much yes to this, would intersect so many core bits of interest perfectly
My friend...i can't thank you enough for the "Essence of linear algebra" videos
I want to talk about a strange area of probability, where random variables no longer commute: Random Matrices
You can define the expectation of a random matrix to be the expectation of its trace, which Essentially is the distribution of its eigenvalues.
It turns out, theres a new kind of central limit theorem, known as the "Free central limit theorem"
This theorem says that if you have "Freely independent" random matrices, then the mean's eigenvalue distribution tends towards not a normal distribution, but a semicircular distribution.
In this probability theory (known as free probability theory), a free convolution exists, which essentially gives the distribution of eigenvalues of X+Y. It turns out the semicircle distribution convolved with itself is another semicircle, much like a normal distribution in classical probability.
Is this what we called ''Wigner semicircle law''?
What incredible content. I think like once a year I revisit the same list of statistical oriented content.
Between Grant, Richard Mcelreath and Josh Starmer. You really get your bases covered on great stats content.
Last time, just after I completed IFFT, you dropped a video on continuous convolution. Yesterday, I finished studying Bivariate Normal distribution and you dropped this. Perfect timing for me!
My abstract brain would have loved showing that Gaussians aee their own convolution via the Fourier Transform, since a convolution in coordinate space is multiplication in momentum space (spot the physicist), and since an FT of a Gaussian is a Gaussian, and the product of two Gaussians is a Gaussian, then the convolution of two Gaussians must also be a Gaussian.
But, this is an incredibly satisfying explanation. I'm not left wanting, and after being in the field for nearly a decade, I'm glad to see a frequent concept intuited so cleanly, without the need for arcane notation. ❤
My god he drops a video relevant to the topic I take literally after I finish it
Man I just love your videos
Even though I'm way past the time of having genuine will and ability to learn abstract mathematics (living in wartorn hell doesn't really help) but they still give me a sad and lovely nostalgia of the things I love
I'm just really glad I learned about your channel and watched it grow without losing any of the great things that made it simply extraordinary
After having received my Bachelors of Math this past December, I now just realized why we get that sqrt(2) when finding the convolution. The geometric visualization is extremely easy to understand! (I’m sure I derived this back in first year, but I must have forgotten lol)
Humanity will always be grateful for your superbly amazing, impactful, and meaningful work. I'm confident your viewers are the best candidates to improve our entire world. It's inspiring to see how your efforts can enhance our understanding of the world and empower people to engage with sophisticated ideas. With your powerful content, you hold the impressive potential to inspire and educate countless individuals, fostering a deeper appreciation for math and its importance in our lives. Such efforts unquestionably play a crucial role in advancing our society as a whole. Thanks a million, Sir 3Blue1Brown. You are genuinely enhancing our world with the most insightful visual content currently available. Please continue for good.
You make me finally understand why CLT works, thanks ❤
4:57 "For a mildly technical reason you need to divide by sqrt2" - dude that was the first thing I really understood so far! :D But this is more advanced than any math I've studied so I'm ok. I'm glad this seemed obvious to me.
Grand was and is my source of inspiration to master mathematics, and to become linguistically accurate! One of my hero ❤.
Thanks so much for this--it makes it really clear. And the 3-dimensional model is really a lot more like a bell! (although I know that actual bells have a somewhat different shape). I've been using the concept of combining Gaussian (and uniform) distributions for a while now in my (Scala) library called Number. It keeps track of the error bounds on variables. If keeping track of relative bounds, it's easy: for multiplication, just add the relative bounds together; for functions like e^x or x^p, only very slightly more complex. But, for addition, we need to convert to absolute bounds and use the convolution that you've been describing.
You are partly the reason I am in love with statistics. Thank you. ❤
I was wondering about this topic for a while because I didn’t quite get this concept intuitively. And then 3blue1brown dropped this !!
It's been 7 years since I took calculus but this is a great way to revisit those concepts. Thank you!
Hi Grant,
I've been a long-time follower of yours, and I've noticed that the last few videos seem to have a lot of content taken from general undergraduate math. Math puzzles for the general public are limited after all, and I would suggest that you consider expanding your audience a bit by including content that could tolerate the more frequent use of mathematical derivations while remaining the original style of generalization for the general public, so that undergraduate non-math students can get a feel for the general framework of the greater math thinking.
I have recently been interested in Brouwer's Fixed Point Theorem, whose calculus proof connects the theorem's original literal meaning of there must be a center of the vortex when you stir coffee, and, the idea of the inability to stick a two-dimensional unit disk continuously to the unit circle as boundary without tearing it, but is just an ordinary mapping that connects two distinct and plain intuitions, and whose proof process is very understandable to students who have only taken calculus, but can let them see a greater picture that calculus can be used for more than just poles and integrals, and I think you can go with such mathematical examples in your videos.
@3Blue1Brown
Moment generating function is directly tied to the characteristic function E(e^(itX)). In some classical textbooks (Casella and Berger) it is used to prove both CLT and its convergence to normal distribution, as well as to derive the pdf of normal (through talyor expansion iirc). Since cf is a Fourier transformation of the pdf, convolution of two independent normal RV can also be easily proven to be also normal by characteristic function E(e^(it(X+Y)). It's been a decade and I still remember that was how I was taught because how elegant it was.
Also after polar transformation, the (r, theta) coordinates corresponding to a normal distribution makes its properties really clear, and why it's the only option. This is super neat maths.
Wow it's already in the playlist.. thank you. I wanted to study this for so long
Another important question is how fast the normalized sum of iid random variables’ distribution converge to that of the Gaussian. One way to quantify this is to ask max_A | P[S in A] - P[W in A]| where S is the sum and W is the Gaussian. This maximum scales as constant/sqrt(N) and is known as the Berry-Esseen theorem. The constant depends on the third moment and the variance. If you need an intuition for why the scaling is 1/sqrt(N), the answer would be the gaps between cumulants of S and W. Their first 2 cumulants are the same by design (mean and variance). Cumulants of W beyond 3rd degree are all zero. Cumulants of S beyond 3rd degree go as c/sqrt(N), d/N, … If you relate this gap with inverse Fourier transform, you will get probability gaps. And that c/sqrt(N) gap in the third cumulant leads to the scaling in Berry-Esseen Theorem.
The order of scaling (1/sqrt(N)) is also quite universal. You don’t necessarily need iid. For example it works for independent sums and Markov-1 sums. The dimension can be more than 1. You can even pass the sum through a smooth function.
FYI there is a small typo at 9:10 in the challenge problem, "The transformatoin of the line..."
Thank you visualizing this connection!
This video is a joyous moment in maths communications, as all your videos are.
yo everyone replay at 6:37 to make the most replay graph a normal distribution
Sweety!! The guy with 25% brown 75% blue uploaded
always enjoy ur videos. it's nice to watch them and make some connections i might've missed from my time in school
Wondeful video! The feel i got (in high school) when i proved something by symmetry always made my day cheer up! Usually these are the most elegants approches to do and the simplest in intuition. Much respect ❤
This is what I needed, was working on my project on Central limit theorem in various scenarios.
The animation at 7:17 about rotating your radius r to be perpendicularly aligned with the background x-y Cartesian grid is super. Like again animation is providing a very immediate, visual, and physically informed intuition / feeling that if you rotate it one way to align with the grid you'll preserve the area and simplify your computation. Just a small detail but these animations are great thank you very much!
Like it's almost like the feeling in linear algebra when you change to a natural (eigen) basis to decouple your vectors/directions and then the computation just proceeds orthogonally along their individual axes, not interfering with each other and making the computation much more literally straightforward. So like rotation for a better coordinate system. This was a cool video thanks!
A mailing list! Awesome. I loved Tom Scott doing it and now you too? Amazing!
I really loved the math since the day I started watching your videos not gonna lie!
Majority of your video goes top of my head 😅 as I'm not a good student. But i come here and watch your evey video because of your representation.
Thank you 😊
This is a very elegant explanation of what makes the normal curve so special, but it still seems a little
[puts on sunglasses]
...convoluted.
No one believed that math could be soooooooo beautiful before ur channel was created
Can't wait for the step 1 explanation. Because this is what I expected from the title.
Step 1 explanation is the thing that I have been waiting from this series... The series has been making a point about distributions approaching a normal distribution, and then the finale (or I think this is supposed to be the finale) skips the whole reasoning as to why they approach it in the first place. I hope he will be making a video about it.
@@HoxTop same
The entropy explanation is really interesting and makes a lot of sense. As far as I can tell, what it is saying is that: noticing that convolving many different distributions leads to a gaussian distribution, is the same as noticing that repeated sampling the microstate of a system, which is the same as sampling N independent atomic distributions (or approximately independent... or not, depending on your system) of an equilibrium (maximal entropy) system, for large N, will always correspond to the same value of a macrostate variable.
Very very cool. Never learnt convolutions that way!
MY Statistics and calculus professor love your video.
Thank you very much for your hard work, the result is so pleasing. I’ve discovered your channel with the neural network series and I’ve been enjoying your videos ever since. You rekindled in me the taste for mathematics. Greetings and best regards from France
So great seeing this video finally come out just as I finished statistics
I've forgotten pretty much everything I learned in college, but one thing I kind of sort of remember is that one way to convolve two functions is to take their Laplace transform and then multiply them. Convolution in the time domain is multiplication in the frequency domain, basically.
@3Blue1Brown could you please do a series of videos for the time series analysis, I think we need a visual and intuitive explanation for a lot of things there! Thank you 😊
I'm a physics person so maybe this is why, but I really like the Fourier space argument (in Fourier space, a convolution becomes a product; any normalised PDF with finite variance looks like (1-(x/s)^2)e^imx where m is the mean and s is related to the standard deviation; multiply a bunch of things of this form together and by the definition of the exponential, you get out a Gaussian. And then a Gaussian in Fourier space is a Gaussian in real space. It also neatly tells you why functions with infinite variance don't work since they can't be written in that form.)
I didn’t watch already but thank you for this video, none of my university teachers ever explained this when studiying probabilities !
Because they never apply it.
More generally, linear transformations of Gaussian-distributed random vectors are also Gaussian random vectors. This is one of the main reasons why Kalman filtering works. BTW, convolution is also a bilinear transformation on L^p spaces.
ما شاءالله عليك يا استاذي الله يكتب اجرك ويبارك فيك على شرحك الجميل كل كلمه تقولها احسن انك أضفت معلومه جديده لي شرح سلس و مفصل و عميق ما اقدر واوصف اعجابي بك اشكرك شكرا جزيلا جدا ...اتمنى اكون مثل يوما ما
يعجبني ايضا استخدامك للبايثون
اتمنى لك التوفيق والنجاح
...رغم اني اتمنى اشوف كل فديوهاتك لكن الانترنت في بلدي غالي لذالك اشوف مقاطع المهمه
This question popped back into my head yesterday so good timing
I've been watching for a while now, Idk why I haven't subscribed till now, but I love your videos. I've always found it fascinating that there is an awesome maths channel with a logo that has relatively the same shape as one of my eyes :) (the brown spot is even in the right place too)
Freshman me would thank you a lot. "Why Normal?" is the most unanswered question throughout my stats undergrad.
Love the videos! They have “re-sparked” my interest in math
When you upload video I feel happy because I learn new concept
This channel is one of the most popular chnl in the field of advance maths..❤❤
After taking AP stats in my high school senior year, I'm glad this series tied up some loose ends of that course. Thanks for all the amazing insight!
By the way, I was wondering if you could possibly do a video based on a problem I solved and want to confirm my answers on. It goes like this: You have a line segment of any arbitrary length (it doesn't matter). If you cut it in two random places, what is the probability that the three new segments form a triangle without any excess length left over? Again, I believe I know the answer, but I still feel the need to have my results confirmed. I'm also curious if there is any extra insight that can be provided based on problems such as this one. Again, thanks for making this series, and I can't wait to hear what more spicy knowledge you have in store for us!
*@[**00:08**]:*
-I am guessing it has to do with factorials and summations.- Nevermind.
Here's a little idea that I figured out while thinking about catalysts in my high school chemistry class. There is a mysterious fact that's taught just for road memorization in chemistry, that catalysts lower energy use of activation but they don't shift equilibriums. This is broadly been explained as, if catalysts could shift equilibriums then it would be possible to add and remove catalysts from a reaction chamber, shift the equilibrium back and forth, and essentially build a perpetual motion machine from what you could generate power. This fact was mysterious to me until I realized that the distribution of energies in molecules bouncing around a reaction chamber approaches the normal distribution. The normal distribution. The amount of each reactants and products is only determined by the relative differences in energy and the temperature, not the ease of transition. This would not be true for any other distribution I can think of
8:47 remember on screen what is shown is (mean, variance) but what is said aloud is (mean, standard deviation). Don't get them mixed up if you're still new to stats lmao
I don't know if it gets mentioned, but I think one of the beautiful aspects of the Gaussian is that it's an eigenfunction of the Fourier transform, meaning the Fourier transform of a Gaussian function is just a Gaussian function. That's another way to look at why the sum of two Gaussians is a Gaussian, because the convolution turns into a product, so of course the product of two Gaussians is going to remain a Gaussian.
wayyyyy too beautiful i cannot process
Would the same apply to a Laplace transform?
@@iankrasnow5383yes
Great vid as always
Everyone wake up, new 3blue1brown video just dropped!
Clicked on the video knowing it would be over my head. Was not disappointed.
Thank you for the shoutout at the end! -Daksha
I cant wait what you are up to on the new channel. Take care!
I would love you extending this series on gaussian distributions and CLT for when there is correlation and/or dependency.
3b1b upload, yessss!!! gonna watch it later though
I did a whole stats degree and never fully got this thank you again grant
I didn't understand anything but I enjoyed the video🙂
Another way to simplify the reason why the 3D visualisation needs to be scaled down by root 2:
Take a square with side length 1
The diagonal is root 2
Scale down by root 2 and you get 1
Or that the cos45 or sin45 of s will be s x root(2)/2 which is the same as s/root(2)
I'm not able to understand. Can you please elaborate?
@@thevinarokiarajl2149 Pythagoras
11:18 I have often wondered, "Why the Gaussian?" Finally, Grant explains it in Step 2 which, of course, depends on Step 1. There is a universal shape that the process of convolutions tends towards in the infinite. What is that shape? Well, a Gaussian always results in another Gaussian after N convolutions, so the universal shape must be a Gaussian. Wow, I think my brain just did some Aristotleian gymnastics there. And if you can wrap your head around that and convince yourself that the logic is true, then ok.
I'd love to see a video on deconvolution, and its applications. One noteworthy one is basic processing of an image from a telescope. The aperture (typically circular) applies a convolution of a rectangle function to the incoming light. Convolving the resulting image with the inverse of the rect function will remove the distortions caused by the aperture.
One strategy on smaller telescopes (especially using film instead of digital sensors) to avoid this is to put a filter on the aperture whose opacity follows a Gaussian, clearest in the center and darkest at the edge. This minimizes the distortions of the image coming through the telescope and avoids the need to process it afterward.
Am I the only one who does not find pleasure in statistical functions, and prefers topics that talk about deterministic functions and definite equations?
This isn't Statistics, it's Probability. There are no random processes at all in the video, everything Grant talked about in this entire series is entirely deterministic.
dang.
we didnt get that friend in the coffee shop being satisfied as to what circles have to do with populations in this video.
still realy good for us to head the end of it.
When I learned that the area under the Gaussian curve and Γ(1/2) are the same and equal to sqrt(π) I was blown away. It was like seeing an interesting cameo in my favourite movie.
at 5:44, being super-clear and specific: the properties that imply a 2D Gaussian are (i) a function x and y only through r, and (ii) independence, expressed as the functional equation g(r) = f(x)f(y)
you mention independence earlier and it's on the screen in the upper right but i think it's worth emphasizing that it's essential to the derivation
The gist is that gaussian distribution is a distribution such that convolution of itself becomes itself.
For CLT's final target, it has to satisfy the property that convolution of itself is itslef.
Therefore CLT's target is gaussian.
This was one of the best (conclusion) videos i've ever seen, holy mama
Golly id love a little on entropy and it's application here. Important almost even
🤫@ 7:34 to 7:37 the truth was succulently and eloquently spoken.
9:30 I just assumed this based on the fact that Galton boards are composed of smaller Glaton boards. Might not be a smart assumptions, but it feels right, and that's what matters.
I actually predicted much of the proof in the previous video - that the convolution of two Gaussians is still a Gaussian. In fact, it felt very intuitive, now with the new visualizer of convolutions, with the diagonal slices and its rotational symmetry, some of you should have got it.
pompous mf
I agree. Some of us might have gotten it. Refreshers needed😅
No way I was just wishing for a video about this from you like 2 weeks ago
Тот случай, когда в потоках информации вдруг удается послушать действительно умного человека
Thank you for this (once again) amazing video !
I think seeing this property as "the mean of 2 identical gaussians is this same gaussian" gives another intuitive reason for the CLT : there are a lot of links between convergence and fixed points ! Not all fixed points are attractive, but nevertheless finding a fixed point of some process might give you the intuition that this process tends to transform everything into its fixed point
(Not really used to talking about math in english so i hope this is understandable)
Awesome video as always! I don't think I've seen you do it yet, but I would love to see you tackle explaining how and why the RSA encryption algorithm works.