Beautiful video and brilliantly done. I paused at points so I could really drink it in. Great job. I’m in awe. And that was a heartwarming tribute to a great teacher.
17:45 since the expected value of V is p, on the left hand side we have the variance of V. Which can be computed by summing the variance of the individual independent Bernoulli variables C_i which is p_i (1 - p_i) what we have at 18:43. Nice Video! (Or something like that. We have random vectors, but I think we can make an argument along these lines)
Each time we add a new vector to the basis, longest distance between opposite vertices grows at least as sqrt(n): there were two opposite vertices with distance at least sqrt(n-1) by induction, adding a new basis vector to both of them gives us two more points, which all together form a parallelogram, its longest diagonal (opposite to obtuse angle) is at least sqrt((sqrt(n-1))^2+1^2)=sqrt(n). Let us then take any point and these two opposite vertices: it lies at least sqrt(n)/2 away from one of them (if both distances are shorter, triangle inequality is not satisfied). So the maximum distance is at least sqrt(n)/2.
Awesome! I remember reading some stuff on how the kolmogorov probability axioma is just a set of rules, which we link to randomness because of the way we often use this. This video showed me atleast what the writers of that statement meant; our analysis op the axioms of probability can be applied in situations where those axioms hold, even if the object of interest does not have to do with randomness!
Very nice video. I always enjoyed probabilistic proofs when they arose in surprising areas. Erdös' bound on Ramsey numbers is a simple one but the argument is still charming. More complicated subjects in discrete math often hinge on probabilistic tools like the Lovász local lemma or Szemerédi's regularity lemma (used in proof of the Green-Tao theorem).
I think that the best way to describe probability to a layperson is that it is a way to quantify how much we do not know about system. It could be actually random or just that we do not know enought for us to perceive it at random. This way of thinking nicely ties in with information theory and to direct applications such as telecomunications and stuff.
The Bayesian interpretation of probability is very reasonable, I just felt like it wasn't "hands-on" enough for my purposes here, I'm just more comfortable with the frequentist approach for an explanation.
At 17:40 Jensen’s inequality should probably be mentioned. The modification from proving about E[d] to proving about E[d^2] is not trivial, since in general (E[d])^2 does not equal E[d^2]. Fortunately for this problem, using Jensen’s inequality, we have (E[d])^2
Actually, the reason for the equivalence is simpler and hinted at in the video. If 𝔼[D²] ≤ A², then there exists a vertex for which D² ≤ A² (I can't be bothered to properly typeset the bound), which means that D ≤ A, which is what we had set out to prove. So the two formulations are equivalent because "D² ≤ A² ⇔ D ≤ A" in this context.
@@syst3ms413 Yes, I agree the two formulations are equivalent for the proof of this particular problem. Of course the two expectations are not equal, just wanted to highlight that. I’m sure you know this 🙂 Nice video and great proof!
Thanks for mentioning it, I came looking for such a comment. Btw you can also use Variance instead of Jensen: we know Var(d) ≥ 0, and Var(d) = E[d²] - E[d]², so E[d²] ≥ E[d]². We can even get a strict inequality because d is not constant, cheers
simple: the maxidistant point is always equidistant to opposing vertices, and the bisector point between opposing vertices of a parallelotope is by definition the center; the average position of all vertices. Proof by symmetry
Great video, and your professor sounds amazing! Sort of unrelated, but one of my favorite counterintuitive facts: Which point inside the unit square maximizes the product of distances to the four vertices? (It's not the center.)
excellent video and effort! one note: in 3:30 the conjecture should be "the furthest a point inside of it can be from its CLOSEST vertex is sqrt(n)/2", right?
You can interpret as being kind of the same thing. If a point is at a distance x from its closest vertex, then it's at a distance (at least) x from every other vertex. Maybe that wasn't really explicit though.
good video. it would be better to say around 4:30 the equations are elsewhere to make room for subtitles. it will greatly benefit the hearing impaired to read the equations at the same time as the subtitles
I'd say this can be understood as a rounding argument. Instead of rounding to the closest, round the coordinates probabilistically by using Bernoulli random variable. The value closer to 1 has higher probability of becoming 1 as so on.
@@syst3ms413 No. I meant a random rounding. Let p = (p_1, ..., p_n). Let X_i ~ Be(p_i) be independent. Consider the random vertex V = (X_1p_1, ..., X_np_n) of the parallelotope. Then the E[|V - p|^2] can be computed as you did. The interpretation is that V is a "random rounding" of p. The i-th coordinate has the propensity to become equal to 1 with probability p_i, and thus if a coordinate is close to 1 it has high propensity to be rounded to 1. It makes intuitive sense to me that this way one will get a vertex close (not necessarily closest) to p with high probability. As a general philosophy, I think of probability as the mathematics of the very complex, rather than mathematics of randomness (of course this is just philosophy). Thus, no wonder, it appears as a method rather than just as a subject.
Without getting very far into this, I'm gonna guess that you get an unexpected answer in higher dimensions due to the same effect that can cause the hyperspheres at the corners of a hypercube to become bigger than the cube itself. I forget the full description of the phenomenon but I do know there are a few videos about it.
1:13 Would not there be two off-center points, where the minimum distance would be maximized? The distance from the points on the shorter diagonal would increase if I move the point on the longer diagonal, so it can't be at a maximum. What did I misunderstand?
@@merouan3922 I mainly used Manim, the same library used by 3b1b. There was barely any other editing needed, and for that I used OpenShot, but i don't recommend it, it was pretty bad
I follow the proof but I don't quite follow how having at least one point with distance less than √n/4 shows that the maximum distance is achieved at √n/4. Wouldn't you want to show there are no points with distance greater than √n/4?
You fix an arbitrary point inside a parallelotope. Then you choose a vertex of the parallelotope randomly. If on average the distance to the fixed point is less or equal than sqrt(n) / 2, then at least on of the vertexes is close enough. So for an arbitrary point we showed that there is a vertex such that the distance between them is less or equal than sqrt(n) / 2
As the previous comment says, the point I fixed in the beginning is perfectly arbitrary. If I can show the theorem for an arbitrary point, it means it's true for all points.
Probability is real. The math you use to operate on it is hard and true. As long as you don't actually have to work with a dataset somewhere, you can give perfectly accurate analysis. Expected value of the binomial distribution *is* p, because that's what it's defined as. You only get unsure if your sample size is anything below infinity. Just because matrix operations are all over video processing doesn't mean we can't use Graphics Processing Units for machine learning. It's just branding at that rate.
There's promise here, but honestly I was very lost. Lots of confusing words. Like this seems interesting if I had a bachelor's or maybe was working on my masters in mathematics. Or maybe I'm just dumb, which I'm not a genius, lol. But I feel like things were difficult to relate to each other.
Wow, that's an incredible video! One of the best Some2 submissions in my opinion.
Beautiful video and brilliantly done. I paused at points so I could really drink it in. Great job. I’m in awe. And that was a heartwarming tribute to a great teacher.
17:45 since the expected value of V is p, on the left hand side we have the variance of V. Which can be computed by summing the variance of the individual independent Bernoulli variables C_i which is p_i (1 - p_i) what we have at 18:43.
Nice Video!
(Or something like that. We have random vectors, but I think we can make an argument along these lines)
lately been reading more about probabilistic proofs and honestly they are so beautiful.
Makes it so beautiful once you see it all come together. Brilliant work!
Each time we add a new vector to the basis, longest distance between opposite vertices grows at least as sqrt(n): there were two opposite vertices with distance at least sqrt(n-1) by induction, adding a new basis vector to both of them gives us two more points, which all together form a parallelogram, its longest diagonal (opposite to obtuse angle) is at least sqrt((sqrt(n-1))^2+1^2)=sqrt(n).
Let us then take any point and these two opposite vertices: it lies at least sqrt(n)/2 away from one of them (if both distances are shorter, triangle inequality is not satisfied). So the maximum distance is at least sqrt(n)/2.
Awesome! I remember reading some stuff on how the kolmogorov probability axioma is just a set of rules, which we link to randomness because of the way we often use this. This video showed me atleast what the writers of that statement meant; our analysis op the axioms of probability can be applied in situations where those axioms hold, even if the object of interest does not have to do with randomness!
Very nice video. I always enjoyed probabilistic proofs when they arose in surprising areas. Erdös' bound on Ramsey numbers is a simple one but the argument is still charming. More complicated subjects in discrete math often hinge on probabilistic tools like the Lovász local lemma or Szemerédi's regularity lemma (used in proof of the Green-Tao theorem).
I think that the best way to describe probability to a layperson is that it is a way to quantify how much we do not know about system. It could be actually random or just that we do not know enought for us to perceive it at random. This way of thinking nicely ties in with information theory and to direct applications such as telecomunications and stuff.
The Bayesian interpretation of probability is very reasonable, I just felt like it wasn't "hands-on" enough for my purposes here, I'm just more comfortable with the frequentist approach for an explanation.
@13:28 a pop team epic screenshot, I see you're a man of culture
Absolutely amazing video! Subscribed.
Basically, probability is used to make the proof more intuitive. But it's not really probability, it's just measuring some abstract volumes.
Well this is a video I'll have to watch a second to fully understand
But it is really cool
At 17:40 Jensen’s inequality should probably be mentioned. The modification from proving about E[d] to proving about E[d^2] is not trivial, since in general (E[d])^2 does not equal E[d^2]. Fortunately for this problem, using Jensen’s inequality, we have (E[d])^2
Actually, the reason for the equivalence is simpler and hinted at in the video. If 𝔼[D²] ≤ A², then there exists a vertex for which D² ≤ A² (I can't be bothered to properly typeset the bound), which means that D ≤ A, which is what we had set out to prove.
So the two formulations are equivalent because "D² ≤ A² ⇔ D ≤ A" in this context.
@@syst3ms413 Yes, I agree the two formulations are equivalent for the proof of this particular problem. Of course the two expectations are not equal, just wanted to highlight that. I’m sure you know this 🙂 Nice video and great proof!
Thanks for mentioning it, I came looking for such a comment.
Btw you can also use Variance instead of Jensen: we know Var(d) ≥ 0, and Var(d) = E[d²] - E[d]², so E[d²] ≥ E[d]². We can even get a strict inequality because d is not constant, cheers
simple: the maxidistant point is always equidistant to opposing vertices, and the bisector point between opposing vertices of a parallelotope is by definition the center; the average position of all vertices. Proof by symmetry
What a cool proof!
What a lovely epilogue!
Very Interesting Great Job
Love this it's very visual.
Deserves more views!
Great video, and your professor sounds amazing!
Sort of unrelated, but one of my favorite counterintuitive facts: Which point inside the unit square maximizes the product of distances to the four vertices? (It's not the center.)
Halfway in between two of the vertices?
As in, on the perimeter
@@iamrepairmanman I agree. Center gives you 1/4, midpoint of a side gives you 5/16 (which is 1/16 more).
excellent video and effort!
one note: in 3:30 the conjecture should be "the furthest a point inside of it can be from its CLOSEST vertex is sqrt(n)/2", right?
You can interpret as being kind of the same thing. If a point is at a distance x from its closest vertex, then it's at a distance (at least) x from every other vertex. Maybe that wasn't really explicit though.
Awesome video! +1 sub
Nice video!
good video. it would be better to say around 4:30 the equations are elsewhere to make room for subtitles. it will greatly benefit the hearing impaired to read the equations at the same time as the subtitles
Bravo!
I'd say this can be understood as a rounding argument. Instead of rounding to the closest, round the coordinates probabilistically by using Bernoulli random variable. The value closer to 1 has higher probability of becoming 1 as so on.
I think that here by rounding you actually mean "finding the closest vertex", which yes, is the object of the problem.
@@syst3ms413 No. I meant a random rounding. Let p = (p_1, ..., p_n). Let X_i ~ Be(p_i) be independent. Consider the random vertex V = (X_1p_1, ..., X_np_n) of the parallelotope. Then the E[|V - p|^2] can be computed as you did. The interpretation is that V is a "random rounding" of p. The i-th coordinate has the propensity to become equal to 1 with probability p_i, and thus if a coordinate is close to 1 it has high propensity to be rounded to 1. It makes intuitive sense to me that this way one will get a vertex close (not necessarily closest) to p with high probability.
As a general philosophy, I think of probability as the mathematics of the very complex, rather than mathematics of randomness (of course this is just philosophy). Thus, no wonder, it appears as a method rather than just as a subject.
Okay, I see your point. Sure, why not!
Should the conjecture at 3:30 also specify a maximal edge length of 1 in the parallelotope?
Yes, it should, my bad.
Funny yellow dog does wacky and uncharacteristic math problem
Without getting very far into this, I'm gonna guess that you get an unexpected answer in higher dimensions due to the same effect that can cause the hyperspheres at the corners of a hypercube to become bigger than the cube itself. I forget the full description of the phenomenon but I do know there are a few videos about it.
Well no, for once the result in higher dimensions is actually a clean generalization of the lower-dimensional cases.
@@syst3ms413 found that out lol 😆
1:13 Would not there be two off-center points, where the minimum distance would be maximized?
The distance from the points on the shorter diagonal would increase if I move the point on the longer diagonal, so it can't be at a maximum.
What did I misunderstand?
You are right, but the max distance is less than in a square
Aaah, the argument was about the upper bound being sqrt(2)/2. Now the sides being
What apps u use for making this video
@@merouan3922 I mainly used Manim, the same library used by 3b1b. There was barely any other editing needed, and for that I used OpenShot, but i don't recommend it, it was pretty bad
alright aside from a brilliant illustration of a topic i've almost never seen discussed, your memes are fucking HILARIOUS keep it up m8
I follow the proof but I don't quite follow how having at least one point with distance less than √n/4 shows that the maximum distance is achieved at √n/4. Wouldn't you want to show there are no points with distance greater than √n/4?
You fix an arbitrary point inside a parallelotope. Then you choose a vertex of the parallelotope randomly. If on average the distance to the fixed point is less or equal than sqrt(n) / 2, then at least on of the vertexes is close enough. So for an arbitrary point we showed that there is a vertex such that the distance between them is less or equal than sqrt(n) / 2
As the previous comment says, the point I fixed in the beginning is perfectly arbitrary. If I can show the theorem for an arbitrary point, it means it's true for all points.
At 11:08, without looking up the formula somewhere else, should there be a division by n? And maybe a limit as n goes to infinity?
Wait, I bet n is the number of dimensions we’re using.
cool
Probability is real. The math you use to operate on it is hard and true. As long as you don't actually have to work with a dataset somewhere, you can give perfectly accurate analysis. Expected value of the binomial distribution *is* p, because that's what it's defined as. You only get unsure if your sample size is anything below infinity.
Just because matrix operations are all over video processing doesn't mean we can't use Graphics Processing Units for machine learning. It's just branding at that rate.
There's promise here, but honestly I was very lost. Lots of confusing words. Like this seems interesting if I had a bachelor's or maybe was working on my masters in mathematics. Or maybe I'm just dumb, which I'm not a genius, lol. But I feel like things were difficult to relate to each other.
Don’t feel bad. I majored in math years ago and I didn’t follow everything. There are a lot of prerequisites to be able to follow everything.
Yes, this is indeed slightly more advanced than many of the other submissions