You just simply explained a complex topic that I spent 3 hours on reading a textbook into 10-minute video. Your ability to condense and concisely explain these topics in your videos have been phenomenal. Great job!
You have no idea how often your explanations blow my mind. It is an "aha" moment every single time, a concept clicks so well! Please keep up this amazing work.
Great intuition, great visualizations! Mathematicians will also say that you need to assume that X is integrable in order for Jensen's inequality to hold. Jensen also has far reaching consequences in theoretical probability, and even analysis in general. Can't wait for more!
This means a lot getting your comment here. Much appreciated! And yes! There are unfortunately rigor qualifications that I omit to keep the vid light. In the case of integrability, I hadn’t thought of that, so thanks for pointing it out :)
You sure have a gift for teaching ! Plus, what a slick production . It takes a lot of hard work and skill to make something look as simple and obvious as you do. Awesome.
I was curious about Jensen’s inequality, having seen it in the context of EM. You did a great job of providing even more context and explaining the intuition. The animation makes it so easy to understand why it is true. Simply outstanding. This is hands-down the best video I’ve seen on the subject. Thank you! Just subscribed.
Probably one of the most underrated inequalities... Shows up everywhere (mostly Machine-Learning these days, but I also encountered this in neutron transport and rendering of images)
Great explanation. Instead ofaverageg, its better to think of weighted average; this will easily convey the idea of the formal definition of convex function : )
Yes! The EM algorithm will be covered, but later this year. If you're curious immediately, I linked to some sources in the description where Jensen's Inequality is used. In Cover's book, there is a section "Jensen's Inequality and Its Consequences" which show how foundational it is for Information Theory.
This was awesome! Could you make a video about where this is applied? you talked about how it effects ML but could you show an example? Thank you! btw the algorithm showed me this video so hopefully ur on the rise! Honored to be this early
Yea I’ll do a video on the EM algorithm, where this shows up. Also, variational inference, eventually. And I’m happy to hear that! I hope you’re right but we’ll see.
Hey thanks and to answer your question, I stick together a bunch of graphs made in Altair using a personal library. Altair is very nice plotting library
There's no need to put yourself down in such a way. You can be thankful for the content this guy is putting out on YT by liking, subscribing and hitting the share buttom...stop dragging yourself over the coals.
most OP explanation of all time. My intuition so far had came from showing it w the definition of convexity, This was awesome, relating it to N sampling was the key. Learning Online Convex Opt, any tips :p prof is planning 50% avg midterm
Hm, sorry but that's unlikely. It's just too specific. The topics I've picked are already fairly niche. If I go into a very specific subtopic, it'll appeal to very few folks (unless there is something particularly fascinating about it)
Glad you’re here! His channel is a huge inspiration. And yep, concave functions you get the opposite. The neg of a convex I’d a concave function and that deserves the inequality.
This is relevant to extractor theory used in cryptography (usually as min-entropy, not the Shannon entropy you assume here) - how does your function change the entropy per bit of the input data?
My friend you are amazing like really i feel bad how much time i wast in trying underatand it when you explain it amaizing in 5 minute Can you also make video on vae and variational inference elbo and all that? Please it is really topic which hard for a lot of people and look like it exactly in your domain Please
I never understood why this inequality was true or what it really meant the first time I saw it, and my engineering prof said don't worry about the intuition just know how to use it (bleh), but UA-cam somehow knew to suggest this to me months later! Thanks for the clear explanation
I wish I had friends like you! What's on the bookshelf? =) The intuition follows immediately from grokking the idea of convex functions! Some of the coolest tricks in mathematics come from manipulation of inequalities. There's a brilliant little maths book named "The Cauchy-Schwarz Master Class" which I recommend for anyone wanting to master these dark arcana. Sometimes, like in this case, an animation is just unbeatable, however!
I don't like the chosen function for the visualization because it is everywhere increasing. From this video i m not convinced that this reasoning would be valid for say a parabola segment.
The visual explanation is excellent. However, it might seem that it does not hold if one picks a convex function that is below the straight line. For completeness, it is worth remarking that the equality holds for any straight line. Because of this, for all convex functions f(x) there is always a straight line ax+b for which f(x)>=ax+b for all x. (Namely, the line you pick has to be the tangent of the convex function at E[X]).
My question was not answered by the author, maybe you can help me. My doubt is precisely about the equality holding for any straight line... why it is true? I can understand it holds for linear functions of type f(x) = c.x, in the sense of addition and multiplication conservation -> hence f(E(x)) = E(f(x)). Functions of the type f(x) = ax+b are straight lines, but not linear funcions in these sense, so we cannot prove that f(E(x))=E(f(x)). Try to use f(x) ~ exp(k) and the transformation Y = ax + b, you'll see that f(E(x))=a/k+b and E(f(x))=exp(kb/a)*a/k, which do not hold for any straight line, but just for b=0.
I’m not sure what you’re saying. The equality is satisfied when f is an affine transformation. This is trivial because E is a linear operator: E(f(X)) = E(aX+b) = a E(X)+b=f(E(X)) where the first step applies the definition that f is affine (a straight line), the second, that E is linear, and the third using again the definition of f being affine. In your example you’re mixing up letting f be an exponential or letting it be an affine transformation, so you get something weird. You can’t apply both transformations at the same time, that is not something that Jensen inequality talks about in any way. If the function is convex, you get an inequality. If the function is strictly convex (unless X is constant a.s.) you get strict inequality. And in the opposite edge case that the function is affine (which is also convex), you get strict equality.
@@rocamonde Of course you are correct, in my example I made a mistake calculating E[f(X)], for that reason the equality does not hold. My approach was to see the theorem from the affine functions properties, but in fact is much more simple to see it from the expectation linearity. Thank you, your explanation was clear!
You just simply explained a complex topic that I spent 3 hours on reading a textbook into 10-minute video. Your ability to condense and concisely explain these topics in your videos have been phenomenal. Great job!
Thank you! Glad this is working for you
You have no idea how often your explanations blow my mind. It is an "aha" moment every single time, a concept clicks so well! Please keep up this amazing work.
THank you and I will! I got something big in the works :)
@@Mutual_Information Do you plan on doing one for EM derivation of GMMs?
@@karanshah1698 EM yes, GMMs, yes eventually. Using them together?? No I didn’t think of that, hm
Great intuition, great visualizations! Mathematicians will also say that you need to assume that X is integrable in order for Jensen's inequality to hold. Jensen also has far reaching consequences in theoretical probability, and even analysis in general. Can't wait for more!
This means a lot getting your comment here. Much appreciated!
And yes! There are unfortunately rigor qualifications that I omit to keep the vid light. In the case of integrability, I hadn’t thought of that, so thanks for pointing it out :)
You sure have a gift for teaching ! Plus, what a slick production . It takes a lot of hard work and skill to make something look as simple and obvious as you do. Awesome.
I appreciate that!
Intuition is indeed what helps at least me to understand (not just short-term memory) a concept, great work, thank you!
Human transforms short-term memory to long-term memory by understanding and prediction.
Congrats on the launch of your channel and first video, DJ! This was awesome!
Thanks Sumaya! More coming :)
Thanks for the intuition-nurturing graphics. Very helpful!
Glad it was helpful!
Doing information theory at uni right now. This video gave me an intuitive understanding for why the inequality holds. Cheers
I was curious about Jensen’s inequality, having seen it in the context of EM. You did a great job of providing even more context and explaining the intuition. The animation makes it so easy to understand why it is true. Simply outstanding. This is hands-down the best video I’ve seen on the subject. Thank you! Just subscribed.
Thanks - great to have you! This was my first vid. I've gotten a lot of useful feedback since, but glad this one still lands
Wow. The way you simplified the concept. I was amazed.😍
Great explanation, I am at the process of figuring out the cross-entropy and you video helped me with Jensen inequality concept! Keep it up!
Thank you! Happy to help
Huh.. it clicked in like 3 seconds after seeing the comparison with line. And why it's true is so obvious now. Amazing. Thanks.
Probably one of the most underrated inequalities... Shows up everywhere (mostly Machine-Learning these days, but I also encountered this in neutron transport and rendering of images)
Very interesting. Keep ‘em coming DJ!
First comment :) will do!
Great explanation. Instead ofaverageg, its better to think of weighted average; this will easily convey the idea of the formal definition of convex function : )
As a tired econometrics student with a dull lecturer, this helped a bunch, thanks
Great video, hopefully there is a followup videos on how Jensen's Inequality becomes the important part of EM, KLDiv and so on.
Yes! The EM algorithm will be covered, but later this year. If you're curious immediately, I linked to some sources in the description where Jensen's Inequality is used. In Cover's book, there is a section "Jensen's Inequality and Its Consequences" which show how foundational it is for Information Theory.
Feels great with some visual input. I just realised that I need to go through a few examples just to prove it to myself....
Nice animation making things a lot more intuitive, thanks.
Thank you very much for this wonderful presentation. A lot of effort must have gone into getting an animation that feels intuitive.
very clear explanation with high energy .. I like it
lol old video will lots of energy.. I've chilled out a bit since, but thank you
Amazing! Thank you very much!
Thank you very much!
Great Video ...Thanks for the efforts you put in these Videos ..🙂
Very clear explanation. Keep up the good work
Great visualization, thank you very much for your effort to break it down so well!!! :)
Thanks. Very interesting. I listend to a talk from Nassim Taleb where he talked about the Jensen’s Inequality.
Yea it shows up a lot. Glad you enjoyed it
Your channel looks great, thanks!
Nicely and Intuitively explained! Thanks
Very good intuitive explanation.
Very good explanation without keeping it complicated, thanks a lot/
Course man - glad you liked it
You are awesome. I have been binge watching your videos.🖖
Excellent, thank you!
You deserve more likes!
This was awesome! Could you make a video about where this is applied? you talked about how it effects ML but could you show an example? Thank you!
btw the algorithm showed me this video so hopefully ur on the rise! Honored to be this early
Yea I’ll do a video on the EM algorithm, where this shows up. Also, variational inference, eventually.
And I’m happy to hear that! I hope you’re right but we’ll see.
Great visualisation, really good job! Thank you very much!
This was an awesome explanation. Thank you! Out of curiosity, how did you make the animation? By the way, that was also really well made!
Hey thanks and to answer your question, I stick together a bunch of graphs made in Altair using a personal library. Altair is very nice plotting library
Looks like he is using manim made by 3blue1brown
nice quality and explanation, really helped me out
You basically took an esoteric formula and explained it in a stupid-people friendly way. Thank you
There's no need to put yourself down in such a way. You can be thankful for the content this guy is putting out on YT by liking, subscribing and hitting the share buttom...stop dragging yourself over the coals.
Thanks for such great visuals
most OP explanation of all time. My intuition so far had came from showing it w the definition of convexity, This was awesome, relating it to N sampling was the key. Learning Online Convex Opt, any tips :p prof is planning 50% avg midterm
Excelent explanation indeed!
Awesome video!
Thank you! More coming - one every 3 weeks.
Amazing video! Subscribed.
Nice visualization!
Awesome video! Really easy to follow
Great video! Thanks for sharing your knowledge!
I loved this concept ❤❤!!
Great explanation!
Then for concave function, I expect 'greater than equal to', instead if less than equal to.
Brilliant explanation!
Perfect presentation, thank you!!
great vid! but it would be even better if you can show the formal proof and connect it with the visualization u showed
incredibly clear and helpful! thx a lot
You are awesome...this explanation is so cool.
Glad you like it!
Thanks...you've explained it clearly
Liked and commented. Thank you and more please!
I got something good cookin :)
This is so good. Thankyou.😊
And thank you too Satish!
Can you upload some content about estimation maximization with mixed poisson?
Hm, sorry but that's unlikely. It's just too specific. The topics I've picked are already fairly niche. If I go into a very specific subtopic, it'll appeal to very few folks (unless there is something particularly fascinating about it)
3:19 in and I get it. Thanks!
Gem of a video...
Cool inequality! If we take f(x) = x² then this inequality tells us that for any real a, b: a² + 2ab + b²
great channel
Thank you for great explanation :)
GREAT VISUALISATION TY
Great vid bruv
Heh. 4:55 "what mathematicians will ask, and engineers probably won't, is 'why?'." That definitely matches my own experience.
Just found your channel through your comment on 3b1b's video, very nice explanation. Btw, is the opposite inequality true for concave functions?
Glad you’re here! His channel is a huge inspiration.
And yep, concave functions you get the opposite. The neg of a convex I’d a concave function and that deserves the inequality.
This is relevant to extractor theory used in cryptography (usually as min-entropy, not the Shannon entropy you assume here) - how does your function change the entropy per bit of the input data?
My friend you are amazing like really i feel bad how much time i wast in trying underatand it when you explain it amaizing in 5 minute
Can you also make video on vae and variational inference elbo and all that?
Please it is really topic which hard for a lot of people and look like it exactly in your domain
Please
Thanks! And I do have plans to make a video on variational inference but it may take awhile. There are a few videos in front of it. But it’s coming!
Awesome! So ... given a random variable X I can use Jensen's inequality to estimate the local curvature of a function 'f' ?
Love the videos!
Thanks!
Is it possible you could made a video on Variational Inference and the intuition on the loss function?
Would you explain the Variational Inference technique? I found it hard to learn it by myself😭
So what would happen if the space of points below(!) the function is convex? Will we get the different inequality?
Yep, then that would be a concave function and the inequality would be reversed. A pretty common example of that is the log(x) function.
I never understood why this inequality was true or what it really meant the first time I saw it, and my engineering prof said don't worry about the intuition just know how to use it (bleh), but UA-cam somehow knew to suggest this to me months later! Thanks for the clear explanation
Glad you enjoyed it!
Very Very nice!
If you have a concave function, does the inequality sign just get flipped?
Thank you very much for the content!
Yes it does!
Subscribed
Thank You boss
Thanks !
Thank you for this..
For sure - If you'd like, you can do me a solid and anyone into ML/stats about the channel :)
What software did you use for this cool moving graph thingy??
Thanks for the love! Answered you on Twitter :)
Can't thankyou enough!
Hey, nice video. I though we would see the
f((a+b)/2)
Glad you enjoyed it!
Holyshit, this is a great video!
Thank you - it means a lot !
I wish I had friends like you! What's on the bookshelf? =)
The intuition follows immediately from grokking the idea of convex functions!
Some of the coolest tricks in mathematics come from manipulation of inequalities. There's a brilliant little maths book named "The Cauchy-Schwarz Master Class" which I recommend for anyone wanting to master these dark arcana.
Sometimes, like in this case, an animation is just unbeatable, however!
Thank you! I’ve heard that book recommended a few times but have never checked it out. I’ll order it!
I don't like the chosen function for the visualization because it is everywhere increasing. From this video i m not convinced that this reasoning would be valid for say a parabola segment.
Information theory couldn't exist without Jensen's inequality? Why?
The visual explanation is excellent. However, it might seem that it does not hold if one picks a convex function that is below the straight line. For completeness, it is worth remarking that the equality holds for any straight line. Because of this, for all convex functions f(x) there is always a straight line ax+b for which f(x)>=ax+b for all x. (Namely, the line you pick has to be the tangent of the convex function at E[X]).
My question was not answered by the author, maybe you can help me. My doubt is precisely about the equality holding for any straight line... why it is true? I can understand it holds for linear functions of type f(x) = c.x, in the sense of addition and multiplication conservation -> hence f(E(x)) = E(f(x)). Functions of the type f(x) = ax+b are straight lines, but not linear funcions in these sense, so we cannot prove that f(E(x))=E(f(x)).
Try to use f(x) ~ exp(k) and the transformation Y = ax + b, you'll see that f(E(x))=a/k+b and E(f(x))=exp(kb/a)*a/k, which do not hold for any straight line, but just for b=0.
I’m not sure what you’re saying. The equality is satisfied when f is an affine transformation. This is trivial because E is a linear operator:
E(f(X)) = E(aX+b) = a E(X)+b=f(E(X))
where the first step applies the definition that f is affine (a straight line), the second, that E is linear, and the third using again the definition of f being affine.
In your example you’re mixing up letting f be an exponential or letting it be an affine transformation, so you get something weird. You can’t apply both transformations at the same time, that is not something that Jensen inequality talks about in any way.
If the function is convex, you get an inequality. If the function is strictly convex (unless X is constant a.s.) you get strict inequality. And in the opposite edge case that the function is affine (which is also convex), you get strict equality.
@@rocamonde Of course you are correct, in my example I made a mistake calculating E[f(X)], for that reason the equality does not hold. My approach was to see the theorem from the affine functions properties, but in fact is much more simple to see it from the expectation linearity. Thank you, your explanation was clear!
Outstanding explanation!
Thanks!
Thank you! 3rd donation ever :)
Amazing explanation!