The Principle of Maximum Entropy

Поділитися
Вставка
  • Опубліковано 2 лип 2024
  • The machine learning consultancy: truetheta.io
    Join my email list to get educational and useful articles (and nothing else!): mailchi.mp/truetheta/true-the...
    Want to work together? See here: truetheta.io/about/#want-to-w...
    What's the safest distribution to pick in the absence of information? What about in the case where you have some, though only partial, information? The Principle of Maximum Entropy answers these questions well and as a result, is a frequent guiding rule for selecting distributions in the wild.
    SOCIAL MEDIA
    LinkedIn : / dj-rich-90b91753
    Twitter : / duanejrich
    Enjoy learning this way? Want me to make more videos? Consider supporting me on Patreon: / mutualinformation
    Sources
    Chapters 11-12 of [2] were primary sources - this is where I ironed out most of my intuition on this subject. Chapter 12 of [1] was helpful for understanding the relationship between the maximum entropy criteria and the form of the distribution that meets it. [3] was useful for a high level perspective and [4] was helpful for determining the list of maximum entropy distribution.
    Also, thank you to Dr. Hanspeter Schmid of the University of Applied Sciences and Arts, Northwestern Switzerland. He helped me interpret some of the more technical details of [2] and prevented me from attaching an incorrect intuition to the continuous case - much appreciated!
    [1] T. M. Cover and J. A. Thomas. Elements of Information Theory. 2nd edition. John Wiley, 2006.
    [2] E. T. Jaynes. Probability theory: the logic of science. Cambridge university press, 2003.
    [3] Principle of Maximum Entropy, Wikipedia, en.wikipedia.org/wiki/Princip...
    [4] Maximum Entropy Distribution, Wikipedia, en.wikipedia.org/wiki/Maximum...
    Timestamps :
    0:00 Intro
    00:41 Guessing a Distribution and Maximum Entropy
    04:16 Adding Information
    06:40 An Example
    08:00 The Continuous Case
    10:26 The Shaky Continuous Foundation

КОМЕНТАРІ • 109

  • @pedrobianchi1929
    @pedrobianchi1929 3 роки тому +39

    These principles explained here appear everywhere: thermodynamics, machine learning, information theory. Very fundamental

  • @equanimity26
    @equanimity26 Рік тому +4

    An amazing video. Proving once again why the internet is a blessing to humanity.

  • @kaishang6406
    @kaishang6406 11 місяців тому +4

    Just in the recent 3b1b video about normal distribution, there is a mention about normal distribution maximizes entropy. Then immediately I saw it here on your video displaying the normal distribution as the one that maximizes the entropy while constraining average and variance which are the only two parameters of the normal distribution. That is very nice.

  • @arongil
    @arongil 11 місяців тому +6

    I can't get over how much fun you make learning about stats, ML, and information theory---not to mention that you teach it with skill like Feynman's and a style that is all your own.

    • @Mutual_Information
      @Mutual_Information  11 місяців тому

      That's quite a compliment - Feynman is a total inspiration for many, myself included. His energy about the topics makes you *want* to learn about them.

  • @ilyboc
    @ilyboc 2 роки тому +10

    It blew my mind that those famous distributions come naturally as the ones that give maximum entropy when we set the domain and constraints in a general way. Now I kind of know why they are special.

    • @MP-if2kf
      @MP-if2kf 2 роки тому +1

      Definitely very cool. In many cases there are also other fascinating characterizations. For example: assume continuous distribution on positive support with the no-memory-property-->solve differential equation-->find that it MUST be the exponential

  • @NowInAus
    @NowInAus Рік тому +2

    Really stimulating. Your last example looked to be heading towards information in the variables. Got me hooked

  • @alec-lewiswang5213
    @alec-lewiswang5213 Рік тому +3

    I found this video very helpful! Thanks for making it! The animated visuals especially are great :)

  • @dobb2106
    @dobb2106 3 роки тому +4

    I’m glad I clicked on your comment, this channel is very well presented and I look forward to your future content.

  • @maniam5460
    @maniam5460 3 роки тому +15

    You know that feeling when you find a criminally overlooked channel and you’re about to get in on the ground level of something that’s gonna blow up? This is you now

    • @Mutual_Information
      @Mutual_Information  3 роки тому +3

      That is quite nice of you - thank you! I hope you're right, but for now, I'm working on my patience. It can take quite a while to get noticed on UA-cam. I'm trying to keep my expectations realistic.

  • @HarrysKavan
    @HarrysKavan 2 роки тому +2

    I didn't expect much and was disappointed. What a great Video. I wish you lots more followers!

  • @nandanshettigar873
    @nandanshettigar873 Рік тому +4

    Great video, love the level of complexity and fundamentals. I feel this just gave me some fresh inpso on my research

  • @outtaspacetime
    @outtaspacetime 2 роки тому +2

    This one saved my life!

  • @garvitprashar3671
    @garvitprashar3671 3 роки тому +2

    I have a feeling you will become famous someday because and video quality is really good...

  • @mattiascardecchia799
    @mattiascardecchia799 2 роки тому +2

    Brilliant explanation!

  • @murilopalomosebilla2999
    @murilopalomosebilla2999 2 роки тому +2

    The quality of your content is amazing!

  • @ckq
    @ckq Рік тому +2

    This channel is so amazing. I had a fuzzy understanding of a lot of these concepts, but this clarifies it.
    For example, my intuition suggested that for a given mean and variance, the maximum entropy estimate would be a beta-binomial distribution, but I wasn't really able to prove it to myself. 7:00

  • @antoinestevan5310
    @antoinestevan5310 3 роки тому +1

    I do not think I would produce any interesting analysis today. I simply... appreciated it a lot! :-)

  • @NoNTr1v1aL
    @NoNTr1v1aL 2 роки тому +2

    Amazing video!

  • @sivanschwartz3813
    @sivanschwartz3813 3 роки тому +1

    Thank a lot for this great and informative video!!
    one of the best explanations I have come across

  • @tylernardone3788
    @tylernardone3788 2 роки тому +2

    Great video! Great channel! Im working my way throught that Jaynes book [2] and absolutely love it.

    • @Mutual_Information
      @Mutual_Information  2 роки тому +2

      That is a heroic move! He has some wild insights on probability theory. Guy was a complete beast.

  • @arnold-pdev
    @arnold-pdev Рік тому +2

    Great video!

  • @nerdsofgotham
    @nerdsofgotham 3 роки тому +2

    Been 20 years since I last did information theory. This seems closely related to the asymptotic equipartition principle. Excellent video.

    • @Mutual_Information
      @Mutual_Information  3 роки тому +1

      Oh I’m sure they’re related in some mysterious and deep way I don’t yet understand, just because that’s a big topic in source [1] for this topic :)

  • @sirelegant2002
    @sirelegant2002 5 місяців тому +1

    Incredible lecture, thank you so much

  • @kenzilamberto1981
    @kenzilamberto1981 3 роки тому +1

    your video is easy to understand, I like it

  • @YiqianWu-dh8nr
    @YiqianWu-dh8nr Місяць тому

    大概捋了一下整体思路,用很多浅显易懂的描述代替了很多复杂的数学公式,让我至少明白了他的原理。感谢!

  • @mCoding
    @mCoding 3 роки тому +6

    Another fantastic video! I would love to improve my knowledge about the Jeffreys prior for a parameter space.

    • @Mutual_Information
      @Mutual_Information  3 роки тому +1

      Thank you! Always means a lot. And yea, now that I've covered the Fisher Information, I can hit that one soon. Appreciate the suggestion - it's on the list!

    • @derickd6150
      @derickd6150 Рік тому

      @@Mutual_Information I do believe (and I hope this is the case) that we are going through a boom in science channels right now. It seems that the youtube algorithm is identifying that sub-audience that loves this content and recommending these types of channels to them sooner and sooner. So I really hope it happens to you very soon!

  • @albertoderfisch1580
    @albertoderfisch1580 2 роки тому +10

    woah this is such a good explanation. I just randomly discovered this channel but I'm sure it's bound to blow up. Just a bit of critique: Idk if this is only meant for college students but if you want to get a slightly broader audience you could focus a bit more on giving intuition for the concepts.

    • @Mutual_Information
      @Mutual_Information  2 роки тому +11

      Thank you very much! Yea the level of technical background I expect of the audience is an important question. I’m partial to keeping it technical. I think it’s OK not to appeal to everyone. My audience will just be technical and small :)

    • @zorro77777
      @zorro77777 2 роки тому

      @@Mutual_Information ++ with Alberto le Fisch : "Prof. Feynman... "If you cannot explain something in simple terms, you don't understand it." And I am sure you understand so please explain us! :)

    • @diegofcm6201
      @diegofcm6201 Рік тому

      His channel already puts lots of efforts into doing exactly that, the final bit of the video explaining about the FUNDAMENTAL difference from discrete 2 continuous breakage of "label invariance" just blowed my mind, seriously some of the best intuition I've ever received about something.

    • @mino99m14
      @mino99m14 Рік тому +1

      @@zorro77777 well, he also said something like "If I could summarise my work in a sentence it wouldn't be worth a Nobel price". Which means that although something can be simplified, this doesn't mean it won't take a long time to explain. It's not that easy and he is not your employee you know. Also in that quote he meant to be able to explain to physics undergrads, which you would expect them to have some knowledge already.

  • @TuemmlerTanne11
    @TuemmlerTanne11 3 роки тому +3

    Btw i dont know if Mutual Information is a good channel name. The term is pretty stacked and I can't just say "do you know Mutual Information" like I can say "do you know 3blue1brown"... It also makes it harder to find your channel, because if someone looks up mutual information on youtube you wont show up at the top. Maybe thats your strategy though, to have people find your channel when they search for Mutual Information on youtube ;) Anyways, im sure you have thought about this, but thats my take.

    • @Mutual_Information
      @Mutual_Information  3 роки тому +1

      I fear you may be correct. I've heard a few people say they tried to find my channel but couldn't when they searched. But part of me thinks I've gone too far. There's actually quite a bit of work I'd have to do to make a title change, and if the cost is my channel is a little bit more hidden, I think that's OK.
      Weirdly, I'm kinda enjoying the small channel stage (being a bit presumption that I'll eventually not be in this stage :) ). It's less pressure, gives me time to really nail the feedback and it's easier to have 2-way communication with the viewers. Don't get me wrong, I'd like to grow the channel, but I'm OK with leaving some growth hacks on the table.
      That said, I'm not totally set on "Mutual Information." I'd like to feel it out a bit more. As always, appreciate the feedback!

  • @praveenfuntoo
    @praveenfuntoo 2 роки тому +1

    I able to apply this equation in my work , thanks to make it plausible .

  • @kylebowles9820
    @kylebowles9820 3 місяці тому

    I think it would be up to the parametrization to care about area or side length depending on the problem case in that example. I'd like my tools to do their own distilled thing in small, predictable, usable pieces.

  • @Septumsempra8818
    @Septumsempra8818 Рік тому +2

    WOW!

  • @ckq
    @ckq Рік тому

    12:45, I think taking the log would be useful in the squares scenario since then the squaring would become a linear transformation rather than non-linear

  • @NoNTr1v1aL
    @NoNTr1v1aL 2 роки тому +3

    When will you make a video on Mutual Information to honor your channel's name?

    • @Mutual_Information
      @Mutual_Information  2 роки тому +2

      Haha it's coming! But I got a few things in queue ahead of it :)

  • @dermitdembrot3091
    @dermitdembrot3091 2 роки тому +2

    Your videos are great! I am curious about the connection between maximum entropy and Bayesian Inference. They seem related. Lets think about Bayesian inference in a variational way where you minimize a KL divergence between approximate and true posterior KL(q(z)||p(z|x)) where z is e.g. the vector of all our unknown digits and x the digit mean. Maximizing this KL divergence is equivalent to maximizing the sum of
    (1) H(q(z)), an entropy maximization objective
    (2) CE(q(z),p(z)), a cross-entropy term with the prior distribution p(z). This term is constant for a uniform prior
    (3) E_q(z) log p(x|z), a likelihood term that produces constraints, in our digits case p(x|z) is deterministically 1 if the condition x=mean(z) is fulfilled and 0 otherwise. All z with log p(x|z) = log 0 = -infty must be given a probability of 0 by q(z) to avoid the KL objective reaching negative infinity. On the other hand, once this constraint is fulfilled, all remaining choices of q attain E_q(z) log p(x|z) = E_q(z) log 1 = 0, therefore the entropy terms gets to decide among them.

    • @dermitdembrot3091
      @dermitdembrot3091 2 роки тому +2

      further, if we choose the digits to be i.i.d. ~ q(z_1) (z_1 being the first digit), as
      the number of digits N goes to infinity, the empirical mean, mean(z), will converge almost surely to the mean of q(z_1), so in the limit, we can put the constraint on the mean of q(z_1) instead of the empirical mean, as done by the maximum entropy principle. Digits being i.i.d. should be an unproblematic restriction due to symmetry (and due to entropy maximization).

    • @Mutual_Information
      @Mutual_Information  2 роки тому +3

      Wow yes you dived right into a big topic. Variational inference is a big way we get around some of the intractability naive bayesian stats can yield. You seem to know that well - thanks for all the details

  • @regrefree
    @regrefree 3 роки тому +1

    Very informative. I had to stop and go back many times because you are speaking and explaining things very fast :-p

    • @Mutual_Information
      @Mutual_Information  3 роки тому +2

      I’ve gotten this feedback a few times now. I’ll be working on it for the next vids, though i still talk fast on the vids I’ve already shot.

  • @sukursukur3617
    @sukursukur3617 2 роки тому +2

    Imagine you have a raw set. You want to build a histogram. You dont know bin range, bin start and end locations and number of bins. Can ideal histogram be built by using max entropy law?

    • @Mutual_Information
      @Mutual_Information  2 роки тому +1

      I've heard about this and I've actually seen it used as an effective feature-engineering preprocessing step in a serious production model. Unfortunately, I looked and couldn't find the exact method and I forget the details. But there seems to be a good amount of material on the internet for "entropy based discretization." I'd give those a look

  • @janerikbellingrath820
    @janerikbellingrath820 Рік тому +3

    nice

  • @alixpetit2285
    @alixpetit2285 2 роки тому +2

    Nice video, do you think that set shaping theory can change the approach to information theory?

    • @Mutual_Information
      @Mutual_Information  2 роки тому +1

      I don't know anything about set shaping theory, so.. maybe! Whatever it is, I think it could only *extend* information theory. I believe the core of information theory is very much settled.

    • @informationtheoryvideodata2126
      @informationtheoryvideodata2126 2 роки тому +1

      Set shaping theory is a new theory, but the results are incredible, it can really change information theory.

  • @kabolat
    @kabolat Рік тому +2

    Great video! Thanks a lot. A little feedback: The example you give in 12:00-13:00 is a bit hard to follow without visualization. The blackboard and simulations you use are very helpful in general. It would be great if you do not leave that section still and only talk. Even some bullet points would be nice.

    • @Mutual_Information
      @Mutual_Information  Рік тому +1

      Thanks - useful, specific feedback is in short supply, so this is very much appreciated. I count yours as a "keep things motivated and visual"-type of feedback, which is something I'm actively working on (but not always great about). Anyway, it's a work in progress and hopefully you'll see the differences in upcoming videos. Thanks again!

  • @MP-if2kf
    @MP-if2kf 2 роки тому +1

    One thing is bothering me... The justification of using entropy seems circular. In the first case, where no information is added, we are implicitly assuming that the distribution of the digits is discrete uniform. Because we are choosing the distribution based on the number of possible sequences corresponding a distribution. This is only valid if any sequence is just as likely. But this is only true if we assume the distribution is uniform.
    Things are a bit more interesting when we add the moment conditions. I guess what we are doing, is conditioning on distributions satisfying the moment conditions, and choosing among these the distribution with the most possible sequences. We seem to be using a uniform prior (distribution for the data), in essence. My question is: why would this be a good idea? What actually is the justification of using entropy? Which right now in my mind is: why should we be using the prior assumption that the distribution is uniform when we want to choose a 'most likely' distribution?
    Don't feel obliged to respond to my rambling. Just wanted to write it down. Thank you for your video!

    • @Mutual_Information
      @Mutual_Information  2 роки тому +1

      lol doesn't sound like rambling to me.
      I see you're point about it being circular. But I don't think that's the case in fact. Let's say it wasn't uniformly distributed.. Maybe odd numbers are more likely. Now make a table of all sequences and their respective probabilities. Still, you'll find that sequences with uniform counts have a relative advantage.. it may not be as strong due to whatever the actual distribution is.. but the effect of "there are more sequences with nearly even counts" is always there.. even if the distribution of each digit isn't uniform. It's that effect we learn on.. and in the absence of assuming anything about the digit distribution.. that leads you to the uniform distribution. In other words, the uniform distribution is a consequence, not an assumption.

    • @MP-if2kf
      @MP-if2kf 2 роки тому

      @@Mutual_Information I have to think about it a bit more. In any case, thank you for your careful reply! Really appreciate it.

  • @boar6615
    @boar6615 8 місяців тому +1

    Thank you so much! the graphs were especially helpful, and the concise language helped me finally understand this concept better

  • @kristoferkrus
    @kristoferkrus 7 місяців тому

    Hm, I tried to use this method to find the maximum entropy distribution when you know all first three moments of the distribution, that is, both the mean, the variance and the skewness, but I end up with an expression that either leads to a distribution completely without skewness or one with a PDF that goes to infinity, either as x approaches infinity or as x approaches minus infinity (I have an x^3 term in the exponent), and which therefore can't be normalized. Is that a case which this method doesn't work for? Is there some other way to find the maximum entropy distribution when you know all first three moments in that case?

    • @kristoferkrus
      @kristoferkrus 7 місяців тому

      Okay, I think I found the answer to my question. According to Wikipedia, this method works for the continuous case if the support is a closed subset S of the real numbers (which I guess means that S has a minimum and a maximum value?), and it doesn't mention the case where S = R. But presume that S is the interval [-a, +a], where a is very large, then this method works. And I realized that the solution you get when you use this method is a distribution that is very similar to a normal distribution, except for a tiny increase in density just by one the of the two endpoints to make the distribution skewed, which is not really the type of distribution I imagined.
      I believe the reason this doesn't work if S = R is because there is no maximum entropy distribution that satisfies those constraints, in the sense that if you have a distribution that does satisfy those constraints, you can always find another distribution that also satisfies the constraints, but with a lower distribution. Similarly, if you let S = [-a, a] again, you can use this method to find a solution, but if you let a → ∞, the limit of the solution you will get by using this method is a normal distribution.
      But as you let a → ∞, the kurtosis of the solution will also approach infinity, which may be undesired. So if you want to prevent that, you may also constrain the kurtosis, maybe by putting an upper limit to it or by choosing it to take on a specific value. When you do this, all of a sudden the method works again for S = R.

  • @johnbicknell8424
    @johnbicknell8424 2 роки тому +1

    Great video. Are you aware of a way to represent the entropy as a single number, not a distribution? Thanks!

    • @Mutual_Information
      @Mutual_Information  2 роки тому +2

      Thanks! And to answer your question, the entropy *is* a single number which measures a distribution.

  • @SuperGanga2010
    @SuperGanga2010 Рік тому +1

    Is the shaky continuous foundation related to the Bertrand paradox?

    • @Mutual_Information
      @Mutual_Information  Рік тому

      I am not aware of that connection. When researching it, I just discovered that these ideas we're intended for the continuous domain. People extended it into the continuous domain, but then certain properties were lost.

  • @abdjahdoiahdoai
    @abdjahdoiahdoai 2 роки тому +2

    do you plan to make a video on expectation maximization? loll funny you put a information theory textbook on the desk for this video

    • @Mutual_Information
      @Mutual_Information  2 роки тому +2

      Glad you noticed :)
      Yes EM is on the list! I have a few things in front of it but it's definitely coming.

    • @abdjahdoiahdoai
      @abdjahdoiahdoai 2 роки тому +1

      @@Mutual_Information nice

  • @Gggggggggg1545.7
    @Gggggggggg1545.7 3 роки тому +1

    Another great video. My only comment would be slow down slightly to give more time to digest the words and graphics.

    • @Mutual_Information
      @Mutual_Information  3 роки тому

      Thank you and I appreciate the feedback. I’ve already shot 2 more vids so I won’t be rolling into those, but I will for the one I’m writing right now. Also working on avoiding the uninteresting details that don’t add to the big picture.

  • @MP-if2kf
    @MP-if2kf 2 роки тому +2

    Cool video! You lost me at the lambda's though... They are chosen to meet the equations... what do they solve exactly?

    • @MP-if2kf
      @MP-if2kf 2 роки тому

      Are they the Lagrange multipliers?

    • @MP-if2kf
      @MP-if2kf 2 роки тому

      I guess I get it, the lambda is just chosen to get the maximal entropy distribution given the moment condition...

    • @MP-if2kf
      @MP-if2kf 2 роки тому

      Amazing video, I will have to revisit it some times though

    • @MP-if2kf
      @MP-if2kf 2 роки тому

      only didnt understand the invariance bit...

    • @Mutual_Information
      @Mutual_Information  2 роки тому +2

      The invariance bit is something that I really didn't explore well. It's something I only realized while I was researching the video.
      The way I would think about it is.. the motivating argument for max entropy doesn't apply over the continuous domain b/c you can't enumerate "all possibly sequences of random samples".. so if you use the max entropy approach in the continuous domain anyway.. you are doing something which imports a hidden assumption you don't realize. Something like.. minimize the KL-divergence from some reference distribution.. idk.. something weird.
      As you can tell, I think it's OK to not understand the invariance bit :)

  • @desir.ivanova2625
    @desir.ivanova2625 2 роки тому +2

    Nice video!
    I think there's an error in your list at 10:42 - the Cauchy distribution is not exponential family.

    • @Mutual_Information
      @Mutual_Information  2 роки тому +4

      Thank you! Looking into it, I don't believe it's an error. I'm not claiming here that these are within the exponential family. I'm saying these are max entropy distribution under certain constraints, which is a different set. You can see the cauchy distribution listed here : en.wikipedia.org/wiki/Maximum_entropy_probability_distribution
      But thank you for keeping an eye out for errors. They are inevitable, but extra eyes are my best chance at a good defense against them.

    • @desir.ivanova2625
      @desir.ivanova2625 2 роки тому +3

      @@Mutual_Information Thanks for your quick reply! And thanks for the link - I can see that indeed there's a constraint (albeit a very strange one) for which Cauchy is the max entropy distribution.
      I guess then, I was confused by the examples in the table + those that you then list -- all distributions were exponential family and Cauchy was the odd one out. Also, please correct me if I'm wrong, but I think if you do moment matching for the mean (i.e. you look at all possible distributions that realise a mean parameter \mu), then the max entropy distribution is an exponential family one. And the table was doing exactly that. Now, we can't do moment matching for the Cauchy distribution as none of its moments are defined. So that was the second reason for my confusion.

    • @Mutual_Information
      @Mutual_Information  2 роки тому +2

      Thanks makes a lot of sense. To be honest, I don’t understand the max entropy exponential family connection all that well. There seems to be these bizarre distributes that are max entropy but aren’t exponential fam. I’m not sure why they’re there, so I join you in your confusion!

  • @mino99m14
    @mino99m14 Рік тому +1

    Great video!
    Maybe it’s just me but the explanation of this equation is a bit misleading 3:10. Specifically the part where you say to transform the counts into probabilities. For a moment I thought you meant that
    nd/N is the probability of having a string with nd copies of d and I was very confused. What is actually saying is that if we have N numbers of 1 digit in which there are n0 copies of 0, n1 copies of 1, and this for all digits (this means that n0+n1+…+n9 = N.) The probability of getting the digit d is nd/N. I got confused because the main problem was about strings of size N and these probabilities just consider a single string N with nd copies of each digit d.

    • @Mutual_Information
      @Mutual_Information  Рік тому +1

      Yes, there's a change of perspective on the problem. I tried to communicate that with the table, but I see how it's still confusing. You seem to have gotten through it with just a good think on the matter

    • @mino99m14
      @mino99m14 Рік тому +1

      @@Mutual_Information it's alright. Having the derivation of the expression helped me a lot. I appreciate you take part of your time to add details like these in your videos 🙂...

  • @manueltiburtini6528
    @manueltiburtini6528 Рік тому +1

    Then, Why is the logistic regression also called Maximum Entropy? Am I wrong?

    • @Mutual_Information
      @Mutual_Information  Рік тому +1

      You're not wrong. It's the same reason. If you optimize NLL and you leave the function which maps from W'x (coefficients-x-features) open, and then maximize entropy.. then the function you'd get is the softmax! So logistic regression comes from max-ing entropy.

  • @TuemmlerTanne11
    @TuemmlerTanne11 3 роки тому +2

    Honestly your videos get me excited for a topic like nothing else. Reminder to myself not to watch your videos if I need to do anything else that day... Jokes aside, awesome video again!

    • @Mutual_Information
      @Mutual_Information  3 роки тому +2

      Thank you very much! I'm glad you like it and I'm happy to hear there are others like you who get excited about these topics like I do. I'll keep the content coming!

  • @kristoferkrus
    @kristoferkrus Рік тому

    What do you mean that the entropy only depends on the variable's probabilities and not its values? You also said that the variance does depend on its values, but I don't see why the variance would while the entropy would not. You say that you can define the entropy as a measure of a bar graph, but so can the variance.

    • @Mutual_Information
      @Mutual_Information  Рік тому +2

      entropy = - sum p(x) log p(x).. notice only p(x) appears in the equation - you never see just "x" in that expression. For (discrete) variance.. it's sum of p(x)(x-E[x])^2.. notice x does appear on it's own.
      When I say the bar graph, I'm only referring to the vertical heights of the bars (which are the p(x)'s).. you can use just those set of numbers to compute the entropy. For the variance, you'd need to know something in addition to those probabilities (the values those probabilities correspond to).

    • @kristoferkrus
      @kristoferkrus Рік тому +1

      @@Mutual_Information Ah, I see! I don't know what I was thinking. For some reason, I thought probability when you said value. It makes total sense now. Great video by the way! Really insightful!

  • @SystemScientist
    @SystemScientist 9 місяців тому +1

    Supercool

  • @zeio-nara
    @zeio-nara 2 роки тому +1

    It's too hard, too many equations, I didn't understand anything. Can you explain it in simple terms?

    • @Mutual_Information
      @Mutual_Information  2 роки тому +1

      I appreciate the honesty! I'd say.. go through the video slowly. The moment you find something.. something specific!.. ask it here and I'll answer :)

  • @whozz
    @whozz Рік тому

    6:13 In this case, the Gods have nothing to do with 'e' showing up there haha Actually, we could reformulate this result to any other proper basis b and the lambdas would just get shrank by the factor ln(b).

  • @piero8284
    @piero8284 9 місяців тому

    The math gods work in mysterious ways 🤣

  • @bscutajar
    @bscutajar 11 місяців тому +1

    Great video, but one pet peeve is that I found your repetitive hand gestures somewhat distracting.

    • @Mutual_Information
      @Mutual_Information  11 місяців тому

      Yea they're terrible. I took some shit advice of "learn to talk with your hands" and it produced some cringe. It makes me want to reshoot everything, but it's hard to justify how long that would take. So, here we are.

    • @bscutajar
      @bscutajar 11 місяців тому

      @@Mutual_Information 😂😂 don't worry about it man, the videos are great. I think there's no reason for any hand gestures since the visuals are focused on the animations.

    • @bscutajar
      @bscutajar 11 місяців тому

      @@Mutual_Information Just watched 'How to Learn Probability Distributions' and in that video I didn't find the hand gestures distracting at all since they were mostly related with the ideas you were conveying. The issue in this video is that they were a bit mechanical and repetitive. This is a minor detail though I love your videos so far!

  • @ebrahimfeghhi1777
    @ebrahimfeghhi1777 2 роки тому +1

    Great video!