Mutual Information, Clearly Explained!!!

StatQuest with Josh Starmer

Додати в
- Мій плейлист
- Переглянути пізніше
Поділитися

Поділитися

Вставка

Розмір відео:

Показувати елементи керування програвачем

Автоматичне відтворення

Автоповтор

Опубліковано 25 січ 2025

КОМЕНТАРІ • 192

@statquest 2 роки тому ⁺⁵
To learn more about one common way to create histograms of continuous variables, see: journals.plos.org/plosone/article?id=10.1371/journal.pone.0087357
To learn more about....
R-squared = ua-cam.com/video/2AQKmw14mHM/v-deo.html
Entropy = ua-cam.com/video/YtebGVx-Fxw/v-deo.html
To learn more about Lightning: lightning.ai/
Support StatQuest by buying my books The StatQuest Illustrated Guide to Machine Learning, The StatQuest Illustrated Guide to Neural Networks and AI, or a Study Guide or Merch!!! statquest.org/statquest-store/
@SelinDrawz Рік тому ⁺⁵⁷
Thank u daddy stat quest for carrying me through my university course
@statquest Рік тому ⁺⁵
Ha! :)
@faizalrafi Рік тому ⁺²⁵
I am binge-watching this series. Very clear and concise explanations for every topics given in the most interesting way!
@statquest Рік тому ⁺¹
Glad you like them!
@PunmasterSTP 10 місяців тому
Same here!
@mohammadeslami7462 6 місяців тому ⁺⁸
Superb!!! I recommend this channel to everyone.
@statquest 6 місяців тому
Thanks!
@filandavid3747 4 місяці тому ⁺¹
Mesmerizing! U are a beacon of hope for us struggling engineers here in China xxx
@statquest 3 місяці тому
Thanks!
@Geneu97 11 місяців тому ⁺³
Thank you for being a content creator
@statquest 11 місяців тому ⁺¹
Thanks!
@PunmasterSTP 10 місяців тому ⁺¹
Not just a creator of any content either. A creator of *exceptional* content!
@PunmasterSTP 10 місяців тому ⁺²
Mutual information, clearly explained? More like "Magnificent demonstration, you deserve more fame!" 👍
@statquest 10 місяців тому ⁺¹
Thanks! 😃
@Fan-vk9gx Рік тому ⁺⁸
Super! I have been struggled between copula, mutual information, etc. for a while, that is exactly what I am looking for! Thank you, Josh! This video is really helpful!
@statquest Рік тому ⁺¹
Glad it was helpful!
@raizen74 Рік тому ⁺²
Superb explanation! Your channel is great!
@statquest Рік тому
Glad you think so!
@isaacfernandez2243 Рік тому ⁺⁴
Dude, you don't even know me, and I don't really know you either, but oh boyy, I fucking love you. Thank you. One day I will teach people just like you do.
@statquest Рік тому
Thanks! :)
@ian-haggerty 9 місяців тому ⁺³
Entropy === The expectation of the surprise!!! I'll never look at this concept the same again
@statquest 9 місяців тому ⁺¹
bam! :)
@kenmayer9334 2 роки тому ⁺⁴
Awesome stuff, Josh. Thank you!
@statquest Рік тому
My pleasure!
@GGWPTrader Рік тому ⁺¹
OMG i never see this channel, how many hours would be saveeddd.. new subs here, thanks alottt for ur vids
@statquest Рік тому
Welcome!
@dragoncurveenthusiast Рік тому ⁺²
Your explanations are awesome!
@statquest Рік тому ⁺¹
Glad you like them!
@こよい-e7n 8 місяців тому ⁺¹
I love this video. Simple and clear.
@statquest 8 місяців тому
Thanks!
@espedaire 4 місяці тому ⁺¹
It would be awesome if there were links to "if you are not familiar with XYZ, check out the quest", for noobs trying figure out what we don't know. Keep up the great work!
@statquest 4 місяці тому ⁺¹
Those links are at the bottom of the description, but I'll also add them here:
R-squared = ua-cam.com/video/2AQKmw14mHM/v-deo.html
Entropy = ua-cam.com/video/YtebGVx-Fxw/v-deo.html
@espedaire 4 місяці тому ⁺¹
❤ There's a lot of junk appended by yt to the description, I had to look hard to find it just now
@smilefaxxe2557 9 місяців тому ⁺¹
Great explanation, thank you! ❤🔥
@statquest 9 місяців тому
Glad it was helpful!
@adityaagrawal2397 Рік тому ⁺¹
Just started Learning ML, am assured now that the journey would be smooth with this channel
@statquest Рік тому
Good luck! :)
@MegaNightdude Рік тому ⁺³
Great stuff. As always.
@statquest Рік тому
Thank you very much! :)
@stepavancouver Рік тому ⁺¹
An interesting explanation and nice sence of humor 👍
@statquest Рік тому
Thank you!
@VaibhaviDeo Рік тому ⁺²
you are the best god sent really stay blessed
@statquest Рік тому
Thank you!
@zachchairez4568 Рік тому ⁺²
Great job! Love it!
@zachchairez4568 Рік тому ⁺¹
Liking my own comment to double like your video :)
@statquest Рік тому
Double bam! :)
@erkanbey4504 28 днів тому ⁺¹
that was quite useful brother thanks
@statquest 28 днів тому
Thanks!
@samjudelson 5 місяців тому ⁺⁴
Someone hit me on the head with a club, and now I'm good at stats. That's what they call... bam bam.
@statquest 5 місяців тому
ha! :)
@ian-haggerty 9 місяців тому
Seriously though, I think the KL divergence is worth a mention here.
Mutual information appears to be the KL divergence between the actual (empirically derived) joint probability mass function, and the (empirically derived) probability mass function assuming independence.
I know that's a lot of words, but my brain can't help seeing these relationships.
@statquest 9 місяців тому
One day I hope to do a video on the KL divergence.
@harishankarkarthik3570 8 місяців тому
The calculation at 8:27 seems incorrect. I plugged it into a calculator and got 0.32. The log is base 2 right?
@statquest 8 місяців тому
At 8:07 I say that we are using log base 'e'.
@bernardtiongingsheng85 Рік тому ⁺¹
Thank you so mcuh! It is really helpful. I really hope you can explain KL divergence in the next video.
@statquest Рік тому
I'll keep that in mind.
@Maciek17PL 2 роки тому ⁺¹
Amazing as always!!!
@statquest Рік тому ⁺¹
Thank you!
@sasha297603ha 10 місяців тому ⁺¹
Love it, thanks!
@statquest 10 місяців тому
Thank you!
@Malyosh-m6i 11 місяців тому ⁺²
Two sigmas are like two for loops, such that, for every index of outer Sigma, the inner sigmaales a complete iteration.
@statquest 11 місяців тому ⁺¹
bam!
@arash2229 Рік тому ⁺¹
Thank youuuu. you explain everything clearly
@statquest Рік тому
Glad it was helpful!
@felipevaldes7679 2 роки тому ⁺¹
I love this channel
@statquest 2 роки тому
BAM! :)
@felipevaldes7679 Рік тому ⁺¹
@@statquest lol, very on-brand too.
@Lynxdom Рік тому ⁺¹
You got a like just for the musical numbers!
@statquest Рік тому ⁺¹
bam!
@liam_42 9 місяців тому
Hello, that's a great video and it has helped me understand a lot about Mutual Information as well as your other video about entropy. I do have a question.
At 11:13 the answer you get after calculation is 0.5004 and it is explained that it is close to 0.5. However when I do the math (( 4 ÷ 5 ) × log ( 5 ÷ 4 ) + ( 1 ÷ 5 ) × log( 5 ) ) the answer I get is 0.217322... Am I missing something? Because from what I understood, the closer you get to 0.5, the better it is but it is not confirmed by my other examples. Is there a maximum to mutual information?
Thank you for your video.
@statquest 9 місяців тому
The problem is that you are using log base 10 instead of the natural log (log base 'e'). I talk about this at 8:07 and in this other video: ua-cam.com/video/iujLN48gumk/v-deo.html
@liam_42 9 місяців тому ⁺¹
@@statquest Thank you for your answer. That explains a lot.
@rosss6989 8 місяців тому
I have same doubt, when both columns are equal it says mutual info is 0.5 then what is maximum value of mutual info and in which scenario ?
@aleksandartta Рік тому ⁺¹
1) based on what to choose the number of bins? Does larger number of bins gives lesser mutual information?
2) what if the label (output value) is numerical?
Thank in advance
@statquest Рік тому ⁺²
1) Here's how a lot of people find the best number (and width) of the bins: journals.plos.org/plosone/article?id=10.1371/journal.pone.0087357
2) Then you make a histogram of the label data.
@user-oq1yk2fq2f 4 місяці тому
so here does it means that we are comparing two variables, one is feature and one is output and the output is taken from test data? and basically we are tuning the model and we are using mutual information just to know which of the features are more useful to tune our model to get more accurate predictions? and after this we check our tuned model for the test set? and why do we want to reduce the attributes? do we do it because the less attributes will do the fast calculations and we can train our data in less time?
@statquest 4 місяці тому
That's the main idea. There are a lot of reasons you might want to reduce the number of variables in your model. 1) sometimes collecting data can be very expensive 2) fewer variables can mean we need less data to fit the model.
@ruiqili1818 9 місяців тому
Your explanations are alway awesome! I wonder how to explain Normalized Mutual Information?
@statquest 9 місяців тому
I believe it's just a normalized version of mutual information (so scale it to be a value between 0 and 1).
@Lara-qo5dc 8 місяців тому
This is great! Do you know if you can interpret a NMI value in percentages, something like 7% of information overlaps, or 7% of group members overlap?
@pablovivas5234 2 роки тому ⁺¹
Keep it up. Great content
@statquest 2 роки тому
Thank you!
@AI_ML_DL_LLM Рік тому
3 more things: 1- it would have been great if you could make a comparison with correlation too here, 2- discuss the minimum and maximum value of the MI, 3- the intuition of this specific formula
@statquest Рік тому
Thanks! I'm not really sure you can compare Mutual Information to correlation because correlation doesn't work at all with discrete data. I mention this at 1:20.
@666shemhamforash93 Рік тому ⁺¹
Amazing as always! Any update on the transformer video?
@statquest Рік тому ⁺¹
Still working on it.
@marahakermi-nt7lc Рік тому
thankss joshh 😍😍 in 1:30 since the response variable is not continuous and takes on 0 or 1(yes/no) can we model it with logistic regression?
@statquest Рік тому ⁺¹
Yep!
@IshanGarg-y1u 6 місяців тому
In case of continuous variables how to decide the number of bins and the boundaries?
@statquest 6 місяців тому
It probably depends on the dataset. Usually with things like that I like to plot histograms to make decisions.
@buckithed 11 місяців тому ⁺¹
Fire🔥🔥🔥
@statquest 11 місяців тому
BAM! :)
@avnibar 7 місяців тому
Hi, thank you Josh. I have one question. Does MI score is affected by imbalanced data?
@statquest 7 місяців тому
Presumably - pretty much everything is affected by imbalanced data. This is because you have a much better estimate one class and a much worse estimate for the other.
@dhanrajm6537 11 місяців тому
hi, what will be the base of the logarithm when calculating entropy. I believe it was mentioned in the entropy video that for 2 outputs(yes/no or heads/tails) the base of the logarithm will be two. Is there any generalization to this statement?
@statquest 11 місяців тому
Unless there is a specific reason to use a specific base for the log function, we use log base 'e'.
@pranabsarmaiitm2487 Рік тому
awesome!!! Now waiting for a video on Chi2 Test of Independence.
@statquest Рік тому ⁺¹
I'll keep that in mind.
@noazamstein5795 Рік тому
is there a good and stable way to calculate mutual information for numeric variables *where the binning is not good*, e.g. highly skewed distributions where the middle bins are very different from the edge bins?
@statquest Рік тому
Hmm... off the top of my head, I don't know, but I wouldn't be surprised if there was someone out there publishing research papers on this topic.
@Ewan-t6v 7 місяців тому ⁺¹
you are a genius
@statquest 7 місяців тому
:)
@RaviPrakash-dz9fm Рік тому
Can we have videos about all the gazillion hypothesis tests available!!
@statquest Рік тому
I'll keep that in mind.
@GMD023 2 роки тому ⁺¹
Off topic question...but will chatgpt replace us as data scientists/analysts/ statisticians. I just discovered it tonight and it blew me away. I basically learned html and css in a day with it. Im worried it will massively reduce jobs in our field. I did a project that would normally take all day in a few minutes...scary stuff.
@insomniacookie2315 2 роки тому ⁺¹
Well, if you really want his opinion, watch the AI Buzz #1 Josh uploaded three weeks ago. It’s in this channel.
As for my opinion, obviously nobody knows yet, but it will soon be a new ground-level for anybody else. For some that all they can do is basic things ChatGPT does far better, they are in danger; for others that can make more values out of ChatGPT (or any tools to come), they are in far better shape. Which do you think you and fellow data scientists are?
And even for the basic stuffs, there should be at least someone to check whether the ChatGPT has done some absurd work or not, right? Maybe at least for a few years or so.
@ayeshavlogsfun 2 роки тому ⁺²
Just out of curiosity how did you learn HTML and CSS in a day ?
And what's specific task that you solved
@toom2141 2 роки тому
I didnt think ChatGPT is that impressive afterall. Makes so many mistakes is not able to do really complicated stuff. Totally overhyped!
@statquest 2 роки тому ⁺¹
See: ua-cam.com/video/k3b9Mvtt6lU/v-deo.html
@GMD023 2 роки тому ⁺¹
@@statquest thank you! This is great. Im also starting my first job today post college as a research data specialist! Your videos always helped me throughout my data science bachelors, so thank you!
@StackhouseBK Місяць тому ⁺¹
you are amazing
@statquest Місяць тому
Thank you!
@murilopalomosebilla2999 Рік тому ⁺¹
Excellent content as always!
@statquest Рік тому
Much appreciated!
@ronakbhatt4880 7 місяців тому
Can't we use correlation factor instead of Mutual information for continuous variable?
@statquest 7 місяців тому
If you have continuous data, use R^squared.
@devenkapadia5330 4 місяці тому
Can this mutual information value be greater than 0.5, I mean closer to 1??
@statquest 4 місяці тому
In theory the range of possible values goes from 0 to positive infinity.
@6nodder6 Рік тому
Is it weird that my prof. gave me the mutual information equation as one that uses entropy? We were given "I(A; B) = H(B) - sum_b P(B = b) * H(A | B = b)" with no mention of the equation you showed in this video
@statquest Рік тому
That is odd. Mutual information can be derived from the entropy of two variables. It is the average of how the surprise in one variable is related to the surprise in another. However, this is the standard formula. See: en.wikipedia.org/wiki/Mutual_information
@archithiwrekar4021 Рік тому
Hey, so what if our dependent variable ( here, loves troll 2) is continuous? Can we use Mutual information in that case? by binning aren't we just converting it into a categorical variable?
@statquest Рік тому
You could definitely try that.
@romeo72899 Рік тому
Can you please make a video on Latent Dirichlet Allocation
@statquest Рік тому ⁺¹
I'll keep that in mind! :)
@wowZhenek Рік тому
Josh, thank you for the awesome easily digestible video. One question. Is there any specific guideline about binning the continuous variable? I'm fairly certain that depending on how you split it (how many bins you choose and how spread they are) the result might be different.
@statquest Рік тому
To learn more about one common way to create histograms of continuous variables, see: journals.plos.org/plosone/article?id=10.1371/journal.pone.0087357
@wowZhenek Рік тому
@@statquest Josh, thank you for the link, but I guess I formulated my question incorrectly. The question was about not creating the histogram but actually choosing the bins. You split your set in 3 bins. Why 3? Why not 4 or 5? Would the result change drastically if you split in 5 bins? What if the distribution of the variable you are splitting is not normal or uniform? Etc
@statquest Рік тому ⁺¹
@@wowZhenek When building a histogram, choosing the bins is the hard part, and that is what that article describes - a special way to choose the number and width of bins specifically for Mutual Information. So take a look. Also, because we are using a histogram approach, it doesn't matter what the underlying distribution is. The histogram doesn't make any assumptions.
@wowZhenek Рік тому ⁺¹
@@statquest oh, yeah, I didn't look inside the URL you gave because your described it as "one common way to create histograms of continuous variables" which seemed very much distant from what I was actually asking about. Now that I checked the link, damn, what a comprehensive abstract. Thank you very much!
@andrewdouglas9559 Рік тому
It seems information gain (defined via entropy) and mutual information are the same thing?
@statquest Рік тому ⁺¹
They are related, but not the same thing. For details, see: en.wikipedia.org/wiki/Information_gain_(decision_tree)
@andrewdouglas9559 Рік тому ⁺¹
@@statquest Thanks, I'll check it out. And also thanks for all the videos. It's an incredible resource you've produced.
@usamahussain4461 6 місяців тому
this is a nice tutorial and with different useful scenarios.
But I didn't completely grasp the intuition of something never changing telling nothing about something that does. I understand it mathematically but hoping for a more intuitive explanation, because even if something does not change, there are some matches between the features.
@statquest 6 місяців тому
Say like I ask a bunch people what is their favorite color is and how old they are. Some of the people are young, some are middle aged and some are old, but everyone loves the color green. Now, if I told you that someone in that group loved the color green, what would that tell you about that person's age? Nothing. Since everyone loves green (it never changes) it can't differentiate between young, middle aged and old people.
@9erik1 Рік тому ⁺¹
6:18 not small bam, big bam... thank you very much...
@statquest Рік тому
BAM!!! :)
@jozefinagramatikova4889 Місяць тому
So when we don't have categorical features can we just use R^2?
@statquest Місяць тому
Yep
@jozefinagramatikova4889 Місяць тому
@@statquest But doesn't R^2 show only linear relationship?
@statquest Місяць тому
@@jozefinagramatikova4889 When used with linear regression, then yes. However, R-squared can be applied to any model, even models that make non-linear fits, and in that case, it can evaluate a non-linear relationship.
@jozefinagramatikova4889 Місяць тому
@@statquest Thank you very much! So, is Mutual Information used more often compared to R^2 for feature selection (when we don't have categorical features) and why?
@statquest Місяць тому
@@jozefinagramatikova4889 If you don't have categorical features, I think R^2 is more popular.
@eltonsantos4724 Рік тому ⁺³
Que Top. Dublado em português
@statquest Рік тому
Muito obrigado! :)
@AI_ML_DL_LLM Рік тому ⁺¹
maybe next video on this: KL divergence
@statquest Рік тому ⁺¹
It's on the list.
@viranchivedpathak4231 Рік тому ⁺¹
DOUBLE BAM!!
@statquest Рік тому
Thanks!
@Chuckmeister3 Рік тому
What does it mean if mutual information is above 0.5? If 0.5 is perfectly shared information...
@statquest Рік тому
As you can see in the video, perfectly shared information can have MI > 0.5. So 0.5 is not the maximum value.
@Chuckmeister3 Рік тому
@@statquest Is MI then somehow influenced by the size of the data or the number of categories? The video seems to suggest it should be around 0.5 for perfectly shared information (at least in this example). With discrete data using 15 bins I get some values close to 1.
Thanks for these great videos.
@statquest Рік тому
@@Chuckmeister3 Yes, the size of the dataset matters.
@yurigansmith Рік тому
@@Chuckmeister3 Interpretation from coding theory (natural log replaced by log to base 2): Mutual information I(X;Y) is the amount of bits wasted if X and Y are encoded separately instead of jointly encoded as vector (X,Y). Statement holds on average and only asymptotically, i.e. for optimal entropy coding (e.g. arithmetic encoder) with large alphabets (asymptotically for size -> oo). It's the amount of information shared by X and Y measured in bits. Mutual information can become arbitrarily large, depending on the size of the alphabets of X and Y (and the distribution p(x,y) of course). But it can't be greater than the separate entropies H(X) and H(Y), respectively the minimum of both. You can think of I(X;Y) as the intersection of H(X) and H(Y).
ps: I think the case of perfectly shared information is if there's a (bijective) function connecting each symbol of X with each symbol of Y, so that the relation between X and Y becomes deterministic. In that case H(X)=H(Y)=I(X;Y). The other extreme is X and Y being statistically independent: In that case I(X;Y) = 0.
@poLirLANCER Рік тому ⁺¹
awesome
@statquest Рік тому
Thanks!
@BorisNVM Рік тому ⁺¹
this is cool
@statquest Рік тому
Thanks!
@yourfutureself4327 Рік тому ⁺¹
i'm more of a 'Goblin 3: the frolicking' man myself
@statquest Рік тому
bam!
@Tufelkind 10 місяців тому ⁺¹
It's like FoodWishes for stats
@statquest 10 місяців тому
:)
@AlexanderYap 2 роки тому
If I want to calculate the correlation between Likes Popcorn and Likes Troll 2, can I use something like Chi2? Similarly between Height bins and Likes Troll 2. What's the advantage of calculating the Mutual Information?
@statquest 2 роки тому
The advantage is that we have a single metric that works on both continuous, discrete and mixed variables and we don't have to make any assumptions about the underlying distributions.
@viajedali7663 3 місяці тому ⁺¹
tiny bam
@statquest 3 місяці тому
:)
@rogerc23 2 роки тому
Ummm I know I have a cold right now but did anyone only hear an Italian girl speaking ?
@statquest Рік тому
?
@user-hl6xe8dz9x Місяць тому ⁺²
puop poopup pooh
@statquest Місяць тому
:)
@AxDhan Рік тому ⁺¹
small bam = "bamsito"
@statquest Рік тому ⁺¹
Ha! :)
@sera-masumi 4 місяці тому ⁺¹
2:11 baam
@sera-masumi 4 місяці тому ⁺¹
8:48 double baaaam
@sera-masumi 4 місяці тому ⁺¹
9:28 tiny baaaam
@statquest 4 місяці тому
ha! :)
@TommyMN Рік тому
If I could I'd kiss you on the mouth, wish you did a whole playlist about data compression
@statquest Рік тому
Ha! I'll keep that topic (data compression) in mind.
@FREELEARNING 2 роки тому
Great content. But just don't sing, you're not up to that.
@statquest Рік тому
Noted! :)
@VaibhaviDeo Рік тому
i will fite you if you tell daddy stat quest what to do what not to do
@igorg4129 2 роки тому ⁺¹
I was always interested how should we think if we want to invent such a technique. Imean ok, lets say I "suspect" that the probabilities here should do the job, and say my goal is to get at the end of a day some "flag" from 0 to 1 which indicates the strenght of a relationship, but how should I think on, to deside like what comes to denominator vs nominator, when use log etc. There should be something like an "thinking algorithm"
P.s
Understanding this will be very helpfull in understanding the existing fancy formulas
@statquest 2 роки тому ⁺¹
I talk more about the reason for the equation in my video on Entropy: ua-cam.com/video/YtebGVx-Fxw/v-deo.html
@joshuasirusstara2044 Рік тому ⁺²
that small bam
@statquest Рік тому
:)

Наступне

Автоматичне відтворення

One-Hot, Label, Target and K-Fold Target Encoding, Clearly Explained!!!