Explaining the Kullback-Liebler divergence through secret codes
Вставка
- Опубліковано 14 тра 2018
- Explains the concept of the Kullback-Leibler (KL) divergence through a ‘secret code’ example. The KL divergence is a directional measure of separation between two distributions (although is not a 'distance').
This video is part of a lecture course which closely follows the material covered in the book, "A Student's Guide to Bayesian Statistics", published by Sage, which is available to order on Amazon here: www.amazon.co.uk/Students-Gui...
For more information on all things Bayesian, have a look at: ben-lambert.com/bayesian/. The playlist for the lecture course is here: • A Student's Guide to B...
I am ridiculously overjoyed to have found this remarkably clear & concise explanation of the Kullback-Liebler divergence, thanks Ben Lambert!
You are too good and perfect. Every sentence conveys maximum information and complete arguments !! Never heard even the best of professors so precise and complete with arguments !! 👍👍
This is the best explanation I've found yet. Thanks!
The codes you present are uniquely decodable but they are not instantaneously decodable. Would it not have been better to say, for example for P, c(a) = 0, c(b) = 10, c(c) = 11?
yes, you are right....we are using the principle of entropy coding. but the main is code length will be the same.
Great explanation, thank you!
wow, wonderful explanation!
Thank you for explaining this :)
Fantastic. Great job
very insightful, cheers!
great tutorial! thanks
Amazing! Great explanation :-)
This is a great explanation. Thanks.
Thank you for this explanation! Could you do some about the Jensen Shannon Divergence and the relation of it with Mutual Information ?
Nice explanation! Its Kullback LEIbler divergence, though. Spelled with ei not ie :)
The Q of L should be two times one quarter, rather than one half, which solves Q to three over two. It is the same for both languages P and Q, three over two. All of this presents no particular divergence which is an alternate application of the Kullback Liebler Divergence. Given a symmetrical solution, we might have employed an equation to begin with.
Thanks for the nice tutorial. If my understanding is correct, you showed us two ways of calculating the information loss from P(X) to Q(X). But it seems that the second way is independent of the L(X), since the length, L, is not used at all. Does that mean the encoding length will not affect the KL divergence, or my understanding is not correct? Thanks a lot!
nice observation, i think it is pre-assumed that P(x) is ideal for arbitrary coding and if we use Q(x) for that arbitrary coding we shall deviate by 1/4th. So I think the arbitrary coding length is not required. I might be wrong though.
What is the relationship between log_2(p(x)) and the encoding length of x ? Intuitively, the higher p(x), the shorter the encoding length. How is this relation concisely mathematically formulated?
Mind blown!
Good explanation! but i think that this is just information coding and not ciphering
Isn't the example toooo specific as probabilities are very specific....though the idea is fairly correct and two things will match if the no of letters are large instead of only 3
Shouldn't the code for language P be a=0, b=10, c=11? And likewise for Q, a=10, b=0, c=11? That way if the first digit seen is 0, you know it is the commonest letter; if it is 1, you now select between the two rarer letters
you mean LEIbler
did not make a mistake? E[L(Q)|P] = 1/4 (not 1/2) x 2 + 1/2 x 1 + 1/4 x 2
P has ½ probability for letter a and Q has ¼ probability for letter a, and you are given Language P, so I believe the video is correct
This isn't really helpful .. I still don't understand the secret code example and how KL is similar ..