Explaining the Kullback-Liebler divergence through secret codes

Поділитися
Вставка
  • Опубліковано 14 тра 2018
  • Explains the concept of the Kullback-Leibler (KL) divergence through a ‘secret code’ example. The KL divergence is a directional measure of separation between two distributions (although is not a 'distance').
    This video is part of a lecture course which closely follows the material covered in the book, "A Student's Guide to Bayesian Statistics", published by Sage, which is available to order on Amazon here: www.amazon.co.uk/Students-Gui...
    For more information on all things Bayesian, have a look at: ben-lambert.com/bayesian/. The playlist for the lecture course is here: • A Student's Guide to B...

КОМЕНТАРІ • 27

  • @wendylanger
    @wendylanger 3 роки тому +2

    I am ridiculously overjoyed to have found this remarkably clear & concise explanation of the Kullback-Liebler divergence, thanks Ben Lambert!

  • @anusaxena971
    @anusaxena971 2 роки тому

    You are too good and perfect. Every sentence conveys maximum information and complete arguments !! Never heard even the best of professors so precise and complete with arguments !! 👍👍

  • @akash_goel
    @akash_goel 3 роки тому +1

    This is the best explanation I've found yet. Thanks!

  • @olivierpaalvast1213
    @olivierpaalvast1213 5 років тому +12

    The codes you present are uniquely decodable but they are not instantaneously decodable. Would it not have been better to say, for example for P, c(a) = 0, c(b) = 10, c(c) = 11?

    • @usmanshabbir4902
      @usmanshabbir4902 2 роки тому

      yes, you are right....we are using the principle of entropy coding. but the main is code length will be the same.

  • @user-is3mv9ol8i
    @user-is3mv9ol8i 5 років тому +1

    Great explanation, thank you!

  • @thangbom4742
    @thangbom4742 5 років тому

    wow, wonderful explanation!

  • @danielshamaeli3696
    @danielshamaeli3696 5 років тому +1

    Thank you for explaining this :)

  • @eminmammadov6525
    @eminmammadov6525 5 років тому +1

    Fantastic. Great job

  • @adampax
    @adampax 4 роки тому +1

    very insightful, cheers!

  • @PieroSavastano
    @PieroSavastano 5 років тому

    great tutorial! thanks

  • @polinactiveaccount7737
    @polinactiveaccount7737 5 років тому

    Amazing! Great explanation :-)

  • @posthocprior
    @posthocprior Рік тому

    This is a great explanation. Thanks.

  • @lima073
    @lima073 Рік тому

    Thank you for this explanation! Could you do some about the Jensen Shannon Divergence and the relation of it with Mutual Information ?

  • @nikolaassteenbergen7270
    @nikolaassteenbergen7270 5 років тому +10

    Nice explanation! Its Kullback LEIbler divergence, though. Spelled with ei not ie :)

  • @BlAcKpHrAcK
    @BlAcKpHrAcK 4 роки тому +1

    The Q of L should be two times one quarter, rather than one half, which solves Q to three over two. It is the same for both languages P and Q, three over two. All of this presents no particular divergence which is an alternate application of the Kullback Liebler Divergence. Given a symmetrical solution, we might have employed an equation to begin with.

  • @jakejing1118
    @jakejing1118 5 років тому +1

    Thanks for the nice tutorial. If my understanding is correct, you showed us two ways of calculating the information loss from P(X) to Q(X). But it seems that the second way is independent of the L(X), since the length, L, is not used at all. Does that mean the encoding length will not affect the KL divergence, or my understanding is not correct? Thanks a lot!

    • @wahabfiles6260
      @wahabfiles6260 4 роки тому

      nice observation, i think it is pre-assumed that P(x) is ideal for arbitrary coding and if we use Q(x) for that arbitrary coding we shall deviate by 1/4th. So I think the arbitrary coding length is not required. I might be wrong though.

  • @p.z.8355
    @p.z.8355 Рік тому

    What is the relationship between log_2(p(x)) and the encoding length of x ? Intuitively, the higher p(x), the shorter the encoding length. How is this relation concisely mathematically formulated?

  • @prab436
    @prab436 5 років тому

    Mind blown!

  • @michaelkonstantinov7857
    @michaelkonstantinov7857 4 роки тому

    Good explanation! but i think that this is just information coding and not ciphering

  • @sayandebnath115
    @sayandebnath115 3 роки тому

    Isn't the example toooo specific as probabilities are very specific....though the idea is fairly correct and two things will match if the no of letters are large instead of only 3

  • @DarrelFrancis
    @DarrelFrancis 2 роки тому

    Shouldn't the code for language P be a=0, b=10, c=11? And likewise for Q, a=10, b=0, c=11? That way if the first digit seen is 0, you know it is the commonest letter; if it is 1, you now select between the two rarer letters

  • @2137kg
    @2137kg 5 років тому +4

    you mean LEIbler

  • @amirhosseinmaleki9802
    @amirhosseinmaleki9802 5 років тому

    did not make a mistake? E[L(Q)|P] = 1/4 (not 1/2) x 2 + 1/2 x 1 + 1/4 x 2

    • @tylertyler82
      @tylertyler82 5 років тому +2

      P has ½ probability for letter a and Q has ¼ probability for letter a, and you are given Language P, so I believe the video is correct

  • @malharjajoo7393
    @malharjajoo7393 4 роки тому +4

    This isn't really helpful .. I still don't understand the secret code example and how KL is similar ..