Kullback-Leibler divergence (KL divergence) intuitions

Поділитися
Вставка
  • Опубліковано 29 вер 2024

КОМЕНТАРІ • 11

  • @cabbagecat9612
    @cabbagecat9612  Рік тому

    demo codes: github.com/szhaovas/blog-ytb/blob/master/NES/kl_demo.py

  • @alexkelly757
    @alexkelly757 3 місяці тому

    Thank you. This really helped.

  • @blackkyurem7166
    @blackkyurem7166 3 місяці тому

    What do you mean by the statement that the “positive and negative log ratios will cancel each other out?”
    Attempting to verify this, suppose we have X∈{1, 2, 3, 4} and two simple PMFs:
    - P(X), with probabilities 0.1, 0.2, 0.3, and 0.4 respectively
    - Q(X), with probabilities 0.25, 0.25, 0.25, and 0.25 respectively
    But ln(0.1/0.25) + ln(0.2/0.25) + ln(0.3/0.25) + ln(0.4/0.25) = -0.487109, not 0. Perhaps I’m doing something wrong/misinterpreting the video, but I don’t get why this should be true.

    • @cabbagecat9612
      @cabbagecat9612  3 місяці тому

      Since the area under the curve of a PDF is 1, if P(x1) is very large at a point x1, then P at other points P(x2), P(x3) etc. have to be smaller, such that the total area under the curve for all P(x) does not exceed 1. So if P(x1) > Q(x1), there must be other points xi where P(xi) < Q(xi), and the positive log-ratio at x1 will be cancelled out by the negative log-ratios at xi (using xi here because there can be an arbitrary number of such points).
      TBH I haven't proven myself whether or not positive and negative log-ratios cancel out "exactly" to 0 (even though they definitely cancel each other out, which we don't want in this case because both P(xi) > Q(xi) and P(xj) < Q(xj) should be contributing together towards a larger divergence between P and Q, not against each other).
      My math is a bit rusty, but here's a sketch of the proof idea:
      - First of all, you will need to integrate log(P(x)/Q(x)) over -inf and +inf x. (You can't choose specific x values here because you do not know in advance where P(x) or Q(x) is large)
      - log(P(x) / Q(x)) = log(P(x)) - log(Q(x))
      - Then it becomes something like integral log(P(x)) - integral log(Q(x)).
      - Both integral P(x) and Q(x) would be 1, because the area under the curve of a PDF is 1. I'm not sure how adding the log changes things, but log(P(x)) and log(Q(x)) are still bell-shaped, which means fixed area under the curve, which means the difference will be bounded. Not sure if this difference is 0, but even if it's not, it will still be a constant irrespective of the actual "distance" between distributions P and Q.
      Hope this helps.

  • @KianSartipzadeh
    @KianSartipzadeh 6 місяців тому

    Thank you for making a very intuitive video about the KL divergence 🙏

  • @priyankjain9970
    @priyankjain9970 10 місяців тому

    This is probably the best and simple explanation.. Thanks @CabbageCat for the video
    👍

  • @Blu3B33r
    @Blu3B33r 8 місяців тому

    Amazing explanation and the code is such a smart idea
    Thank you for sharing🙏

  • @sunasheerbhattacharjee4760
    @sunasheerbhattacharjee4760 8 місяців тому

    I think the points on the PDF curves are not probability values as probability values at those points are 0 when considering continuous random variables. The integration between those points actually results in a probability value. Hence, when you integrate from 0 to infinity, the area under the curve results in 1 (probability cannot exceed the value of 1)

    • @cabbagecat9612
      @cabbagecat9612  8 місяців тому

      You are right, it was a sloppy use of terms. Should have been probability density.

    • @sunasheerbhattacharjee4760
      @sunasheerbhattacharjee4760 8 місяців тому +1

      @@cabbagecat9612 Nonetheless, it was a great effort explaining the concept

  • @TongLi-g2f
    @TongLi-g2f 9 місяців тому

    thank you so much :)