Kullback-Leibler (KL) divergence

The Kullback-Leibler (KL) divergence (also called relative entropy), denoted $D_{K L} (P ∣∣ Q)$ , is a type of statistical distance: a measure of how much a model probability distribution $Q$ is different from a true probability distribution $P$ . Mathematically, it is defined as

D_{K L} (P ∣∣ Q) = x \in χ \sum P (x) l o g (\frac{P ( x )}{Q ( x )})

A simple interpretation of the KL divergence or $P$ from $Q$ is the expected excess surprise from using $Q$ as a model instead of $P$ when the actual distribution is $P$ . While it is a measure of how different two distributions are and is thus a “distance” in some sense, it is not actually a metric, which is the most familiar and formal type of distance. In particular, it is not symmetric in the two distributions (in contrast to variation of information), and does not satisfy the triangle inequality. Instead, in terms of information geometry, it is a type of divergence, a generalisation of squared distance, and for certain classes of distributions (notably an exponential family), it satisfies a generalised Pythagorean theorem (which applies to square distances).

Relative entropy is always a non-negative real number with value $0$ if and only if the two distributions in question are identical. It has diverse applications, both theoretical, such as characterising the relative (Shannon) entropy in information systems, randomness in continuous time-series, and information gain when comparing statistical models of inference; and practical, such as applied statistics, fluid mechanics, neuroscience, bioinformatics, and machine learning.

References

Wikipedia

ML Notes

Explorer

Kullback-Leibler (KL) divergence

Graph View

Backlinks