The Kullback-Leibler (KL) divergence (also called relative entropy), denoted , is a type of statistical distance: a measure of how much a model probability distribution is different from a true probability distribution . Mathematically, it is defined as

A simple interpretation of the KL divergence or from is the expected excess surprise from using as a model instead of when the actual distribution is . While it is a measure of how different two distributions are and is thus a “distance” in some sense, it is not actually a metric, which is the most familiar and formal type of distance. In particular, it is not symmetric in the two distributions (in contrast to variation of information), and does not satisfy the triangle inequality. Instead, in terms of information geometry, it is a type of divergence, a generalisation of squared distance, and for certain classes of distributions (notably an exponential family), it satisfies a generalised Pythagorean theorem (which applies to square distances).

Relative entropy is always a non-negative real number with value if and only if the two distributions in question are identical. It has diverse applications, both theoretical, such as characterising the relative (Shannon) entropy in information systems, randomness in continuous time-series, and information gain when comparing statistical models of inference; and practical, such as applied statistics, fluid mechanics, neuroscience, bioinformatics, and machine learning.

References