Information entropy

machine-learning maths
Information Entropy

Suppose we have a discrete random variable $X$ that can take values in $A$. We define the information entropy of $X$ to be

$$H(X) = - \sum_{a \in A} \mathbb{P}(X = a) \log_2 \left ( \mathbb{P}(X = a) \right ).$$

The higher the entropy the more uniform the random variable.