Information entropy
machine-learning
maths
Information Entropy
Suppose we have a discrete random variable $X$ that can take values in $A$. We define the information entropy of $X$ to be
$$H(X) = - \sum_{a \in A} \mathbb{P}(X = a) \log_2 \left ( \mathbb{P}(X = a) \right ).$$The higher the entropy the more uniform the random variable.