Joint Entropy

machine-learning
Joint Entropy

Suppose we have two random variables $X$ and $Y$ over different domains $A$ and $B$. The joint entropy is defined by

$$H(X, Y) = - \sum_{a \in A} \sum_{b \in B} \mathbb{P}[X = a, Y = b] \log_2(\mathbb{P}[X = a, Y = b]).$$

This is Information entropy of the joint distribution.