Joint Entropy
machine-learning
Joint Entropy
Suppose we have two random variables $X$ and $Y$ over different domains $A$ and $B$. The joint entropy is defined by
$$H(X, Y) = - \sum_{a \in A} \sum_{b \in B} \mathbb{P}[X = a, Y = b] \log_2(\mathbb{P}[X = a, Y = b]).$$This is Information entropy of the joint distribution.