Probability

Pages in this section

Bayes' rule
Last edited: 2026-01-28
# Statement
Bayes’ rule
For two events $A$ and $B$, we have the following equality on their conditional probabilities :
Bayesian network
Last edited: 2026-02-05
Bayesian network
Let $G = (V,E)$ be a directed acyclic graph and let $X = (X_v)_{v \in V}$ be a set of random variables . We say $(G,X)$ forms a Bayesian network if the probability density function is given by
Bayesian network if and only if it satisfies the local Markov Property
Last edited: 2026-01-28
# Statement
Lemma
Let $G = (V,E)$ be a directed acyclic graph and $X = \{X_v\}_{v \in V}$ a set of random variables . $(G,X)$ is a Bayesian network if and only if it satisfies the Local Markov property .
Chain rule (probability)
Last edited: 2026-02-05
[!definition] Chain rule (probability) For random variables $A_k$ for $k \in \{1, 2, \ldots, n\}$ we have
$$\mathbb{P}[A_1, A_2, \ldots, A_n] = \prod_{k=1}^n \mathbb{P}[A_k \vert A_1, A_2, \ldots, A_{k-1}].$$
This follows from the definition of conditional probability .
Conditional entropy
Last edited: 2026-02-05
Conditional entropy
Suppose we have two random variables $X$ and $Y$ over different domains $A$ and $B$. The conditional entropy is defined by
Conditional Independence
Last edited: 2026-02-05
Conditional Independence
Suppose we have random variables $X$, $Y$, and $Z$ over domains $A$, $B$, and $C$. We say $X$ is conditionally independent of $Y$ given $Z$ if for all $a \in A$, $b \in B$ and $c \in C$ we have
Conditional probability
Last edited: 2026-02-05
Conditional probability
For two events $A$ and $B$ the conditional probability of $A$ happening given $B$ has happened is
Ergodic Markov chain
Last edited: 2026-02-05
Ergodic Markov chain
A Markov chain is said to be ergodic if it is both aperiodic and irreducible .
Finite Markov Decision Process
Last edited: 2025-05-14
Finite Markov Decision Process
This summarises the environment that an actor in a discrete Markovian universe experiences. It is given by:
Game theory
Last edited: 2026-02-05
Game theory
Game theory is the study of systems where there are more than one rational players.
If two variables are independent conditional entropy excludes the dependent
Last edited: 2025-12-05
# Statement
Lemma
Suppose we have two independent random variables $X$ and $Y$ over different domains $A$ and $B$. Then the conditional entropy does not depend on the independent variable:
If two variables are independent joint entropy is additive
Last edited: 2025-12-05
# Statement
Lemma
Suppose we have two independent random variables $X$ and $Y$ over different domains $A$ and $B$. Then the Joint Entropy is additive
Independent component analysis
Last edited: 2024-03-10
Independent component analysis is a form of linear dimension reduction . The goal of independent component analysis is to form a linear map to features which are independent of one another.
Strictly if you previous features were $X_1, X_2, \ldots, X_n$ and you map to $Y_1, Y_2, \ldots, Y_m$ then we want the following statements about Mutual information :
Independent events
Last edited: 2026-02-05
Independent events
Suppose we have two events $A$ and $B$. These events are independent if
Independent identically distributed samples
Last edited: 2026-02-05
Independent identically distributed samples
This means that the samples are independent events drawn from the same probability distribution .
Irreducible Markov chain
Last edited: 2026-02-05
Irreducible Markov chain
A Markov chain given by $P \in M_{N,N}(\mathbb{R})$ is irreducible if the directed graph on $V = \{1, 2, \ldots, N\}$ given by non-zero values of $P$ has a single strongly connected component .
Joint distribution
Last edited: 2026-02-05
Joint distribution
Given two random variables $X$ and $Y$ over domains $A$ and $B$. The joint distribution of $X \oplus Y$ is over $A \times B$ and is given by
Kullback–Leibler divergence
Last edited: 2026-02-05
Kullback–Leibler divergence
Given two probability distributions over $A$ called $P$ and $Q$. The Kullback–Leibler divergence is the expected value of the log difference between $P$ and $Q$ with the probabilities for each value being given by $P$.
Local Markov property
Last edited: 2026-02-05
Local Markov property
Let $G = (V,E)$ be a directed acyclic graph and $X = \{X_v\}_{v \in V}$ a set of random variables . We say $(G,X)$ satisfies the local Markov property if for all $v \in V$ and $w \in V$ such that $(w,v) \not \in E$ where there is no path from $v$ to $w$ then $X_v$ is conditionally independent of $X_w$ given $\cup_{(u,v) \in E} X_u$.
Marginalisation (probability)
Last edited: 2026-02-05
Marginalisation (probability)
Suppose we have two random variables $X$ and $Y$ over domains $A$ and $B$ respectively. If we know their join distribution $\mathbb{P}[X, Y]$ then we can calculate either $X$ or $Y$’s (marginal) distribution, i.e.
Markov chain
Last edited: 2026-02-05
Markov chain
A Markov chain is specified by a number of states $N$ and a transition probability matrix $P \in M_{N \times N}(\mathbb{R})$. Intuitively think of this as a state machine which at state $i$ has probability $p_{i,j}$ of transitioning to state $j$.
Markov decision process
Last edited: 2026-02-05
Markov decision process
A Markov decision process is defined by the following data:
Mutual information
Last edited: 2026-02-05
Mutual information
Suppose we have two random variables $X$ and $Y$ over different domains $A$ and $B$. Then the mutual information is defined to be
Mutual information is symmetric
Last edited: 2026-01-28
# Statement
Lemma
Suppose we have two random variables $X$ and $Y$ over different domains $A$ and $B$. We have that Mutual information is symmetric
Perfect information
Last edited: 2026-02-05
Perfect information
A game with perfect information, is one in which all information the effects the game is known to all players.
Periodic Markov chain
Last edited: 2026-02-05
Periodic Markov chain
A Markov chain is said to be periodic if it has a periodic state and aperiodic otherwise.
Periodic state (markov chain)
Last edited: 2026-02-05
Periodic state (markov chain)
For a Markov chain given by $P \in M_{N,N}(\mathbb{R})$ a state $i$ with $1 \leq i \leq N$ is said to be aperidoic if
Probability distribution
Last edited: 2026-02-05
Probability distribution
Given some domain $D$ a probability distribution on $D$ is a function $p: D \rightarrow [0,1] \subset \mathbb{R}$ such that
Simulated annealing ending probability
Last edited: 2026-01-28
# Statement
Lemma
In the Simulated Annealing algorithm with some unspecified assumptions, we have
Standard deviation
Last edited: 2026-02-05
Standard deviation
For some random variable $X$ given by the probability density function $f: D \rightarrow \mathbb{R}$ and expectation $\mu = \mathbb{E}[X]$ the standard deviation is
Stationary distribution (Markov Chains)
Last edited: 2026-02-05
Stationary distribution
For a Markov chain given by $P \in M_{N, N}(\mathbb{R})$, a stationary distribution is a probability distribution given by a vector $\pi \in M_{1,N}(\mathbb{R})$ (this is also a probability matrix ) such that
Symmetric Markov chain
Last edited: 2026-02-05
Symmetric Markov chain
A Markov chain given by $P \in M_{N,N}(\mathbb{R})$ is called symmetric if $p_{i,j} = p_{j,i}$.
Symmetric Markov chains have a uniform stationary distribution
Last edited: 2026-01-28
# Statement
Lemma
Symmetric Markov chains have a uniform stationary distribution on its $N$ states.
Uniform distribution
Last edited: 2026-02-05
Uniform distribution
For a domain $D$ of size $\vert D \vert = N$ then the uniform distribution on $D$ is the probability distribution given by
Variance
Last edited: 2026-02-05
Variance
For some random variable $X$ given by the probability density function $f: D \rightarrow \mathbb{R}$ and expectation $\mu = \mathbb{E}[X]$ the variance is

Probability

Pages in this section

# Statement

# Statement

# Statement

# Statement

# Statement

# Statement

# Statement