Probability
Pages in this section
- Bayes' ruleLast edited: 2026-01-28
# Statement
Bayes’ ruleFor two events $A$ and $B$, we have the following equality on their conditional probabilities :
- Bayesian network
Last edited: 2026-02-05Bayesian networkLet $G = (V,E)$ be a directed acyclic graph and let $X = (X_v)_{v \in V}$ be a set of random variables . We say $(G,X)$ forms a Bayesian network if the probability density function is given by
- Bayesian network if and only if it satisfies the local Markov Property
Last edited: 2026-01-28# Statement
LemmaLet $G = (V,E)$ be a directed acyclic graph and $X = \{X_v\}_{v \in V}$ a set of random variables . $(G,X)$ is a Bayesian network if and only if it satisfies the Local Markov property .
- Chain rule (probability)
Last edited: 2026-02-05[!definition] Chain rule (probability) For random variables $A_k$ for $k \in \{1, 2, \ldots, n\}$ we have
$$\mathbb{P}[A_1, A_2, \ldots, A_n] = \prod_{k=1}^n \mathbb{P}[A_k \vert A_1, A_2, \ldots, A_{k-1}].$$This follows from the definition of conditional probability .
- Conditional entropy
Last edited: 2026-02-05Conditional entropySuppose we have two random variables $X$ and $Y$ over different domains $A$ and $B$. The conditional entropy is defined by
- Conditional Independence
Last edited: 2026-02-05Conditional IndependenceSuppose we have random variables $X$, $Y$, and $Z$ over domains $A$, $B$, and $C$. We say $X$ is conditionally independent of $Y$ given $Z$ if for all $a \in A$, $b \in B$ and $c \in C$ we have
- Conditional probability
Last edited: 2026-02-05Conditional probabilityFor two events $A$ and $B$ the conditional probability of $A$ happening given $B$ has happened is
- Ergodic Markov chain
Last edited: 2026-02-05Ergodic Markov chainA Markov chain is said to be ergodic if it is both aperiodic and irreducible .
- Finite Markov Decision Process
Last edited: 2025-05-14Finite Markov Decision ProcessThis summarises the environment that an actor in a discrete Markovian universe experiences. It is given by:
- Game theory
Last edited: 2026-02-05Game theoryGame theory is the study of systems where there are more than one rational players.
- If two variables are independent conditional entropy excludes the dependent
Last edited: 2025-12-05# Statement
LemmaSuppose we have two independent random variables $X$ and $Y$ over different domains $A$ and $B$. Then the conditional entropy does not depend on the independent variable:
- If two variables are independent joint entropy is additive
Last edited: 2025-12-05# Statement
LemmaSuppose we have two independent random variables $X$ and $Y$ over different domains $A$ and $B$. Then the Joint Entropy is additive
- Independent component analysis
Last edited: 2024-03-10Independent component analysis is a form of linear dimension reduction . The goal of independent component analysis is to form a linear map to features which are independent of one another.
Strictly if you previous features were $X_1, X_2, \ldots, X_n$ and you map to $Y_1, Y_2, \ldots, Y_m$ then we want the following statements about Mutual information :
- Independent events
Last edited: 2026-02-05Independent eventsSuppose we have two events $A$ and $B$. These events are independent if
- Independent identically distributed samples
Last edited: 2026-02-05Independent identically distributed samplesThis means that the samples are independent events drawn from the same probability distribution .
- Irreducible Markov chain
Last edited: 2026-02-05Irreducible Markov chainA Markov chain given by $P \in M_{N,N}(\mathbb{R})$ is irreducible if the directed graph on $V = \{1, 2, \ldots, N\}$ given by non-zero values of $P$ has a single strongly connected component .
- Joint distribution
Last edited: 2026-02-05Joint distributionGiven two random variables $X$ and $Y$ over domains $A$ and $B$. The joint distribution of $X \oplus Y$ is over $A \times B$ and is given by
- Kullback–Leibler divergence
Last edited: 2026-02-05Kullback–Leibler divergenceGiven two probability distributions over $A$ called $P$ and $Q$. The Kullback–Leibler divergence is the expected value of the log difference between $P$ and $Q$ with the probabilities for each value being given by $P$.
- Local Markov property
Last edited: 2026-02-05Local Markov propertyLet $G = (V,E)$ be a directed acyclic graph and $X = \{X_v\}_{v \in V}$ a set of random variables . We say $(G,X)$ satisfies the local Markov property if for all $v \in V$ and $w \in V$ such that $(w,v) \not \in E$ where there is no path from $v$ to $w$ then $X_v$ is conditionally independent of $X_w$ given $\cup_{(u,v) \in E} X_u$.
- Marginalisation (probability)
Last edited: 2026-02-05Marginalisation (probability)Suppose we have two random variables $X$ and $Y$ over domains $A$ and $B$ respectively. If we know their join distribution $\mathbb{P}[X, Y]$ then we can calculate either $X$ or $Y$’s (marginal) distribution, i.e.
- Markov chain
Last edited: 2026-02-05Markov chainA Markov chain is specified by a number of states $N$ and a transition probability matrix $P \in M_{N \times N}(\mathbb{R})$. Intuitively think of this as a state machine which at state $i$ has probability $p_{i,j}$ of transitioning to state $j$.
- Markov decision process
Last edited: 2026-02-05Markov decision processA Markov decision process is defined by the following data:
- Mutual information
Last edited: 2026-02-05Mutual informationSuppose we have two random variables $X$ and $Y$ over different domains $A$ and $B$. Then the mutual information is defined to be
- Mutual information is symmetric
Last edited: 2026-01-28# Statement
LemmaSuppose we have two random variables $X$ and $Y$ over different domains $A$ and $B$. We have that Mutual information is symmetric
- Perfect information
Last edited: 2026-02-05Perfect informationA game with perfect information, is one in which all information the effects the game is known to all players.
- Periodic Markov chain
Last edited: 2026-02-05Periodic Markov chainA Markov chain is said to be periodic if it has a periodic state and aperiodic otherwise.
- Periodic state (markov chain)
Last edited: 2026-02-05Periodic state (markov chain)For a Markov chain given by $P \in M_{N,N}(\mathbb{R})$ a state $i$ with $1 \leq i \leq N$ is said to be aperidoic if
- Probability distribution
Last edited: 2026-02-05Probability distributionGiven some domain $D$ a probability distribution on $D$ is a function $p: D \rightarrow [0,1] \subset \mathbb{R}$ such that
- Simulated annealing ending probability
Last edited: 2026-01-28# Statement
LemmaIn the Simulated Annealing algorithm with some unspecified assumptions, we have
- Standard deviation
Last edited: 2026-02-05Standard deviationFor some random variable $X$ given by the probability density function $f: D \rightarrow \mathbb{R}$ and expectation $\mu = \mathbb{E}[X]$ the standard deviation is
- Stationary distribution (Markov Chains)
Last edited: 2026-02-05Stationary distributionFor a Markov chain given by $P \in M_{N, N}(\mathbb{R})$, a stationary distribution is a probability distribution given by a vector $\pi \in M_{1,N}(\mathbb{R})$ (this is also a probability matrix ) such that
- Symmetric Markov chain
Last edited: 2026-02-05Symmetric Markov chainA Markov chain given by $P \in M_{N,N}(\mathbb{R})$ is called symmetric if $p_{i,j} = p_{j,i}$.
- Symmetric Markov chains have a uniform stationary distribution
Last edited: 2026-01-28# Statement
LemmaSymmetric Markov chains have a uniform stationary distribution on its $N$ states.
- Uniform distribution
Last edited: 2026-02-05Uniform distributionFor a domain $D$ of size $\vert D \vert = N$ then the uniform distribution on $D$ is the probability distribution given by
- Variance
Last edited: 2026-02-05VarianceFor some random variable $X$ given by the probability density function $f: D \rightarrow \mathbb{R}$ and expectation $\mu = \mathbb{E}[X]$ the variance is
- Bayesian network