Reinforcement-Learning

Pages in this section

Action-advantage function (RL)
Last edited: 2025-05-22
Action-advantage function (RL)
Using the value function and Quality function (RL) of a policy $\pi$ we can work out how advantageous taking a particular action is for us given we are in a state.
Bellman equation
Last edited: 2026-02-05
Bellman equation
The Bellman equation is used to determine the optimum value function for a given Markov decision process . It defines this value function recursively as follows:
Learning rate convergence
Last edited: 2026-01-28
# Statement
Lemma
Given a Markov decision process $M$, let $V_t(s)$ be the value estimate for a state $s$ at the $t$-th iteration. If we update this using the following update rule:
Policy (MDP)
Last edited: 2025-05-14
Policy (MDP)
In a Markov decision process , a policy is how an actor will behave in a given situation, given by $\pi: S \rightarrow A$ where $\pi(s) \in A_s$. This concept can extend to become a probabilistic policy. Let $\mathcal{A}$ be the set of probability distributions over $A$. Then a probabilistic policy is given by $\pi: S \rightarrow \mathcal{A}$ where if $\pi(s)(a)$ is non-zero then $a \in A_s$.
Quality function (RL)
Last edited: 2025-05-22
Quality function (RL)
Similar to the value function , a quality function accounts for both state and action. The function $q: S \times A \rightarrow \mathbb{R}$ represents the quality of taking action $a \in A$ when you are in state $s \in S$.
Return (RL)
Last edited: 2025-05-22
Return (RL)
In a Markov decision process the return $G_t$ at time step $t$ is defined to be the discounted sum of rewards:
Value function (RL)
Last edited: 2025-05-14
Value function (RL)
A value function is a mapping of states to a long term worth for an actor in a Markov decision process . This is defined by $V: S \rightarrow \mathbb{R}$.

Reinforcement-Learning

Pages in this section

# Statement