Reinforcement-Learning
Pages in this section
- Action-advantage function (RL)Last edited: 2025-05-22Action-advantage function (RL)
Using the value function and Quality function (RL) of a policy $\pi$ we can work out how advantageous taking a particular action is for us given we are in a state.
- Bellman equation
Last edited: 2026-02-05Bellman equationThe Bellman equation is used to determine the optimum value function for a given Markov decision process . It defines this value function recursively as follows:
- Learning rate convergence
Last edited: 2026-01-28# Statement
LemmaGiven a Markov decision process $M$, let $V_t(s)$ be the value estimate for a state $s$ at the $t$-th iteration. If we update this using the following update rule:
- Policy (MDP)
Last edited: 2025-05-14Policy (MDP)In a Markov decision process , a policy is how an actor will behave in a given situation, given by $\pi: S \rightarrow A$ where $\pi(s) \in A_s$. This concept can extend to become a probabilistic policy. Let $\mathcal{A}$ be the set of probability distributions over $A$. Then a probabilistic policy is given by $\pi: S \rightarrow \mathcal{A}$ where if $\pi(s)(a)$ is non-zero then $a \in A_s$.
- Quality function (RL)
Last edited: 2025-05-22Quality function (RL)Similar to the value function , a quality function accounts for both state and action. The function $q: S \times A \rightarrow \mathbb{R}$ represents the quality of taking action $a \in A$ when you are in state $s \in S$.
- Return (RL)
Last edited: 2025-05-22Return (RL)In a Markov decision process the return $G_t$ at time step $t$ is defined to be the discounted sum of rewards:
- Value function (RL)
Last edited: 2025-05-14Value function (RL)A value function is a mapping of states to a long term worth for an actor in a Markov decision process . This is defined by $V: S \rightarrow \mathbb{R}$.
- Bellman equation