Reinforcement-Learning

Pages in this section

  • Action-advantage function (RL)
    Last edited: 2025-05-22

    Action-advantage function (RL)

    Using the value function and Quality function (RL) of a policy $\pi$ we can work out how advantageous taking a particular action is for us given we are in a state.

  • Bellman equation
    Last edited: 2026-02-05

    Bellman equation

    The Bellman equation is used to determine the optimum value function for a given Markov decision process . It defines this value function recursively as follows:

  • Learning rate convergence
    Last edited: 2026-01-28

    # Statement

    Lemma

    Given a Markov decision process $M$, let $V_t(s)$ be the value estimate for a state $s$ at the $t$-th iteration. If we update this using the following update rule:

  • Policy (MDP)
    Last edited: 2025-05-14

    Policy (MDP)

    In a Markov decision process , a policy is how an actor will behave in a given situation, given by $\pi: S \rightarrow A$ where $\pi(s) \in A_s$. This concept can extend to become a probabilistic policy. Let $\mathcal{A}$ be the set of probability distributions over $A$. Then a probabilistic policy is given by $\pi: S \rightarrow \mathcal{A}$ where if $\pi(s)(a)$ is non-zero then $a \in A_s$.

  • Quality function (RL)
    Last edited: 2025-05-22

    Quality function (RL)

    Similar to the value function , a quality function accounts for both state and action. The function $q: S \times A \rightarrow \mathbb{R}$ represents the quality of taking action $a \in A$ when you are in state $s \in S$.

  • Return (RL)
    Last edited: 2025-05-22

    Return (RL)

    In a Markov decision process the return $G_t$ at time step $t$ is defined to be the discounted sum of rewards:

  • Value function (RL)
    Last edited: 2025-05-14

    Value function (RL)

    A value function is a mapping of states to a long term worth for an actor in a Markov decision process . This is defined by $V: S \rightarrow \mathbb{R}$.