Finite Markov Decision Process

This summaries the environment that an actor in a discrete Markovian universe experiences. It is given by:

States: A finite set of states $S$ that the actor can be in.
Actions: For each state $s \in S$ a finite set of actions $A_s$, sometimes it is convenient to refer to this as $A := \cup_{s \in S} A_s$.
Rewards: The value the actor gets from doing each action within a state, these are real values $R \subset \mathbb{R}$.

We assume the actor works on discrete time steps $t \in \mathbb{N}$, at time $t$ it is in state $s_t \in S$, takes action $a_t \in A$ and gets rewards $r_t \in R$. The actor deterministically chooses $a_t$ when in state $s_t$ but we have a probability distribution that determines the reward and next state

$$ > p(s_{t+1}, r_{t+1} \vert s_t, a_t): S \times R \times S \times A \rightarrow [0,1] > $$

read this as, the probability of ending up in state $s_{t+1}$ with reward $r_{t+1}$ given they are in state $s_t$ and take action $a_t$. This is what determines how the world progresses. Notice it is Markovian as it does not depend on $t$.

It is sometimes useful to think of the state you are going to be in at time step $t$ as a random variable we refer to as $S_t$, similarly for rewards as $R_t$.