Policy (MDP)
reinforcement-learning
Policy (MDP)
In a Markov decision processes a policy is how an actor will behave in a given situation, given by $\pi: S \rightarrow A$ where $\pi(s) \in A_s$. This concept can extend to become a probabilistic policy. Let $\mathcal{A}$ be the set of probability distributions over $A$. Then a probabilistic policy is given by $\pi: S \rightarrow \mathcal{A}$ where if $\pi(s)(a)$ is non-zero then $a \in A_s$.