Action-advantage function (RL)

reinforcement-learning
Action-advantage function (RL)

Using the value function and Quality function (RL) of a policy $\pi$ we can work out how advantageous taking a particular action is for us given we are in a state.

$$ >a_{\pi}(s,a) = q_{\pi}(s,a) - v_{\pi}(s) >$$