Action-advantage function (RL)
reinforcement-learning
Action-advantage function (RL)
Using the value function and Quality function (RL) of a policy $\pi$ we can work out how advantageous taking a particular action is for us given we are in a state.
$$ >a_{\pi}(s,a) = q_{\pi}(s,a) - v_{\pi}(s) >$$