主要内容来源于:论文以及教程(Thomas Simonini Deep Reinforcement Learning Course with Tensorflow, Arthur Juliani Simple Reinforcement Learning with Tensorflow series),OpenAI Spinning Up in Deep RL
Concepts
which attempt to learn functions which directly map an observation to an action.
observation -> action
attempts to learn the value of being in a given state, and taking a specific action there.
state, action -> value
which states that the expected long-term reward for a given action is equal to the immediate reward from the current action combined with the expected reward from the best future action taken at the following state.
$$ Q(s, a) = r + \gamma (\max (Q(s’, a’))) $$
利用Bellman Equation可以实现Q-Table算法:
1 | import gym |
但是这种方法不具有扩展性,毕竟表格的容量有限。
1 | import gym |