^-^
人工智能、计算机、机器学习、linux、程序员
Two-armed Bandit
最近更新:2020-06-14   |   字数总计:719   |   阅读估时:4分钟   |   阅读量:
  1. Concepts
    1. Learning a Policy
    2. Policy Gradients
    3. Value functions
    4. e-greedy policy
    5. policy loss equation
  2. The Multi-armed bandit