Skip to main content

One doc tagged with "reinforce"

Policy Gradients

Optimizing the policy directly: understanding the REINFORCE algorithm, stochastic policies, and the Policy Gradient Theorem.