Skip to main content

One doc tagged with "reinforce"

View all tags

Policy Gradients

Optimizing the policy directly: understanding the REINFORCE algorithm, stochastic policies, and the Policy Gradient Theorem.