Policy Gradients
Optimizing the policy directly: understanding the REINFORCE algorithm, stochastic policies, and the Policy Gradient Theorem.
Optimizing the policy directly: understanding the REINFORCE algorithm, stochastic policies, and the Policy Gradient Theorem.