Proximal Policy Optimization

model-free reinforcement learning algorithm

Encyclopedia from Wikipedia, the free encyclopedia

Proximal Policy Optimization (PPO) is a family of model-free reinforcement learning algorithms developed at OpenAI in 2017. PPO algorithms are policy gradient methods, which means that they search the space of policies rather than assigning values to state-action pairs.

PPO algorithms have some of the benefits of trust region policy optimization (TRPO) algorithms, but they are simpler to implement, more general, and have better sample complexity.[1] It is done by using a different objective function.[2]

See also


  1. ^ Schulman, John; Wolski, Filip; Dhariwal, Prafulla; Radford, Alec; Klimov, Oleg (2017). "Proximal Policy Optimization Algorithms". arXiv:1707.06347.
  2. ^ "Proximal Policy Optimization". OpenAI. 2017.

External links

Original content from Wikipedia, shared with licence Creative Commons By-Sa - Proximal Policy Optimization