Part of a series on |

Machine learning and data mining |
---|

This article needs additional citations for verification. (October 2022) |

**Proximal Policy Optimization (PPO)** is a family of model-free reinforcement learning algorithms developed at OpenAI in 2017. PPO algorithms are policy gradient methods, which means that they search the space of policies rather than assigning values to state-action pairs.

PPO algorithms have some of the benefits of trust region policy optimization (TRPO) algorithms, but they are simpler to implement, more general, and have better sample complexity.^{[1]} It is done by using a different objective function.^{[2]}

### See also

### References

**^**Schulman, John; Wolski, Filip; Dhariwal, Prafulla; Radford, Alec; Klimov, Oleg (2017). "Proximal Policy Optimization Algorithms". arXiv:1707.06347.**^**"Proximal Policy Optimization".*OpenAI*. 2017.