This is a minimalistic implementation of Proximal Policy Optimization - PPO clipped version for Atari Breakout game on OpenAI Gym. This has less than 250 lines of code. It runs the game environments on multiple processes to sample efficiently. Advantages are calculated using Generalized Advantage Estimation.

The code for this tutorial is available at Github labml/rl_samples. And the web version of the tutorial is available on my blog.

If someone reading this has any questions or comments please find me on Twitter, @vpj.