Testing our architectures

We tested our proposed architectures on multiple tasks, including the puzzle game Sokoban and a spaceship navigation game. Both games require forward planning and reasoning, making them the perfect environment to test our agents' abilities.

In Sokoban the agent has to push boxes onto targets. Because boxes can only be pushed, many moves are irreversible (for instance a box in a corner cannot be pulled out of it).

In the spaceship task, the agent must stabilise a craft by activating its thrusters a fixed number of times. It must contend with the gravitational pull of several planets, making it a highly nonlinear complex continuous control task.

To limit trial-and-error for both tasks, each level is procedurally generated and the agent can only try it once; this encourages the agent to try different strategies 'in its head' before testing them in the real environment.