Learning to Play Using Low-Complexity Rule-Based Policies: Illustrations through Ms. Pac-Man, a new paper published in the Journal of Artificial Intelligence Research 30 details a very successful experiment in teaching an AI to play Ms Pac Man:

The researchers had agents play 50 games using different RL methods. They found that methods utilizing the cross-entropy policies performed better than methods that were hand-crafted. As they explained, the basic idea of cross-entropy is that it selects the most successful actions, and modifies the distribution of actions to become more peaked around these selected actions.

During the game, the AI agent must make decisions on which way to go, which are governed by ruled-based policies. When the agent has to make a decision, she checks her rule list, starting with the rules with highest priority. In Ms. Pac-Man, ghost avoidance has the highest priority because ghosts will eat her. The next rule say that if there is an edible ghost on the board, then the agent should chase it, because eating ghosts results in the highest points.

One rule that the researchers found to be surprisingly effective was the rule that the agent should not turn back, if all directions are equally good. This rule prevents Ms. Pac-Man from traveling over paths where the dots have already been eaten, resulting in no points.

