AI research and video games are a match made in heaven. Researchers get a ready-made virtual environment with predefined goals they can control completely, and the AI agent gets to romp around without doing any damage. Sometimes, though, they do break things.

Case in point is a paper published this week by a trio of machine learning researchers from the University of Freiburg in Germany. They were exploring a particular method of teaching AI agents to navigate video games (in this case, desktop ports of old Atari titles from the 1980s) when they discovered something odd. The software they were testing discovered a bug in the port of the retro video game Q*bert that allowed it to rack up near infinite points.

As the trio describe in the paper, published on pre-print server arXiv, the agent was learning how to play Q*bert when it discovered an “interesting solution.” Normally, in Q*bert, players jump from cube to cube, with this action changing the platforms’ colors. Change all the colors (and dispatch some enemies), and you’re rewarded with points and sent to the next level. The AI found a better way, though:

First, it completes the first level and then starts to jump from platform to platform in what seems to be a random manner. For a reason unknown to us, the game does not advance to the second round but the platforms start to blink and the agent quickly gains a huge amount of points (close to 1 million for our episode time limit).

This quirk in the paper was shared on Twitter by AI researcher Miles Brundage. Wired reporter Tom Simonite joined in the conversation and tagged in Q*bert designer Warren Davis to see if he’d ever stumbled across this bug before. Davis said he’d not worked on that particular version of the game but commented: “This certainly doesn’t look right, but I don’t think you’d see the same behavior in the arcade version.”

You can see what the bug looks like below, when the cubes start flashing:

Whatever the case, this doesn’t seem to be an exploit that any human has discovered before. If the AI agent could think, it would probably be wondering why it’s supposed to bother jumping on all these boxes when it’s found a much more efficient way to score points.

It’s important to note, though, that the agent is not approaching this problem in the same way that a human would. It’s not actively looking for exploits in the game with some Matrix-like computer-vision. The paper is actually a test of a broad category of AI research known as “evolutionary algorithms.” This is pretty much what it sounds like, and involves pitting algorithms against one another to see which can complete a given task best, then adding small tweaks (or mutations) to the survivors to see if they then fare better. This way, the algorithms slowly get better and better.

It’s not the most powerful or widely used form of AI at the moment, but it is making something of a comeback. The ability to crack Q*bert could be read as a good omen that evolutionary algorithms are going to be very useful in the future.