Rationale and Idea Origin

Developing an AI to play Super Mario Bros. offers a challenging problem that is possible (though difficult) to accomplish through mere observation of the video output. Unlike similar AIs for Mario-like games (see the Mario AI Championship and learnfun/playfun by Tom Murphy), we do not have direct access to the game’s memory due to our choice of using an original, unmodified NES console. This means we lack basic information like Mario’s location, the map, the location of enemies, etc. All information used in the AI is derived visually, the same way a human would play the game.

We approached the idea believing that Super Mario Bros. is a fairly deterministic environment with side-scrolling that makes it straightforward to do edge and obstacle detection. Interfacing with the NES is also fairly straightforward, as the controller is nothing more than some buttons and an a parallel-to-serial shift register integrated circuit. Importantly, we believed that the game would provide a challenging problem with a fairly large array of possible solutions.

Logical Structure

At a high level, our AI works by reading the each video frame into a buffer and analyzing it against a set of precomputed kernels and colors to look for areas of interest. The program then uses a kernel and color matching algorithm with an error threshold to attempt to identify the location of enemies, walls, and pipes, while keeping false positives to a minimum. With this information, the program makes decisions about when to make Mario jump and run so that he avoids enemies and obstacles. We chose to define "success" as Mario making it through the first level, alive, as quickly as possible. Mario makes no attempts to obtain coins or get powerups, though he frequently does so due to dumb luck.

Algorithm Design

Mario’s strategy is fairly simple: Walk to the right, and jump to avoid danger and/or obstacles. If an enemy is detected immediately to the right of Mario, he does a small jump (timed well for jumping on/over enemies, and bouncing on their heads for groups of enemies). If multiple enemies are detected, run faster before jumping (to try to clear a large group, and to decrease the chance of getting caught in between two enemies without time to jump away). If a pipe or a pit of doom is detected, Mario does a large jump to try to clear the pipe or the pit.

Mario also detects the blocks comprising the squares at the end of the level and jumps accordingly. Sadly, he makes no attempt to earn 5000 points when jumping on the flagpole. He also is unaware of the warp pipe that allows you to skip most of the level.

To simplify some of our computations, we assume Mario is in the center of the screen, which is true since Mario never turns around or stops in our algorithm. This approach is sufficient to solve the relatively basic environment of world 1-1, but would obviously fail in more challenging environments like world 1-2. (We also didn’t develop the kernels necessary to detect the underground enemies and bricks, so our AI frequently beats world 1-1 only to repeatedly die by slamming into the first Goomba in world 1-2.)

Configuring the timing of Mario’s jumps and the thresholds for enemy/obstacle detection was quite a challenge, in part because measuring success is difficult and time-consuming. Sometimes Mario would consistently make it through most of the level before dying on an arrangement of four Goombas towards the end of the level. A slight tweak to the Goomba detection parameter might cause him to get past that part, at the consequence of consistently dying on the first Goomba. In other words, we had difficulty choosing parameters without overfitting to a particular obstacle or situation.

NTSC and VGA Standards

The NES produces a 240p NTSC signal which we convert to VGA for display on an attached monitor. However, we encountered a problem where the video produced by the NES was converted in a way that caused a constant vertical scrolling, which was not present when testing the same hardware with other NTSC video sources. After much experimentation and debugging, we believe that the problem is caused by the fact that the ADV7181B video decoder on-board the DE2 expects 262.5 scanlines in normal NTSC-M mode, while the NES’ PPU, or Picture Processing Unit, produces only 262. The ADV7181B handles extraction of sync pulses using a predictive algorithm, which claims to be correct for improper/noisy sync generation but is unable to properly handle the PPU’s method of video generation. This is a problem that could likely be corrected using a timebase corrector or other high-quality video processing equipment, but those options were outside the amount of money we were willing to spend. We did experiment with cheap external NTSC-to-VGA converters, but the resulting color distortion sacrificed information we needed to reliably detect obstacles and enemies. We also spent a lot of time trying different configurations of the ADV7181B (which has many different settings that can be accessed using I2C) and modifying the DE2_TV module provided to use by Terasic. While this problem may be solvable (or at least correctable), we would encourage future 5760 groups to consider this carefully before embarking on another retro-video-game-console-related project.

Despite this problem, we chose to simply have our system deal with a constantly scrolling video source. We are fortunate that Super Mario Bros. is an entirely horizontally-scrolling game, so we primarily use information about the location of objects in the X direction, and can extract some information in the Y direction by comparing the location of objects to the location of the black horizontal bar that marks the end of the frame. Still, the rollover is a problem because for a brief period of time, some pixels are not visible on the screen. Since our system only uses the video on the screen, and because the exact timing of the rollover depends on when exactly the game is started, this creates a headache and some slightly random behavior, but is not an insurmountable problem.

Another issue with reading video from the NES is that the PPU utilizes a “shortcut” method of modulation, causing vertical lines to appear slightly jagged and flicker. The conversion from 256x240 to 640x480 also results in different scale factors in the X and Y directions, which makes developing kernels for pattern recognition difficult as well.

NIOS II vs Hardware

We chose to implement this project using custom hardware rather than instantiating a NIOS II processor on the FPGA and then programming in C. While coding in C is simpler, the VGA controllers typically used with the NIOS are vastly different from the custom controller we used to display the NTSC signal from the NES. Using custom hardware also allows us to access the VGA buffer directly, allowing us to read and write to the screen at the same time. This allows for greater computation parallelization so that we can simultaneously detect multiple types of obstacles at once. It would not have been possible to read the video input and generate an updated output fast enough if we were using a NIOS II CPU.