Applying Artificial Intelligence to

[ About | Videos | Download | Run | The Game | Details ]

About

This article describes a Java plug-in for the Nintaco NES / Famicom emulator that automatically plays the puzzle game Hatris.

Videos

Check out the AI playing through the final 10 levels:

Watch the AI blast through all levels in fast motion:

Here are all 6 Hatris cutscenes that were collected while testing the AI:

Download

HatrisAI_2017-08-06.zip

The .zip contains:

src - The source tree.

- The source tree. HatrisAI.jar - The compiled binary.

- The compiled binary. lgpl-2.1.txt - The free software license.

Run

Prerequisites

Nintaco - the NES / Famicom emulator.

Hatris (U) [!].nes - the Hatris ROM.

Launch the Plug-in

Start Nintaco and open Hatris (U) [!].nes . Extract HatrisAI.jar from the downloaded .zip . Open the Run Program window by selecting Tools | Run Program... Enter the path to the file in the JAR name field or browse to it using the Find JAR... button. Press Load JAR to load it. Press Run. From the Hatris main menu, select a starting Shop and Stage. Enabling or disabling music will launch gameplay and the AI will take over from there.

Set the Speed

To watch it play faster, select Machine | Speed | Max.

The Game

Rules

Hatris is a puzzle game somewhat similar to Tetris. Randomly selected hat pairs drop into the playfield. The hats within the pair can be swapped and the pair can be shifted left and right, positioning it above any of the 6 playfield columns. As hats land, they pile up and if any pile reaches the top, it's Game Over. To thwart that, a contiguous stack of 5 of the same hats will vanish. The goal in each level is to eliminate a specified number of stacks of 5.

If the pair lands on the playfield floor (the heads) or onto piles of equal height, then the hats will land together. However, if only one of the hats encounters a pile, then the second hat will remain free to be positioned independently.

As hats descend, horizontal movement is initially restricted only by the playfield walls. But, further down, tall piles can further confine them, particularly limiting the placement of the second hat. Pile height ordering strategies can be employed to maximize placement options. For instance, the player can attempt to keep the piles sorted by height or they can be maintained in a U-shaped pattern.

There are 6 types of hats and each of them have different heights and nesting characteristics. Typically, hats of the same type nest compactly whereas mismatches vary to the detriment of the player. This is why matching only takes place vertically; the playfield is not a grid; hats will not lineup horizontally or diagonally. However, as mentioned above, piles interact with each other indirectly as a consequence of the height ordering.

Helpers

Hatris was developed by Alexey Pajitnov, the inventor of Tetris, and his close friend Vladimir Pokhilko.

Avatars of Alexey and Vladimir appear in the game in the form of helpers and in the cutscenes. Alexey is stationed above the upper-left of the playfield. When summoned, he slides down a pole and then the player can use him to remove up to 5 hats from the bottoms of any of the piles. Vladimir appears to the lower-right of the playfield and he has the ability to swap any 2 piles, potentially contributing to the sorting order.

Helpers are a limited resource. Acquiring one involves eliminating 5 stacks of 5 identical hats. Specifically, removing 5 stacks of crowns, derbies or wizard hats earns an Alexey. And, clearing 5 stacks of baseball caps, cowboy hats or top hats earns a Vladimir. Every time a stack vanishes, one of the progress bars to the right of the playfield advances. When the bar fills up, it resets and a helper is added to the Helper Pool, depicted directly above the progress bars. The last helper earned is the first helper that may be used (FIFO). The pool has a maximum capacity of 8 helpers.

When a helper is summoned, the currently falling hats vanish. And, after the helper is finished, the next pair drops. This effect can actually be used as a strategy. If the current piles cannot accommodate a particular randomly selected pair, a helper could potentially be employed to eliminate it immediately.

Conquering Hatris

The main menu enables the player to start on any of the 60 levels. The Shop is the most significant digit of the level, ranging from 0 to 5, and the Stage is the least significant digit of the level, ranging from 0 to 9. Oddly, during play, those digits are displayed together, denoted simply as the Shop.

Baseball caps, cowboy hats and top hats are introduced in level 0. Derbies are first seen in level 1. Wizard hats follow in level 3. And, crowns appear in levels 6 and above.

When playing from the beginning, the number of stacks of 5 identical hats to eliminate remains at a constant 15 in every stage. To get through all 60 levels, the player needs to assemble 60 × 15 = 900 stacks in total. And, since hats spawn in pairs, that means surviving through a minimum of 5 × 900 / 2 = 2250 spawns. Well, almost.

Normally, at the end of each level, the playfield, the helper progress bars, the Helpers Pool and the random hat sequence remains undisturbed; it's one long continuum. However, every 10 levels (at the completion of each Shop), there is a cutscene (see Videos). After the cutscene, the playfield is cleared and then it is populated with 6 × Shop randomly selected hats, distributed equally among the columns. The rest remains unchanged. Depending on the prior contents of the playfield, that effect may be beneficial or detrimental to the player.

Refer to the Hatris Instruction Manual for more details.

Details

RAM and ROM Maps

The plug-in uses the Nintaco API to manipulate CPU Memory and to receive frame rendered events. All memory addresses were discovered through exploration with the Nintaco Debugging Tools and the information has been added to the Data Crystal ROMhacking.net wiki. In the source, they appear as constants within the Addresses interface.

AI Algorithm

When a hat pair spawns, the plug-in examines every possible placement of the current pair and the next pair. A placement is simply which piles each hat ends up in. As noted above, the piles do not necessarily need to be adjacent and landing the first hat restricts the options for the second. All of the possible placements are determined using nested loops (see Searcher).

Landing a hat onto a pile has consequences: the hat will nest with the one beneath it, increasing the pile height and if it completes an identical stack of 5, then the stack will vanish from the pile. For each valid placement of the current pair and its associated playfield consequences, the plug-in tries each valid placement of the next pair, evaluating the combined consequences. This chain of searches is represented by SearchChain.

Each of the combined consequences is fed to an evaluation function, which scores the contents of the playfield. The combo with the highest score wins out and the current pair is placed accordingly. The results of the search chain only affect the current pair. When the next pair spawns, it is evaluated with the next-next pair and so on.

In addition to examining every pair placement, when a helper is available, the plug-in also evaluates using it. For Alexey, this means it tries all possible ways to remove up to 5 hats from the bottom of piles. For Vladimir, it explores all of the potential pile swaps. Since using a helper annihilates the falling hats, the option is treated as one of the two search chain steps as opposed to introducing more. And, the same evaluation function applies.

Evaluation Function

The evaluation function scores a playfield using a series of metrics:

The normalized complement of the sum of the squares of the number of hats in each column. In other words, the fewer the hats on the playfield and the more evenly they are distributed among the columns, the better. The normalized complement of the sum of the squares of the pile heights. Said another way, the shorter the piles, the better. The normalized sum of the squares of the run lengths > 2. This metrics inspects each pile, looking for contiguous spans of identical hats. Spans of 2 or more hats contribute to the sum. The normalized sum of the squares of the pile top run lengths > 2. This metric is identical to prior one, except that only the contiguous spans at the very top of the piles contribute to the sum. The normalized complement of the total number of mismatches. A mismatch refers to 2 different adjacent hats within a pile. The normalized complement of the pile heights permutation distance. This metric aims to keep the piles in height descending order. With the help of a lookup table, the minimal number of pile swaps required to completely sort them by height is determined. The normalized sum of the adjacent pile height differences. This metric has the same goal as the prior one, but it does it in a different way. It simply compares the heights of neighboring piles. The normalized cash value. In Hatris, the player's score is denoted as cash and the more cash the better.

Metrics (1) and (2) attempt to minimize and even out the amount of material in the playfield. Metrics (3), (4) and (5) look for conformity or discord within the piles. Metrics (6) and (7) try to keep the piles in sorted order. And, metric (8) aims for a high score.

These metrics are combined into a single number via a weighted sum (see EvaluationFunction).

Machine Learning

The road to wisdom?—Well, it's plain

and simple to express:

Err

and err

and err again,

but less

and less

and less.

— Piet Hein

To optimize the weights used in the evaluation function, a Trainer was created that executes the following algorithm:

W := a weight vector, the elements initialized randomly repeat indefinitely: S := a randomly generated sequence of hat pairs R := W rated by running the AI across S repeat a few times: Z := a set of weight vectors, each a randomly generated variant of W for each weight vector, V, in Z: T := V rated by running the AI across S if T is better than R: R := T W := V

It's a variation of Hill Climbing. Here are more details:

W starts off as a randomly initialized vector of weights and it is incrementally improved through random steps. At some point, the outer-most loop is terminated and the current W at that time is the result.

starts off as a randomly initialized vector of weights and it is incrementally improved through random steps. At some point, the outer-most loop is terminated and the current at that time is the result. Weight vectors are rated by the number of pair spawns until Game Over.

The length of the "repeat a few times" loop is randomly set between 11 and 33 iterations (arbitrarily selected values).

Each V is derived from W by randomly adjusting each weight by up to ±5% (arbitrarily selected).

is derived from by randomly adjusting each weight by up to ±5% (arbitrarily selected). The inner-most for-loop that iterates over Z is actually performed in parallel. Each V is rated on an independent thread and the ratings are compared after all threads finish.

Here are the results, rounded to 5 decimal place:

Metric Weight (%) # Hats2 0.00100 Pile Heights2 0.00055 Run Lengths2 1.78585 Top Run Lengths2 0.06811 # Mismatches 88.31852 Permutation Distance 0.00048 Height Differences 0.06392 Cash 9.76156

Each metric is normalized to a number in between 0 and 1 in attempt to make them comparable. And, if they are, then the total number of mismatches (different adjacent hats) is by far the most significant. Total cash (the score) comes in a distant second. And, the run lengths (identical hat spans) comes in as an even more distant third. The height ordering metrics, although not that significant percentagewise, are still relevant to Vladimir and potentially to long-term play.

AI Strength

After training the AI, the number of spawns required to reach Game Over were recorded over a large batch of simulated runs (see MeasureStats). Below is the average and standard deviation:

μ = 327

σ = 221

That's abysmal. As mentioned above, 2250 is the theoretical minimum number of spawns required to defeat all 60 levels. However, when interfaced with the real game, the AI demonstrated a surprising degree of proficiency; it appeared to be able to get through every stage without issue.

The AI was clearly taking advantage of some aspect of the real game missing in the simulation. One possibility was how the playfield is reset after the cutscenes, a feature that the simulation ignores. But, that happens so sparely that the long-term effect should not be that significant. Another possibility is that the real game starts out with only 4 hat types. But, all successive levels after the fifth one use 6 hat types. That also seems insignificant in the long-term.

The most likely candidate was the pseudorandom number generator (PRNG) used to produce the hat pair sequence. It could be rigged to make the game easier to play, which would actually be consistent with some of the other Tetris variants of the era. For example, Super Tetris 2 + Bombliss, developed and published by Bullet Proof Software in 1992 (3 years after this game) for the Super Famicom, is able to detect when the player is getting close to topping out and it will respond by providing extra long-bars.

There are many simply ways that the PRNG could be rigged to aid the player. For instance, it could occasionally produce needed pairs by inspecting the tops of adjacent piles. And, it could be setup to do this with increasing frequency as a function of the maximum pile height. Another simple option is to generate repeated pairs that could easily be collected by the player. In fact, the latter does appear to happen every now and then. But, that may just be a statistical fluke.

The PRNG could be analyzed empirically. At the very least, that would reveal if it were rigged or not. But, the data probably could not explain the exact mechanisms behind it. The only way to get a definitive answer would be to peek at the code...

All of the variables that appear in the following 6502 code comments are single-byte global variables. The PRNG depends on 2 such variables, designated as h1 and h2, which are seeded on game launch:

seedRNG: 8CB2: LDY #$00 8CB4: LDA $8CDC,Y 8CB7: STA $03C0,Y 8CBA: INY 8CBB: CPY #$28 8CBD: BNE $8CB4 8CBF: LDY #$00 8CC1: LDA $8D04,Y 8CC4: STA $03F0,Y 8CC7: INY 8CC8: CPY #$10 8CCA: BNE $8CC1 8CCC: LDA #$04 8CCE: STA $03E5 8CD1: LDA #$3F 8CD3: STA $0304 8CD6: LDA #$85 8CD8: STA $0305 8CDB: RTS

Below is the main PRNG subroutine:

randomizeHats: 88D8: LDA $0300 88DB: EOR $0304 88DE: ROR A 88DF: STA $0304 88E2: LDA $0300 88E5: EOR $0305 88E8: ROL A 88E9: STA $0305 88EC: LDA $0304 88EF: STA $0080 88F1: SEC 88F2: SBC #$06 88F4: BCS $88EF 88F6: LDA $0080 88F8: CLC 88F9: ADC #$01 88FB: STA $0080 88FD: LDX $0080 88FF: CPX #$07 8901: BCS $8901 8903: LDA $0305 8906: STA $0081 8908: SEC 8909: SBC #$06 890B: BCS $8906 890D: LDA $0081 890F: CLC 8910: ADC #$01 8912: STA $0081 8914: RTS

frameCounter is incremented once per frame (since startup). h1 and h2 are XORed with frameCounter and then they are circularly shifted right and left, respectively. hat1 and hat2 are set to h1 and h2 mod 6 plus 1, respectively. Those are hat types, the output of the routine.

The main PRNG subroutine is only called by the following:

pickHats: 889E: JSR $88D8 88A1: LDA $03A8 88A4: BNE $889D 88A6: LDA $03A9 88A9: CMP #$06 88AB: BCS $889D 88AD: LDA #$03 88AF: CMP $0080 88B1: BEQ $889E 88B3: CMP $0081 88B5: BEQ $889E 88B7: LDA $03A9 88BA: CMP #$03 88BC: BCS $889D 88BE: LDA #$02 88C0: CMP $0080 88C2: BEQ $889E 88C4: CMP $0081 88C6: BEQ $889E 88C8: LDA $03A9 88CB: BNE $889D 88CD: LDA #$05 88CF: CMP $0080 88D1: BEQ $889E 88D3: CMP $0081 88D5: BEQ $889E 88D7: RTS

The loop just makes sure that the early stages use fewer hat types. No surprises there.

The initialization subroutine, which is invoked once immediately after starting a new game from the main menu, provides the first clue:

init: 8C60: LDY #$00 8C62: LDA #$00 8C64: STA $00A0 8C66: LDA #$60 8C68: STA $00A1 8C6A: LDA #$00 8C6C: STA ($A0),Y 8C6E: INY 8C6F: BNE $8C6A 8C71: INC $00A1 8C73: LDX $00A1 8C75: CPX #$7C 8C77: BNE $8C6A 8C79: JSR $A1CE 8C7C: JSR $8CB2 8C7F: JSR $8F14 8C82: JSR $A139 8C85: LDA #$FF 8C87: STA $032A 8C8A: JSR $B722 8C8D: LDA #$00 8C8F: STA $032A 8C92: JSR $889E 8C95: LDA $0080 8C97: CMP $0081 8C99: BNE $8C9E 8C9B: JSR $889E 8C9E: LDA $0080 8CA0: STA $03D1 8CA3: LDA $0081 8CA5: STA $03D9 8CA8: LDA #$0F 8CAA: CLC 8CAB: ADC $03A9 8CAE: STA $03A3 8CB1: RTS

It seeds the PRNG and then it picks a random pair to begin the sequence. However, it check if the both hats in the pair match. If they do, it picks a second time. nextType1 and nextType2 are the result, the first pair in the sequence.

That same logic appears in the pair spawn subroutine, which is called every time the sequence needs to be advanced:

spawn: 86CA: LDA $03F0 86CD: BNE $86C9 86CF: LDA $03C0 86D2: BEQ $86C9 86D4: LDA #$00 86D6: STA $03C0 86D9: LDA #$FF 86DB: STA $03C8 86DE: LDA #$00 86E0: STA $03C2 86E3: LDA #$08 86E5: PHA 86E6: JSR $8B3B 86E9: JSR $94EA 86EC: JSR $8894 86EF: PLA 86F0: SEC 86F1: SBC #$01 86F3: BNE $86E5 86F5: LDA #$A9 86F7: STA $03D3 86FA: STA $03DB 86FD: LDA #$68 86FF: STA $03D4 8702: LDA #$80 8704: STA $03DC 8707: LDA #$02 8709: STA $03D2 870C: LDA #$03 870E: STA $03DA 8711: LDA $03D1 8714: STA $03D0 8717: LDA $03D9 871A: STA $03D8 871D: JSR $889E 8720: LDA $0080 8722: CMP $0081 8724: BNE $8729 8726: JSR 889E 8729: LDA $0080 872B: STA $03D1 872E: LDA $0081 8730: STA $03D9 8733: JSR $89F8 8736: JSR $8A2E 8739: LDA #$00 873B: STA $03E0 873E: LDA #$01 8740: STA $03E1 8743: LDA #$00 8745: STA $03D5 8748: STA $03DD 874B: RTS

hatType1 and hatType2 is the current pair, which receives its values from the next pair. Random hats are picked and, as with the initialization routine, if the 2 hats match, a second pick occurs. Finally, nextType1 and nextType2 are updated with the results.

Since the number of frames between spawns is determined by the player's actions, this means that the game actually taps into the inherently random nature of the agent it is interfaced with. The main PRNG subroutine provides equally probable results. But, the check in the initialization and spawn subroutines skews the odds.

Since there are 6 hat types, there are 6 × 6 = 36 ways to assign h1 and h2 . And, there are 6 ways that h1 equals h2. Consequentially, the odds of both hats matching on the first pick is 1/6. The same applies to the second pick. Hence, the odds of a match after the check is reduced to 1/36.

Reducing the odds of both hats matching makes the game easier to play. The reason is that the number of columns in the playfield exactly equals the number of hat types. Consequentially, for the playfield to accommodate any particular hat type, there must be a column assigned to that type. Since there is no way to position both hats in a pair onto the same column, matching hats forces the player to contaminate at least one column with the wrong type.

The AI was further trained with the PRNG logic added to the simulator and new stats were measured:

μ = 4945

σ = 4844

The new average is more than twice the theoretical minimum number of spawns to needed to beat all 60 levels. However, the standard deviation reveals a very wide spread. The AI will play well enough most of the time, but it's not always guaranteed to get through all the levels. That depends on the luck of the draw.

Copyright © 2017 meatfighter.com

This project is free software; you can redistribute it and/or modify it under the terms of LGPLv2.1.