Applying Artificial Intelligence to Magic Jewelry

[ About | Videos | Download | Run | Details ]

About

This article describes a Java plug-in for the Nintaco NES / Famicom emulator that automatically plays Magic Jewelry, the unlicensed Famicom port of Sega's Columns. The plug-in is capable of playing perpetually at 2 different speeds.

Videos

Watch the score, the level and the jewelry counters all eventually rollover in the following fast motion video:

Check out the AI playing through the first 10 levels at normal speed here:

Download

MagicJewelryAI_2017-07-09.zip

The .zip contains:

src - The source tree.

- The source tree. MagicJewelryAI.jar - The compiled binary.

- The compiled binary. lgpl-2.1.txt - The free software license.

Run

Prerequisites

Nintaco - the NES / Famicom emulator.

Magic Jewelry (Unl) [!].nes - the Magic Jewelry ROM.

Launch the Plug-in

Start Nintaco and open Magic Jewelry (Unl) [!].nes . Extract MagicJewelryAI.jar from the downloaded .zip . Open the Run Program window by selecting Tools | Run Program... Enter the path to the file in the JAR name field or browse to it using the Find JAR... button. Press Load JAR to load it. Press Run. Finally, press Start on the Magic Jewelry title screen to begin the game and to let the AI take over.

Set the Speed

To see it play faster, first, press the Stop button. In the Arguments field, enter one word: fast. Then, press Run. Finally, press Start on the title screen to begin a new game and to return control to the AI.

To watch it play even faster, select Machine | Speed | Max.

Details

RAM Map

The plug-in uses the Nintaco API to manipulate CPU Memory and to receive frame rendered events. All memory addresses were discovered through exploration with the Nintaco Hex Editor and the information has been added to the Data Crystal ROMhacking.net wiki. In the source, they appear as constants within the Addresses interface.

AI Algorithm

When a column spawns, the plug-in examines every possible lock combination of the current column and the next column. A valid lock position is one in which column bottom is supported by either the playfield floor or a jewel already present in the playfield. All of the possible lock positions of a single column are computed using breath-first search (see Searcher).

Locking a column into the playfield has consequences: 3 or more identical jewels lined up horizontally, vertically or diagonally vanish, unsupported jewels drop, and that process potentially repeats chain-reaction style. For each valid lock position of the current column and its associated playfield consequences, the plug-in tries each valid lock position of the next column, evaluating the combined consequences. This chain of searches is represented by SearchChain.

Each of the combined consequences is fed to an evaluation function, which scores the contents of the playfield. The combo with the highest score wins out and the current column is moved into position accordingly. The results of the search chain only affect the current column. When the next column spawns, it is paired up with the next-next column and so on.

Evaluation Function

The evaluation function scores a playfield using a series of metrics:

The number of equally colored jewel pairs immediately neighboring each other horizontally, vertically or diagonally. The number of equally colored jewel pairs separated by 1 cell in between and neighboring each other horizontally, vertically or diagonally. The number of equally colored jewel pairs separated by 2 cells in between and neighboring each other horizontally, vertically or diagonally. The number of equally colored jewel pairs separated by 3 cells in between and neighboring each other horizontally, vertically or diagonally. The number of equally colored jewel pairs separated by 4 cells in between and neighboring each other horizontally, vertically or diagonally. The number of jewels in the playfield, the fewer the better. The edge penalty. Jewels along the playfield sides are penalized since they can only potentially line up with 5 other jewels as opposed to up to 8 when placed centrally. Jewels in the corners are penalized even more since they can only line up with at most 3 others. The spawn distance penalty. Columns are penalized based on their proximity to the spawn coordinates. The idea is to keep the jewel stacks as short and even as possible. And, of course, a column locked at the spawn coordinates ends the game; so, the further away from that point, the better.

These metrics are combined into a single number via a weighted sum (see EvaluationFunction).

Machine Learning

To optimize the weights used in the evaluation function, a Trainer was created that executes the following algorithm:

W := a weight vector, the elements initialized randomly repeat indefinitely: S := a randomly generated sequence of columns R := W rated by running the AI across S repeat a few times: Z := a set of weight vectors, each a randomly generated variant of W for each weight vector, V, in Z: T := V rated by running the AI across S if T is better than R: R := T W := V

It's a variation of Hill Climbing. Here are more details:

W starts off as a randomly initialized vector of weights and it is incrementally improved through random steps. At some point, the outer-most loop is terminated and the current W at that time is the result.

starts off as a randomly initialized vector of weights and it is incrementally improved through random steps. At some point, the outer-most loop is terminated and the current at that time is the result. The length of S is randomly set between 20,000 and 40,000 columns (arbitrarily selected values).

is randomly set between 20,000 and 40,000 columns (arbitrarily selected values). Weight vectors are rated by totaling up the number of jewels remaining in the playfield after each column in S is placed and the consequences of the placement take effect. In other words, it's a rating of the clearing efficiency of the AI over short runs of play for a given set of weights. A lower total is a higher rating.

is placed and the consequences of the placement take effect. In other words, it's a rating of the clearing efficiency of the AI over short runs of play for a given set of weights. A lower total is a higher rating. The length of the "repeat a few times" loop is randomly set between 5 and 11 iterations (arbitrarily selected values).

Each V is derived from W by randomly adjusting each weight by up to ±5% (arbitrarily selected).

is derived from by randomly adjusting each weight by up to ±5% (arbitrarily selected). The inner-most for-loop that iterates over Z is actually performed in parallel. Each V is rated on an independent thread and the ratings are compared after all threads finish.

Here are the results, rounded to 1 decimal place:

Metric Weight (%) # Adjacent 0 43.0 # Adjacent 1 12.7 # Adjacent 2 1.5 # Adjacent 3 2.8 # Adjacent 4 0.5 # Jewels 33.6 Edge Penalty 4.2 Spawn Distance 1.6

Each metric is normalized to a number in between 0 and 1 in attempt to make them comparable. And, if they are, then the most significant metric by far is the number of equally colored, immediately neighboring jewel pairs. The second most significant metric is the number of jewels in the playfield, the fewer the better. In third place is matching jewels separated by either an empty space or another jewel. Interestingly, matching jewels separated by 3 cells in between beat out the 2 cells version.

AI Strength

The AI was trained by optimizing short-term clearing efficiency. But, how well does that translate to long-term survival? How far can the AI really go?

In attempt to find out, the AI was setup to run at high speed outside of Nintaco and the number of jewels in the playfield was measured after each column placement (see HistogramMaker). This enabled the construction of the probability curve below where the x-axis is the playfield jewels count and the y-axis is the probability of finding the AI in that state at any given moment.

It resembles a bell curve, which makes sense since it needs a certain amount of material in the playfield to form lines and once it reaches critical mass, it's able to form those lines at an increasing rate. The average number of jewels in the playfield is 11.454. And, there is a statistical expectation that the playfield will be fully cleared out every 51,577 columns; meaning, the AI effectively resets the entire game fairly often.

During the run, it never accumulated more than 47 jewels in the playfield. But, to put an upper bound on how far the AI can go, the statistical expectation of filling up the playfield completely needs to be determined. In other words, what is 1/y as x approaches 78? Fitting the curve might yield an answer.

The Excel Solver worked out constants for a Gaussian fit:

Excel put the center of the Gaussian at 10.973. That low number and the crossover points indicate that the measured curve is not symmetrical; it is skewed slightly toward the y-axis. It also means that the Gaussian falls off at a faster rate than the measured curve and consequentially, it cannot be used as a predictor of what happens as x grows toward a full playfield.

Instead, the chart below depicts log 10 of the probability curve.

To the right of the maximum, the curve becomes nearly linear.

Plugging in x = 78 into the quadratic fit formula above and inverting the log yields a probability of y = 10-16.4244 or an expectation of 26,570,516,703,550,708 columns until game over. But, that assumes that the AI will really fight to the very last cell before topping out. Assume instead that it cannot survive beyond 54 jewels; meaning, once it fills up to the point where there are only 4 empty rows, it's no longer able to cope and immediately succumbs. Plugging in x = 54 yields y = 10-9.78905 or an expectation of 6,152,477,016 columns. That's a pretty wide spread. The AI apparently will survive somewhere between billions and quadrillions of columns. Narrowing the range would require a better understanding of how the AI actually behaves toward a top out.

Also, in the actual game, the player is occasionally rewarded with wild columns, which look like 3 X's instead of jewels. If a wild column lands on the playfield floor, it vanishes. But, if it lands on a jewel, then all jewels of that color vanish from the entire playfield, in addition to the X's.

The AI is able to handle wild columns, evaluating all the combinations as it would with a normal column. However, the program that gathers data for the probability curve (HistogramMaker) never simulates wild columns. In other words, the AI will actually play more powerfully under real conditions than the curve suggests.

Copyright © 2017 meatfighter.com

This project is free software; you can redistribute it and/or modify it under the terms of LGPLv2.1.