One particularly tricky problem encountered during the development of Star Versus was detecting screen wrap. The solution involved discovering a neat trick that exploits the NES’s 6502 processor.

Movement engine

The core engine of Star Versus moves all objects every frame. Most objects, when they reach the edge of the screen, should wrap around to the other side. However, certain slow-moving or long-lived objects (such as asteroids, bonus items, or special attacks) are destroyed instead. In addition, the nebula level disables wrapping entirely, which prevents players from crossing the edge and destroys anything else. Essentially, the engine needs a way to detect when any object crosses a screen edge, and it needs to be very efficient since it’s potentially always needed.

The coordinate system imposed by the NES hardware puts 0,0 at the top-left corner of the screen. Each object has an X and Y position, which equals the top-left pixel of its rendered sprite. These X and Y are each a single byte, with values between 0 and 255. The screen is 256 pixels across, so any X value is valid, and wrapping happens as a natural consequence of 8-bit math. On the other hand, the screen is only 240 pixels high with a gameplay HUD at the top, so objects are only allowed to have a certain range of Y values. The engine must always check for invalid Y values and adjust them, which trivially lets it determine when wrapping happens in the Y direction. This only leaves the problem of detecting screen wrap for X.

Every step of the engine, for each object, X and Y deltas are calculated based upon the direction of movement, and those deltas are added to their corresponding positions. For example, a ship at X position $41 moving right at speed 2 would have a X delta of $02, and a ship moving left at speed would 2 would have a X delta of $fe. Importantly, the code is the same regardless of the direction of movement. Here is the straight forward implementation of X movement without wrap detection (note that the NES has two registers named “X” and “Y”, these are unrelated to the coordinate system having X and Y dimensions):

; The X register holds the object to move ; The Y register holds the direction lda x_delta_table,y ; Load the delta into the accumulator. clc ; Clear the carry flag, as needed by adc. adc object_x_position,x ; Add the x position to the accumlator. sta object_x_position,x ; Store the sum to the x position.

How X Wrap Can Happen

The two cases that need to be checked are when an object crosses the right side or the left side, by either having a large position and adding a positive delta or having a small position and adding a negative delta, respectively:

position delta result status $ff $02 $01 wrapped (on right side) $01 $fe $ff wrapped (on left side)

It appears at first glance as though the carry flag will work here. After all, it correctly matches this behavior, of two values being added and summing to greater than 255. However, the carry flag produces false positives because nearly any negative delta will cause a carry to happen. It will work properly for moving right, but not for moving left. One way to fix this would be to branch depending on the sign of movement using bmi, and handle left movement separately, but it would be preferable to avoid branching or duplicating code if possible.

Signed vs Unsigned

The reason for this mismatch is due to mixing unsigned and signed numbers. Position values are unsigned (positive up to 255), while the movement delta is signed (negative down to -128 or positive up to 127). Only unsigned arithmetic correctly sets the carry flag; for signed arithmetic the overflow flag needs to be used. However, doing so requires positions to be coerced into signed values, losslessly, so that both are signed.

Hypothetically, if instead of the screen using a coordinate system from 0 to 255 it went from -128 to 127, the position would be a signed value and the overflow flag would work. Though the NES hardware doesn’t support this, we can simulate it with only a single instruction. With eor, toggling the high bit of the position before adding the delta, overflow is correctly set when screen wrap happens.

Conceptually, this bit flip can be thought of as swapping the left and right halves of the screen, making left-most positions from near-zero into extreme negatives (close to -128), and right-most positions into extreme positives (almost 127).

The new implementation looks like this:

lda object_x_position,x ; Load the x position into the accumulator. eor #$80 ; Flip the high bit. clc ; Clear the carry flag, as needed by adc. adc x_delta_table,y ; Add the delta to the accumulator, ; setting overflow if appropriate. eor #$80 ; Flip the high bit again. sta object_x_position,x ; Store the sum to the x position. bvc NoWrapHappened ; Branch if there was no overflow. ; If code gets here, a wrap occurred

This code is very efficient, and handles all of the engine’s requirements. It’s especially nice that the overflow flag is only affected by the adc instruction, so that no special care needs to be taken to handle its state until after position calculation is complete. Discovering this particular chunk of code was very satisfying during development.