After having fun reverse engineering the Comprehend Adventure game (Recomprehend) I was looking for a new DOS game reverse engineering project. Over the years various people have reversed a lot of the old popular games and published specifications and tools for them. The website shikadi.net for example has a of load of information about games I played as a kid.

I discovered that the Blizzard (then Silicon and Synapse) game The Lost Vikings didn’t seem to have had any serious reverse engineering attempts. It was released in 1993 at the tail end of the DOS era, and was a game I really enjoyed playing when I was younger. The Lost Vikings is a puzzle/platformer where the player controls three vikings with different skills. The vikings need to combine their skills to solve their way through a number of themed worlds including a spaceship, a prehistoric world and an Egyptian world. The following image shows the first level of the game (source: Strategy Wiki):

The game seemed like it would be relatively simple to reverse. The levels are tile map based and contain some simple puzzles such as buttons that activate or disable objects, movable boxes, and a crane that can pick up objects. Indeed, much of the reverse engineering project was straight forward. There is a single data pack file, which contains compressed file chunks. The chunks encode various game assets such as sprites, maps, sounds, etc. I’ve written a collection of tools which can be used to view the game assets: The Lost Vikings Tools.

The Virtual Machine

One interesting aspect of how the game engine works is that the objects in the game use a template class system. There is a chunk in the data file for each world that defines a set of object classes. For example, in the first level image above both of the hazard doors are class type 0x4f. Reverse engineering the object class code lead to this function:

Ignoring for now the alternative path into location 0x142a2, this code is using si as an index into the object classes array structure (each class template is 0x15 bytes). The word at offset 0x3 in this structure is used as an address in the ES segment. The ES segment actually contains the entire object class template chunk data. The code then moves into the loop at address 0x142a6. This loop fetches the next byte from the ES segment and uses it as an index to a function table. At each step in the loop bx contains the address in the ES segment, and si contains the current opcode.

An interesting thing to note here is that the loop is using an unconditional jump and therefore will never terminate. At least, not normally. It appears that each object class has some form of opcode based program associated with it, which is interpreted by this function. My initial approach was to just start looking at some of the functions in the call table. The first one looks like this in IDA’s graph mode:

IDA’s stack analysis has failed here, and with good reason. Popping ax as the first instruction in a function is pretty weird. The x86 call instruction pushes the instruction pointer (ip) onto the stack, and the ret instruction pops it to return to the calling function. The pop instruction here is effectively discarding the return address, which means the next ret instruction will jump back two stack frames instead of one. This opcode will exit the infinite interpreter loop.

To see how the opcode is really implemented we need to switch to the linear view in IDA:

I’ve annotated the names of the functions with their corresponding opcode numbers to make this code section clearer. There is a little bit of clever code reuse going on here. It’s easiest to start by looking at opcode 0x03. Recall the the loop which is handling opcodes has the object classes chunk data loaded in the ES segment, and bx is being used as the VM instruction pointer. Opcode 0x03 is therefore an unconditional jump instruction. It loads the word at the current instruction pointer, sets the instruction pointer to it, and then returns to the opcode handler loop.

Working backwards, opcode 0x05 skips the next word operand and stores the next address into an array. Reversing other functions shows that word_28522 is the index of the current object instance, so this opcode can store a single address for each object in the level. It then restores the value of bx and falls through into the code for opcode 0x03 (jump). So this opcode appears to be a very limited call instruction with only a single level of stack.

Opcode 0x00 is storing the current instruction pointer in a global array. Note that this is different to opcode 0x05 which is storing the address after the call address operand. It then falls into the call opcode handler. The instruction however won’t actually be executed because of the initial pop instruction. Instead the calling loop will exit. The address saved by this instruction is used as the alternate entry path to the function calling loop that I ignore earlier. Opcode 0x00 is there acting as a sort of co-routine style exit. It stores the current position in the program, and exits the opcode handler loop, allowing the game engine to do other things such as update graphic, receive input, etc before returning where the virtual machine program left of. This allows a virtual machine program to perform complex tasks without stalling the rest of the game engine.

Disassembling Opcodes

At this point I had a general feel for how the virtual machine opcode handlers worked. Looking at the function call table there are around 215 implemented opcodes, so rather than just reversing them in order I started writing a simple script to decompile a program for a given object class so that I could focus on just the opcodes being called by objects in the first level.

At this point I am mostly trying to determine how many operands an opcode has, and what its basic functionality is. If an opcode handler is a short then I will generally try to fully understand what it does. If it is more than a few blocks of code then I will just try to figure out how many operands it has and whether or not it does a conditional jump or call. The reason for this is that sometimes an opcode’s function because apparent from surrounding context in the disassembled program, which is often quicker than reversing the code.

The following is a simple opcode:

This opcode take a word operand and stores it in a global. Reversing other functions shows that this global is a general purpose temporary storage or register. The virtual machine only has one such register, but it also has this opcode:

This opcode stores the current value of the general purpose register at the address in the data segment (DS) specified by the word operand. The programs in the game’s data file use this opcode to write to specific parts of the game data such as the inventory slots for the vikings. A few DS offsets are also used as additional temporary values when more than one is required. Offsets 0x0206 and 0x0208 are often used as temporary x and y coordinates of an object for example.

What makes this opcode and its counterpart opcode 0x53 which reads from DS into the general purpose register interesting is that it can read and write any game data in DS, not just what was intended by the original game creators. It makes extending the original game an interesting prospect.

A more complex opcode is 0x14:

At first glance this looks straight forward to reverse. It fetches a byte operand, and calls a pair of sub-functions. It then fetches another byte operand and calls the same sub-functions again. Looking at the first sub-function (sub_15473) though things get a bit more complicated:

IDA’s stack analysis has failed again, and it shows four exits from this block which I have cropped out because they are all incorrect. What is actually happening is that the first byte operand is passed into this function in ax, which is then used as an index into a jump table. Although ax is and’ed with 7, there are actually only 4 valid entries in the jump table. Looking at type 0 in the jump table we see:

Which loads an immediate word operand. This function snippet does a retn, which returns from the top-level opcode 0x14 handler. It’s not hard to see why IDA is struggling with disassembling this code correctly.

The other entries in the jump table load values in different ways, and then each also do a retn. Type 1 takes a byte operand which is used an object field index. Object has fields for things like the x and y offset, flags, etc. Type 2 takes a word operand and loads a value from DS (like opcode 0x53 above). Type 3 takes a byte operand and loads the field of a target object for actions such as collision detection.

Moving back to the top-level opcode 0x14 handler the next thing which is called is sub_15470. This is only three bytes before the first sub-function which was called. Again it is a clever code optimisation. This sub-function simply shifts ax right by 3 and then falls into the code we just reversed above. So the first byte operand of opcode 0x14 is used to encode how two further arguments are fetched. Reversing the handling of the second byte operand shows that it functions the same way.

After fetching all its (variable length) operands the handler for opcode 0x14 then calls sub_13809. I won’t paste the code here because the function is somewhat large, and calls a bunch of other sub-functions, all of which are also large. This is one of those instances where I just didn’t bother reversing what the function did because I was later able to guess it from context.

CISC Virtual Machine

The instructions in The Lost Vikings virtual machine are variable length. In the case of opcode 0x14, and other similar opcodes, determining the length requires decoding some of the operands. Having variable length instructions means that programs need to be iteratively disassembled, determining at least the number of operands for each new opcode in the program. If the opcodes where all fixed length it would be possible to skip opcodes that were unknown.

There are also instructions which perform common operations that many programs use. For example many programs add or subtract from the current position of the object based on its direction. This could be encoded as several instructions, but opcode 0x91 takes a single word operand as an offset in DS and then adds or subtracts the current temporary register value from it based on the current objects horizontal direction (flag 0x0040). The following pseudo code illustrates the operation of opcode 0x90.

if (this.flags & 0x0040) DS[operand] -= var; else DS[operand] += var;

This results in far fewer bytes in the encoding, especially since this operation is performed by several of the programs. Back in the DOS days when disk space and computation time where at a premium this makes perfect sense. For the modern day reverser it simply adds a bunch more opcodes that need to be understood.

Disassembly Tool

I wrote a simple Python script which can disassemble the virtual machine programs into something that looks vaguely like C. It works by linearly decoding instructions in the program, storing jump and call addresses. When it hits a function which terminates or redirects code-flow, such as a return or unconditional jump, it stops and checks to see if there are any unvisited addresses in the program.

The developers have used similar optimisation tricks in the virtual machine programs to the ones used in game binary. For example, some exit paths of subroutines in the virtual machine programs are reused by jumping to them from other subroutines. The disassembler works around this by decoding the instructions twice, so two subroutines may have code with the same addresses.

The disassembler is able to squash accesses to the temporary register to make the resulting code more readable. For example the virtual machine programs often contain sequences like:

var = 0x1000; obj->field_10 = var;

Which the disassembler will refactor to:

obj->field_10 = 0x1000;

The disassembler does not attempt to refactor code blocks into constructs like if or while blocks, so the resulting programs are spaghetti code. Some of the larger programs become a reversing project in their own right.

A complete program

I started with the program for the gun turret in the top-left of the first (object class 0x04). It seemed like it should be a fairly simple program. The gun turret shoots bullets at a fixed interval, and it cannot be destroyed by the vikings. The following shows the output of my dissassembler. Above each opcode is a comment showing the address in square brackets, the opcode in parens and then the operands.

The turret program looks like this:

main_374d: { // [374d] (19) 37bc this.field_21 = 0x37bc; this.field_22 = 0x0001; // [3750] (97) 00 01 if (0x0001 & (1 << 0)) var = 0x0001; // [3753] (00) this.save_state(); // [3754] (9c) 18 08 if (!(var)) this.db_flags &= ~(1 << 12); label_3757: // [3757] (2f) vm_func_2f(); // [3758] (00) this.save_state(); // [3759] (01) nop(); // [375a] (51) 0000 // [375d] (8c) 30 3798 if (this.field_30 != 0x0000) call sub_3798; // [3761] (52) 1c // [3763] (77) 0000 3757 if (this.update_time != 0x0000) jump label_3757; // [3768] (19) 37c8 this.field_21 = 0x37c8; this.field_22 = 0x0001; // [376b] (52) 1e // [376d] (57) 0206 g_tmp_a = this.xoff; // [3770] (51) 0004 // [3773] (91) 0206 if (this.flags & 0x0040) g_tmp_a -= 0x0004; else g_tmp_a += 0x0004; // [3776] (52) 20 // [3778] (57) 0208 g_tmp_b = this.yoff; // [377b] (51) fff8 // [377e] (5a) 0208 g_tmp_b += 0xfff8; // [3781] (51) 001e // [3784] (56) 1c this.update_time = 0x001e; // [3786] (02) 3030 vm_func_02(0x3030); // [3789] (14) 12 0206 0208 00 0000 0000 05 new.x = g_tmp_a; new.y = g_tmp_b; new.unknown_a = 0x0000; new.unknown_b = 0x0000; new.unknown_c &= 0x801; new.type = 0x05; spawn_obj(new); // [3795] (03) 3757 jump label_3757; } sub_3798: { // [3798] (51) 000a // [379b] (73) 30 37a5 if (this.field_30 == 0x000a) jump label_37a5; // [379f] (51) 0000 // [37a2] (56) 30 this.field_30 = 0x0000; // [37a4] (06) return; label_37a5: // [37a5] (52) 32 // [37a7] (69) 0a 37ba if (this.argument < this.field_32) jump label_37ba; // [37ab] (52) 32 // [37ad] (5c) 0a this.argument -= this.field_32; // [37af] (51) 0000 // [37b2] (56) 30 this.field_30 = 0x0000; // [37b4] (51) 0000 // [37b7] (56) 32 this.field_32 = 0x0000; // [37b9] (06) return; label_37ba: // [37ba] (0e) vm_func_0e(); // [37bb] (10) this.update(); }

There are a few things to note here:

The this.save_state() function is opcode 0x00. It acts a yield allowing the program to save where it was and return to the game engine code.

My dissasembler names some of the object fields that I had worked out. In this program you can see the xoff, yoff, flags and update_time are named.

Similarly DS offsets that are known are also named. g_tmp_a and g_tmp_b are DS offsets 0x0206 and 0x0208.

Opcode 0x14 is used here. I’ve named it spawn_obj(), because from context it is used to spawn a new bullet object each time the turret fires.

Opcode 0x19 is assigning to object fields 0x21 and 0x22. Notably field 0x21 is being assigned a value that looks like a program address. From reading the code/experimentation this address is actually an offset for instructions in a smaller secondary virtual machine which controls how the objects graphics are displayed. I’ve not yet reversed this code.

Refactoring the program makes it easier to understand what is going on:

main_374d: { /* Turret initialisation */ this.set_graphics_prog(0x37bc); if (0x0001 & (1 << 0)) var = 0x0001; this.save_state(); if (!(var)) this.db_flags &= ~(1 << 12); label_3757: /* * Main loop for the turret. This yields to the main * game engine on each iteration. */ while (1) { vm_func_2f(); this.save_state(); /* * The functionality of this is unknown. The this.argument * field can be set for an object instance in the level * header. It may be used to make some turrets behave * slightly differently. */ if (this.field_30 != 0x0000) { if (this.field_30 != 0x000a) { this.field_30 = 0x0000; } else { if (this.argument < this.field_32) { vm_func_0e(); this.update(); } else { this.argument -= this.field_32; this.field_30 = 0x0000; this.field_32 = 0x0000; } } } /* If the update time reaches zero then fire a bullet */ if (this.update_time == 0x0000) { this.set_graphics_prog(0x37c8); /* Reset the update time */ this.update_time = 0x001e; vm_func_02(0x3030); /* * Spawn a bullet object 4 pixels in front of the * turret and 8 pixels above its center line. */ if (this.flags & OBJ_FLAG_FLIP_HORIZONTAL) new.x -= 4; else new.x += 4; new->y = this.yoff - 8; new->unknown_a = 0x0000; new->unknown_b = 0x0000; new->unknown_c &= 0x801; new->type_d = 0x05; spawn_obj(new); } } }

Note that there are still a few sections of the program and some opcodes that I have not identified the function of, but the behaviour of the turret can be easily seen. The update_time object field is used to encode the rate the turret fires at. The program itself does not (at least obviously) decrement this value. Possibly one of the unknown opcodes performs this operation, or some part of the main game engine may do it.

Game Hacking

Being able to view the original programs used by the game is interesting, but the cool part about having a virtual machine implementing the object behaviour is the possibility of modifying or creating new programs. This would be much easier if I were to write a compiler (todo list), but for now this requires poking values manually in a hex editor.

Again I chose to modify the gun turret program since it is quite simple. My first idea was to change the object type that the turret shoots. I had mixed success with this. Attempting to make it shoot some objects, like the green alien, will just result in the game crashing or freezing. Other object types just result in the turret shooting nothing. I was able to make it shoot fire arrows (these are normally a power-up that one of the vikings can use). It is also easy to make the turret shoot more quickly, by lowering the value of the update_time field.

The second modification I made a bit more ambitious: making the turret walk instead of shoot. The opcode (0x14) for spawning the bullet is quite long:

14 12 0206 0208 00 0000 0000 05

The virtual machine (quite generously) provides a single byte nop instruction (0x01), which can be used to remove this instruction. The instructions above have already set DS[0x0206] to the current x position of the turret plus or minus 4 based on its direction, so I just needed to add instructions to set the current x position to that value. This requires only two instructions:

53 0206 // var = DS[0x0206]; 56 1e // this.x = var;

With this change the turret will walk, but will not be obstructed by anything. I did attempt to make it do collision detection and turn around, but I couldn’t get it work properly. More reversing required.

I put together a small video showing the modifications:

I’ve also made my dissassembler tool available:

https://github.com/RyanMallon/TheLostVikingsTools/blob/master/dissassembler/lv_vm_disasm.py