Unbuckle your seatbelts and stretch your legs ladies and gentlemen. This is a long ride.



(Or you could skip straight to the fun: http://lf94.github.io/kdl2viewer/)





I recently bought a Super Game Boy (commonly abbreviated as SGB). The SGB is literally a Game Boy CPU crammed into a SNES cartridge. In order for Game Boy emulators to emulate the SGB, they have to either emulate the SNES as a whole, or simulate parts of it. For me, emulators like bgb simulate it well enough to be used for development. Only several games really make use of the SGB's capabilities anyways: Kirby's Dreamland 2, Space Invaders, Hercules, Megaman, etc.Why did I personally buy it though? Because playing games on a large TV is way easier on the eyes than a tiny screen. This makes, say, writing music on LSDJ way easier, especially if you spend hours at a time composing.Familiar with the cute little Dreamland citizen, I spent the next 2-3 days 100%ing the game with my girlfriend. Entirely cute, surprisingly difficult at times, and one epic final boss battle - I would easily rate this game 8/10. While playing I had few moments where I was just saying to myself "wow, this is actually a Game Boy game." Kirby's Dreamland 2 could have easily been an NES or SNES release ( given some cosmetic changes ). After some searching, turns out there's a Kirby's Dreamland 3 for SNES, except it's rare and worth at least $80!What led me to the reverse engineering? While playing there was a level that resembled a woman. I was curious what other Easter eggs the level designers created, or if there were any test or unused levels. Having experience with Game Boy assembly (my work-in-progress game is entirely in it) and reverse engineering Nintendo 64 games, I decided what the heck.It was time to pull up the rolling chair, and get to work.

I have never reverse engineered a Game Boy game before. I have never reverse engineered a level format before. I have never even written my own level format. It took a few minutes to figure out a vector of attack. I knew looking for something that changed the level felt like the right direction. OK, what changes levels…Option 3 might sound weird, but for majority of Game Boy games, "lazy loading" a level from ROM is a very common pattern.I don't know where the code that detects we've entered a door is, or selected a world, so option 3 it was. In retrospective this pretty naive, and probably should've taken a different approach, but it worked.The smart approach would've been to set a read breakpoint on where the joypad inputs are stored. From there, I would follow routines that check to see if more of a level should be loaded (because remember, levels are "lazy loaded"). Paired with dumping RAM between levels and reviewing the differences, this could've been effective.But no, I didn't do that. Instead, knowing that the Game Boy's working RAM (WRAM) is really small, I just eyeballed RAM changes. Yeah. Within 2 seconds I found values only changing when parts of the map were changing. I set a write breakpoint, and I was in the game's engine.

1.3 Working my way backwards

It turns out this format was way more complex than I initially thought. The people at HAL Laboratory did an excellent job at designing the level engine for this game. From "file system" design, to compression formats, to the lazy level loading. Just fantastic stuff - which made it all harder for me of course.





1.3.1 One fourth of a tile? What?









Instead of storing the tiles like [1,2,3,4], [1,2,3,4], the game instead says, "Hey, I need tile 2!", and goes like this:

Array1[2] Array2[2] Array3[2] Array4[2]



These arrays are essentially the "currently loaded graphics arrays". In WRAM, they're located at $C500, $C600, $C700 and $C800.



First thing I spent a few hours on was changing 1/4th of a tile. Not just any tile - every single tile that looked alike. In my head I was going, "what kind of format is this…I guess they're trying to save space?"The reality was, it was easier to create 2x2 tiles this way!Instead of storing the tiles like [1,2,3,4], [1,2,3,4], the game instead says, "Hey, I need tile 2!", and goes like this:Which creates a 16x16 graphic.These arrays are essentially the "currently loaded graphics arrays". In WRAM, they're located at $C500, $C600, $C700 and $C800.

1.3.2 Compression…format…in a Game Boy game? I set another write breakpoint on one of the graphics arrays' bytes to see what wrote it there. It was some routine that appeared to be checking for types! Could this be the level file format?!



No.



This was something I totally didn't expect to have to deal with: a compression format. The only other time I had seen a compression format in a Game Boy game is while reading a demo's description, saying they used a LZMA-type (similar to 7z) of compression. Oh frig. I have never dealt with compression before, but I persisted…

…and it honestly wasn't that bad.



The assembly detailed everything that was going on (it has to!). Sure it wasn't easy to follow at first, but as I watched it eat and digest every byte by byte, I could slowly see what it was doing with different data. Patience was the key here.

The compression format goes like this:



%TTTNNNNN, [Data]



Where T is "Type bit" and N is "Number".

There are 9 types. Default:

Read and write N bytes. Type 20:

Repeat next byte, N + 1 times Type 40:

Imagine we had data like this: 0209 What "42" would do, is it would write this out: 020902090209 It repeated 2 bytes N + 1 times. Type 60:

Next byte is written N + 1 times, but also has 1 added to the byte itself. Example: Start byte is 7E, the next is 7F, the next is 80, and so on… Type 80:

Copy N bytes, starting from the address of the next 2 bytes. Type A0:





I knew that data at $D900 was tile data, or at least was mostly used for it, but how does that data actually get there? What does it look like? It turns out, using 8 lines of assembly, it's procedurally generated. 1995 Game Boy procedurally generated graphics.



ld hl, $D900

.loop

ld b, $08

.loop2

rrc l

rla

dec b

jr nz, loop2

ldi [hl], a

inc a

jr nz, .loop



It looks like this:





Without this table, the tiles in the game are not complete, and have white spots all over them. Copy N bytes, starting from the address of the next 2 bytes. The difference here is, it reads the source byte, and that source byte is actually an index, which starts at $D900. It then writes the byte from the array to the destination.I knew that data at $D900 was tile data, or at least was mostly used for it, but how does that data actually get there? What does it look like? It turns out, using 8 lines of assembly, it's procedurally generated. 1995 Game Boy procedurally generated graphics.It looks like this:Without this table, the tiles in the game are not complete, and have white spots all over them. Type C0:

Copy N bytes, walking backwards. This basically "vertically" mirrors the top half. The next 2 bytes are again, the address to start at. Type F0:

So this is sort of an "expansion" byte, as in, storing a number in a nibble isn't enough, so this indicates that we need to use the next byte as the number.



The type byte itself translates into 1 of the 6 types above. The format is %111TTTTT, where T is the type. Doing a logical left shift 3 times gets the type. Type FF:

End of file



The routine for the compression format is located at $0708, and takes the following parameters: de - destination, hl - source.



I set a read breakpoint on the FF byte (end of file) to see where this routine would bring me after it was completed.

1.3.3 The golden table Having broken through the metaphorical barrier of compression code, it's nice to be on the other side. The real level loading format code was here. Around this point, I was about 8-10 hours in… It's definitely not as easy as I'm making it out to be in this article.



I took time to see what was happening in the code around my position. This meant lots of scrolling up, scrolling down, staring, and just thinking. I saw that $0708 (compression routine) was being called all over. There was another routine, $05DD, which changes the ROM bank right before the decompression. At each call, I set an execution breakpoint to stop before they were called to see what their destination and source were. I inspected every source but none really meant much. Then it dawned on me…lets see where the sources are being read from.



At the very top of the routine with all the decompression routine calls, there was 'call $1564'. Ah, the goldmine. $1564 contained the code that calculated the level index for the…level table. Yeah baby. I finally found the very start of this madness.



The level table starts at ROM bank 8, at $511F in the European version of the game. There are 176 levels, or more technically speaking, "level parts", in the entire game. How did I find that out? I eyeballed the last level entry, of course. There's nothing in the table that indicates its the end - like a regular array in other languages. You just know the size.



The guys and I on IRC had some fun warping around from level to level, trying to find unused content, but we stopped after 10 minutes or something. Honestly, if the guys at HAL went through all the trouble to create decent compression, why waste it on unused content? We'll talk more about this later. When things cooled down, I began my descent down address mountain.





1.3.4 The level format At this point, exhaustion began to settle in. It's not easy reverse engineering. There is a lot of plain old watching data being moved around. What really helps is seeing patterns you've already learned in higher languages, that you can now "see" at this low level. No, not because a compiler wrote translated high level code to assembly, but because a person with those higher level concepts wrote it.



You may or may not want to skip my adventure reversing the format. Find the heading "Properly rendering a level" if you do.



Instead of reading the assembly instruction for instruction, I start by selecting the first level entry, which happens to be the first level of the game. The entry leads to what looks like a bunch of gibberish, more entries, and who knows what else. I started with swapping the entries, since I knew what they looked like. Taking an entry from another level entry, it turns out the levels would swap graphics. OK, fantastic, now I know what's there. I did the same for the next 2, but the game crashed. That left me the graphics format to figure out first. The graphics



The address from the previous section led me to another address. At this point I wanted to figure out what was here, and not another address away. So I breakpointed after the address and began to watch the engine churn the data some more. I was surprised to find it writing to 4 different places in WRAM (working RAM, where, well, the work is done), instead of VRAM. Wait a minute… 4 places… maybe this is the data I was modifying at the very beginning. With some quick validation, it turns out that was the case. It was a translation-type table (or more correctly put, a map) that takes the tile number, and gives you the VRAM tile number, but four of them. Why four? Because four tiles equals one 2x2 "block". Here's a diagram of how I model it in my head:









I went back to the address we first encountered in this section and read breakpointed that. Within a minute I could clearly see it writing decompressed data to VRAM. So we have the graphics bit sorted out… or do we?



The address from the previous section led me to another address. At this point I wanted to figure out what was here, and not another address away. So I breakpointed after the address and began to watch the engine churn the data some more. I was surprised to find it writing to 4 different places in WRAM (working RAM, where, well, the work is done), instead of VRAM. Wait a minute… 4 places… maybe this is the data I was modifying at the very beginning. With some quick validation, it turns out that was the case. It was a translation-type table (or more correctly put, a map) that takes the tile number, and gives you the VRAM tile number, but four of them. Why four? Because four tiles equals one 2x2 "block". Here's a diagram of how I model it in my head:I want to also remind you that all of this data is still compressed. I would let the game do its decompression routine and then afterwards read or modify the data. In this specific case, the game would write out the data to $CF00, and then write it into 4 "sections": $C500, $C600, $C700, and $C800.I went back to the address we first encountered in this section and read breakpointed that. Within a minute I could clearly see it writing decompressed data to VRAM. So we have the graphics bit sorted out… or do we? I will not be wasteful, I will not be wasteful, I WILL NOT BE WASTEFUL



Those guys at HAL really out did themselves. Instead of being wasteful, at the beginning of every pixel data (or "tile data") block, there is a single byte that tells the location of where to start in VRAM. At first I was insanely confused by the code. The engine would get its complement, increment it by 1, swap its nibbles, then add $96 and $30. I'm not an assembly expert. When there's something I can't understand, I get help. A member from #gbdev on irc.efnet.net explained what was happening - it was actually subtracting that byte, by adding! Crazy, and unintuitive, but he gave an excellent example using decimals:



What do you get when you add 40 + 99? You get 139. Strip off the hundreds digit. 39. Whoa.



…



At least I find it neat.



Just to briefly summarize the above paragraph: the game essentially does $9630 + -(byte << 4).



I had only realized what this byte had done way after I was already rendering levels. Most of the levels, I would say 160/176 were rendering properly. Implementing this change made all levels render properly.