The Nintendo DS boot process involves three parts – two BIOS ROMs for the ARM CPUs and a firmware image. While it is feasible to high-level emulate all three components and directly load a game ROM on start-up, such a task will seem daunting in the beginning of a DS emulator’s life, as it was for mine. Thus, in order to gain a more comprehensive understanding of the DS, I decided that my first goal would be to low-level emulate booting from the BIOS. Contrary to my expectations, this turned out to be far harder than I thought it would be. It was definitely an interesting experience, and I learned a lot from it, so I’d like to share some of the grisly details of the DS BIOS (here I will refer to both BIOS ROMs collectively unless stated otherwise).

The basics

First of all, what does the DS BIOS do? It is not simply a boot ROM like that present in the Game Boy. Rather, the BIOS also provides many useful services to games through the use of SWIs (software interrupts). These functions range from waiting for a specific interrupt to occur to providing decompression methods. Some of these services can be difficult to high-level emulate well, so many emulators require their users to dump the BIOS files from their DS. This provides more impetus to low-level emulate the boot process, at least in the beginning.

The BIOS also contains an exception vector table – simply put, a list of branch instructions leading to specific regions in the BIOS. The ARM architecture provides a form of exception handling – whenever an exception is fired, the CPU jumps to a specific instruction in the exception vector table depending on the type of exception. However, the DS mostly handles a subset of the supported exceptions – reset, IRQ (interrupt request), and SWI. Other exceptions are either not possible due to a lack of hardware or simply lead to a debugging function.

Next we’ll examine exactly how booting works, as well as the difficulties that arise from the process.

Booting the ARM9

Execution begins at memory address 0xFFFF0000, where the reset exception vector is located. This address holds a branch instruction that goes to this code:

Some explanation is required. For those unaware with ARM architecture, the ARM9 and ARM7 each have sixteen 32-bit registers. Most of these are general-purpose, but some are usually only used for specific tasks, such as PC (program counter), SP (stack pointer), and LR (link register). The first two are self-explanatory, but the LR is interesting – when a function is called in ARM assembly, the return address is stored in LR. The first two lines of code store the value 4 into LR if it was equal to zero when the reset vector was called. I’m not sure why exactly this is necessary, but perhaps it has to do something with debugging – a value of zero might indicate problems with null pointers.

The rest of the code has to do with the POSTFLG I/O register. This register is zero when the DS first powers on and is set to one after the booting process is complete. Thus, if the reset vector is called in the middle of game execution (likely due to stray pointer bugs), the BIOS prepares to enter the exception debugging function. This “safeguarding” code is also present in the ARM7 BIOS, and it seems that accidentally calling the reset vector was a common enough issue that Nintendo believed it was necessary to provide some protections. What follows next is simply hardware initialization – setting stack pointers and the like.

The CP15

The only interesting part of the boot process here is the CP15, a coprocessor included within the ARM9 that handles setting up memory controls as well as other functions not too related to the DS. The important parts here are TCM (tightly-coupled memory) and the cache. Despite the ARM9 having a 66 MHz clock speed compared to the ARM7’s 33 MHz, the ARM9 has a severe hardware bug that makes accessing most memory much slower than it already is. Depending on the memory type, the ARM9 can end up being around two to four times slower than the ARM7. The TCM and cache are the only ways to alleviate this issue, as they can be fully used by the 66 MHz clock. The BIOS has to configure both by interacting with the CP15, which mainly consists of data transfers. As TCM is used in part to configure how interrupts are handled, it’s necessary to at least emulate this part; the cache can be ignored for the purposes of booting.

There isn’t much else to say in regards to the ARM9. Throughout the initialization procedure, the two CPUs will use a process called IPCSYNC to synchronize with each other. This works by sending a four-bit value (ranging from 0 to 15) to the other processor and waiting in a busy loop until the other CPU sends a certain value back. At a certain point, the ARM9 sticks itself in a busy loop until the ARM7 has finished all of its tasks, which brings us to our next topic…

Booting the ARM7

The ARM7 mostly has the same hardware initialization procedures, save for the fact that it doesn’t have a CP15 (and by extension, TCM and cache). The other little difference is that the ARM7 begins execution at 0x00000000 instead of 0xFFFF0000. However, the ARM7 has additional responsibilities, such as setting up the RTC (Real Time Clock) and loading the firmware and game cartridge into memory. In fact, the ARM7 is the only one with access to the first two components, as well as other details such as WiFi and the touchscreen.

The RTC isn’t strictly necessary to emulate, but I chose to do so anyway. The DS interacts with this by bitbanging an I/O register – only a single bit can be transferred at a time. The actual functionality of the RTC isn’t that complicated, only figuring out how to put these bits together. It’s worth noting that the time function on the RTC is used as a PRNG for the game cart code – more on that later.

The firmware lies outside of the ARM7’s memory range and is instead accessible by the Serial Peripheral Interface bus (this is also how the ARM7 pulls data from the touchscreen). The ARM7 will first configure the SPI by setting a control register, and then it begins data transfers by writing to a strobe register. One byte is sent or received at a time – a seemingly slow process, but the bulk of the data is only received during start-up. Once again, not a difficult process.

This pales in comparison to what happens next.

Loading the secure area

The game cartridge also lies outside accessible memory and is accessed through I/O registers. To interact with the cart, the DS sends an eight-byte command, and several cycles later, it reads one word (four bytes) of data. Once the DS has read a word, the cartridge will automatically send more data until a sufficient amount has been transferred. This process by itself is similar to using the SPI.

The start-up process works as follows: The DS sends a “dummy” command that probably acts as an activation signal on real hardware. Next, it retrieves the cartridge header – the first 512 bytes, containing information such as the locations in memory where the ARM9/ARM7 should begin execution of the game. After that, the DS retrieves the chip ID, which indicates certain things like the chip manufacturer and the size of the chip. Although the chip ID is not stored anywhere in the ROM, a fake but realistic one can be used instead in emulation. So far so good.

Then the DS sends a command indicating that all further commands will be encrypted, which is where the problems begin.

Nintendo saw it fit to encrypt most game ROMs using the Blowfish algorithm. However, this only happens within the first 2 KB of an 8 KB region called the “secure area.” The first eight bytes of the secure area consist of the double encrypted string “encryObj” – if decrypted successfully, this string is overwritten in memory with the word “0xE7FFDEFF”, but if not, the entire 2 KB is overwritten (as you can imagine, this doesn’t bode well for running the game). ROM dumps, however, tend to have this region already decrypted, so the emulator has to manually re-encrypt the secure area if one wishes to boot from the BIOS. Furthermore, as indicated above, the commands sent to the cartridge are also encrypted, so the emulator must decrypt them as well.

After encrypting all commands, the DS will then retrieve the chip ID either once or twice (which must match with the ID fetched earlier) depending on the value of the RTC. It will then load the 8 KB secure area in a random order using four 2 KB blocks. The BIOS then deactivates command encryption, synchronizes with the ARM9, and prepares to execute firmware code.

To say there is room for error here is an understatement. My initial implementations of CorgiDS were marred with bugs on both the CPU and cartridge side. For example, a particularly silly bug was my misunderstanding of the behavior of the carry flag. On the Game Boy Z80, the carry flag is set whenever there is a borrow from a subtraction operation x – y; in simpler terms, the carry flag is set when x < y. I had assumed this to also be the case for the DS, and only after vigorous debugging did I find out the ARM architecture uses the opposite behavior: the carry flag is set when there is NO borrow, or when x >= y. This wasn’t the only issue: many stupid bugs that only surfaced when the BIOS got to this part resulted in me having to spend a couple of weeks figuring out why nothing was working. Many times I wanted to completely give up, to shove this project aside as a failed experiment.

But finally, after completing this last test of will, CorgiDS began to execute the firmware. The elation I felt from seeing this happen was simply unmatched.

Modern consoles may have more complicated boot-up processes, up to sporting a custom operating system. However, the DS is not to be taken lightly either. Its innocuous exterior hides quirky and downright buggy hardware that games are more than happy to take advantage of, and the BIOS only touches a small portion of it. Nevertheless, booting successfully represents a major step in any emulator, as it indicates a large chunk of the hardware has already been emulated. Once this step has been completed, the project is finally worthy of being called an emulator, even if there’s far more work ahead.

(The header image for this post is a screenshot of CorgiDS running the “hello world” example that comes with libnds)