Decode

I fetched my first 8 bits from the memory address 0x0100 , next I need to decode the operation. While I’ve seen pretty sophisticated ways the decode CPU instructions, it’s actually very often done using just a single switch/case statement where each operation is its own case.

So let’s say I’ve got some function to fetch 8 bits from memory and store them in a variable called op :

uint16_t PC = 0x0100;

uint8_t op = fetchOp(PC);

I then construct a switch/case to handle each possible operation:

switch (op) {

case 0:

// NOP aka no operation

break;

case 0x76:

// HALT

break;

…

}

Some operations take arguments, for example the operation 0xC6 takes another 8 bits and adds them to the value stored in register A. So, in case of a 0xC6 instruction the CPU needs to also fetch the next 8 bits. I can reuse the same fetchOp function for that since its only job is to fetch 8 bits from memory.

…

case 0xC6:

uint8_t value = fetchOp(PC + 1);

// add value to register A

…

As you can see, I will have to advance the program counter (PC) by one in order to read the next 8 bits. This is usually done inside the fetchOp function, so that every read increments the counter automatically:

uint8_t fetchOp() {

return memory[PC++];

}

With fetchOp implemented like that it’s no longer necessary to advance the counter outside of the function.

Execute

Now that instructions can be fetched and decoded, all that’s left to do is to execute them. All a CPU needs for that is access to some memory as well as its internal registers. Let’s have a closer look at the instruction set for the DMG CPU.

8 Bit Load Commands

These are operations where 8 bits of data are going to be copied from one place to another. Some examples:

Load the contents of one register into another register (e.g. A to B)

Load an arbitrary value of 8 bits into a given register

Load the contents of a location in memory into a register

Load the contents of a register into a location in memory

16 Bit Load Commands

Although the Gameboy’s CPU is an 8 bit CPU it has some limited ability to work with junks of 16 bit data. All memory addresses are 16 bit long and so is the program counter (PC) and the stack pointer (SP) since they point to locations in memory. In fact, as already mentioned all registers are described as 16 bit registers AF, BC, DE, HL, SP and PC and can be fed directly with 16 bit values. In total there are four 16 bit load commands available:

Load an arbitrary value of 16 bits into a register (e.g. BC)

Load the contents of the HL register into the stack pointer (SP)

Push the contents of a register onto the stack

Pop the contents of the stack into a register

The last two commands may require further explanation. I already introduced the stack pointer register (SP) by mentioning it’s pointing to a special location in memory called the stack. A stack is basically exactly what it sounds like: A place to put some data aside for later use. It’s often used together with branching or subroutines. Those are the times when a program needs to save its registers on the stack before doing something else and popping them off the stack when it’s done.

8 Bit Arithmetical and Logical Operations

Besides copying and moving around data, a CPU usually does a lot of arithmetical and logical calculations. It’s no surprise that the DMG CPU has quite a few instructions dedicated to those kinds of tasks:

Adding the value of a given register to the value stored in register A (the accumulator)

Adding an arbitrary value of 8 bits to the value stored in register A

Add the contents of a location in memory to the value stored in register A

The same operations as above but for subtraction instead of addition

Increment or decrement a value stored in a register

Increment or decrement the contents of a location in memory

Logical operations AND, OR, and XOR (exclusive or) of a register with register A

Logical operations AND, OR, and XOR of an arbitrary value with register A

Logical operations AND, OR, and XOR of the contents of a location in memory with register A

Addition and subtraction with respect to certain flags stored in register F

The last point is kind of important and prone to errors, at least in my experience. The flag register (F) holds four flags:

The Zero flag

The Subtraction flag

The Half Carry flag

The Carry flag

When dealing with arithmetical and logical operations, these flags are often being set, reset and even incorporated into the calculation itself. The Zero flag for example is being set to 1 when the result of an operation has been 0 and reset otherwise. The Subtraction flag is set to 1 when the preceding operation was a subtraction and reset otherwise. The Carry and Half Carry flags are used to indicate that a certain type of carrying has occurred during the operation. Remember when you did addition and subtraction by hand in school? You had to carry numbers from one column to the next. That’s exactly what carry means here, too.

16 Bit Arithmetical Operations

There’s a limited amount of 16 bit arithmetical operations available on the DMG CPU. These not that different from their 8 bit counterparts except they operate on the 16 bit registers BC, DE, HL and SP — and they disregard all the flags completely which makes them much easier to implement.

Rotate and Shift Operations

Another useful operation with binary data is bit shifting and rotation. Let’s take the 8 bit expression 00000001 ( 1 in decimal) and shift it to the left by one. The result is 00000010 which is 2 in decimal. Shift it another time to the left and you get 00000100 ( 4 in decimal). Each shift left is equivalent to a times 2 operation. It’s apparent how this can be useful.

Rotation is basically the same as shift with the only difference being that the the data is wrapped around at the edges of the 8 bit values. For example, shifting 11000000 left would result in 10000000 while rotating 11000000 left results in 10000001 .

The DMG CPU supports the following rotate and shift commands:

Rotate the contents of a given register left or right

Rotate the contents of a given register left or right through the carry flag (in the example above the leftmost bit would be rotated into the carry flag and whatever value was stored in the carry flag would be rotated into the rightmost bit: 11000000 0 → 10000000 1 )

→ ) Rotate the contents of a location in memory left or right

Rotate the contents of a location in memory left or right through the carry flag

Shift the contents of a given register left or right into the carry flag

Shift the contents of a given register right into the carry flag without changing the leftmost bit

Shift the contents of a given memory location left or right into the carry flag

Shift the contents of a given memory location right into the carry flag without changing the leftmost bit

Swap the leftmost 4 bits and the rightmost 4 bits of the contents of a given register ( 11110000 → 00001111 )

→ ) Swap the leftmost 4 bits and the rightmost 4 bits of the contents of a location in memory

Single Bit Operations

The Gameboy’s CPU also supports a few operations on single bits instead of whole bytes:

Set a single bit of a register to 1

Reset a single bit of a register (set it to 0 )

) Set a single bit of a byte value at a location in memory

Reset a single bit of a byte value at a location in memory

Test a single bit of a register (the Zero flag will be set to 1 if the bit is 0 )

if the bit is ) Test a single bit of a value at a location in memory

CPU Control Operations

There are a couple of commands to control the CPU itself:

HALT. Halt all operations on the CPU until an interrupt occurs.

NOP. Don’t do anything. This is a useful command to control timing — a topic that will be discussed further down.

Complement the carry flag: 1 becomes 0 and 0 becomes 1

becomes and becomes Reset the carry flag

Enable or disable interrupts. Interrupts will be discussed in a future post.

Jump Operations

I already mentioned branching and subroutines in connection to the stack. Branching and subroutines are implemented using jump, call and return operations. The DMG CPU supports the following operations:

Jump to an arbitrary location in memory. This sets the program counter (PC) to that location.

Jump to the location stored within the register HL

Jump to an arbitrary location if a certain condition is met (i.e. Zero flag set, Zero flag reset, Carry flag set, Carry flag reset)

Jump to a location relative to the current program counter (PC)

Jump to a location relative to the current program counter (PC) if a certain flag-related condition is met

CALL. Push the address of the next instruction onto the stack and execute a jump to an arbitrary location

Push the address of the next instruction onto the stack and execute a jump to an arbitrary location if a certain condition is met

RETURN. Pop the location of the next instruction from the stack and jump there

Pop the location of the next instruction from the stack and jump there if a certain condition is met

Pop the location of the next instruction from the stack and jump there while also enabling all interrupts

That’s it for the DMG CPU’s instruction set. With these operations implemented, every Gameboy game can be executed — in theory. There’s a few things missing like an actual memory implementation and some necessary interrupts.

Timing

I briefly mentioned timing when discussing the NOP operation. Getting the timing right is an important aspect when building an emulator. I want to make sure my emulated CPU runs as closely as possible to the correct clock speed.

The DMG CPU is clocked at 4.19 MHz which is a bit misleading since every operation takes at least 4 clock cycles to complete. This means that, at best, you can only get one operation executed every 4 ticks. This is also why some consider the Gameboy’s CPU to only run at an actual speed of 1.05 MHz.

Regardless of whether you want to consider the CPU to be clocked at 1 MHz or 4 MHz, it’s important to count clock cycles for each operation. Some operations take considerably longer to execute than others and you want to have this reflected by an emulator as precisely as possible. This will become apparent when implementing graphics and sound output. For example: a horizontal line to the display is generated once every 116 ticks considering the CPU to be clocked at 1 MHz, or once every 464 ticks considering the clock speed to be 4 MHz. I will be discussing this in more detail when talking about the graphics implementation.

Final Thoughts

This post has been fairly technical but in order to complete this project, it’s important for me to understand the actual core of the Gameboy.

If you’d like to see my current implementation of the DMG CPU you can do so here. Keep in mind that I wouldn’t consider this a particularly elegant implementation, though. I’d probably do a few things differently next time. Most and foremost I’d strongly consider making use of C’s union type for the register implementation:

union {

struct {

uint8_t A;

uint8_t F;

};

uint16_t AF;

}

Anyway, the implementation works and that’s the most important thing for now.