What I Learned About Emulation

Well, technically, everything. I went into the project only knowing what emulators do. I didn’t know the first thing about how an emulator was created. And that is what I wanted to know. I wanted to literally crack the code of how you get computers to run something as if it was some old, dead hardware. I’m not going to list every little concept that I’ve learned. If you want to learn everything, the best way to do so is to create your own emulator. The only thing it requires is programming knowledge and good google-fu. I’m just going to cover the interesting parts.

OPCODES

This was the first thing that I had to wrap my head around. Just what is an opcode? I saw it mentioned everywhere, but it felt like nobody was actually saying what it was. Even the guide I just listed above to high praise talks a lot about how to implement opcodes without actually mentioning what they are. I’m going to spare you the quick google search that will lead you to a Wikipedia article (lifesaver, I know). Opcode is an abbreviation of operation code. Opcodes are the part of a machine code instruction that defines the action.

Let’s give a quick example from the CHIP-8 instruction set. One opcode is:

0x00E0 — Clear Screen

What does this mean? When 0x00E0 was written in the CHIP-8 language, the expected effect would be for the screen to be cleared. So from an emulators point of view, you are going through the memory reading each individual opcode, and when you reach 0x00E0, you know you have to go to whatever you are using to display graphics and make the entire screen blank. In my code, I did this:

if(self.opcode == 0x00E0):

for i in range(len(self.graphics)):

self.graphics[i] = 0

self.draw_flag = True

So when I run across the opcode I change every pixel to 0 (since CHIP-8 draws by changing pixels) and set a flag that tells the program that it is going to draw again.

For those of you who know some assembly language, you are probably familiar with opcodes. Well, kind of. Here is an example of some x86 instruction:

MOV eax, ebx

;Moves the contents of the EBX register into the EAX register for those curious

See that MOV there? That is what is known as a mnemonic opcode. Opcodes are part of the reason that assembly languages were created. It is not easy to remember every single machine language instruction. Imagine if every time you wanted to clear a screen you had to type something like 0x00E0? In the minuscule CHIP-8 instruction set alone, there are 35 opcodes. That itself is quite an undertaking. Imagine doing that on something with 100 opcodes! 200 opcodes! You just wouldn’t. That is why we now have assembly languages (and higher level languages). It is much easier to remember that MOV moves things into different register than to remember some random hexidecimal number.

EMULATING REGISTERS AND MEMORY ISN’T AS SCARY AS IT SOUNDS

I went into this absolutely clueless on how these people were representing registers and memory. I could easily see what opcodes were doing. However, addressing and RAM are those scary words you learn about in your intro CS classes, learn how to use them, then forget any details about them because binary is still scary.

Well, after creating the emulator, it almost seems funny the fears that I had about all of this. Let’s learn about what these registers and memory actually are in regards to the CHIP-8 emulator.

The CHIP-8 has 16 general purpose registers name V0 through VF. The registers are all 1-bytes long. How can we store 16, indexed, 1-byte values in a programming language. Sounds exactly like an array of characters. In fact, in the guide listed above, that is exactly how they represent the registers. However, this emulator was written in python. Representing byte-size data in python isn’t exactly the smoothest things in the world. However, python does have this well-known 4-byte data type called the integer. As long as our code watches out for an artificial overflow, then there is no reason we can’t use it besides the size. So we would initialize the registers like this:

V = [0] * 16

What if we wanted to set the value of the VE register to 0x8A? Well, it would be as simple as this:

v[0xE] = 0x8A

We have 16 working registers. Simple as that.

Now you may be thinking, alright, registers are easy. But how in the name do I represent 4 KB of memory. Let’s look at what memory actually is. All memory means, in this sense, is a collection of values that can be reached by an address. The fact that we can reach the memory with an address means that it is indexed. 4KB means that there are 4096 of these addresses (for the uninitiated, one KB is actually 1024 bytes of memory, not 1000. Technically 1024 bytes is a kibibyte, but in reality, this is what people are generally reffering to, especially when talking about binary-addressing).

Wait a second? This sounds familiar. 4096, indexed, 1-byte values. Sounds like the same problem we ran into with our registers. In fact, it is the exact same problem. Registers are just a smaller, easily-accessible form of memory. So in order to make our memory, we need to make a much larger collection of memory. In python, it looks like this:

memory = [0] * 4096

Thats it. You now have 4KB of addressable memory. Actually, since this is python, you now have 16KB of memory since one integer is 4-bytes in length. However, since 16KB of memory is minuscule on our modern computers, it is much simpler to just take the memory-size hit than to have 1096 elements in the array and make the code significantly more complex.

So I hope this makes all of that memory and register stuff a lot less scary. It is pretty much exactly the same concept as the variables we’ve been coding with this entire time, a way to store values. Just sometimes with a bit more of those scary binary and hexidecimal numbers.