Write a NES Emulator with JavaScript - Part 1

Background

After getting my JavaScript Chip-8 emulator working, I thought it’d be a good idea to take on the next challenge: the Nintendo Entertainment System (also known as the FamiCom in parts of Asia).

In my naïvety, I failed to realise the massive jump between emulating the Chip-8 and the NES. I had a lot of struggles, so I decided to write this series as a means of documenting the development and as a guide to anyone who wants to have some fun and potentially lose some hair.

Getting Started

To write an emulator in JavaScript, you only need the basics: Chrome a web browser, text editor, keyboard, hands, etc but the most important part are some more intrinsic attributes: perseverance, commitment, etc.

You should keep a few browser tabs open during development, one for your project, and several for documentation on the component you’re working on. Planning is also important, as I’ve re-written some portions of my own code which I perhaps could have avoided if I didn’t rush in. Resist the temptation to start coding, and become intimate with the component first.

The NES overview

The NES is a basically a purpose-built computer, which includes the following components:

Central Processing Unit (CPU) Picture Processing Unit (PPU) Psuedo Audio Processing Unit (pAPU) Random-access Memory (RAM) Game-pak Read Only Memory (Game ROM) and on-board memory mappers (referred to as simply mappers) Controller interfaces Some other bits and pieces

Today, or possibly tonight if you’re not in a today part of the world, we’ll take a look at the NES’s CPU and touch on some JavaScript. If you want, go and learn some basics of assembly. It will help, but it’s not required (you will learn it as you go otherwise).

The NES CPU - Ricoh 2A03

A company named Ricoh developed the CPU for the NES called the 2A03 (and its PAL counterpart, the 2A07), a chip that is based on the 6502 processor. This custom chip added support for audio and dropped Binary-Coded Decimal (usually standard on a 6502) support.

Luckily, this chip uses the 6502 instruction set, which was widely popular (used in the Apple II, Commodore 64, Terminator and Bender). This popularity means there is a load of information about it online, which is good news for you.

While explaining this processor, I won’t assume too much existing knowledge, so if you don’t know what you’re doing, read carefully and if you do know, then skim-read and/or send me suggestions on how to improve the explanations.

You will need to know about binary (base 2) and hexadecimal (base 16) numbers, as well as some basic computer concepts such as bytes. There are already thousands of articles about them. Google can be your friend.

How processors work

A computer processor is what keeps track of and executes the instructions required for a program to run. It is the raw deal, handling computer instructions with a bunch of logic gates, which are composed of transistors. For example, the JavaScript code you’re used to running in the browser will eventually be executed down the line as machine instructions.

Let’s take a look at all the bits and pieces that are relevant to know when emulating a CPU.

Registers

A processor has some small pieces of storage buckets available on it which are used for temporary storage of small pieces of data. These buckets are different to external RAM and ROM, however, they function like tiny pieces of RAM, in which they can be read and/or written. These small storage buckets are named registers, and they live on the processor. Almost all of the processor’s work relies on registers. Most processors have some registers in common. These are the Program Counter (PC), also known as the instruction pointer and the Stack Pointer (SP).

The program counter is used to keep track of where the processor is currently executing instructions (explained next) from. It holds the position in memory of the current instruction, and it can either increment to the next instruction (most common), or be modified by an instruction to point to a different location, and therefore a new subsequent instruction (like a goto statement).

Imagine you are given a list of chores. You do the items in order, from top to bottom, leaving a pin next to your current item. When it begins to rain, you decide to skip mowing the lawn, but you know you will come back to it last when the weather clears up. Your pin is functioning like a program counter.

The stack pointer is used to keep tabs on the internal stack in RAM. When the stack grows up (often down in address space), the stack pointer is modified to point to where the stack is currently at.

I can’t think of an analogy right now for a stack pointer that doesn’t sound completely absurd. Suggestions welcome!

In addition to these registers, processors generally have a number of general-purpose registers. For example, a register referred to as the accumulator is generally used for instructions related to math.

Memory

Memory is not strictly part of the processor, even if the NES’s RAM is on-board the chip. However, they’re very closely related, so they will be discussed here.

Memory is contiguous buckets of addressable space that can be either read or written if the hardware supports it.

A processor can address memory, which is usually in the form of RAM and/or ROM. The addressable space is limited by the size of the address bus. For example, an 8 bit address bus can address up to 256 bytes of memory, as it can only hold values between 0 and 255.

The memory holds space which can be written (in the case of RAM) and read from relatively quickly when compared to HDD access. The RAM holds the space for the stack, general-purpose memory, and depending on the system, some parts of the memory may be mapped to other things, such as a port for modifying a sub-component like a video controller.

Addressing Modes

When a processor wants to address some memory, it uses an addressing mode to determine where the value is located, either in memory or encoded in the instruction itself.

Different processors have different addressing modes, but often they share similar modes. Processors will generally(?) have an absolute mode, meaning to take the operand as an address, an immediate mode which will take the operand as its literal value, and possibly addressing modes which add a register’s value to its argument, in order to get a new address.

Endianess

A CPU is either little or big-endian . This is a description of how numbers are stored that are larger than 1 byte. In a little-endian system, the number is stored back-to-front, e.g. 0x1986 is stored as 0x86 in the first byte and 0x19 in the following. Big-endian is simply the opposite of this.

Instruction Set

Each processor has a list of predefined instructions, which are generally specific to that processor (however, for compatibility, many processor instruction sets extend others). These instructions are described by a number, and their possible arguments’ values by other numbers.

For example, a very basic processor might have the following instructions: load into accumulator, put into accumulator and logical and. This means, that a simple program might look like (in pseudo-code)…

Load from address 0x2021 into register A Perform a logical and with A’s value and 0x6 Put register A’s value into address 0x42

As a list of assembly instructions, this might look like so…

LDA $2021 AND #$6 STA $42

If this was assembled, it might look something like this…

AD 21 20 29 06 85 42

I’ve added some new lines there, to help with readability. As you can see, each instruction has a number (known as the operation code, or opcode) as well as each argument (notice that this processor is little-endian, that is 0x2021 was stored 21 20).

One thing to realise is that the instructions and their arguments are simply a stream of numbers, like any other part of memory. So long as the PC starts at the correct spot (a system may have a fixed starting address), it can process each instruction one-by-one.

It’s also worth mentioning that the assembly instructions from above are a subset of the 6502‘s instruction set. Sneaky.

A processor generally has instructions for copying into and from registers, basic arithmetic, bit manipulation, comparisons, stack operations, flag modification, interrupts and branching. The latter three are worth explaining in more details.

Processor Flags

A processor can have flags which modify how the processor behaves and/or define system state. These can be stored as a register, with each bit the on/off switch for a flag.

For example, certain interrupts (covered next) can be enabled/disabled via flags, and certain arithmetic instructions can set flags, such as if the last operation resulted in an overflow, negative number, zero or there was a carry-out.

Interrupts

During normal instruction execution, a processor may receive an interrupt, which is a signal to tell the processor that it needs to interrupt normal execution and handle some other thing.

For example, an interrupt may be triggered by a video controller, to tell the processor that a frame has been painted.

There are a couple of different types of interrupts, such as maskable, which can be turned off via a flag and non-maskable, which is always triggered and can’t be disabled. Some interrupts can also be triggered by software.

When an interrupt occurs, a processor generally saves the PC and possibly some extra data, such as processor flags and then jumps to that interrupt’s vector, which is an address associated with the interrupt type.

The doorbell analogy is often used to help explain an interrupt. Imagine that you’re making dinner, and then you hear the doorbell. You must stop what you’re doing (unless you’re rude) and attend to the door. After the door issue is attended, you must return to making dinner.

Branching

Branching is the low-level construct which backs an if or a for statement in a high-level programming language. For example, if you want to have a loop in assembly, you would branch back to the top of the loop body if the condition isn’t met (in order to run the loop body again). This branch is achieved by modifying the PC via a branch condition. The PC can also be modified by other means.

There are also subroutines which point the PC to a new section of code. What sets the subroutine jump apart from a normal branch is that there are generally arguments pushed onto the stack, as well as the prior value of the PC, so when the subroutine returns, it knows where to resume instructions.

In addition to these, there is usually a non-conditional jump, which can be used for just changing the PC to a new address to execute some new code.

Knowing about branches isn’t strictly required knowledge to building your emulator, but I will walk you through debugging your emulator via test programs, in which case knowing about branches is most definitely useful.

Clock

The CPU needs to have something responsible for the fetch-decode-execute loop, and that is the processor’s clock. The speed of the clock affects the speed that the processor can execute instructions.

The clock oscillates on and off and the speed of which it does is measured in Hertz. Each instruction takes a pre-determined amount of cycles to execute, though they can end up taking more if certain conditions are met.

The NES’s CPU

Now that we have a bit of an idea of processors in general, let’s look in more detail at the NES’s CPU.

Overview

The NES’s CPU is little-endian and uses the same instruction set as the 6502 with some minor differences, such as BCD instructions not doing anything.

Registers

The chip has three special-purpose (PC, SP and P) and three general-purpose registers (A, X and Y). All of the registers are 8 bits wide, except for the PC, which is 16 bits.

The SP points to the current size of the stack, and is the least significant byte of an address starting at offset 0x100 in memory. This means that the stack is located in memory from 0x100 to 0x1FF. The SP grows downwards in memory space, meaning that pushing onto the stack will decrement the stack pointer, and popping will increment it.

The P register holds the processor flags. These are listed from least to most significant bit…

Carry flag - Set if the last instruction resulted in an over or underflow. Used for arithmetic on numbers larger than one byte, where the next instruction is carry-flag aware. Zero flag - Set if the last instruction resulted in a value of 0 Interrupt Disable - Set to disable responding to maskable interrupts Decimal Mode - Set to enable BCD mode. This doesn't affect the 2A03 so flipping this value doesn't do anything. Break Command - Set to indicate a `BRK` instruction was executed Unused bit Overflow flag - Set when an invalid two's complement number is the result of an operation. An example is adding 2 positive numbers which results in the sign bit being set, making the result a negative. Negative flag - Set if the number is negative, determined by checking the sign bit (7th bit)

The A register, known as the accumulator, is related to all arithmetic related instructions.

The X and Y registers are used for holding things such as iteration counts or as offsets to addressing modes. The X register in addition can be used to set or get the SP.

Memory

The NES has a 16 bit address bus, and thus can address up to 16 KiB of memory, from 0x0 to 0xFFFF. Different ranges of the addressable memory have different names, of which the name explains its purpose.

Many addresses that the CPU can read are mapped to other subsystems of the NES, such as the PPU and the ROMs present in the game paks. There are also many parts which are mirrored, for example, writing or reading to certain address ranges will also write the same value in consistent offsets (due to not all address lines being decoded).

The first 256 addressable slots are referred to as Zero-Page and are handy as they can be addressed with only one argument.

From 0x100 to 0x1FF lives the stack. The SP will wrap if it exceeds its capacity, so it can never write outside of this range.

0x200-0x800 lives general purpose RAM that can be written and accessed via loaded programs.

From 0x801 to 0x2000 mirrors the previous chunk of memory, that is, from 0x0 to 0x7FF.

The PPU’s registers (explained in another part) are mapped to 0x2000 to 0x2007. 0x2008 to 0x4000 mirrors this data.

0x4000 to 0x4020 has other mapped data, such as Direct Memory Address for copying sprites (which will be explained sometime in the future, or get Googlin’ if you’re the curious type, and you should be).

Between 0x4020 and 0x6000 holds Expansion ROM and 0x6000-0x8000 holds SRAM, which is used to save game state by supported game paks. Writing and reading from this space allows the program authors to save the state when the game pak was removed by using a little piece of RAM and a battery.

Finally, we have the game pak’s data. The game pak’s ROM consists of 16 KiB banks. It is addressable at 0x8000, and if the program only contains one bank, it will be mirrored at 0xC000. This is to ensure that the reset vector points to a valid instruction. These two banks of memory can be extended using mappers (by replacing the addressable portions with different banks), which is backed by special bank-switching technology inside the game paks themselves.

Interrupts

The processor has three types of triggerable interrupts:

Maskable Interrupts (IRQ) - Generated by certain memory mappers or via the BRK instruction. Can be disabled via a processor flag. The vector is 0xFFFE and 0xFFFF.

(IRQ) - Generated by certain memory mappers or via the instruction. Can be disabled via a processor flag. The vector is 0xFFFE and 0xFFFF. Non-maskable Interrupts (NMI) - When the PPU generates a V-blank (hold ye hores for part 2), this interrupt is triggered. As the name suggests, it can’t be disabled using a processor flag (but it can be disabled via the PPU). The vector is 0xFFFC and 0xFFFD.

(NMI) - When the PPU generates a V-blank (hold ye hores for part 2), this interrupt is triggered. As the name suggests, it can’t be disabled using a processor flag (but it can be disabled via the PPU). The vector is 0xFFFC and 0xFFFD. Reset - Caused when the system first starts or the physical reset button is pressed on the NES. The vector is 0xFFFC and 0xFFFD.

Priority is given to the interrupts in the order of reset, NMI and then IRQ. There is a latency of 7 cycles before an interrupt is handled.

Addressing Modes

The 2A03 has 11 addressing modes. There is a link to the end of the article that describes them.

Instruction Set

Luckily, the 2A03 shares the same instruction set as the 6502, of which it is based on. It would be silly for me to reproduce the entire list here, so I will link to them at the end of this article.

Remember that the Decimal Mode instructions don’t serve a purpose on the NES.

Enough, let’s see some JavaScript

Let’s begin by describing how we will architect our emulator. I am going to use Browserify for its awesome support of require() in the browser and also Beefy as a development server. This means you will need Node (and NPM) installed.

You can generate a package.json file, which can be done via npm init . You can add Browserify and Beefy ( npm install -g --save-dev browserify beefy ) as development dependencies if you wish. You (or others) can then use npm install to get the dependencies for your project in the future.

We’re not going to produce any screen output at this stage, so it doesn’t really matter what HTML you use (if any). You can create a canvas element if you want, but we won’t touch it until another day :)

We will create each component as its own file, and we will load them via require() . We will create a global object to hang our NES off, so we can poke around with it via the browser console as it’s running.

We can start the development server with beefy main.js . Then, simply visit the page in your web browser to see your code running.

One thing to keep in mind, is that the original hardware knew no concept of separation of concerns. It simply wasn’t feasible to try and build hardware like that. Due to this, the resulting code also has some intermingling going on. To make this possible, we will pass in a reference to our NES object to each component we construct.

Entry point

We will call this file main.js , as it’s the entry point into our program…

(function() { var NES = require("./nes"); window.nes = new NES(); })();

Make sure you’ve created a sibling file called nes.js …

(function() { function NES() { // Not much right now. :) } module.exports = NES; })();

That’s it!

…for today. There is probably a lot to take in, so take the time to re-read it and try to understand the concepts.

I’m not an emulation expert, but you might be, so I encourage you to give me constructive criticism on how to improve this article.

Documentation

These are the pages you need to have open while developing your CPU emulator.

NesDev. If you only use one documentation this year, make it this one.

6502 Instruction Set Reference

6502 Addressing Modes

This page and feel free to ping me at @alexdickson if you’re stuck

What’s next

In the next article, I will look into how the NES programs are generally encoded and we will begin our CPU.

Want to discuss this post? Just mention me @alexdickson.