Console Emulation via Common Lisp Brit Butler

2012-10-01 (use arrow keys or PgUp/PgDown to move slides)

Enter cl-6502







Let's emulate one.....with Lisp!

BUT WHY?!?

Goals: Definition Extensible - Easy to arbitrarily toggle logging of reads/writes to memory, or branching opcodes, or...



Concise - NOT CODE GOLF. Dense but elegant. The minimal executable spec.



Performant - Able to comfortably run at least 2 emulated CPUs/core on modern hardware.



Correct - Implements the 6502 faithfully enough to run Mega Man 2, Super Mario Bros 3, etc.

Goals: Status Extensible, Concise? Check. Extensible - All opcodes are methods. CLOS makes arbitrary wrappers (think decorators or defadvice) trivial.

Concise - Py65: 2091 SLOC . cl-6502: 879 SLOC .

. cl-6502: .

Correct, Performant? Verdict still out, looking good. Correct - Py65: ~8000 SLOC . cl-6502: ~200 SLOC .

- Lots of manual testing, no large programs.

. cl-6502: . - Lots of manual testing, no large programs. Performant - ZERO profiling, ZERO optimization.

Currently runs 2x the speed of the NES on a single core. * - All SLOC figures courtesy of sloccount.

Why Lisp? (or, a few good points) My favorite tool. Never found a better way to stay in flow than Emacs+SBCL+SLIME.

Universal syntax is still (for now) the one true path to a metaprogramming culture.



It's truly agnostic which enabled me to grow with the language. Procedural, OO, FP, Logic, Dataflow, whatever!

It's the fucking playdoh of PLs. Mold the language, mold the program.

On emulators, portability, and methods of emulation It's basically a VM, duh. Simulate hardware and run a program inside the simulated hardware. Portability thinking seems to be: Write it in C. After all, C compilers are ported everywhere. Interpreted

Dynamic Recompilation

Static Recompilation

I stayed with interpretation for simplicity's sake. Crawl, then walk. For a great overview, see this paper.

Enter the 6502 What comprises this thing? Data Accumulator, X, Y - general purpose 8-bit registers

Program Counter - 16-bit register allowing for 64k addressable memory

Stack Pointer, Status Register - special purpose 8-bit registers What runs on this thing? Code 56 "instructions" (functions/primitives)

155 "opcodes" (variants for different addressing modes)

Take note pedants: I interchange these terms in talk and code. Probably incorrect.

An introductory 6502 program

;; clear the carry status bit CLC ;; accumulator, Y = 0 LDA #$00 LDY #$00 ;; loop incrementing Y. once Y wraps to zero, proceed to 0x0b loop: INY BNE &loop ;; accumulator = accumulator - (0x0001 + carry bit), halt. SBC $0001 BRK

Leaves the Accumulator holding 86, or 255 - LDA (0xa9).

zomg machine codes!

Ben Fry's Distellamap project is fantastic.

Addressing Modes, or Pointers redux

None of this "Declare a variable" nonsense. 6502 has 13 addressing modes. No instruction supports every mode. Indirect mode only used by one instruction. Arg == Byte No args - Implied, Accumulator

One arg - Immediate, Zero-page, Zero-page{x,y}, Indirect{x,y}, Relative

Two args - Absolute, Absolute{x,y}, Indirect Some opcodes use address, some use byte at address.

"Register" modes Immediate - PC, 1 byte Only in arithmetic, logical comparisons and loading. LDA #$00 ; 0xa9, LDA 0 into the Accumulator SBC #$2a ; 0xe9, Subtract 42 from the Accumulator Accumulator - A, 0 bytes Only in Bitwise Shift and Rotations. ASL A ; 0x0a, shifts Accumulator left ROR A ; 0x6a, rotates Accumulator right

"Zero-page" modes, 1 byte

Fast access to anything in bottom page of RAM.

Along with absolute, indirect comes with -x, -y variants.

ORA $51 ; 0x05, Bitwise Or [81] with Accumulator. STY $1b, X ; 0x94, Store Y register at [27+X]. LDX $1b, Y ; 0xb6, Store [27+Y] in X register.

"Absolute" modes, 2 bytes.



Specify any address in memory.



DEC $dead, X ; 0xde, Decrement [57005+X]. LDX $beef, Y ; 0xbe, Load [48879+Y] into X register. CMP $1234 ; Compare [4660] with Accumulator, set flags.

"Indirect" modes Indirect mode, 2 bytes. Only used by the JMP instruction. JMP ($1234) ; 0x6c, Set PC to [[4660]]. Indirect-x,y: 1 byte. Slightly simplified: get word at (zero-page + register). ADC ($bc, X) ; 0x61, Add Indirect-X(188) to Accumulator. AND ($ad, Y) ; 0x31, Bitwise And Indirect-Y(173) with Accumulator.

Other modes Implied mode, 0 bytes. Instruction has no operand or knows operand location.

Tons of these, Stack+Status register handling, etc. NOP ; Do nothing DEX ; or INY to Increment/Decrement the specified register. Relative mode, 1 byte. Trickiest to test because it's stateful.

Moves PC forward/back. Used in all branching instructions. BCC &19 ; 0x90, Move PC forward by 25 when :carry == 0. BNE &ac ; 0xfd, Move PC back by 2 (255 - fd) when A != 0.

On dispatch strategies Which method goes with this byte in the array? Opcodes take a CPU, figure out their addressing mode, modify its state and return it.

Store an array of methods, indexed by opcode. Opcode takes no args. `find-method` makes this annoying.

Have a giant switch/case statement. Totally unacceptable. What is this, C? :) Wound up doing dispatch via opcodes just like a switch/case interpreter... But using Lisp's EQL-specialized methods! \o/

On Assembly and Disassembly Once you have an opcodes array full of metadata, disassembly is *trivial*.



Assembly is more involved even though I was lazy and didn't write a proper parser, just regexes. Still not so bad. 2-pass assembler. First sets labels, second resolves uses. Supports labels, constants, comments. Here's a small example.

Get ready to poop your pampers







The following code has been rated GH-MA*

* - [Github Mature] For gratuitous use of parentheses.

And screwing around with EVAL-WHEN with its weak phase separation.

An executable spec, or Enter defopcode Remember when I said I'll Explain?

A Defopcode Aside A simple tweak to defopcode and opcodes return compiled closures instead of executing directly.

Not that it would give a big speedup.

Too much learning+engineering for a JIT compiler right now.

Fast interpreters are possible!

Interesting hurdles



Decimal mode - Dodged that bullet. Not in the NES! ;)



Status bits - Sort of ad-hoc and messy. No defaults for you!



Relative addressing - Oh God, Oh God.

Emulator hurdles Truthfully, CPU emulation isn't so bad. Whole systems are what's complicated. A few reasons... Concurrency+Synchronization for performance reasons. Graphics card in one thread, everything else in another, overhead keeping them 'in step' with each other.

Documentation doesn't exist in many cases. Reversing may be necessary.

In the case of the NES, the memory map is ... kinda nuts.

...and a number of cartridges had custom hardware. SNES was even worse! :P