31 December 2013 (programming brainfuck retro)

What would the Christmas holidays be without some silly hacking? This year, I had this most random idea to try to write something for the Commodore 64, the result of which is available on GitHub.

I grew up on the C64 (I think I was 7 when we got one), but interestingly, I never got into the whole low-level programming thing. Instead, I started using the Graphics BASIC extension after growing out the built-in BASIC interpreter. So I never learned too much about the details of the 6510 CPU, or all the other driver chips like the SID or the VIC-II; but everyone knew even back then that you can't get anything serious done with the performance of BASIC interpreters.

These days, of course, you have very nice cross-compiler tools so you can get both the performance and the advantages of a somewhat-higher-level programming language. My idea for a project was to use the CC65 C cross-compiler to write a compiler for Brainfuck.

So after reading up on the 6502 processor family, it seemed very straightforward to implement the Brainfuck instructions by just storing the cell pointer in the X register and loading to / storing from the A register to do cell arithmetics. Of course, to get good performance, we compile bunches of the same Brainfuck instruction together, so e.g. +++ is compiled into

LDA mem,X ; 4 cycles CLC ; 2 cycles ADC 3 ; 2 cycles STA mem,X ; 5 cycles

instead of the slower

INC mem,X ; 7 cycles INC mem,X INC mem,X

(see e.g. here for instruction timings).

So I wrote some C code that generates code like the above, machine code byte by byte. The only interesting part is compiling code for the [ and ] Brainfuck instructions. First of all, the 6502 only has conditional branch instructions for near jumps, whereas in Brainfuck the loop body can be arbirarily large. And second, since the compiler is to be on-line in the sense that it emits the machine code as it reads the Brainfuck source without ever holding the full source in memory (otherwise the source could take up too much of the already limited 64 kilobytes of memory), we have actually no idea when processing a [ how to jump over the loop. Because of these factors, the code I generate for [ is like this:

loop: LDA mem,X BNE enter JMP $0000 enter: ... ; rest of the program

where the argument to the JMP instruction is zeroed out initially. We store the loop address in a stack in the compiler, and when we encounter the matching ] , apart from emitting the code for JMP loop , we also edit loop+6 to store the current target address.

So that takes care of a traditional compiler. However, my compiler also has a transpiler mode, where it emits C64 BASIC from Brainfuck. Writing this was a lot of fun because I finally got to learn the details of how BASIC programs were stored on these low-powered home computers. Turns out programs were pre-tokenized line by line, so for example, the following line:

10 IF X% = 0 THEN GOTO 100

is stored as

10 <IF> X% <=> 0 <THEN> <GOTO> 100

So the variable name X% and number literals 10 , 0 and 100 are stored as text, but the keywords IF , THEN, GOTO and the = operator are stored as single-byte identifiers. I think it's interesting that number literals are stored as text instead of numbers.

The overall structure of the BASIC transpiler is otherwise the same as the compiler. But typing LIST and seeing my generated BASIC code for the first time was so much more satisfying than looking at the generated machine code in the monitor.

In closing, here's a screencast of the program in action.