About two and a half years ago, a wave of interest in electronics swept across the Budapest office of Intentional. We desperately wanted to create something tangible from first principles. The idea we settled on was to design and eventually build a CPU that uses Brainfuck as its machine language.

Looking back, it really was a case of people with insufficient knowledge trying to use inappropriate tools. But damn if we didn't have fun during the process! After filling a couple of notebooks with sketches, we ended up with an initial design in Logisim. It had horrible timing problems, of course, since too much of it was asynchronous. Before ironing out all the wrinkles, though, I remember Maya pointing at one of the lines tangled up on the screen, and saying "You guys realise this single line will be 16 wires if we actually want to solder this together, right?" So basically we gave up on building the thing. Later on, Maya and Encsé went on to enroll to a bachelor's program in EE as a hobby; and I decided to stick to discrete logic, ordered a bunch of 7400 TTL's and some LEDs and seven-segment displays and started wiring together much simpler circuits on breadboards. I never got to soldering, not to mention getting access to anything that could produce PCB's, Then as I moved to Singapore, I left all my electronics stuff at home, and put the whole electronics thing on the backburner indefinitely.

Then, a couple months ago I discovered the Papilio FPGA platform, which has this collection of nice IO daughterboards (called "wings") that snap right into it, no soldering or even wiring required. I ordered one with the LogicStart IO board which features, among other, more advanced stuff, eight toggle switches and four seven-segment displays. Perfect for my baby steps into the world of FPGA's!

So what else could my Hello World project have had been, than the Brainfuck CPU.

Basic design

We can use the Harvard architecture: since the Brainfuck language has no reflection capabilities, the program can be stored in ROM with no programmable access. Memory is implemented as a RAM of 32K 8-bit bytes. The CPU also has several internal registers:

Internal state: As we will see, it takes several clock cycles for the CPU to execute a given Brainfuck opcode. Rather unfortunately, I ended up with 9 states, so I need 4 bits to store it.

Program counter ( PC ): directly connected to the address pin of the ROM.

): directly connected to the address pin of the ROM. Opcode register: loaded from the data pin of the ROM in the Fetch state.

state. Memory pointer ( idx ): a 15-bit register that is directly connected to the address pin of the RAM.

): a 15-bit register that is directly connected to the address pin of the RAM. Data out register: for simplicity's sake, new values for RAM[idx] , and the write enable leg of the RAM, are buffered

, and the write enable leg of the RAM, are buffered Depth counter ( DC ): Used in the implementation of the [ and ] opcodes (see below)

Output is implemented by an 9-bit signal: 8 bits of data and an enable bit. When a . opcode is encountered, the CPU sets these 9 bits, and enters a special state until it receives an acknowledgment signal. Input is implemented similarily. On the actual board, the output signals are connected to the seven-segment display, the input signals are fed from the eight toggle switches, and the directional "mini-joystick" is used to acknowledge input/output.

Implementing [ and ]

Compared to a normal machine language, it's really just [ and ] that requires special handling. Everything else is just straightforward manipulation of either idx or RAM[idx] via incrementing/decrementing; or pushing data between RAM[idx] and the IO port. [ and ] are tricky because we need to search for their matching pairs, and pre-processing the Brainfuck program to attach pair addresses would be against the spirit of this (self-imposed) challange.

One solution would be to maintain a stack in a separate RAM, and push PC into it whenever a [ is encountered. In that case, ] is a simple matter of popping PC if RAM[idx] does not equal 0. However, here we've basically changed [ / ] from a while loop to a do while loop. So if RAM[idx] is 0 when we first enter the [ , we have to scan the program forward to find its matching ] .

For simplicity's sake, I decided not to worry about performance and skip the stack part, and just implement scanning in both directions. Scanning is where the DC register is used (explained here for [ , but ] is similar): if the opcode is [ and RAM[idx] is 0, DC is set to 1, and the CPU enters a special skip-forward state from the next opcode. In this state, only [ and ] opcodes have any effect: they increment and decrement, respectively, the DC register. When DC gets to 0, we know we've found the matching ] , and so we can go back to the regular fetch-execute cycle.

Lava → VHDL → Xilinx

I originally planned to implement the whole thing in VHDL, and compile that using the Xilinx synthesizer tools (since the Papilio One board uses a Xilinx FPGA chip). However, I've found VHDL to be quite a horrible language from a software programmer's point of view. The whole point, for me, of going from physical chips to FPGA's was to enable using abstractions to manage complexity; so why settle for a poor language like VHDL? Fortunately, there's a whole family of Haskell-embedded DSLs for hardware description called Lava. Of these, Kansas Lava seemed the only one actively maintained, and it already had support for a Xilinx dev board; so adding support for the Papilio was straightforward (see my kansas-lava-papilio package).

The complete code for my Brainfuck CPU (including IO via the LogicStart daughterboard) is available on GitHub. There are quite some rough edges left to file off; I'd say the most pressing is adding the ability to synthesize the program ROM separately from the CPU definition.

Videos

This first video shows a simple countdown (actually, count-up ) program: ,[>+.<-] . I had to record these slightly out-of-focus, otherwise the seven-segment LEDs were hard to read.

Next up is "Hello world!":

And the final one shows Maya's solution to the 9-digit problem. This one really shows how slow this naïve, stackless implementation is.