Moonforth

This is a detailed interactive guide to building a Forth for an unusual architecture, the DCPU-16. Ever wanted to learn how the system on the Philae comet lander works, ported to an imaginary vintage computer used to control a spaceship in a futuristic game by the creator of Minecraft? Then this is the guide for you!

You can download this guide and contribute.

Forth for the DCPU-16

Why Forth?

Forth is an incredible language, and not very widely understood in the 21st century. It's seen as a bit of a dinosaur, or a relic from the golden age of computing when assembly was written by hand. Indeed it was initially developed for what are now vintage computers, but then so was C and so were other languages like Lisp—which is older than both Forth and C. The reason why Forth is now seldom used is allegedly because it is extremely low-level without having the performance advantages of C. But if you don't need high-level constructs like garbage collection, and you don't want to eke every last cycle of performance from your architecture, then perhaps you will be receptive to Forth's capabilities:

what was most important was that nothing was hidden, there were no complex data structures around with “don't-look-at-this” parts (think on garbage collection in Lua, for example, and Lua's tables - beginners need to be convinced to see these things abstractly, as the concrete details of the implementation are hard), and everything - code, data, dictionaries, stacks - were just linear sequences of bytes, that could be read and modified directly if we wished to. We had total freedom, defining new words was quick, and experiments were quick to make; that gave us a sense of power that was totally different from, say, the one that a Python user feels today because he has huge libraries at his fingertips. Eduardo Ochs

The key to Forth is that it gives you usable malleability at any level you like, including poking around in the very deep internals of your computer. As Ochs says, nothing is hidden in a Forth, and this means that you have the power to do things that are very difficult in other languages. The freedom to tinker is important.

Why the DCPU-16?

To learn Forth, it is a good idea to implement a Forth—to become a Forthwright—and learn how the language works under the covers. This necessarily means writing a Forth for a specific architecture. There are many architectures to choose from. You might pick the Z80 or 6502 for vintage charm, or the x64 for cutting edge performance. Instead, we pick the DCPU-16.

The DCPU-16 was created by Markus "Notch" Persson in 2012 for the game 0x10c. Notch, having been responsible for Minecraft (one of the best selling and most widely known computer games of all time), had a fanatical following when 0x10c was announced. Since 0x10c was to be a successor to Minecraft, and since it concentrated on vintage computing with a 16-bit CPU, programmers descended upon it in droves and wrote their own emulators, assemblers, even entire operating systems, for the architecture before the game was even released.

The pressure that this put on Notch led him to give up working on the game. He didn't think that he could keep up with the fans. Though the specifications for the DCPU-16 were posted online, the game was abandoned, and later all of the official material for the architecture was taken offline. It's a shame, because 0x10c could have sparked a vintage computing renaissance, and it looked very fun to play. But the specifications are still available, hosted all over the web by the many fans, so the DCPU-16 can still be targeted.

The DCPU-16 is the novelty choice. It means that a Forth written for it is not going to be taken too seriously, not run in earnest, not even run on a chip since there is no existing DCPU-16 in silicon. Because it was designed for a game, the DCPU-16 is also quite clear and simple and reasonable. This makes its assembly code easier to understand, at least for assembly. In other words, it's a great architecture for learning and having fun, and since Forth has these qualities too, the two should work together well. As the creator of Forth says:

I encourage people to write their own Forth. The standard doesn't mean that you cannot invent something. [...] You can do three things with a computer. You can try to make money and that is unlikely. You can try to become famous and that never happens. And you can have fun and that always works. Chuck Moore

Let's have fun with Forth.

How Forth works

Forth is a simple language. It is the synthesis of a few ideas:

somewhere to store code and data (a dictionary), a way of passing information between bits of code (a stack), a way of invoking bits of code from other code (an evaluation function) and a way of knowing where to come back to when it's done (a return stack). A few primitive operations to read and write memory, and a few global variables so that code can find the stacks and dictionary, and that's it done. Frank Carver

The interface is just as simple. Words are read in a REPL, and either compiled or interpreted depending on the current REPL state. When a word is compiled, it's added to the dictionary—the mapping of words to their procedures. When a word is interpreted, then it is looked up in the dictionary and its procedure is entered. The stack is used to send parameters between procedures.

The compilation of words into their dictionary forms is what gives Forth its flavour, more so than the stacks or postfix notation. These forms, the structure of the procedures in the dictionary, are called threaded code—not to be confused with the "multitasking" sense of threading which is most common in contemporary computing. A better word for it would be woven code. Compilation in Forth is the act of threading code into the dictionary. Here is a good explanation of Forth threads:

They are a framework for talking about code compilation and meta-programming. [...] A compiled forth word is just a consecutive array of fixnums, most of which represent pointers to other words. This has always been one of the advantages of forth. Because of the transparency in the threading of the program into memory, forth allows fine control over many programming tradeoffs, including one of the most important: Execution speed versus program size. Threaded code lets us optimise our abstractions as close to our problems as possible, resulting in extremely fast, small programs. But just as lisp macros are about much more than just efficiency, so are forth threads. As much as lisp programmers, forth programmers tend to think of themselves as implementors instead of mere users. Forth and lisp are both about control—making your own rules. Doug Hoyte, in Let Over Lambda pp.290–1

There are several ways to thread code, and each implementation must pick one. We're going to use indirect threaded code (ITC), which was classically the most popular way to write a Forth, and is the easiest to understand. To show how it works, we're going to need to explore the Forth VM.

The Forth VM

Forth is a way of interacting with a Von Neumann machine in a sensible way. And a Forth implementation is a fully reflective and introspectable glue layer between a Von Neumann CPU assembler and the Forth VM. Writing a Forth implementation means moulding the underlying architecture into something more tractable.

The Forth VM consists of two main instruction registers, two registers that point to stacks, and what is sometimes described as an "inner interpreter" called NEXT that makes it all run. This inner interpreter is not really an interpreter, it's more like a return from a subroutine. In the assembly code implementation of the Forth VM, we'll return at the end of subroutines but we'll jump to NEXT at the end of sections of Forth VM code.

To explain the Forth VM in more detail we'll need to look at a specific example. Here is a very simple Forth program:

: DOUBLE DUP ADD ;

This is a bit like the following in JavaScript syntax:

function double() { dup(); add(); }

We're defining a word called DOUBLE which calls the word DUP and then the word ADD in its definition. DUP duplicates the top item of the parameter stack, and then ADD adds the top two items together. So for example:

DUP [5] -> [5 5] ADD [5 5] -> [10]

But what the words actually do, which is the interesting part to the Forth user, is not relevant at the Forth VM level. At this level, we're interested in the actual memory structure of the threaded code, and the execution model.

Execution tokens

This is a simplified version of what threaded code looks like inside the computer for DOUBLE, DUP, and ADD in an ITC Forth like ours:

Key Value DOUBLE DOUBLE-XTO DUP ADD DOUBLE-TAIL DUP DUP-XTO ADD ADD-XTO

This is just a representation of RAM, the memory of the computer. The value column is the consecutive sequence of integers that makes up the memory array. The key column are indices into that array, and computer people start counting at zero not one. So DOUBLE might represent the number 0 (1st row), DUP the number 4 (5th row), and so on.

DOUBLE is a Forth word, and it points to the value DOUBLE-XTO followed by DUP, ADD, and DOUBLE-TAIL. We call these four items the execution tokens (XTs) of DOUBLE. Similarly, DUP has a single XT called DUP-XTO, and ADD has the single XT called ADD-XTO. Note that the first XT of DOUBLE, DUP, and ADD all end with the suffix -XTO, which is short for XT ZERO. Traditionally this is called the Code Field Address or CFA, but we do not follow that tradition here.

When DUP is being used as a value, as it is in the 2nd row, then we say this is an XT. But when it's being used as a key, as it is in the 5th row, we say this is the definition of the word DUP and it points to the DUP-XTO XT. Every Forth word is designed as a list of XTs like this.

Here is a slightly more detailed look at the same thing:

Key Value NEXT WORD-FROM-CODE NEXT-WORD JUMP-XTO-VALUE DOUBLE DOUBLE-XTO DUP ADD DOUBLE-TAIL DOUBLE-XTO R-PUSH WORD-AFTER-XTO NEXT DOUBLE-TAIL DOUBLE-TAIL-XTO DOUBLE-TAIL-XTO R-POP NEXT DUP DUP-XTO DUP-XTO DUP-MACHINE-CODE NEXT ADD ADD-XTO ADD-XTO ADD-MACHINE-CODE NEXT

We've added what looks like a new word, NEXT, and we've also defined the -XTO and -TAIL XTs. Note how all of the -XTO and -TAIL XT sections end with NEXT as a value. NEXT is the only key here which is neither a word (DOUBLE, DUP, ADD) nor an execution token (DOUBLE-XTO, DOUBLE-TAIL, DUP-XTO, ADD-XTO). That's because NEXT is the core piece of VM machinery that allows us to return from these XTs into the next piece of code.

The way that a CPU generally works is that it has a Program Counter (PC) register which is a key pointing to some value, and the value is a number which the CPU takes as machine code to execute. When it has been executed, the PC is usually incremented to the next number, so the adjacent cell—the next value below in the value column—is executed next, unless the machine code said to jump to another place. If you're familiar with assembly programming, this means that a jump is just a way to set the PC register.

The two main registers in the Forth VM are the code register (CODE-R), and the word register (WORD-R). For historical reasons these registers are usually called IP and W respectively. CODE-R contains a key that points to the next word to be executed as its value. WORD-R contains a key that points to the XTO of the current word being executed as its value. So for example, CODE-R could be set to 4 which is the blank 5th key just underneath DOUBLE, which points to DUP. And WORD-R would be set to 3 which is the 4th key, DOUBLE, which points to DOUBLE-XTO.

How NEXT works

Let's say we have the following Forth code:

5 DOUBLE DUP

This means to put 5 on the stack, giving us [5], double it giving us [10], and then duplicate it giving us [10 10]. We'll assume that we have already put 5 on the stack, and now we're going to call DOUBLE. We start with CODE-R set to a key which points to DOUBLE, because that's the next word that we want to execute. WORD-R can be set to anything to start with; it doesn't matter because we're about to use NEXT to set it for us. We call NEXT. If you look at the table above, you'll find that NEXT is defined as the sequence WORD-FROM-CODE, NEXT-WORD, and JUMP-INTO-XTO. Usually NEXT is implemented in assembly, and sometimes the assembly is so compact that it can be done in a single CPU instruction. But let's understand what is really going on here in the Forth VM independent of any specific CPU architecture implementation:

WORD-FROM-CODE uses CODE-R as a key to get a value that we store in WORD-R. So for example if CODE-R points to DOUBLE, then we set WORD-R to DOUBLE. DOUBLE itself points to DOUBLE-XTO. In C terminology, we're doing a pointer dereference on CODE-R, so it would look something like word_r = *code_r . In DCPU-16 assembly it looks like SET WORD_R, [CODE_R] . NEXT-WORD increments CODE-R so that it points to the next word in our current code. Our input code is "5 DOUBLE DUP", and CODE-R currently points to DOUBLE. The next thing that it's going to point to is DUP, so we increment CODE-R to point to DUP. CODE-R is now the key that points to the value DUP. JUMP-INTO-XTO uses CODE-R as a key to get a value that we jump to. At the moment CODE-R is set to DOUBLE. When we look up DOUBLE in memory, we find that it points to DOUBLE-XTO so we jump to DOUBLE-XTO.

There are lots of different possible notations for expressing this. Here is one way of doing it:

WORD-R = GET(CODE-R) CODE-R = NEXT(CODE-R) JUMP(GET(WORD-R))

Here is another:

WORD-R And here is a gallery of explanations of ITC NEXT from around the web:

*ip++ -> w jump **w++ [IP] TO W CELL +TO IP [W] JUMP W <- [IP] IP <- IP+1 X <- [W] JMP [X] (ip)+ -> w jmp (w)+ lodsl jmp *(%eax) (IP) -> W IP+2 -> IP (W) -> X JP (X) W <= (IP++) PC <= (W) w = *ip++; jmp *w; mov @ip+,w mov @w+,pc IP )+ W MOV W )+ ) JMP mov W, [IP] add IP, ONECELL jmp [W] cfa = *ip++; ca = *cfa; goto *ca;

(Sources: 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12. Spot the odd ones out!)

The notation doesn't really matter as long as you understand it. What we're doing in NEXT is setting WORD-R to point to the XTO of the next word, and then CODE-R to point to the word after next. You can think of it like two people skipping past one another: CODE-R was ahead, and then we set WORD-R to go past it, and then CODE-R goes past WORD-R. Then we go and do whatever WORD-R is pointing to, which will end with another call to NEXT so that we can do this over again. It's a bit like looping using tail recursion.

Primitive and regular words

There are two kinds of Forth word: primitive, and regular. In our example, DUP and ADD are primitive words, and DOUBLE is a regular word. Primitive words are quite straightforward:

Key Value DUP DUP-XTO DUP-XTO DUP-MACHINE-CODE NEXT

So a primitive word is one which only has one XT that points to some machine code followed by NEXT. The DUP-XTO value is actually the machine code implementation of DUP itself, and nothing else needs to be called. We finish with NEXT as usual. But things are a bit more complicated for regular words:

Key Value DOUBLE DOUBLE-XTO DUP ADD DOUBLE-TAIL DOUBLE-XTO R-PUSH WORD-AFTER-XTO NEXT DOUBLE-TAIL DOUBLE-TAIL-XTO DOUBLE-TAIL-XTO R-POP NEXT

We find that a normal word is one which has a list of XTs starting with an XTO and then being followed by XTs which are words. Even DOUBLE-TAIL is a primitive word, with its own XTO. And instead of the XTO being an arbitrary block of machine code, it always has the contents R-PUSH, WORD-AFTER-XTO, and NEXT. These three bits of machine code are usually given the assembly label ENTER, or in old Forth DOCOL. ENTER is actually a pretty good name, but it does hide the implementation details at the VM level. What's happening here is as follows:

R-PUSH copies CODE-R and pushes it onto the return stack. We do this so that we know, at the end of executing the current word (DOUBLE in this case), where we're coming back to. In the "5 DOUBLE DUP" example, when we execute DOUBLE then CODE-R would be set to the DUP that comes afterwards, so that's where we'll come back to. WORD-AFTER-XTO sets CODE-R to the value that comes after the current value of WORD-R. This means that CODE-R is moving from the outer layer of code, "5 DOUBLE DUP" to the insides of DOUBLE. Since WORD-R points to DOUBLE-XTO, CODE-R will now be pointing to the DUP value that is inside the definition of DOUBLE.

To make this clear, imagine that the code that we're executing is in memory just before the definition of DOUBLE. We'll leave out the 5, so we're just putting "DOUBLE DUP" before the code, and we'll also add a word called QUIT so that the interpreter doesn't run into the code that follows. Here is where CODE-R starts before WORD-AFTER-XTO and where it is afterwards:

Key Value Register DOUBLE DUP CODE-R is here before WORD-AFTER-XTO QUIT DOUBLE DOUBLE-XTO WORD-R is here DUP CODE-R moves to here after WORD-AFTER-XTO ADD DOUBLE-TAIL DOUBLE-XTO (ENTER) R-PUSH WORD-AFTER-XTO NEXT DOUBLE-TAIL (EXIT) DOUBLE-TAIL-XTO DOUBLE-TAIL-XTO R-POP NEXT

What's being done is that we're moving CODE-R to point inside the word that we're currently executing. WORD-AFTER-XTO moves CODE-R to point to the memory cell directly after the current WORD-R. This will always be a Forth word, but only in the body of regular Forth words. DOUBLE is a regular Forth word, so it has Forth words (DUP and ADD) inside it. This is why only the DOUBLE-XTO of regular Forth words needs to contain WORD-AFTER-XTO.

The DUP and ADD inside the DOUBLE definition are then executed, and then we come to DOUBLE-TAIL. This is usually called EXIT, or in old Forth ;P. (;P may have been named by Chuck Moore, who favours extremely short and cryptic words in Forth.) Once again EXIT is quite a good name, but it hides the VM details. EXIT is the dual of ENTER, and it works as you might anticipate:

R-POP pops the value at the top of the return stack, and puts that value into CODE-R. In other words, we're going back to where CODE-R was originally before it entered this word. We have now done the opposite of the earlier R-PUSH, restoring the state of CODE-R to how it was immediately before the R-PUSH was called. Note that R-POP is the dual to both R-PUSH and WORD-AFTER-XTO combined, because the R-POP actually sets the value of CODE-R and reverses what WORD-AFTER-XTO and any subsequent calls did.

Then we call NEXT, and the Forth VM continues with its normal business.

DCPU-16 assembler

We can implement NEXT, along with the ENTER and EXIT execution tokens, quite easily in just a handful of lines of DCPU-16 assembly. When we write code that we want to reuse, we'll put a capitalised bold module name in front of it that begins with an underscore, like _MODULE. Then later on we can refer to the code using this module name.

In assembly, we use colon-prefixed words to temporarily label memory locations, a bit like the keys in the examples given above. The instructions that follow the labels are then compiled into machine code, and the address of that machine code is stored into the label. There are only a few DCPU-16 instructions, and we won't even be using all of them, so it's a good idea to read about them and learn them to understand the code. It should be fairly self-explanatory anyway.

_NEXT:

:NEXT :WORD_FROM_CODE SET J, [I] ; WORD-R Now we can embed this code within a simple test program, and then actually run it directly in the browser! Any interactive code sample has a button below it called "Run Code". When you click that button, the interface will expand and a row of new buttons and some new text will appear beneath. The buttons control the DCPU-16 emulator and work as follows:

Expand — does a macro expansion on the code.

Labels — toggles showing memory labels next to the RAM dump.

Step — processes the next instruction pointed to by the PC register, dumps the output, and stops.

Run — just sets the CPU to go full steam ahead, running as many instructions as it can as fast as it can. This is not very fast by modern CPU standards, but it's extremely fast for interactive input, so expect to go through many tens of thousands of CPU cycles per second. Changes to Stop, which enables you to stop the running CPU.

Note that until you hit "Run Code", the modules and the code samples are editable within the browser. That means you can make your own modifications to experiment. When you hit Run Code, the current state is then frozen for that particular code sample.

The assembler and emulator used here is a modified version of Mappum DCPU-16 by Matt Bell. There are several modifications:

The CPU and RAM dump now shows the current assembly label for the I, J, and PC registers. I is being used for CODE-R, and J is being used for WORD-R. The keyboard driver was broken, and is now fixed. There was no include facility, so an .INCLUDE macro is now added. All instances of the macro are expanded recursively in a preprocessing phase. A .LINK macro has been added, explained below. A .BREAKPOINT macro has been added, explained below. The ability to callback on a running CPU has been added.

The .INCLUDE macro is especially useful as it allows us to embed modules previously defined with the bold capitalised labels starting with an underscore.

Here is the first interactive test program, used to try _NEXT: