Implementation of the Hecl Programming Language

David N. Welton

davidw@dedasys.com

2006-03-15

What is Hecl, and why I'm writing it

Have you ever wondered what it's like to create your own programming language? This article covers the design, and above all, implementation of the Hecl programming language. It explains, in detail, how we go about taking some text in a file, and making the computer carry out the instructions written therein. Since Hecl is not a fancy, complicated language, but rather a small, simple, humble language, it should be pretty easy to follow. If you are working on your PhD in computer science, you probably won't find this interesting, as you most likely already know more than I do about the subject. But for everyone else, I hope to provide an interesting glimpse into what goes into the making of a programming language.

A form created with Hecl, running on the J2ME emulator.

As I said above, Hecl is a small, simple language. It's that way for a reason - first of all, because I wanted to have it up and running quickly, but most of all, because I wanted something that would run in the space constraints of my J2ME mobile phone. Furthermore, despite - or perhaps because of - having a simple design, Hecl is quite flexible and extensible, meaning that it could grow into a lot of different niches. And, of course, it makes a good subject for an article, rather than the entire book that would be necessary to describe more complicated languages.

I started working on Hecl when a couple of things happened: I bought a cell phone that ran J2ME, and I decided it was probably time to learn some Java. This was a fortuitous combination, as I'm not very good at learning things without a project to hack on as an incentive, and the cell phone provided just what I needed. Java doesn't hold a lot of fascination for me - it's got some nice aspects, but most of my programming needs are covered either by dynamic languages like Tcl or Ruby, or C for the fast, low-level bits. However, J2ME is a very interesting space to work in, because it's already out there on millions of phones, and many of the phones that support it can't run anything else. The Symbian platform offers a bit more - you can port regular C code to it, but when I purchased my phone, that was still limited to relatively expensive models. Time will tell which platform will win, but in the meanwhile, I liked the idea of writing something that will work on millions of devices. And while there are some other scripting languages for Java, there only a few others that can run on the J2ME platform, because it imposes quite a few limits on what you can do with Java. For example, the reflection API, which is heavily used by languages like Groovy, does not exist.

The Hecl code that creates the above form.

With a goal in mind, and fingers itchy to starting laying down code, my last stop was to consider the language's overall design. In reality, this wasn't so hard. I really like the Tcl programming language, especially its syntax, which borrows more from Lisp than a lot of people realize or give it credit for. Tcl is, on the surface, a pretty simple language, which in turn allows it to be incredibly flexible. It's possible to write an object (classes, inheritance, and so on) system in Tcl itself, and even to add new control structures (like 'if' and 'while') that are written in Tcl itself! It is also an easy language to learn - it "scales down", something that I place a lot of value on because it means that the language is accessible to more people, even if they aren't great programmers or don't have a degree in computer science. Tcl isn't perfect, but since I'm not trying to write a Tcl clone in Java, I am also free to take what I like, improve what I don't like, and be creative where I need to be. Once again, the decision to abandon Tcl compatibility was, in part, driven by the need to keep things small, and in part to maximize the fun value of the project - it's more exciting to work without pre-established limits. There is already a version of Tcl written in Java called Jacl, but it is far too large for my phone's scarce memory.

The rest of this article deals with the technical details of the Hecl language. For those interested in seeing exactly how the code described fits into the rest of the system, it's possible to browse Hecl's code on line here: https://github.com/davidw/hecl - in particular the core classes, located here: https://github.com/davidw/hecl/tree/master/core/org/hecl

How Hecl works

Parser

So - how does the computer recognize and act on some scribblings in a file? The first thing that Hecl must do is read what's in the file, and turn it into small chunks that it can make sense of. This portion of Hecl is the parser, defined in Parse.java. As mentioned previously, Hecl is a simple language. So in order to more easily discuss what's happening, let's look at a bit of Hecl code:

set txt [getprop $tf text] if { ne $txt "" } { # a comment stringitem label "You wrote:" text $txt puts "hello" ; puts "world" ; }

Hecl's parser only recognizes a few characters as special: "" [] {} $ # ; and newline. The parser reads each line of text, and breaks it into pieces according to simple rules:

Commands and their arguments are divided into words by spaces.

A command is ended by either a newline or a semicolon - ';'.

{curly braces} group the arguments inside, which are passed to the command 'as is'. This includes backslashes, newlines, and anything else

"double quotes" group everything between them into one word, but allow for substitution, so that you can do puts "The variable a is equal to $a"

[square brackets] evaluate the command inside and return its value. So where in a language like C, you might do foo = bar() , in Hecl, you would do set foo [bar] .

, in Hecl, you would do . $dollar returns a reference to the value contained in the variable named.

\ backslash quotes the next character. For instance "foo\[bar" would contain a literal left bracket instead of trying to evaluate bar as a command, as an unquoted brace would do.

would contain a literal left bracket instead of trying to evaluate bar as a command, as an unquoted brace would do. # is the comment character. Everything after # on a line is ignored.

That's all! Hecl's syntax rules can be described by just 8 rules. Even if that looks a little complex, think about what it would take you to describe the syntax of a language like Ruby, Python, or C (or Perl, for real masochists!)

With that in mind, let's look at the snippet of code above. The first line can be broken up like so:

A command, set . An argument to that command, txt A third argument, some text to be evaluated, [getprop $tf text] . Hecl doesn't evaluate it immediately though, it just stores it and makes a note that it should be evaluated.

The second bit of code, also a single command, is similar.

A command, if , followed by two arguments, which happen to contain more Hecl code: ne $txt "" # a comment stringitem label "You wrote:" text $txt puts "hello" ; puts "world" ;

This is important to understand, because it's very different from how many languages work. In Hecl, if is an ordinary command just like any other, whereas in C or Python it is part of the syntax, rather than a function that users can define or modify.

In any case, what Hecl does as it reads in the source code is create an "internal representation" of the various commands and variables it is reading about. It does this so that it can store the whole thing, and then run it when it's ready, rather than trying to execute it piece by piece. This speeds things up some when code gets executed multiple times - the code in a while loop, for example - because Hecl doesn't have to re-parse it each time. When taken at as a whole, the parsed Hecl program could be viewed as a tree, with each node a part of the program as described below.

Hecl "Things"

This brings up a discussion of how Hecl stores values, which we need to cover before we come back to our discussion of parsing and Hecl's internal representation.

Since the term "object" has been used far too much in the realm of computer science, for Hecl I decided to revert to the equally vague, but more anglo-saxon term "thing". All values in Hecl are covered by the Thing class, which holds another value internally depending on what type of value it's holding. As far as more or less standard types of things, Hecl has StringThings , IntThings , ListThings , and HashThings , as well as DoubleThings for non-J2ME versions of Hecl. These are all pretty standard data types/structures found in most scripting languages. [1] Hecl also has an ObjectThing , which can contain any Java object, with the caveat that if it's transformed into its string representation, it will lose the reference to the object it contains.

Where things get tricky is in the internal types, where Hecl has CodeThings , GroupThings , and SubstThings , which, along with plain old StringThings , cover the internal representation of Hecl code. SubstThings are the simplest. When the parser encounters a variable with a dollar sign or ampersand, it creates a SubstThing to place in Hecl's internal view of the script. To understand how this case works, consider the following script:

set i 0 while { < $i 100 } { puts $i incr $i }

Obviously, Hecl needs to reexamine the value contained in 'i' each time through the script rather than substituting it once! SubstThing lets Hecl do that, because it knows it's dealing with something that really stands for something else - "$i" in this case refers to an integer, but a different one each time through the loop, so it must be substituted each time through.

Things get a bit more complex, though, because Hecl, like many scripting languages, lets you do "string interpolation". Consider this example:

puts "The value of 'i' is still $i and 1 + 1 is still [+ 1 1]"

We use the GroupThing class to handle this case. As the name suggests, it's a group of other things, as shown below:

GroupThing s evaluate their components when Hecl needs to know what the GroupThing itself contains.

Things are implemented in Java like so: a Thing class that contains a RealThing , which is its actual value:

public class Thing { public RealThing val;

CodeThing , GroupThing , StringThing and all the others implement the RealThing interface, so that Hecl can deal with them more or less interchangeably. The implementation of IntThing is a good starting point, because it's pretty easy to understand what an IntThing is and does.

public class IntThing implements RealThing { private int val;

We see that an IntThing holds an int - the real value that we are actually interested in after all this indirection! There is a convenience method in IntThing for the creation of new Things representing integers:

public static Thing create(int i) { return new Thing(new IntThing(i)); }

Which gets used a lot like so: Thing foo = IntThing.create(56); However, in terms of importance, the heart of what makes Hecl's types dynamic are two methods:

private static void setIntFromAny(Thing thing) throws HeclException { RealThing realthing = thing.val; if (!(realthing instanceof IntThing)) { thing.setVal(new IntThing(thing.getStringRep())); } } public static int get(Thing thing) throws HeclException { setIntFromAny(thing); IntThing getint = (IntThing) thing.val; return getint.val; }

What's happening here is that when you need to get an int from a Thing , you do call get like so: int foo = IntThing.get(foothing); as you can see, this code examines the thing's internal value, and in case it's not already an IntThing , turns it into a string and tries to parse it into an integer. Of course, that might fail, because if you try and get an integer out of a string "bleagh", it's just not going to work out! That's why both methods are capable of throwing HeclException , which we'll cover shortly.

Evaluation

Prior to understanding how Hecl code is actually evaluated, let's return to our discussion of parsing for just a moment. Now that you know how Hecl actually stores things internally, it's easy to understand what happens when Hecl code is parsed - it gets turned into Things:

puts "foo bar" -> StringThing (puts) and a GroupThing (foo bar). puts $i -> StringThing (puts) and a SubstThing (i) set i [+ $i 10] -> StringThing (set), StringThing (i), followed by a CodeThing. The CodeThing is parsed up like so: + $i 10 -> StringThing (+), SubstThing (i), StringThing(10)

You may be wondering why the 10 above is a StringThing when it's pretty obvious that it's actually a number and intended to be used like one. Your intuition is correct, but Hecl initially parses everything as strings, and only transforms them into other types of things when the code is evaluated.

To understand how Hecl is actually evaluated once we have a representation stored in memory, we need to examine CodeThings , which are the real motor of Hecl. You've probably guessed from the name that they contain code, which is indeed correct. CodeThings are divided into "stanzas", which are equivalent to individual commands. For instance, in the following code, the actual code would be transformed into a CodeThing with two stanzas, puts $i and incr $i . Each time the CodeThing is asked to run, it loops through the Stanzas, and in turn, calls their run method.

The actual evaluation takes place in the stanzas. To Hecl, the first stanza looks something like this:

StringThing (puts) SubstThing (i)

In order to do something with that, it needs to look up "puts" as a command. The Interp class contains a hash table that maps command names such as "puts", "if", "proc", "+", "list" and so forth, to the respective Java classes that implement that code. The "puts" command maps to the PutsCmd class, which is very, very simple:

class PutsCmd implements Command { public void cmdCode(Interp interp, Thing[] argv) throws HeclException { System.out.println(argv[1].getStringRep()); } }

All command classes implement the Command interface, and therefore must have a cmdCode method. But let's not get ahead of ourselves...

The stanza that contains the puts command also has something to pass to it as an argument, in this case a SubstThing . For each of the command's arguments in the stanza, Hecl checks what type it is. If it's something simple like an IntThing or StringThing , it gets passed on to the command sans modification. SubstThing s GroupThing s and other CodeThing s are recursively called in order to find out what value they actually represent, so that the actual command might be called like so:

puts 23

The arguments (including the command name) are passed to the cmdCode method as an array of Things: Thing[] argv array. In the case of PutsCmd , all it does is print out the string representation of the first argument.

This is how all Hecl commands work, even if and while , which take code as arguments, evaluate it, and act accordingly.

This simple strategy isn't blazingly fast, but it is very, very flexible. You can replace code on the fly, rename commands, and do all kinds of other interesting tricks. Additionally, it's easy to implement this system in the constrained J2ME environment.

Variables

Astute observers will have noted that when we covered SubstThings above, we didn't really explain how they work. How does Hecl go from a variable name - say, "foo", to a value contained therein, like 42? Hecl uses a simple hash table lookup scheme where each "stack level" has its own hash table that associates names to values. What this means is that inside a 'proc', a new hash table is created to hold local values. To see this in action:

set a 0 puts "a is $a" proc foo {} { set a 10 puts "in foo, a is $a" } foo puts "a is $a"

when run, gives us:

a is 0 in foo, a is 10 a is 0

In terms of program flow, you can see in the above figure that at 1, the global variable lookup table is in effect. When program evaluation enters the foo procedure, the global lookup table is replaced by a new one for the procedure being executed. At 3, when the procedure exits, the hash table that was created in the interpreter to hold its variables is set to null and left for Java's garbage collector to clean up. The stack itself is implemented with the convenient java.util.Stack class.

The variable hash table lookup routines are contained in the Interp class (along with the command lookup hash table).

Interp

Indeed, the Interp class is the central data structure that holds everything together. It contains the program's state in terms of variables, existing commands and procedures, exceptions, and so on. For Java developers looking to include Hecl in their own code, one of the handy aspects of Interp is that you can have more than one of them at a time. Imagine, for instance, a server that evaluates Hecl code for clients. Each authenticated client could have their own separate Hecl interpreter without interfering with the others!

The most commonly used methods in an interpreter are those which exist for modifying its state.

Make a command visible to Hecl:

interp.commands.put("newcommandname", new ClassThatImplementsCommand());

Create and set a variable:

interp.setVar("temperature", new Thing(99));

Evaluate code:

interp.eval(new Thing("puts {hello world}"));

Set the result from a command:

interp.result = new Thing("ok, like, that worked, and stuff");

Errors and Exceptions

So far, we've described how things should work. Life, however, has a habit of not going exactly as we'd planned, and that applies to Hecl programs as well. We need a way to deal with errors. Java itself has a featureful exception mechanism, so we take advantage of that in Hecl. Anything that could cause an error in Hecl throws a HeclException, so that this code, in Stanza.java, can catch that and deal with it appropriately:

try { interp.result = new Thing(""); tmpcommand.cmdCode(interp, newargv);

This is the actual command being executed.

} catch (HeclException e) { if (newargv[0] != null) { e.where(newargv[0].getStringRep()); } throw e;

We tell the exception what command it happened in, so that we can get a stack trace to show the chain of commands that failed.

} catch (Exception e) { String msg; msg = e.getMessage(); if (msg == null) { msg = "(null exception of type " + e.getClass() + ")"; } else { msg = "Exception of type " + e.getClass() + ": " + msg; } throw new HeclException(msg); }

And in the case of a regular Java exception, we catch it and create a new HeclException , so that it can be dealt with in Hecl itself. Hecl has a command, catch , that can be used to catch exceptions and deal with them programmatically. It works by evaluating the code passed to it, catching any exceptions that code may have thrown, and then simply returning those exceptions to the caller of the command in the form of a Hecl variable. In other words:

set cmds " some code to evaluate " catch $cmds result

This will evaluate the code in cmds and place any results in the result variable. For programs that must not halt, even on errors, catch affords the programmer a means by which he or she may attempt to handle errors gracefully.

Exceptions have another purpose, besides errors. If you think about a while loop, it might look like this:

while { true } { if { > $somevar 100 } { break } do_other stuff incr $somevar }

How does the break command work? It stops execution of the loop when it's run, yet it's not an error. It throws a HeclException itself, tagging it as a break exception: throw new HeclException(HeclException.BREAK) . At that point, it's up to commands implementing loops to catch it. Here's what the while command looks like:

while (Thing.isTrue(interp.eval(argv[1]))) { try { interp.eval(argv[2]); } catch (HeclException e) { if (e.code == HeclException.BREAK) { break; } else if (e.code == HeclException.CONTINUE) { } else { throw e; } } }

Remember, while is a command with two arguments - the condition, and the code. As long as the condition is true, the loop keeps executing - each time through, the code is evaluated. However, we also look out for the two special exceptions and act on them. Break causes the loop to end, while continue causes the loop to short-circuit back to the evaluation of the condition. If, however, the HeclException is not one of these special types, the error is thrown again and percolates its way up the stack.

Mixing Hecl and Java

As discussed at the beginning of this article, Hecl isn't meant as a replacement for Java, but rather to work alongside it. It's unlikely that an entire software system would be written entirely in Hecl at this point in time (although it's likely to evolve in this direction), so it's important to discuss the integration of Hecl and Java. There are two ways of doing this:

Calling Hecl from Java

If you have a program written in Java, and you'd like to make it scriptable, adding a Hecl interpreter to it is very simple. This would be an appropriate strategy if you wanted to let users change some variables, or create callbacks that are executed at certain points in the life cycle of your program. A simple example might be a web server that evaluates Hecl scripts when certain URL's are requested.

The place to look at for an example of this technique is the Hecl command line itself, specifically the file commandline/Hecl.java . Here's the main method:

public static void main(String[] args) { try { int i; Interp interp = new Interp(); /* Add the standard packages in. */ new HeclFile().loadModule(interp); new org.hecl.fp.HeclFloat().loadModule(interp); new org.hecl.load.HeclLoad().loadModule(interp); Vector argv = new Vector(); for (i = 0; i < args.length; i++) { argv.addElement(new Thing(args[i])); } interp.setVar("argv", ListThing.create(argv)); if (args.length > 0) { HeclFile.sourceFile(interp, args[0]); } else { Hecl.commandLine(interp); } } catch (Exception e) { e.printStackTrace(); } }

Simple enough. First, the interpreter is created, some modules loaded in, and the command line arguments examined. If there is a command line argument, it's treated as the name of a file containing Hecl code to evaluate, which is done via the HeclFile.sourceFile method.

If there aren't any command line arguments, Hecl runs the commandLine method, a "REPL", or read-eval-print loop, reading Hecl commands from standard input, evaluating them, printing the result, and so on. That code is pretty simple too:

while (true) { System.out.print("hecl> "); System.out.flush(); line = buff.readLine(); /* Exit on end of file. */ if (line == null) { System.exit(0); } try { interp.eval(new Thing(line)); if (interp.result != null && Compare.compareString(interp.result, new Thing("")) != 0) { System.out.println(interp.result); } } catch (HeclException he) { System.out.println(he); } }

The key points are that to utilize Hecl code from Java, you need to initialize the interpreter, and then use it to evaluate code.

Creating Java Extensions for Hecl

The other model is to write your program in Hecl, and add Java extensions to perform specific tasks. For instance, to create the charts for some programming language popularity statistics that I ran (http://www.dedasys.com/articles/language_popularity.html) I wrote the program in Hecl, and wrote a Hecl extension that provides an interface to the JFreeChart library.

This model is very popular with mature languages like Tcl, Ruby and Python, where the scripting language contains all the functionality necessary to write a fully functioning system, and extensions (usually written in C), are needed only to deal with specific external libraries or devices.

In any case, it's easy to do with Hecl - the basic trick is to write a class that implements the org.hecl.modules.HeclModule interface, like so:

public class JFreeChartHecl implements Command, HeclModule { public void cmdCode(Interp interp, Thing[] argv) throws HeclException { ... command implementation ... } public void loadModule(Interp interp) throws HeclException { interp.commands.put("barchart", this); interp.commands.put("savechart", this); } public void unloadModule(Interp interp) throws HeclException { interp.commands.remove("barchart"); interp.commands.remove("savechart"); } }

In this case, the module also implements a command - the basic goal of any extension to Hecl will be to add one or more commands that provide access to some external library or resource

This can then be compiled into a .jar file, and then loaded into Hecl with the load command:

load JFreeChartHecl { /home/davidw/hecllib/build/jfreechart/ /home/davidw/downloads/jfreechart/lib/jfreechart-1.0.0-rc1.jar /home/davidw/downloads/jfreechart/lib/jcommon-1.0.0-rc1.jar }

Which tries to load the named class and its dependencies from the specified URL's. Simple (local) files are transformed into URL's. In this case, it tries to load JFreeChart, which it finds in the jfreechart/build directory of hecllib. That in turn requires code from jfreechart and jcommon, which are loaded from the respective jar files.

Conclusions, the Future, and Further Reading

As you've seen from this tour through Hecl, an interpreter is an involved piece of software that requires many decisions. But that's what makes it so much fun - there is a lot of room both for both clever technical solutions, and elegant design - although as they say, "beauty is in the eye of the beholder"! In any case, Hecl is a small and simple enough language that anyone with a bit of Java knowledge should be able to use it for their own experiments. The software, by the way, is covered by the very liberal Apache license, Version 2 (which can be found here http://www.apache.org/licenses/LICENSE-2.0), meaning that you can even include it in proprietary, commercial offerings, so by all means, use it in your own software.

Hecl's Future

I aim to keep Hecl small and simple enough to run on as many cell phones as possible, meaning that the core itself will stay as small as possible. I am a big believer in the "make it, make it right, make it fast" strategy, and we are still making Hecl, so there is lots of room for people to help out in terms of adding features and extensions, and, since Hecl is at such an early stage, even experimenting with the language itself. Creating a programming language has been very challenging, creative endeavor, and those interested in participating in shaping Hecl's future are welcome.

Recommended reading: