I've always been a hobby programmer. I got started with PHP/HTML/CSS back around 2004, and quickly expanded to Ruby, which became my favorite programming language. I was quite involved in the #ruby-talk mailing list even though I was only around 15-16 at the time, and even managed to get a tech reviewer credit for the first edition of O'Reilly's Ruby Cookbook, and wrote some official documentation for the Ruby website which is still live today, and has been translated into a dozen languages.

But still, I was a hobbyist, self-taught, and never chose to pursue programming as a career. This has left some peculiar holes in my programming knowledge. I never worked with large teams, so for a long time I didn't even bother with anything but ad-hoc version control. On the other hand, my main interest in programming has always been the theory and implementation of programming languages, and there, I had some expertise. I spent long hours browsing Lambda the Ultimate, reading academic papers and the source code of many, many programming language implementations. I implemented many a toy Lisp. But still, I worked mainly in high-level languages, and there were all those holes in my knowledge.

This article will be a mixture between a technical article and a philosophical essay. I want to talk concretely about implementing my programming language, but also touch on the subject of what it's like to be a long-time hobbyist programmer embarking on a serious project. I apologize if it reads a bit scatter-brained; you could say I was seized by a creative raptus when I wrote this. I wanted to touch on a lot of technical details, but at the same time, I also wanted to talk about what it's like trying to go from amateur to, well, not professional in the literal sense—nobody's paying me to do this work— but professional in the sense of serious, disciplined, producing robust and functional code that amounts to more than a toy.

Late 2017, I decided to get serious about implementing a programming language. Naturally, I chose a Lisp, since it's so easy to parse (parsing being the bane of my existence), and so infinitely extensible. I decided I wanted some more performance and lower-level experience, so I chose to implement it in Crystal. Crystal is a relatively immature language, but it's extremely heavily inspired by Ruby, down to its syntax and standard library, which is still the language I'm most familiar with. Unlike Ruby, though, it compiles to native, has static typing with type inference and fancy union types. Getting it up and running was a challenge itself on Windows; still, the recommended method is to do it via WSL. I had some trouble, but finally got it set up with the help of the community and got cracking.

My first attempt was a rather naive tree-walking interpreter. At that point, I could get simple programs like the following working:

Welcome to alisp! > (def f (fn (x) (+ x 1))) => #<fn (x)> > (print (f 2)) 3 => nil > (def f (fn (x) (+ x 1))) => #<fn (x)> > (print (f 2)) 3 => nil

Then I added tail call optimization, which also seemed to be working without blowing the stack:

> (def tc (fn (i) (if (eq? i 100000) i (tc (+ i 1))))) => #<fn (i)> > (tc 0) => 100000

The technique used for implementing tailcall optimization depends on two factors: first, functions must be actively annoted with the tailcall primitive. This is done in a separate analysis module. This turns out to be surprisingly tricky. If Crystal had supported pattern matching, it would have been a whole lot easier. I think the code speaks for itself:

def annotate_tc(s : LispObject) # Automatically annotating tail calls can be tricky, because we need to # properly recurse while terminating at the right points. In particular: # * The last expr in a function body is in tail position (TP) # * If a function call is in TP, we need to annotate it # * If an if expr is in TP, we need to annotate each branch # * If a fn literal is in TP, we need to recursively call annotate_tc. # However, we will not do this from within annotate_tc; rather, # Interpreter#eval_fn, which evaluates a function literal and # returns a Lisp::Lambda object, will call # annotate_tc once it evaluates the inner function # * Otherwise, we need to return the value unchanged # # This function receives only the function body, and returns another # properly annotated function body. return s unless s.is_a?(List) s = s.as(List) last = s[-1] return s unless last.is_a?(List) return s if last.empty? last = last.as List if last.first == sym("if") s[-1] = annotate_tc_if last return s elsif funcall?(last) s[-1].as(List).unshift sym("tailcall") return s else return s end end def annotate_tc_if(s : List) : LispObject if s.size < 3 error("if must have at least 2 args") end _then = s[2] if funcall?(_then) s[2].as(List).unshift sym("tailcall").as(Lisp::Symbol) else s[2] = annotate_tc s[2] end if s.size == 4 if funcall? s[3] s[3].as(List).unshift sym("tailcall").as(Lisp::Symbol) else s[3] = annotate_tc s[3] end end s end

You may notice all the casts. LispObject is a union type, but acts more like a tagged void* pointer. It's defined something like this (the exact definition subject to change as plans are getting going to migrate to a bytecode VM):

alias LispObject = (String | Int64 | List | Nil | Bool | Builtin | Env | Hash(LispObject, LispObject) | Lisp::Symbol | Lambda | Quoted | Quasiquoted | Trampoline)

This may be my inexperience with Crystal showing, but I've found the compiler to be pretty dumb when it comes to narrowing types. I've plenty of fragments throughout my code of this sort:

if args[0].is_a? Lisp::Symbol key = args[0].as(Lisp::Symbol).value

Here the compiler should be able to prove the invariant that within this if branch, args[0] is a Lisp::Symbol , narrowing its type, but it currently isn't smart enough to do that, so I have to cast manually. Anyway, that's a bit of a detour. The second part of tail-call optimization makes use of trampolines. Trampolines are an old concept and they're not far from what they sound like: if we want to call in tail position without expanding the stack, we can return a trampoline object representing the call to be made. The function responsible for executing calls must then loop until the result is no longer a trampoline, thus avoiding expanding the call stack. The relevant part of the code looks like this:

val = nil fn.body.each { |e| val = self.eval(e) } while val.is_a?(Trampoline) # Reuse current stack frame fn = val.fn args = val.args if fn.is_a?(Builtin) # NOTE: forgetting to set val here caused me hours of frustration. val = self.eval_call_builtin(fn, val.args) elsif fn.is_a?(Macro) params = fn.args.list.map &.to_s @env.env = fn.env.merge Hash.zip(params, val.args) val = nil fn.body.each { |e| val = self.eval(e) } elsif fn.is_a?(Lambda) params = fn.args.list.map &.to_s args.map! { |a| self.eval(a) } @env.env = fn.env.merge Hash.zip(params, val.args) val = nil fn.body.each { |e| val = self.eval(e) } else error "cannot tailcall uncallable #{fn}" end end

Note the comment about forgettting to set val, which caused me a lot of frustration as the call stack exploded for seemingly no reason. As mentioned, I'm quite well-read in programming language theory, but as a practical programmer, I've always been a bit of a dabbler, an amateur. I make silly mistakes all the time that a working programmer wouldn't. On the other hand, I don't think the average working programmer, having no interest in programming language implementation, has any clue how to implement TCO either, although I'm sure they could learn it if they set their mind to it. This creates a curious asymmetry: on the one hand, I know a lot. On the other, there's so much stuff that would have become second nature had I been a professional programmer. It's a bit like learning a foreign language without living in a country where it's spoken: sometimes there's weird gaps in your vocabulary. You can converse fluently about Sartre but forget the names of basic items of cutlery. Returning to the technical side of things, there's many upsides to Crystal that I haven't touched on. Its compile-time metaprogramming is intuitive and, while technically less powerful, feels more potent than Ruby's runtime metaprogramming. Making DSLs that resolve to efficient code at compile-time is extremely easy in Crystal. It's as intuitive as Ruby, but far less magical and mysterious, and far less obscure than C++ templating or Template Haskell. Here's how I currently define builtin (i.e., defined-in-Crystal) functions in Alisp:

builtin("read", 1) do |i, args| expect String, args[0] str = args[0].as(String) value = Parser.new.parse(Lexer.new.lex(str)) end.help("(read )", " parses a string into a list representing an s-expression")

And all of this is accomplished quite simply:

macro typecheck(val, type) {{val}}.is_a? {{type}} end macro expect(type, arg) unless typecheck({{arg}}, {{type}}) error("Type error: expected " + {{type.stringify}} + ", got " + {{arg}}.class.to_s) end end def builtin(name, arity, &fn : BuiltinProc) f = Builtin.new(name, arity, fn) DefaultEnv[name] = f end

The call to .help simply sets a string value in the Builtin object, which is a class mostly wrapping what is essentially a function pointer. This enables helpful REPL sessions like:

> (help help) (help <fn>) prints documentation about a function => nil > (help eq?) (eq? ...) returns true if all arguments are equal, otherwise false => nil

Another thing that's quite nice about Crystal is how easy it is to do spec tests out of the box. I'm no full convert to the TDD, full-code-coverage, agile, etc., train, but having some regression tests to run after each change is a godsend, and Crystal makes it very easy with a nice DSL built into the standard library. Here's an extract:

require "spec" require "../parser/parser.cr" require "../lispobject.cr" require "pretty_print" include Lisp [...snipped utility code...] describe Parser do it "should correctly parse literals" do parse("#t").should eq(true) parse("-1").should eq(-1) parse("a").should eq(sym("a")) parse("inc!").should eq(sym("inc!")) parse("+").should eq(sym("+")) parse("@").should eq(sym("@")) parse("!?@-+a1").should eq(sym("!?@-+a1")) parse("1").should eq(1) parse("\"a\"").should eq("a") parse("#f").should eq(false) parse("'a").should eq(make_list(sym("quote"), sym("a"))) end it "should correctly parse lists" do parse("(a b c)").should eq(make_list(sym("a"), sym("b"), sym("c"))) parse("(a (+ 1 2))").should eq(make_list(sym("a"), make_list(sym("+"), 1_i64, 2_i64))) end it "should correctly implement quoting lists" do parse("'(a)").should eq(make_list(sym("quote"), make_list(sym("a")))) end it "should correctly ignore comments" do parse("a; 1").should eq(sym("a")) parse(";").should eq(make_list) end [...snipped further tests...] end

And then I can run it very easily with crystal spec spec/parser_spec.cr and see what I just broke. (There's always something—even writing this post, I accidentally broke some things that were previously working. Don't worry, I have a backup.)

One of the things that separates a mature programming language from a toy is robust error handling. My language is not quite there yet, far from it. It has no stack traces yet. But I've reworked the parser to include file, line and column information. And because Lisp's code is also its main data structure, I figured out I could save some memory by only attaching this line, column and file information to lists, although the lexer adds them to every token. This is achieved by subclassing Lisp::List , itself a wrapper around a native Crystal Array(LispObject) , and adding this information to a special structure:

class ParserInfo property file = "" property line = 1_i64 property col = 0_i64 def initialize(@file, @line, @col) end def inspect "# " end end class ParserList < Lisp::List property parse_info : ParserInfo def initialize(@list, @parse_info : ParserInfo) end end

This way, I figure I don't need to store this extra information within programmatically generated lists (stack traces would be far more helpful here), and I don't attach it to literals either, because I think in practice, literals only generate errors during parsing, at which point I have full access to the ParserInfo which is attached to every Token . The newly rewritten parser, which has not been fully integrated with the naive interpreter—itself in the process of a major rewrite—can generate structures such as this:

; Lisp source (a '(b \"c\" 1 #t nil #f)) # ==> pretty-printed Crystal AST #<Lisp::Parser::ParserList:0x7fffee3b2e00 @list= [#<sym a>, #<Lisp::Parser::ParserList:0x7fffee3b2e20 @list= [#<sym quote>, #<Lisp::Parser::ParserList:0x7fffee3b2e60 @list=[#<sym b>, "c", 1_i64, true, nil, false], @parse_info=#<ParseInfo file = 'testfile' @ 2:1>>], @parse_info=#<ParseInfo file = 'testfile' @ 2:1>>], @parse_info=#<ParseInfo file = 'testfile' @ 1:0>>