January 13, 2012 Update: There are still a few seats left in the compilers class for January 17-20, 2012. More details here.

In the spring of 1995, I took a course on compiler design. At the time, I was just a first year Ph.D. Computer Science student making my way through various course requirements. Before that, I was a mathematician working on computational physics software--writing a lot of finely tuned C code for solving differential equations on big supercomputers. Although I already considered myself to be a pretty knowledgable programmer, I think compilers was probably the one course that had the most profound impact on my later work. In fact, this is the course that inspired me to look at the use of scripting languages for controlling scientific software. It also directly led to the Swig project, first implemented in the summer of 1995. Last, but not least, this is how I ultimately ended up in the world of Python.

I think the great thing about compilers was how it simply tied so many topics together all in one place. Everything from mathematical theory, clever algorithms, programming language semantics, computer architecture, software design, clever hacking, and even the nature of computation itself. As part of that course, we had to write our own compiler--a tangled mess of C code that turned a subset of Pascal into executable code that would actually run on a Sun Sparcstation. To be sure, the code was a horrible disaster. However, simply having written a working compiler was definitely one of the most memorable parts of graduate school.

In 2001, I had an opportunity to revisit the topic of compilers. At the time, I was an assistant professor at the University of Chicago and an opportunity to teach compilers came up. I jumped at it. I also used the opportunity to try an experiment of what it might be like to write a compiler in Python instead of C. As a bit of context, a lot of people had been asking me about the idea of rewriting Swig in Python (instead of C++). I wasn't so sure. In fact, I really didn't even know how to do it given doubts about Python's performance as well as a general lack of sufficiently powerful parsing tools at the time. Long story short--this is how the PLY project came into existence. I used it in the class and had about 25 students write a compiler for an even more powerful subset of Pascal, creating runnable code for the Sparc.

Fast forward 11 years. I've long since left the University, but I still continue to teach quite a few classes--especially various sorts of Python classes. Over the past year or so, students and I have often discussed the idea of having some kind of advanced project course. Something that would be quite a bit harder and involve much more coding. I think you might see where this is going.

Write a Compiler (in Python)

So, today is the first day of another compiler experiment. Over the course of 4 days, I'm going to attempt to take six students through a compiler writing project similar to the one at the University. It's basically a nine-stage project:

Lexing and tokenizing. Parsing and parse trees. Type checking. Intermediate code generation. Simple optimization (constant folding, etc.). Relations Control flow Functions Output code in RPython (from the PyPy project)

One interesting thing about using Python for such a project is that you can use the internals of Python itself to explore important concepts. For example, if you want to see what happens when you compile a regular expression, you can just try it:

>>> import sre_parse >>> sre_parse.parse(r"[a-zA-Z_][a-zA-Z0-9_]*") [('in', [('range', (97, 122)), ('range', (65, 90)), ('literal', 95)]), ('max_repeat', (0, 65535, [('in', [('range', (97, 122)), ('range', (65, 90)), ('range', (48, 57)), ('literal', 95)])]))] >>>

Or, if you want to look at how Python makes an AST:

>>> import ast >>> node = ast.parse("a = x + 2*y") >>> ast.dump(node) "Module(body=[Assign(targets=[Name(id='a', ctx=Store())], value=BinOp(left=Name(id='x', ctx=Load()), op=Add(), right=BinOp(left=Num(n=2), op=Mult(), right=Name(id='y', ctx=Load()))))])" >>>

Or, if you want to see what kind of code Python generates:

>>> def fact(n): ... if n <= 1: ... return 1 ... else: ... return n*fact(n-1) ... >>> import dis >>> dis.dis(fact) 2 0 LOAD_FAST 0 (n) 3 LOAD_CONST 1 (1) 6 COMPARE_OP 1 (<=) 9 POP_JUMP_IF_FALSE 16 3 12 LOAD_CONST 1 (1) 15 RETURN_VALUE 5 >> 16 LOAD_FAST 0 (n) 19 LOAD_GLOBAL 0 (fact) 22 LOAD_FAST 0 (n) 25 LOAD_CONST 1 (1) 28 BINARY_SUBTRACT 29 CALL_FUNCTION 1 32 BINARY_MULTIPLY 33 RETURN_VALUE 34 LOAD_CONST 0 (None) 37 RETURN_VALUE >>>

By looking at what Python does itself, I think it can be related back the work the students will be doing on their own project and might be an interesting way to explore important concepts without getting completely bogged down in a theory-heavy exposition (as one might find in a compilers textbook). I don't have any grand illusions about the students running off afterwards to do research in compilers. However, I think it will be an interesting experiment where everyone still learns a lot.

Follow the Project

Due to time constraints of the project, I won't be blogging during the week. However, you can follow me on Twitter for updates to see how it's going. I will be posting a more detailed followup describing the project and how it worked out after it's over.

If you would like to write a compiler yourself, there are still some seats left in a second running of the project, January 17-20. Click here for more information.