An Industrial-Grade BF Compiler

6 minute read

What do you expect from your compiler? The best compilers offer:

Highly optimised output

Short compile times

Readable syntax errors

Helpful warnings

Support for lots of architectures

Being a wildly overambitious developer, I set out to make bfc provide this whole feature set. Let’s take a look at the result.

Optimisations

Like any self-respecting LLVM frontend, bfc outputs LLVM IR directly. To my surprise, BF compilers using a C backend (such as this one) were routinely beating bfc’s performance.

LLVM provides an excellent set of tips for frontend compiler writers. Re-reading this document, I noticed a new note had been added about pass ordering. It turns out that LLVM’s default optimisations are tuned for C, and bfc massively benefits from running the optimisations again.

In other words:

$ bfc-1.3 some_program.bf --dump-llvm > raw_ir.ll $ opt -S -O3 raw_ir.ll -o opt_once.ll $ opt -S -O3 opt_once.ll -o opt_twice.ll

opt_twice.ll was often much faster than opt_once.ll !

After fixing the LLVM passes, and reusing the BF benchmark from the previous blog post, we see a significant speedup on all test programs:

Short Compile Times

LLVM provides command line tools for optimising ( opt ) and for writing object files ( llc ). These are great for exploring LLVM.

However, bfc wrote its LLVM IR to temporary files and shelled out to run these commands. This is slower and brittle: users with LLVM libraries could compile bfc, but suffered unhelpful errors when bfc could not find opt on $PATH .

bfc now uses the LLVM APIs directly for writing object files. This is not as well documented (the Kaleidoscope tutorial does not cover it), but reading the source of llc and rustc shows examples. You can see the API calls used by bfc here.

For maximum performance, bfc users can specify --opt or --llvm-opt to produce debug builds with fewer optimisations.

Syntax Errors

‘Syntax error’ is not helpful, and programmers expect better. A compiler should highlight where the error occurred, and provide a clear description of what is wrong.

bfc now provides friendly syntax errors, showing the filename, line number, column number and highlighting the offending syntax.

Warnings

bfc performs static analysis and compile time execution. During this process, it may find code that looks incorrect.

We want to warn the user when this occurs. This is difficult, because bfc can dramatically change the AST as it optimises.

In v1.3.0 bfc gained position annotations to its AST. Optimisation passes now preserve position information wherever possible.

However, some optimisations aggressively reorder BF instructions. If those instructions are not consecutive, bfc cannot preserve position information. Instead, I added simpler optimisations that run earlier and can keep position information.

Here’s an example of a warning:

Cross-Compilation

No self-respecting compiler can only target a single architecture. With LLVM, it’s easy to support a remarkable range of architectures.

Previously, bfc always compiled to 32-bit x86. bfc now compiles to the host architecture by default. Users with x86-64 machines will see a modest performance improvement from this.

bfc now also supports cross-compilation by specifying an LLVM target triple:

$ bfc hello_world.bf --target=x86_64-pc-linux-gnu

Scraping The Bottom Of The Barrel

Having implemented all these features, bfc now has the dubious title of the most overengineered sophisticated BF compiler available.

There’s not even much scope for further polish. There are a few niche optimisations we lack, such as scan loops ( [>] ) and integer division.

Some BF implementations also provide a # instruction, to print the current cell values and aid debugging. bfc does not yet provide this either.

I hope you’ve enjoyed this series on BF compilation. BF is a fantastic playground, and LLVM is an incredible feat of engineering. If you encounter a bug or limitation in bfc, don’t hesitate to file a bug.