It has been a very long time since I posted here — life has been busy, including a new baby at home. But I really want to share about my recent experience tackling several performance problems in Agda. Agda, as I hope you know already, is a very elegant dependently typed pure functional programming language. It supports Unicode, so you can write → instead of -> for function space; and many other cool symbols. It has user-defined mixfix notation, so you can define if_then_else_ (you write underscores in Agda to show where the arguments go) with the expected syntax. It compiles to Haskell, although I get the impression that many people just use Agda as an advanced type checker, and do not bother compiling executables. Agda has very good inference for implicit arguments, which can help make code shorter and more readable. It also features definition of terminating functions by recursive equations. So you can write beautiful definitions like the following for infix vector append:

_++𝕍_ : ∀ {ℓ} {A : Set ℓ}{n m : ℕ} → 𝕍 A n → 𝕍 A m → 𝕍 A (n + m)

[] ++𝕍 ys = ys

(x :: xs) ++𝕍 ys = x :: (xs ++𝕍 ys)

You might object to naming the function _++𝕍_ instead of _++_, but Agda does not support type classes or other approaches to operator overloading, and I prefer never to have to worry about symbols clashing from different included files. This definition is from my standard library (not the Agda standard library; if prompted for username and password, enter “guest” for both). There is a very nice emacs mode, too.

With all these great features, working well together, Agda provides just about the most elegant programming experience I have had in 21 years of coding. I think it is a fantastic language, with much to emulate and learn from. These accolades aside, my purpose in this post is to discuss some grotesque workarounds for performance problems inherent in the implementation and maybe even the language. To be direct: Agda’s type checker has abysmal performance. Suppose we create an Agda source file defining test to be a list containing 3000 copies of boolean true. Agda takes 12.5 seconds to type check this file on my laptop. If we give the same example to OCaml, it takes 0.2 seconds, in other words, heading towards two orders of magnitude slower. Now, is Agda’s type checker doing fancier things than OCaml’s? Undoubtedly. But not on this example! I am willing to accept some overhead in general for fancier type checking even on code that does not use fancy types. And OCaml has been around for quite some time and is engineered by one of the best language implementors on the planet. Fine. So let Agda be 2 times slower. Let it be 5 times slower. But 60 times slower? That is not good.

I ran into the performance issues with Agda’s type checker while tilting at the windmill I’ve been dueling off and on the past three years: parsing by rewriting. Without going into detail here, let me just say I’ve been working, with very patient colleague Nao Hirokawa, to create a new approach to parsing based on term rewriting. The idea as it stands now is that one runs an automaton approximating the language of your CFG over the input string, to create a formal artifact called a run. Then one applies confluent run-rewriting rules to this run, where those rules are derived from the productions of the grammar and will rewrite every string in the language to a term built from the start symbol and then containing the parse tree as a subterm. I love the approach, because it is inherently parallelizable (because the run-rewriting rules are confluent), and because we can resolve ambiguities in the grammar by adding rewrite rules. The trickiest part is coming up with the right approximating automaton, and this is still not at all ready for prime time (despite the fact that I inflicted it on my undergraduate class this semester).

Anyhow, since I have been teaching undergrad PL this semester using Agda (more on this in a later post), and since Agda does not have a parser generator (perhaps because FPers seem to prefer parser combinators), I decided I would make my parsing-by-rewriting tool, called gratr, target Agda as a backend. After a fair bit of hacking I had this working, only to discover that for even pretty small grammars, I was generating several thousand line Agda source files, which Agda absolutely could not handle. Imagine my disappointment! Beautiful (if not yet ready for big grammars) approach to parsing, targeting Agda, and the Agda type checker could not handle my generated parsers’ source files, except for the tiniest of grammars. I was depressed, and determined to find a way around this problem to get working parsers from the medium-small grammars I wanted to use for class, as well as research.

Enter –use-string-encoding. A little experimentation revealed that while Agda chokes checking even very simple terms when they get even moderately big, it will affirm that double-quoted strings indeed have type string in time seemingly independent of string size. Oh the fiendery. Let us encode all our parsing data structures — that means automaton and run-rewriting rules both — as strings, get them past the Agda type checker, and then decode at runtime to get our parser. It is gross, it is nasty, but it might just work. Of course, no one wants to decode their parsing data structures every time a string is parsed, but that was the price I decided I’d be willing to pay to get my parsers running in Agda.

I spent a month or so — while learning to care, with my wife, for a very fussy newborn — implementing this. I finally had code in gratr to dump all the data structures as strings, and code in Agda to decode those strings and plug them into my existing parsing-by-rewriting infrastructure. Agda could type check the file containing the strings in a second or two, even when the strings were huge (megabytes-long files, due to the unfortunately large automata my approach is currently producing). The moment of truth arrives: let us actually compile the entire shebang to an executable (not just type check that one file). Agda type-checking chokes. I cannot believe what I am seeing. What is happening? I can type check the files containing the string-encoded data structures almost instantly, but type-checking the wrapper file defining the main entry point (which is just based off the way Haskell sets up code for compilation to executables) is running seemingly forever. A little quick experimentation reveals: big strings encoding the data structures makes type checking that file super slow. What gives! Further head scratching leads me to suspect that for some reason, when Agda is instantiating a module somewhere in my setup, it is actually trying to normalize the fields of a record, where those fields are calling the decode functions on the string encodings. This is the step that could take a second or two at runtime, with ghc-optimized executables, but will likely take forever with Agda’s compile-time interpreter. How to make Agda stop doing this?

Here’s a hack for this:

postulate runtime-identity : ∀{A : Set} → A → A

{-# COMPILED runtime-identity (\ _ x -> x ) #-}

The idea is that we will interpose the postulate called runtime-identity to block the redex of decoder applied to string encoding. At compile time, Agda knows nothing about runtime-identity, and hence will not be able to reduce that redex. But at runtime, runtime-identity will be compiled to the identity function in Haskell, and hence the redex will reduce.

Delightfully, this worked. Compiling the emitted parsers with the string encodings is now very quick, maybe 10 seconds to get all the way through Agda to Haskell to an executable. Awesome! Now let’s just run the executable. Heh heh. There is no way that could not work, right?

Wrong, of course. Running the executable on a tiny input string takes 7 minutes and then I kill it. Oh my gosh, I am thinking. I just spent 5 weeks of precious coding time (in and around other duties, especially new childcare duties, with my patient wife looking on and wondering…) to get this gross hack working, and now the runtime performance is unacceptable. I almost despair.

But hark! Reason calls. It cannot be taking a ghc-optimized executable that long to decode my string-encoded data structures. After all, encoding the data structures to strings from my OCaml gratr implementation is instantaneous. Sure, decoding could be a bit longer, but forever longer? That can’t be right. So how can we figure out what is going on?

Easy: profile the compiled code with ghc’s profiling features. Agda just compiles down to (almost unreadable) Haskell, which is then compiled by ghc, so we can just profile the code with ghc. I had never used ghc’s profiler, but it was very simple to invoke and the results were easily understandable. Where is the time going? Here is the scary line:

d168 MAlonzo.Code.QnatZ45Zthms 322 7928472 10.9 36.0 55.0 53.5

The last numbers are showing that over half the time of the executable is going into function d168 in nat-thms.agda. A function in nat-thms.agda? That contains a bunch of lemmas and theorems about natural-number operations. I hardly expect my parser to be grunting away there. What is d168? Well, it is the Agda-emitted version of this lemma:

<-drop : ∀ {x y : ℕ} → (x < (suc y) ≡ tt) → x ≡ y ∨ x < y ≡ tt

This function looks to take linear time in the size of x, which could be the length of the emitted string encoding in this case. Where on earth is this called from? And why is its evaluation getting forced anyway in Haskell’s lazy evaluation model? <-drop is getting called in

wf-< : ∀ (n : ℕ) → WfStructBool _<_ n

This is the proof that the _<_ ordering on natural numbers is well-founded. The string-decoding functions have to use well-founded recursion for Agda to see they are terminating. You recursively decode some part of the string, and then need to continue on the residual part of the string that has not been decoded yet, which is returned by your recursive call. Agda cannot see that the residual you are recursing on is a subterm of the starting input string, so it cannot confirm the function is structurally terminating. The solution is to use well-founded recursion. And this is taking, as far as I can tell, time quadratic in the size of the input string to be decoded. These strings are long, so a quadratic time operation (with lots of recursion and pattern matching) is going to kill us.

What is the solution? Strip out the well-founded recursion and just disable Agda’s termination checker. I do this, cross my fingers, compile, run, and … it works! Hot diggety.

So those are the three performance problems we tackled here in Agda: slow type checking (just avoid the type checker altogether by encoding big data structures as strings and decoding at runtime), unwanted compile-time evaluation (interpose postulated runtime-identity to block the redexes), and super slow well-founded recursion (punt and disable the termination checker). I am interested in any similar experiences readers may have had….