One of the first choice to make when starting the development of a compiler, or any program manipulating syntax with binders (e.g. programs with functions and variables), is how to represent these terms. Some specialized programming languages, like Beluga, have built-in facilities for this. But if you choose a general-purpose one, like OCaml, then you have plenty of choices: Named terms, De Bruijn indices, Locally nameless, Higher-order abstract syntax (HOAS)… each with their idiosyncrasies, inconvenients etc. This issue is pervasive in PL research and almost formed a subfield of itself. If you are interested, there is a wealth of literature on the topic; a documented discussion happened on LtU some years back (already), which is probably a good starting point, although of course some has happened since then.

Today I will be looking at implementing in OCaml one of the most recent and praised ones: Parametric HOAS (PHOAS). For some reason that I detail hereafter, the traditional way of implementing PHOAS can be cumbersome, and I propose an alternative here, which happens to be one of my very first uses of first-class modules.

Parametric HOAS, the traditional way

The idea of Parametric HOAS goes back Geoffrey Washburn and Stephanie Weirich, at least, and was named this way and formalized in Coq by Adam Chlipala (I recommend the latter, which is a good read). In a nutshell, you encode binders in the language by OCaml binders (like in HOAS), but the term data structure is parameterized by the type 'x of variables. We will call these pre-terms:

type 'x t = (* pre-term *) | Lam of ('x -> 'x t) | App of 'x t * 'x t | Var of 'x

For instance, this is a perfectly valid pre-term:

let ex : float t = Lam (fun x -> App (Var 3.14, Var x))

Its variables have floats dangling from them. But for a pre-term to become a real term, it has to make no assumption on the type of variables. Let’s encode this with an explicit universal quantification in a record:

type tm = {t : 'x. 'x t}

This parametrization rules out the previous pre-term. It is also what makes it impossible to define so-called “exotic” terms. In fact, there are as many inhabitant to this type than there are λ-terms.

Examples

This is an example of a real term:

let ex : tm = {t = Lam (fun x -> Lam (fun y -> App (Var x, Var y)))}

So, each term comes in a little package that seals the representation of variables to “unknown”. If you open this little box, you can set it however you want, as long as, when you close it back, it is back to “unknown”. This is how you define recursive functions on terms: a main function opens the package, and an auxiliary one works with the pre-term inside, instantiating variables as it pleases. Here, the classic example function counting the number of variables in a term:

let count : tm -> int = let rec aux : unit t -> int = function | Var () -> 1 | App (m, n) -> aux m + aux n | Lam f -> aux (f ()) in fun {t} -> aux t

When I go under a λ-abstraction, I fill the open holes with () (because here there is no information to carry); so during the recursion, the pre-term can have variables instantiated with ().

The issue

This is all well and good, but this encoding can be quite cumbersome for practical reasons. Look again at the definition of pre-terms. Imagine that there is not 3 but 30 constructors, with not one but five kinds of variables, i.e. five parameters: each time, I would have to pass them to each subterm. Who wants to read 30 lines of code of this kind?

type ('x, 'y, 'z, 't, 'u) t = | Var1 of 'x | Var2 of 'y | App of ('x, 'y, 'z, 't, 'u) t * ('x, 'y, 'z, 't, 'u) t

Variable types are never changed, and just passed on untouched to the next subterm. It is like defining a recursive function which never let one of its arguments vary: you want to lambda-drop it. In Coq, this is easy thanks to sections, which give you the illusion of lambda-dropping, but OCaml does not have this feature. Nonetheless, let us do exactly this: lambda-drop a parametric type definition into… a module.

First-class modules to the rescue

First, we are going to factor out all these occurrences of the ‘x type parameter by lifting type t into a functor.

module type X = sig type x end module Tm (X : X) = struct open X type t = | Lam of (x -> t) | App of t * t | Var of x end

Pre-term is not a parametric type, it is a functor (a parametric structure). Note that the type of variables is “lambda-dropped” to the parameter of the functor, so the definition of type t is much less verbose; and it would stay as concise with 5 different kinds of variables. For instance, this is a valid pre-term:

module Pt = T (struct type x = float end) let ex : Pt.t = Pt.(Lam (fun x -> App (Var x, Var 3.14)))

OCaml does not let us instantiate functors in type definitions, so we must do this in two steps: first declare the module by instantiating the functor, and then giving an inhabitant to it.

Once again, For a pre-term to become a real term, it has to make no assumption on its “variable type” module. Let’s encode this by a functor, from the structure defining variables to a structure defining an inhabitant of the pre-term type:

module type TM = functor (X:X) -> sig val t : Tm(X).t end

Now, thanks to first-class modules, we have the ability to treat an inhabitant of this module type as an usual value:

type tm = (module TM)

Here it is, that’s our type of terms.

Examples

Let me give you an example of term:

let ex : tm = (module functor (X:X) -> struct module T = Tm(X) let t = T.(Lam (fun x -> Lam (fun y -> App (Var x, Var y)))) end : TM)

The value is a first-class module that is a functor, just like the TM module type dictates. Yes, it is substantially more to write than in the previous section, but the overhead is fixed. Note that the type annotation on the module is necessary (I don’t know why).

A function on this term representation, e.g. the count example from before, has to first unpack the first class module Tm and instantiate it with the right kind of variable X, grab the pre-term t contained in it; Then an auxiliary recursive function can traverse this pre-term. All in all, we get:

let count : tm -> int = fun (module Tm) -> let module X = struct type x = unit end in let module TmX = Tm(X) in let module TX = T(X) in let open TX in let rec aux : T(X).t -> int = function | Lam f -> aux (f ()) | App (t, u) -> aux t + aux u | Var () -> 1 in aux TmX.t

Again, it seems like a lot to type, but the overhead is constant, so it has better chances to scale.

Epilogue

Here was my implementation of PHOAS with “lambda-lifting” of type arguments, thanks to first-class modules. I guess that this trick could be useful for other large type definitions involving constant parameters, for instance, do you know the recursion scheme programming pattern? Also, try to encode the typed λ-calculus this way, using GADTs; you will need type x to be parametric on a type ‘a, therefore encoding rank-2 polymorphism. I did not get there yet.

As a bonus, here is the boring but still relevant identity function on λ-terms, which has the advantage of returning a term (unlike count ):

let id : tm -> tm = fun (module Tm) -> (module functor (X:X) -> struct let t = let module TmX = Tm(X) in let module TX = T(X) in let open TX in let rec id : t -> t = function | Lam f -> Lam (fun x -> id (f x)) | App (t, u) -> App (id t, id u) | Var x -> Var x in id TmX.t end)