Around 2 years ago, for an earlier project of mine (which has not seen its light yet!) in which I had to build a language with variables and prove its properties, I surveyed a number of ways to handle binders. For some background, people have noticed that, when proving properties about a language with bound variables, one often ends up spending most of the effort proving “uninteresting” things about names, substitution, weakening, etc. For formal theorem proving to be practical, the effort has to be reduced. A number of approaches were proposed. Among them, the “locally nameless” style appeared to be interesting and promising.

With the only reference I used, Engineering Formal Metatheory by Aydemir et al., which outlines the principle ideas of the approach, I imagined how it works and tried to implement my own version in Agda. My first few implementations, however, all ended up in a mess. It appeared that there was an endless number of properties to prove. Besides the complexity of the language I was implementing, there must be something I got wrong about the locally nameless representation. Realising that I could not finish the project this way, I eventually decided to learn from the basics.

I started with the tutorial by Chargueraud, with complete code in Coq. I would then follow his footprints using Agda. The task is a classical one: define untyped λ-calculus and its reduction rules, and prove the Church-Rosser theorem.

Term

From an abstract level, there is nothing too surprising. We define a syntax for untyped λ-calculus that distinguishes between free and bound variables:



data Term : Set where bv : (i : BName) → Term fv : (x : FName) → Term ƛ : (e : Term) → Term _·_ : (e₁ : Term) → (e₂ : Term) → Term

where BName = ℕ represents bound variables by de Bruin indexes, while FName is the type of free variables. The latter can be any type that supports equality check and a method that generates a new variable not in a given set (in fact, a List ) of variables. If one takes FName = String , the expression λ x → x y , where y occurs free, is represented by ƛ (bv 0 · fv "y") . For ease of implementation, one may take FName = ℕ as well.

Not all terms you can build are valid. For example, ƛ (bv 1 · fv "y") is not a valid term since there is only one ƛ binder. How to distinguish the valid terms from invalid ones? I would (and did) switch to a dependent datatype Term n , indexed by the number of enclosing binders, and let BName = Fix n . The index is passed top-down and is incremented each time we encounter a ƛ . Closed terms are then represented by the type Term 0 .

The representation above, if works, has the advantage that a term that can be build at all must be valid. Choosing such a representation was perhaps the first thing I did wrong, however. Chargueraud mentioned a similar predicate that also passes the “level” information top-down, and claimed that the predicate, on which we will have to perform induction on to prove property about terms, does not fit the usual pattern of induction. This was probably why I had so much trouble proving properties about terms.

The way to go, instead, is to use a predicate that assembles the information bottom up. The predicate LC (“locally-closed” — a term is valid if it is locally closed) is defined by:



data LC : Term → Set where fv : ∀ x → LC (fv x) ƛ : (L : FNames) → ∀ {e} → (fe : ∀ {x} → (x∉L : x ∉ L) → LC ([ 0 ↦ fv x ] e)) → LC (ƛ e) _·_ : ∀ {e₁ e₂} → LC e₁ → LC e₂ → LC (e₁ · e₂)

A free variable alone is a valid term. A application f · e is valid if both f and e are. And an abstraction ƛ e is valid if e becomes a valid term after we substitute any free variable x for the first ( 0 -th) bound variable. There can be an additional constraint on x , that it is not in L , a finite set of “protected” variables — such co-finite quantification is one of the features of the locally nameless style.

The “open” operator [ n ↦ t ] e substitutes the term t for the n -th bound variable in e . It is defined by



[_↦_] : ℕ → Term → Term → Term [ n ↦ t ] (bv i) with n ≟ i ... | yes _ = t ... | no _ = bv i [ n ↦ t ] (fv y) = fv y [ n ↦ t ] (ƛ e) = ƛ ([ suc n ↦ t ] e) [ n ↦ t ] (e₁ · e₂) = [ n ↦ t ] e₁ · [ n ↦ t ] e₂

note how n is incremented each time we go into a ƛ . A dual operator,



[_↤_] : ℕ → FName → Term → Term

instantiates the n -th bound variable to a term.

β Reduction and Church-Rosser Theorem

Small-step β reduction can be defined by:



data _β→_ : Term → Term → Set where β-red : ∀ {t₁ t₂} → Body t₁ → LC t₂ → ((ƛ t₁) · t₂) β→ [ 0 ↦ t₂ ] t₁ β-app₁ : ∀ {t₁ t₁' t₂} → LC t₂ → t₁ β→ t₁' → (t₁ · t₂) β→ (t₁' · t₂) β-app₂ : ∀ {t₁ t₂ t₂'} → LC t₁ → t₂ β→ t₂' → (t₁ · t₂) β→ (t₁ · t₂') β-ƛ : ∀ L {t₁ t₁'} → (∀ x → x ∉ L → ([ 0 ↦ fv x ] t₁) β→ ([ 0 ↦ fv x ] t₁')) → ƛ t₁ β→ ƛ t₁'

where β-red reduces a redux, β-app₁ and β-app₂ allows reduction respectively on the right and left hand sides of an application, and β-ƛ goes into a ƛ abstraction — again we use co-finite quantification.

Given _β→_ we can define its reflexive, transitive closure _β→*_ , and the reflexive, transitive, symmetric closure _β≣_ . The aim is to prove that _β→*_ is confluent:



β*-confluent : ∀ {m s t} → (m β→* s) → (m β→* t) → ∃ (λ u → (s β→* u) × (t β→* u))

which leads to the Church-Rosser property:



β-Church-Russer : ∀ {t₁ t₂} → (t₁ β≣ t₂) → ∃ (λ t → (t₁ β→* t) × (t₂ β→* t))

At an abstract level, the proof follows the classical route: it turns out that it is easier to prove the confluence of a “parallel reduction” relation _⇉_ which allows β reduction to happen in several places of a term in one step. We then prove that _β→*_ is equivalent to _⇉*_ , thereby proving the confluence of _β→*_ as well. All these can be carried out relatively nice and clean.

The gory details, however, hides in proving the infrastructutal properties supporting the abstract view of the proofs.

Infrastructures

Confluence, Church-Rosser… these are the interesting stuffs we want to prove. However, we often end up spending most of the time proving those infrastructure properties we have to prove — which is why there have been so much recent research hoping to find better representations that simplify them. The locally nameless style is supposed to be such a representation. (Another orthogonal topic is to seek generic representations such that the proofs can be done once for all languages.)

In my code, most of these properties are piled in the file Infrastructure.agda . They range from stuffs you might expect to have:



open-term : ∀ k t {e} → LC e → e ≡ [ k ↦ t ] e close-var-open-aux : ∀ k x e → LC e → e ≡ [ k ↦ fv x ] ([ k ↤ x ] e)

to stuffs not that obvious:



open-var-inj : ∀ k x t u → x ∉ fvars t → x ∉ fvars u → [ k ↦ fv x ] t ≡ [ k ↦ fv x ] u → t ≡ u open-term-aux : ∀ j t i u e → ¬ (i ≡ j) → [ j ↦ t ] e ≡ [ i ↦ u ] ([ j ↦ t ] e) → e ≡ [ i ↦ u ] e

The lemma open-var-inj is one of the early lemmas that appeared in Chargueraud’s tutorial, which might give one the impression that it is an easy first lemma to prove. On the contrary, it is among the tedious ones — I needed a 40-line proof (most of the cases were simply eliminated by contraction, though).

It takes experience and intuition to know what lemmas are needed. Without promise that it will work, I would think something must have gone wrong when I found myself having to prove weird looking lemmas like:



close-var-rec-open : ∀ x y z t i j → ¬(i ≡ j) → ¬(y ≡ x) → y ∉ fvars t → [ i ↦ fv y ] ([ j ↦ fv z ] ([ j ↤ x ] t)) ≡ [ j ↦ fv z ] ([ j ↤ x ] ([ i ↦ fv y ] t))

which is not easy to prove either.

The Bottom Line?

So, is the locally nameless representation what it claims to be — a way to represent binders that simplifies the infrastructural proofs and is easier to scale up? When I was struggling with some of the proofs in Infrastructure.agda I did wonder whether the claim is true only for Coq, with cleverly designed proof tactics, but not for Agda, where everything is done by hand (so far). Once the infrastructural proofs are done, however, the rest was carried out very pleasantly.

To make a fair comparison, I should re-implement everything again using de Bruin notation. That has to wait till some other time, though. (Any one want to give it a try?)

It could be the case that, while some proofs are easily dismissed in Coq using tactics, in Agda the programmer should develop some more abstractions. I did feel myself repeating some proof patterns, and found one or two lemmas that do not present in Chargueraud’s tutorial which, if used in Agda, simplifies the proofs a bit. There could be more, but at this moment I am perhaps too involved in the details to see the patterns from a higher viewpoint.

The exercise does pay off, though. Now I feel I am much more familiar with this style, and am perhaps more prepared to use it in my own project.

Code

A zip file containing all the code.

References