Trees That Shrink

Posted on 19 June 2018

This blog post is also available as an IHaskell notebook.

I read this paper a while ago and people seemed pretty excited about it, although I couldn’t see why. Fortunately, someone posed me an interesting problem recently and in the process of tackling it I think I understand now.

Suppose we have a simple algebraic data type representing the lambda calculus with de Bruijn indices, which are a way of avoiding the messy problem of variable names:

data Expr a = Lit a | Var Int | Abs ( Expr a) a) | App ( Expr a) ( Expr a) a) (a) deriving ( Show )

So far so good! Now we want to have let bindings in this language, and we’ve decided that we want named variables as well while we’re at it because de Bruijn indices are not so much fun to work with. We dutifully define another data type:

data Expr' a = Lit' a | Var' String | Abs' ( Expr' a) a) | App' ( Expr' a) ( Expr' a) a) (a) | Let' String ( Expr' a) ( Expr' a) a) (a) deriving ( Show )

Let bindings can be easily desugared into lambda abstractions as follows:

let <n> = <x> in <y> <=> (

-> y) x

which will help us to write a simpler evaluator. However, we’d also like to make sure the desugaring has been implemented correctly, perhaps by converting to some intermediate state where both the name and the correct de Bruijn index coexist peacefully. We have a couple of options, none of which are great:

Define a third data type and then write an indexing pass that converts Var String to Var (String, Int) and then a desugaring pass that converts that to Expr a . Work entirely within the bigger data type, forget about indexing, and throw errors whenever a Let is encountered after a desugaring pass. Combine the desugaring and indexing passes into one, and forget about keeping track of the desugaring.

Let’s implement the third:

import qualified Data.Map.Strict as Map type Env = Map.Map String Int desugarAndAnonymise :: Env -> Expr' a -> Expr a = case expr of desugarAndAnonymise env exprexpr Lit' a -> Lit a Var' name -> Var (env Map.! name) name(envname) Abs' expr' -> let expr' = Map.map succ env env'Map.mapenv in Abs (desugarAndAnonymise env' expr') (desugarAndAnonymise env' expr') App' f x -> App (desugarAndAnonymise env f) (desugarAndAnonymise env x) f x(desugarAndAnonymise env f) (desugarAndAnonymise env x) Let' n v expr' -> desugarAndAnonymise env ( App' ( Abs' expr') v) n v expr'desugarAndAnonymise env (expr') v)

That wasn’t a lot of fun to write, I have no idea if I did the conversion from names to indices correctly, and there’s no easy way to check if I did.

These problems are (barely) manageable in this case, but what if we want to add more syntax sugar or share this data type with other libraries that have different use cases? We’d either have to write variations on a theme over and over again or say goodbye to type safety. It also becomes harder and harder to decompose our functions into smaller ones that only do one thing. There has to be a better way!

This is the subject of Trees that Grow by Shayan Najd and Simon Peyton Jones. They noticed the need for this approach when looking at GHC’s abstract syntax tree type but the idiom is generally applicable.

The first insight is that defining different data types for each use case is the wrong approach. Conceptually all these data types are the same type, just with different annotations or decoration, and we should be able to define the base data type with enough extensibility to accommodate all the other use cases.

The second insight is that we can use some of GHC’s fancier features to ease the pain of working with this ridiculously extensible data type, such as type families and pattern synonyms.

The third insight is that this can be made to work with other language features, such as generalised abstract data types and existentials! We won’t use this here, but it’s great to know that it’s possible.

Let’s see how we can use it to solve our problem. The first thing to do is turn on some language extensions, as with anything moderately fun in Haskell:

{-# LANGUAGE TypeFamilies #-} {-# LANGUAGE PatternSynonyms #-}

The next steps are:

Add a type parameter to the data type declaration. Add a field to each constructor with a new data type that uses this parameter (and any others). Add an extra constructor to enable extending this data type, also with a new data type as in step 2. Define type families for each of the new data types you declared in steps 2 and 3.

And we’re done! This is what that looks like for our example:

data ExpX i a i a = LitX ( XLit i a) a i a) a | VarX ( XVar i a) i a) | AbsX ( XAbs i a) ( ExpX i a) i a) (i a) | AppX ( XApp i a) ( ExpX i a) ( ExpX i a) i a) (i a) (i a) | ExpX ( XExp i a) i a) type family XLit i a i a type family XVar i a i a type family XAbs i a i a type family XApp i a i a type family XExp i a i a

The next thing to do is to reconstruct our original data type with no extensions. To do this we’ll need to use the Data.Void package:

import Data.Void void :: Void = error "Attempt to evaluate void" void

Now we can define ExpUD (UD for “undecorated”) using Int for our Var like we originally wanted and Void for all other extension points. It might get frustrating to construct and work with these values by hand, so we can use pattern synonyms to ease this pain.

type ExpUD a = ExpX UD a data UD type instance XLit UD a = Void type instance XVar UD a = Int type instance XAbs UD a = Void type instance XApp UD a = Void type instance XExp UD a = Void pattern LitUD :: a -> ExpUD a pattern LitUD a <- LitX _ a _ a where LitUD a = LitX void a void a pattern VarUD :: Int -> ExpUD a pattern VarUD i <- VarX i where VarUD i = VarX i pattern AbsUD :: ExpUD a -> ExpUD a pattern AbsUD a <- AbsX _ a _ a where AbsUD a = AbsX void a void a pattern AppUD :: ExpUD a -> ExpUD a -> ExpUD a pattern AppUD f a <- AppX _ f a f a_ f a where AppUD f a = AppX void f a f avoid f a

This is unfortunately a bit boilerplate-y, but now we can define that intermediate data type that uses named variables as well as de Bruijn indices! As a bonus, we can now have named parameters to lambdas, which would not have been possible with the approach we tried to use above.

type ExpAnn a = ExpX Ann a data Ann type instance XLit Ann a = Void type instance XVar Ann a = ( String , Int ) type instance XAbs Ann a = String type instance XApp Ann a = Void type instance XExp Ann a = Void pattern LitAnn :: a -> ExpAnn a pattern LitAnn a <- LitX _ a _ a where LitAnn a = LitX void a void a pattern VarAnn :: String -> Int -> ExpAnn a pattern VarAnn s i <- VarX (s,i) s i(s,i) where VarAnn s i = VarX (s, i) s i(s, i) pattern AbsAnn :: String -> ExpAnn a -> ExpAnn a pattern AbsAnn s a <- AbsX s a s as a where AbsAnn s a = AbsX s a s as a pattern AppAnn :: ExpAnn a -> ExpAnn a -> ExpAnn a pattern AppAnn f a <- AppX _ f a f a_ f a where AppAnn f a = AppX void f a f avoid f a

Now for my favourite part, where we add let bindings! We achieve this by using a tuple to represent let <name> = <expr> in <expr> as (<name>, <expr>, <expr>) , and we use named variables and parameters in this representation.

type ExpLet a = ExpX Let a data Let type instance XLit Let a = Void type instance XVar Let a = String type instance XAbs Let a = String type instance XApp Let a = Void type instance XExp Let a = ( String , ExpLet a, ExpLet a) a,a) pattern LitLet :: a -> ExpLet a pattern LitLet a <- LitX _ a _ a where LitLet a = LitX void a void a pattern VarLet :: String -> ExpLet a pattern VarLet s <- VarX s where VarLet s = VarX s pattern AbsLet :: String -> ExpLet a -> ExpLet a pattern AbsLet s a <- AbsX s a s as a where AbsLet s a = AbsX s a s as a pattern AppLet :: ExpLet a -> ExpLet a -> ExpLet a pattern AppLet f a <- AppX _ f a f a_ f a where AppLet f a = AppX void f a f avoid f a pattern LetLet n v e <- ExpX (n,v,e) n v e(n,v,e)

Now instead of writing a single giant pass, we can write smaller ones!

We can write a desugarer that preserves names and rewrites our let bindings as follows:

desugar :: Env -> ExpLet a -> ExpAnn a = case expr of desugar env exprexpr LitLet a -> LitAnn a VarLet name -> VarAnn name (env Map.! name) namename (envname) AbsLet name expr' -> let name expr' = Map.map succ env env'Map.mapenv = Map.insert name 0 env' env''Map.insert nameenv' in AbsAnn name (desugar env'' expr') name (desugar env'' expr') AppLet f x -> AppAnn (desugar env f) (desugar env x) f x(desugar env f) (desugar env x) LetLet n v expr' -> desugar env ( AppLet ( AbsLet n expr') v) n v expr'desugar env (n expr') v)

We can also write an anonymiser that throws the names away:

anonymise :: ExpAnn a -> ExpUD a = case expr of anonymise exprexpr LitAnn a -> LitUD a VarAnn _ i -> VarUD i _ i AbsAnn _ e -> AbsUD (anonymise e) _ e(anonymise e) AppAnn f x -> AppUD (anonymise f) (anonymise x) f x(anonymise f) (anonymise x)

And finally an evaluator that operates on undecorated expressions:

eval :: [a] -> ExpUD a -> a [a] = case expr of eval env exprexpr LitUD a -> a VarUD i -> env !! i env AbsUD f -> eval env f eval env f AppUD f x -> let f x = eval env x x'eval env x in eval (x' : env) f eval (x'env) f

Let’s see it in action!

= AbsLet "i" ( VarLet "i" ) identity = ( AbsLet "x" ( AbsLet "y" ( VarLet "x" ))) konst))) . anonymise . desugar Map.empty $ AppLet ( AppLet konst ( LitLet 1 )) ( LitLet 2 ) eval []anonymisedesugar Map.emptykonst ()) (

2

Awesome! We have composable compiler passes that are easier to write and to think about. Even with this small example, I hope the benefits are clear.

Edit: Edward Kmett points out that using Void makes this approach unnecessarily footgun-prone and suggests using strict fields and () instead. This allows for simpler pattern synonyms and comes with the additional benefit that we can disable constructors with Void so we can actually have trees that shrink:

data ExpX i a i a = LitX ! ( XLit i a) a i a) a | VarX ! ( XVar i a) i a) | AbsX ! ( XAbs i a) ( ExpX i a) i a) (i a) | AppX ! ( XApp i a) ( ExpX i a) ( ExpX i a) i a) (i a) (i a) | ExpX ! ( XExp i a) i a) type family XLit i a i a type family XVar i a i a type family XAbs i a i a type family XApp i a i a type family XExp i a i a type ExpUD a = ExpX UD a data UD type instance XLit UD a = () () type instance XVar UD a = Int type instance XAbs UD a = () () type instance XApp UD a = () () type instance XExp UD a = Void pattern LitUD :: a -> ExpUD a pattern LitUD a = LitX () a () a pattern VarUD :: Int -> ExpUD a pattern VarUD i = VarX i pattern AbsUD :: ExpUD a -> ExpUD a pattern AbsUD a = AbsX () a () a pattern AppUD :: ExpUD a -> ExpUD a -> ExpUD a pattern AppUD f a = AppX () f a f a() f a type ExpAnn a = ExpX Ann a data Ann type instance XLit Ann a = () () type instance XVar Ann a = ( String , Int ) type instance XAbs Ann a = String type instance XApp Ann a = () () type instance XExp Ann a = Void pattern LitAnn :: a -> ExpAnn a pattern LitAnn a = LitX () a () a pattern VarAnn :: String -> Int -> ExpAnn a pattern VarAnn s i = VarX (s, i) s i(s, i) pattern AbsAnn :: String -> ExpAnn a -> ExpAnn a pattern AbsAnn s a = AbsX s a s as a pattern AppAnn :: ExpAnn a -> ExpAnn a -> ExpAnn a pattern AppAnn f a = AppX () f a f a() f a type ExpLet a = ExpX Let a data Let type instance XLit Let a = () () type instance XVar Let a = String type instance XAbs Let a = String type instance XApp Let a = () () type instance XExp Let a = ( String , ExpLet a, ExpLet a) a,a) pattern LitLet :: a -> ExpLet a pattern LitLet a = LitX () a () a pattern VarLet :: String -> ExpLet a pattern VarLet s = VarX s pattern AbsLet :: String -> ExpLet a -> ExpLet a pattern AbsLet s a = AbsX s a s as a pattern AppLet :: ExpLet a -> ExpLet a -> ExpLet a pattern AppLet f a = AppX () f a f a() f a pattern LetLet n v e <- ExpX (n,v,e) n v e(n,v,e)

The definitions of desugar , anonymise , and eval are unchanged.

Thanks to Andy Chu and Peter Bhat Harkins for comments and feedback.