Reducing boilerplate in finally tagless style

Introduction

Typed Tagless, a.k.a tagless-final or finally tagless, is an approach to embedding DSLs and modeling data in general, advocated by Oleg Kiselyov. Instead of defining a set of algebraic data types to describe data or terms, thus focusing on how data is constructed, the approach focuses on data consumption, defining a canonical eliminator for every constructor that we would otherwise define.

For instance, instead of defining lists as

data List a = Nil | Cons a ( List a) a (a)

we would define a class

class List rep a where rep a nil :: rep rep cons :: a -> rep -> rep reprep

which of course corresponds to the Böhm-Berarducci (or Church) encoding of the above algebraic type.

Oleg has written extensively on the merits of this approach. In this article, I want to discuss a certain aspect of writing transformations in the finally tagless style.

The use case: language-integrated query

Oleg, together with Kenichi Suzuki and Yukiyoshi Kameyama, have published a paper Finally, Safely-Extensible and Efficient Language-Integrated Query. In this paper, they employ the finally tagless approach to embed, optimize, and interpret SQL queries in OCaml.

Here are some excerpts from their OCaml code:

(* Base Symantics *) module type Symantics_base = sig Symantics_base = ... (* lambda abstract *) val lam : ('a repr -> 'b repr) -> ('a -> 'b) repr lam : ('a repr -> 'b repr) -> ('a -> 'b) repr (* application *) val app : ('a -> 'b) repr -> 'a repr -> 'b repr app : ('a -> 'b) repr -> 'a repr -> 'b repr ... end (* Symantics with list operations *) module type SymanticsL = sig SymanticsL = include Symantics Symantics (* comprehension *) val foreach : ( unit -> 'a list repr) -> foreach : (-> 'arepr) -> list repr) -> 'b list repr ('a repr -> 'brepr) -> 'brepr (* condition *) val where : bool repr -> ( unit -> 'a list repr) -> 'a list repr where :repr -> (-> 'arepr) -> 'arepr (* yield singleton list *) val yield : 'a repr -> 'a list repr yield : 'a repr -> 'arepr (* empty list *) val nil : unit -> 'a list repr nil :-> 'arepr (* not empty *) val exists : 'a list repr -> bool repr exists : 'arepr ->repr (* union list *) val (@%) : 'a list repr -> 'a list repr -> 'a list repr (@%) : 'arepr -> 'arepr -> 'arepr (* the table constructor which take a table name and table contents *) val table : ( string * 'a list ) -> 'a list repr table : (* 'a) -> 'arepr end

(‘Symantics’ is not a typo; it’s a portmanteau of ‘syntax’ and ‘semantics’.)

Transformations

A SQL trasnformation (such as transforming a subquery to a join) is represented by an ML functor, i.e. a function mapping one SymanticsL to another, which interprets the term slightly differently than the original one. I say slightly, because normally a transformation touches only a few relevant methods. The others are transformed mechanically following the Reflection-Reification pattern (RR). Informally speaking, we leave the irrelevant methods unchanged, applying the minimal transformation that makes them typecheck.

The question is, how to avoid mentioning irrelevant methods when defining a transformation?

This question is not idle. The language-integrated query code contains about 40 methods and 13 transformations. Pause for a second and imagine the amount of boilerplate that would have to be written if we needed to define every single method for every transformation. As we will see below, ML modules make this a non-issue. In Haskell, however, it is an issue, exhibited in Oleg’s own Haskell example (although easy to miss for a class that only contains 3 methods).

In OCaml, the RR is defined as a transformation of the whole module:

module OL(X:Trans) OL(X:Trans) with type 'a repr = 'a X.from) = struct (F:SymanticsL'a repr = 'a X.from) = include O(X)(F) O(X)(F) open X let foreach src body = foreach src body = fun () -> bwd (src ())) ( fun x -> bwd (body (fwd x)))) fwd (F.foreach (() -> bwd (src ())) (x -> bwd (body (fwd x)))) let where test body = where test body = fun () -> bwd (body ()))) fwd (F.where (bwd test) (() -> bwd (body ()))) let yield e = fmap F.yield e yield e = fmap F.yield e let nil () = fwd (F.nil ()) nil () = fwd (F.nil ()) let exists e = fmap F.exists e exists e = fmap F.exists e let (@%) e1 e2 = fmap2 F.(@%) e1 e2 (@%) e1 e2 = fmap2 F.(@%) e1 e2 let table (name,data) = table (name,data) = fwd @@ F.table (name, data) end

When they define a transformation, they first transform the module in this mechanical fashion, and then override the few relevant methods:

module AbsBeta_pass(F:SymanticsL) = struct AbsBeta_pass(F:SymanticsL) = module X0 = struct X0 = type 'a from = 'a F.repr 'a from = 'a F.repr type 'a term = Unknown : 'a from -> 'a term 'a term = Unknown : 'a from -> 'a term | Lam : ('a term -> 'b term) -> ('a -> 'b) term let fwd x = Unknown x (* generic reflection *) fwd x = Unknown x let rec bwd : type a. a term -> a from = function (* reification *) bwd :a. a term -> a from = | Unknown e -> e fun x -> bwd (f (fwd x))) | Lam f -> F.lam (x -> bwd (f (fwd x))) end open X0 X0 module X = Trans_def(X0) X = Trans_def(X0) open X (* optimization *) module IDelta = struct IDelta = let lam f = Lam f lam f = Lam f let app e1 e2 = app e1 e2 = match e1 with e1 | Lam f -> f e2 | _ -> fmap2 F.app e1 e2 end end (* Combine the concrete optimization with the default optimizer *) module AbsBeta(F:SymanticsL) = struct AbsBeta(F:SymanticsL) = module M = AbsBeta_pass(F) M = AbsBeta_pass(F) include OL(M.X)(F) (* the default optimizer *) OL(M.X)(F) include M.IDelta (* overriding `lam` and `app` *) M.IDelta end

How can we do this in Haskell?

Explicit dictionaries

An explicit dictionariy (a data type containing methods as its fields) seems like a great fit for Symantics . The RR transformation would be a simple function mapping one record to another. To define a transformation, we would override the relevant methods via record update.

However, explicit dictionaries are not that well suited for the finally tagless style. In OCaml, you can include one module into another (notice include Symantics in the OCaml code above). This “unpacks” the contents of one module into another, so that when you open the second module, the contents of the first module is available, too.

This is important for the finally tagless style. One of its strength is extensibility, which is achieved through such inclusion. Consequently, deep inclusion chains are common. With Haskell’s data types, unpacking such chains manually at every use site will quickly become unwieldy.

Type classes

Type classes are better suited for inclusion. If we declare

class Symantics1 rep => Symantics2 rep where { ... } reprep

and impose a Symantics2 rep constraint on a function definition, the methods of Symantics1 become available without any additional effort.

But then we don’t have good support for RR. Type class instances are not first class citizens; we can’t declare a function that transforms one instance into another. Nor can we create one instance from another by overriding a few methods… Or can we?

We can achieve our goal by using default method signatures.

We define the RR transformation simultaneously with the class itself:

class Symantics rep where rep lam :: (rep a -> rep b) -> rep (a -> b) (rep arep b)rep (ab) lam :: RR t rep => (t rep a -> t rep b) -> t rep (a -> b) defaultt rep(t rep at rep b)t rep (ab) = fwd $ lam $ bwd . f . fwd lam ffwdlambwdfwd app :: rep (a -> b) -> rep a -> rep b rep (ab)rep arep b app :: RR t rep => t rep (a -> b) -> t rep a -> t rep b defaultt rept rep (ab)t rep at rep b = fwd $ bwd f `app` bwd x app f xfwdbwd fbwd x foreach :: rep [a] -> (rep a -> rep [b]) -> rep [b] rep [a](rep arep [b])rep [b] foreach :: RR t rep => t rep [a] -> (t rep a -> t rep [b]) -> t rep [b] defaultt rept rep [a](t rep at rep [b])t rep [b] = fwd $ foreach (bwd a) (bwd . b . fwd) foreach a bfwdforeach (bwd a) (bwdfwd) ...

The implementation of RR is straightforward:

class RR t rep where t rep fwd :: rep a -> t rep a rep at rep a bwd :: t rep a -> rep a t rep arep a

Now let’s define the AbsBeta pass in Haskell.

data AbsBeta rep a where rep a Unknown :: rep a -> AbsBeta rep a rep arep a Lam :: ( AbsBeta rep a -> AbsBeta rep b) -> AbsBeta rep (a -> b) rep arep b)rep (ab) instance Symantics rep => RR AbsBeta rep where reprep = Unknown fwd = \ case bwd Unknown t -> t Lam f -> lam (bwd . f . fwd) lam (bwdfwd) instance Symantics rep => Symantics ( AbsBeta rep) where reprep) = Lam lam = app f x case f of Unknown f' -> fwd $ app f' (bwd x) f'fwdapp f' (bwd x) Lam b -> b x b x

All the methods not mentioned in the last instance get their default implementations based on RR, which is exactly what we wanted.

Associated types

Apart from methods, ML/OCaml modules can also define types. This is used in the Language-integrated query paper and code in the following way:

( * Base Symantics * ) module type Symantics_base = sig sig type 'a repr ( * representation type * ) 'a repr (representation : (unit -> 'a repr) -> 'a obs val observe(unit'a repr)'a obs ...

In Haskell, we can replicate that with an associated type:

class SymanticsObs rep where rep type Obs rep :: * -> * observe :: rep a -> Obs rep a rep arep a observe :: RR t rep => t rep a -> Obs rep a defaultt rept rep arep a = observe . bwd observeobservebwd

The default definition for observe saves us from redefining it for derived representations, but what about Obs itself? We would like to write, in the spirit of default method signatures,

class SymanticsObs rep where rep type Obs rep :: * -> * type Obs (t rep) = rep (t rep)rep

However, GHC would not let us to. Since recently, GHC does support default type declarations, but they need to be of the general form type Obs rep = ... .

Nevertheless, we can create a type family that will extract the rep from t rep for us:

type family Peel ( rep :: * -> * ) :: ( * -> * ) where Peel (t rep) = rep (t rep)rep class SymanticsObs rep where rep type Obs rep :: * -> * type Obs rep = Obs ( Peel rep) reprep) observe :: rep a -> Obs rep a rep arep a observe :: RR t rep => t rep a -> Obs rep a defaultt rept rep arep a = observe . bwd observeobservebwd

Now we can say

instance ( Symantics rep, SymanticsObs rep) => SymanticsObs ( AbsBeta rep) rep,rep)rep)

without having to define either type Obs or observe explicitly.

Conclusion

Extensions such as default method signatures, default associated types, and type families can significantly reduce the boilerplate when defining transformations in the finally tagless style.

Update. Although I missed it on the first reading of the paper, /u/rpglover64 on reddit points out that the authors themselves acknowledge the boilerplate problem which this article addresses: