Datatypes are a conundrum for designers of type theories and advanced programming languages. For doing formal reasoning, we rely heavily on datatype induction, and so datatypes seem to be essential to the language design. And they are a terrible nuisance to describe formally, adding a hefty suitcase to the existing baggage we are already coping with. For example, in the “Sep3” Trellys design we have been developing for the past 12 months now, we are contending with a programming language with Type:Type and dependent types, plus a distinct predicative higher-order logic, with syntax for proofs, formulas, and predicates. This language is getting a bit big! And then one has to describe datatypes, and you need an Ott magician to make it all work (fortunately we do have one, but still). What a hassle!

Can’t we just get rid of those darn datatypes? The Pi-Sigma language of Altenkirch et al. and recent proposals of McBride et al. are out to do this, or at least minimize what one needs to accept as a primitive datatype. In this post, I want to talk about the problems you encounter if you lambda-encode your datatypes, and why I think we might be able to get around these in Sep3 Trellys.

There are two terrible snarls that we face with lambda-encoding datatypes, and they both have to do with eliminations (that is, with the ways we are allowed to use data that we have or assume have already constructed). The first snarl is multi-level elimination. Suppose you want to allow the construction of data by recursion on other data (standard), together with construction of types by recursion on data (exotic: “large eliminations”). Then you have a big problem right away with using lambda-encoded data, because lambda-encoding with either the Scott or Church encoding requires fixing a level for elimination of your data. Let me say more about this, looking for the moment just at the well-known Church encoding. In this encoding, a numeral like 2 is encoded as . I am writing this Curry-style (without typing annotations) in the hopes of making it more readable. The type you want to assign to this term is then . The numeral 2 has been encoded as its own iterator: given a function of type for some type , and a starting point of type , the numeral 2 will apply twice starting with . The problem with this encoding is that it is forced to pick a level of the language for . Here, we said that is a type, so values computed by 2 must be at the term level of the language. So we cannot use this encoding of 2 if we want to compute at the type level. For that, we would need the type , and that is quite bad since if we are going to have impredicative quantification over kinds as well as types, we will lose normalization (I don’t have a precise reference for this, but heard it through the type theory grape vine, attributed to Thorsten Altenkirch, that impredicativity is a trick we can only play once.) Anyhow, even if we had such impredicativity, we would need to assign multiple types to this encoded 2, and that sounds very worrisome: users are carefully crafting annotations to show a single typing, and it would become quite wild, a priori, to be crafting annotations to achieve multiple typings.

The second snarl is dependent elimination. If we want to use Church-encoded 2 for writing dependently typed functions, or (as is the same under the Curry-Howard isomorphism) proving theorems, the simple polymorphic type we just considered is not good enough. From an encoded natural number , we want to produce data of type , for some type constructor . So instead of , we really want something like . But this looks problematic, as we are trying to define the type itself! What types can we possibly give to and in that definition? Their types need to use itself, which we are trying to define. Now we could use recursive types to try to get around that problem, but the situation is actually even worse: the pseudo-type I wrote is supposed to be something like the definition for . That is, we would expect our encoding of 2 to have that type. But then we see that even as a pseudo-type, it does not make sense. We actually want 2 to have the type . That is, its type should mention 2 itself! In general, we want a natural number to have type . How can something’s type mention the thing itself? This reminds me of very dependent function types (see Section 3 of Formal Objects in Type Theory Using Very Dependent Types by Jason Hickey, for example). It seems pretty exotic.

Now in the context of our Sep3 Trellys design, there is actually a little light here through all these complex difficulties for lambda encodings. First of all, we have Type : Type in the programming part of Sep3. So we could hope that the problem with multi-level elimination would go away. It actually doesn’t with our current design, because we have adopted a predicative higher-order logic on the logic side, which has actually forced us to add a universe hierarchy on the programming side. So we have Type 0 : Type 0, and then Type 0 : Type 1 : Type 2 … Now it occurs to me that all this trouble would just go away, and the language design would get much simpler, if we just bit the bullet and adopted an impredicative higher-order logic. I immediately lose a lot of confidence that our logical language would be normalizing — but that’s what we do careful metatheoretic study for. There’s no reason to assume that impredicativity will cost us consistency, so let’s not be pessimistic.

If we do not stratify the logical side of the language, we need not stratify the programming side, and so we won’t have multiple levels at which to eliminate an encoded piece of data. This would be great, as it would (almost) knock down the first objection to lambda-encoding datatypes. There’s another issue one would have to worry about, though, and that is eliminating data in proofs (since Sep3 distinguishes proofs and terms). I am going on a bit too long, and my thinking isn’t settled yet, but there’s one very nice thing we could hope for about eliminating data in proofs. Instead of using the Church encoding, we could use the Scott encoding. Here, 2 is encoded as . Data are encoded as their own case-statements, rather than as their own iterators. Why is this relevant? Well, I have to note first that now we really will need recursive types of some kind for this, because intuitively, we want the (simple) type for Scott encoded 2 to be . Note the reference to in the type assigned for . We need to use a recursive type to allow to be defined in terms of itself like this. But here’s the cool observation: Garrin Kimmell and Nathan Collins proposed for Trellys (both Sep3 and U. Penn. Trellys) that we base induction on an explicit structural subterm ordering in the language. We have an atomic formula for expressing that the value of terminating is structurally smaller than the value of terminating . The obvious but beautiful observation is that with the Scott encoding, we really do have . Note that this is not true with the Church encoding, since is not a subterm of .

Ok, I’ve gone on too long and haven’t gotten to address the problem with dependent elimination yet. But suffice it to say that lambda-encoding datatypes, which was always eminently attractive, is starting to look like a possibility for Sep3.