The Need for Partial Type Annotations

Consider some of the interesting things that have been done with Haskell’s type system over the last decade or so:

Session types for enforcing compatibility of communication protocols.

Information flow controls for managing confidential information withing software.

Reflecting values in types for propogating runtime preferences in a program (Oleg’s implicit configurations).

and much more. Indeed, there are plenty more. It’s fairly easy to dream up problems for which clever type system trickery can help enforce the correctness of a program. This is a common theme in Haskell programming, and a major reason it’s frequently used for high-assurance environments.

In particular, many of these techniques share three characteristics: (1) They involve carrying information in phantom types; (2) they rely heavily on the type system to propogate information around by the mechanism of type inference; and (3) if the resulting types are written down explicitly, they can be rather fearsome to behold, sometimes being quite a bit longer than the corresponding terms. There are several reasons for property (3). One could argue that it’s because we lack, for types, the same quality tools for abstraction that are commonly available for terms; or if we have them, we tend not to use them so often. It may be inherent complexity of the information we’re dealing with. Likely it’s a combination of all of those. Whatever the cause, though, I think it’s fairly difficult to argue that it’s true.

Thesis: In its current form, Haskell type annotation is in conflict with advanced type system hackery.

As I think more about the issue, I come to believe this more strongly. And furthermore, I come to believe that it’s severely limiting the things we do in practice with our types. There are many things one can do with building fancy types to capture program properties… but only if you never intend to write a type annotation. The instant you want to annotate types, things get a lot more hairy. The symptoms of this disease are fairly easy to recognize.

Symptom #1: The documentation advises new users not to even try to understand the types.

This is troubling to see, since types often capture precisely the most fundamental information you need to know to get started with an unfamiliar API: what parameters are required, what sort of data it works with, and so on. Indeed, it runs directly counter to the idea (which I still believe) that in Haskell one should often “design by types” — that is, start out by writing some type annotations, and then go from there. But if the interesting information in a type is buried in several lines of an incomprehensible phantom type, then new users can be more confused by looking at type annotations than enlightened . Thus, we’ve lost a key benefit of having the type system in place.

Symptom #2: Top-level API elements lack type annotations completely.

While type inference is quite helpful, it’s also the case that annotating top-level API elements is an easy win for libraries and reusable code. That is, unless the types have gotten out of control. The more advanced type-system cleverness a piece of code contains, the less likely one sometimes is to see type annotations.

Symptom #3: Users are urged to get help from GHCi if they need to write type annotations.

I’ve seen this a time or two, as well. If the type system hackery escapes a library and infects user code as well, the user sometimes insists on annotating types. Then they are occasionally urged to ask GHCi for help. Indeed, GHC now helpfully provides the inferred type when giving its warning for top-level definitions that lack a type annotation. But what purpose could there be in writing a type annotation if even the library author doesn’t understand why the term has that particular type? (Not that I’m not arguing against the GHC warning; indeed, I think there are several cases where it’s very helpful. But it should not be necessary.)

Symptom #4: A type-based technique for solving a common problem is published and presented at a workshop or conference, but rarely used.

This problem doesn’t arise in writing papers for conferences on doing impressive things with types; it arises later, when one attempts to use the type hackery and still keep a clean and modular code base. As a result, we stay in a world where programmers tell themselves they know about the need to be careful with a certain problem (e.g., making sure two pieces of code talk a compatible protocol), are even aware of type-based techniques to solve it, but simply prefer to take the chance and do the work by hand because they see the alternative as too work-intensive.

Recommended Treatment:

Here is where I get a bit more speculative. Suppose that we make a few simplifying assumptions.

First, we assume that the information carried in these types really, truly is the sort of thing that we need to propagate around to check for consistency — that is, it wasn’t a mistake to put them in the type system, and they really ought to be there. Certainly, there are cases in which information that’s not relevant externally ends up getting leaked in phantom types. Those should be fixed, rather than worked around. We’re only concerned with situations where we need the type information. (Related to this, polymorphism is not the answer. we’re not interested in declaring that our term is polymorphic in the complicated type. Rather, it has a specific value that matters, which we want the type system to infer and check when it’s used, but which we just don’t want to write out explicitly.)

Second, suppose that the information we are carrying around just really is that complicated. It may be that there’s good work to be done in building type-level abstractions (or showing how to use existing abstractions more effectively) that can bundle up some of this type-hackery more effectively and encapsulate it to make it easier to use… but I’m just not going that direction in this article.

Finally, suppose we’re okay with being less than explicit about them. Indeed, we want to me less than explicit, because being explicit was so odious that we were considering abandoning this awesome new type system technique entirely. But we don’t want to be completely implicit. That’s the problem. Currently, I have a choice between saying nothing at all about the type of a term, or else giving an authoritative answer about the type. What I really want is something in-between.

In a phrase, what I want is a partial type annotation.

This raises a lot of issues, which I’m not going into. What kinds of partial information are we allowed to give? What’s a reasonable syntax? What effects might this have on the decidability of type checking or type inference? I don’t have the answers to these questions; my point is merely to argue that this is worth thinking about.

PS: An interesting and related question is: supposing one has a nice syntax for partial type annotations, can one then modify GHC’s error reporting to use it by generalizing away those parts of the reported types that are not relevant to a detected error?