One of my pet peeves is when someone makes like they’re going to talk about software design, and then… talks about comments. Or they focus on the details about how functions get written.

At the risk of using a tired metaphor, this is a bit like when an architect wants to do interior design. It’s not that the insides of a room don’t matter—a poor architect can absolutely create rooms humans have no use for—but it’s also not the important part. Furniture can be rearranged, load-bearing walls cannot.

A function is the quintessential abstraction boundary. It’s one of the few cases where we should be able to ignore the details. (Performance still matters, of course, but even that’s something we can understand in a black-box fashion.) We can write better and worse quality functions, but the internal details aren’t really relevant to the software’s overall design. It should be isolated.

I think design is about everything that’s leftover after we remove all the function bodies. It worth taking a moment to think about what that looks like. Here are some of my own immediate thoughts:

Everything that’s left is just types. Even in dynamic languages, they’re just less detailed types: classes that have certain methods, functions that take a certain number of arguments, and the documentation associated with these things. Classical proscription against global state is largely about ensuring those types are meaningful. Any global state is something that could be used anywhere, and so it’s harder to guess how any function works as a result. Absent global state, the only things a function is working on are in its type signature. More modern aversions to deeply chaining method calls (e.g. “Law of Demeter”) are likewise attempts to meaningfully communicate what a function works with. I’m not a strong fan of these rules because they get misapplied. (Note to self: write a future post on this.) But there’s a solid kernel of a good idea here: a function should tend to operate on its actual stated arguments more than “literally anything it can possibly get to from its arguments.”

Humans like to give names to things. It’s part of how we go about understanding the world. In one of Feynman’s book he jokes about the uselessness of just knowing the name of something, but I think he was quite wrong there. The name is the start: it’s the idea you start attaching your understanding to. It’s the tool for communicating with other people, and looking things up. And in programming, everything we give a name to is, or is associated with, a type.

I was recently helping a kid learning to write some Python. They came across a situation where an interface was called for—but Python doesn’t really have interfaces. Inheritance wasn’t called for, as there was not even any shared implementation to speak of. After feeling a bit stupid for a minute (and after a quick google, deciding abc was too much to introduce), I ended up advising they just write up a “template” class in a comment. At least that gave them something to copy & paste, to start each new implementation of that not-interface. It’s really irksome when you can’t give a name to something important. Of course, it’s equally irksome when you have to provide a name for something that’s more structural. Java’s lack of function types comes to mind.

Thinking about design in terms of types

UML, for all its faults, at least got this part right. When it comes to design, we want to figure out what the types are, what their relationships are, and give them names. (I’ve mentioned before what UML got wrong, but in brief: it’s too OO, and design is iterative, while UML is pretty darn up-front. Good designs are the result of refactoring as we learn.)

C programmers end up with a similar design methodology, one I usually refer to as “representations first.” C has a primitive enough type system that thinking about types was really more thinking about representation than anything else.

One of the toxic parts of really early “static types” debates was that most people’s exposure to types was only in the pursuit of performance. C, pre-templates C++, pre-generics Java, and even Common Lisp compilers all basically used types just to make code faster. The idea that types could do anything else was foreign to many programmers for a long time.

Kernighan and Pike write that “the design of the data structures is the central decision in the creation of a program. Once the data structures are laid out, the algorithms tend to fall into place, and the coding is comparatively easy.”

This quote is from “The Practice of Programming,” which also goes on to say: One aspect of this point of view is that the choice of programming language is relatively unimportant to the overall design. We will design the program in the abstract and then write it in C, Java, C++, Awk, and Perl. There, I start to disagree. First, equating “data structures” with “types” is a very C way of thinking. The example that follows in the book is too simple to really criticize, but in general the sentiment they’re expressing here is sometimes derisively referred to as “you can write C in any language!” (A joking reference to earlier complaints that “you can write Fortran in any language.” I suppose it comes as no surprise I now sometimes hear people complain that some people can write Java in any language.)

Functional programming traditions, especially the branch that gave us Standard ML and Haskell, are also very focused on type-first programming. One of Haskell’s most basic innovations (well, I’d guess it probably wasn’t first, but compared to other relatively mainstream languages) was the ability to write the type of a function separately from writing the function body.

map :: (a -> b) -> [a] -> [b] ...

One the one hand, I’ve routinely felt the benefit of this language design. Writing down the function’s type (or more typically, a whole collection of functions’ types) before writing a single body line is very nice. It lets you think things through and do a beneficial amount of planning. There’s a reason Haskell programmers start by writing a function’s type, even though it could just be inferred.

On the other hand, I’ve occasionally lamented the lack of an explicit name for the arguments of a function. While I might not have trouble remembering argument order for map specifically, it is quite nice when your IDE can give you map(fn, lst) as a quick reminder of a function’s arguments and order. The Haskell declaration style has no canonical name for each parameter of a function because it’s immediately pattern matching. Trade-offs, I guess.

Deeper and more intricate types

Even with dynamic languages these days, we’re structuring our programs in terms of types. Just transitioning to a static language at all, though, is arguably a step down. We actually have research suggesting this: the most well known paper about static vs dynamic types and productivity is about a pretty impoverished static type system.

For static types to start to pay off and help us design programs, we need them to be capable enough. A good starting point is supporting all three kinds of types we might want to design. But since no language does that well, we’ll have to settle with some of them. This class of language include things like plain C, and Java before version 5.

The benefit of these languages is that more and more of the task of programming gets mapped out by the type. With an interface, we know an implementation has to meet a certain minimum. With a data type, we know a function will proceed by pattern matching over a certain set of cases. This approach also offers greater machine understanding of the code, allowing automated refactorings to work reliably.

The next step up in terms of types helping us design programs is parametricity. This arrived with Java generics, C++ templates, and so on. This feature originated with functional programming languages, and was eventually ported over.

Contrary to what many focus on (power isn’t what we’re always after: we should want properties), parameterized functions (and types) aren’t merely “generic programming.” It’s not just about the flexibility of using a function with multiple different types. It’s also about properties: if we’re parametric over a type, we restrict what we’re able to do with variables of that abstract type. I think this one-two punch (of more generally useful code and more powerful correctness properties) was responsible for the take-over of static types in the functional programming community. It’s seriously impressive that all it takes to show map is correct is just one type check and one property test.

Researchers continue to look for more ways types can help structure programs. Rust’s type system has similarities to linear types, managing lifetimes. With many modern languages doing garbage collection and exceptions, we can easily forget that an important part of design are questions like “how do we do resource management?” and “how do we handle errors?” For many languages, the right answer is to universally answer these questions, so they’re no longer relevant. But when we need to be more nuanced, encoding more of these decisions in types means we can achieve much greater understanding of the design from just considering the types. It also helps automate the work: having to allocate space for a function to return data into is just more boilerplate in C, but is (in effect) handled for us in Rust.

There’s also plenty of active research into dependent types. My favorite reason to be excited about this area of research is the ability to generate implementations (or at least partial implementations) from types. If you’ve never witnessed this kind of interaction, I recommend watching this video by Edwin Brady about Idris 2. This demos a level of IDE support for a language that we don’t really experience today. Partially, this is lack of good support for such things from IDEs (and language tooling), but partially it’s also a new feature enabled by the (not even that complicated) addition of dependent types to the language.

While I often harp on programmers paying too much attention to the power and not the properties of an abstraction, type systems are an are where I suspect the opposite happens to often. People sometimes act like types are just about correctness: they over-focus on properties. Types are machine-readable descriptions of program design. That’s powerful.

End notes