Note: This article assumes some introductory Haskell knowledge.

Introduction

Just as algebra is fundamental to the whole of mathematics, algebraic data types (ADTs) are fundamental to many common functional programming languages. They’re the primitives upon which all of our richer data structures are built, including everything from sets, maps, and queues, to bloom filters and neural networks.

Algebraic data types and mathematical algebra have some similar looking operations. In fact, it’s common in type theory to use the algebraic notation to define algebraic data types, rather than the Haskell-style data type declarations:

Haskell Math Notes data Void 0 0 0 It’s impossible to construct a term of this type (needs LANGUAGE EmptyDataDecls ) data Unit = Unit 1 1 1 The type with just one term data Bool = True | False 1 + 1 1 + 1 1 + 1 Also known as 2 ( 3 , 4 , etc are exactly as you’d imagine) data Maybe a = Just a | Nothing a + 1 a + 1 a + 1 Read “ + + + ” as “Either” data Either a b = Left a | Right b a + b a + b a + b data (a, b) = (a, b) a × b a \times b a × b Read “ × \times × ” as “And” a -> b b a b ^ a b ​ a ​ ​ More on this in a moment

In this essay, we’ll explore this coincidence and what that means for us as programmers. We’ll follow the path all the way to calculus.

Counting inhabitants

As a first foray into the equivalence of algebraic data types and the algebra most of us are more familiar with, we’ll start with simple arithmetic. Specifically, we’ll start by counting the number of inhabitants (values) of a given type. Looking back at the examples:

There are 0 ways to construct Void . Unit , the type constructor, has 1 inhabitant ( Unit , the data constructor). Bool has 2. We can count the data constructors by hand, but we can also just simplify the algebraic expression to 2. Either a b has as many as a and b , combined. Again, the number of inhabitants looks exactly the same as the algebraic form, a + b a + b a + b . (a, b) has an inhabitant for each combination of a s and b s, a × b a \times b a × b . a -> b has b a b ^ a b ​ a ​ ​ . Why? Think about data Tri = Zelda | Link | Ganon . Why are there eight inhabitants of Tri -> Bool , but nine of Bool -> Tri ? It helps to write out each possible function.

Algebraic manipulations

Counting inhabitants is interesting and useful, but the connection runs much deeper! We can use it to gain insight into the nature of our datatypes. Let’s look at an example.

Consider data Choice a = LeftChoice a | RightChoice a , i.e. a + a a + a a+a. We can manipulate this using the familiar rules of algebra to the equivalent 2 × a 2 \times a 2×a, or type Choice' a = (Bool, a) .

The manipulation was done using a rule we learned in grade school, before we’d ever heard of algebraic data types. What’s more, the rules of algebra were established centuries before algebraic data types were even invented. Yet the result makes sense. Our Choice data type tags each a as either a “left a ” or a “right a .” Clearly it’s equivalent to Choice ’s use of a boolean tag. Isn’t it remarkable that algebraic manipulations agree with our intuition?

Exercise for the reader: Explain why Bool -> a is not equivalent to Either a a , but is equivalent to (a, a) both algebraically and intuitively.

Discovering I could do this felt very much like learning algebra for the first time – elegant, powerful, exciting. It raises the questions of what else one can do? How might we use this new tool? What new powers does it give us? Where does it fail?

Taylor series

We’re going to play fast and loose with notation for a moment. Bear with me.

Let’s start with the familiar List data type.

data List a = Nil | Cons a ( List a ) -- or, with Haskell's special syntax data [ a ] = [] | a : [ a ]

We can repeatedly expand the definition by expressing it with algebraic notation.

L = 1 + a L = 1 + a ( 1 + a L ) = 1 + a + a 2 ( 1 + a L ) = 1 + a + a 2 + a 3 ( 1 + a L ) = 1 + a + a 2 + a 3 + … \begin{align} L & = 1 + a L \\ & = 1 + a \left(1 + a L\right) \\ & = 1 + a + a^2 \left(1 + a L\right) \\ & = 1 + a + a^2 + a^3 \left(1 + a L\right) \\ & = 1 + a + a^2 + a^3 + \ldots \end{align} ​ L ​ ​ ​ ​ ​ ​ ​ = 1 + a L ​ = 1 + a ( 1 + a L ) ​ = 1 + a + a ​ 2 ​ ​ ( 1 + a L ) ​ = 1 + a + a ​ 2 ​ ​ + a ​ 3 ​ ​ ( 1 + a L ) ​ = 1 + a + a ​ 2 ​ ​ + a ​ 3 ​ ​ + … ​ ​

Remarkably, this makes sense. The last line tells us that a list is either [] , [a] , [a, a] , or [a, a, a] , and so on.

Neat trick, but let’s push our luck. The Taylor series for 1 / ( 1 − a ) 1 / \left(1 - a\right) 1/(1−a) is 1 + a + a 2 + a 3 + … 1 + a + a^2 + a^3 + \ldots 1+a+a​2​​+a​3​​+…, which we can use to bypass the manual expansion.

L = 1 + a L L ( 1 − a ) = 1 L = 1 / ( 1 − a ) L = 1 + a + a 2 + a 3 + … \begin{align} L & = 1 + a L \\ L \left(1 - a\right) & = 1 \\ L & = 1 / \left(1 - a\right) \\ L & = 1 + a + a^2 + a^3 + \ldots \end{align} ​ L ​ L ( 1 − a ) ​ L ​ L ​ ​ ​ = 1 + a L ​ = 1 ​ = 1 / ( 1 − a ) ​ = 1 + a + a ​ 2 ​ ​ + a ​ 3 ​ ​ + … ​ ​

Let’s pause for a moment to remember that we’re dealing with types. And the expression 1 / (1 - a) contains both a negative and a fractional type, neither of which have a meaning yet. Let’s consider what they could possibly mean. Algebra tells us that (1 - a) + a = 1 . So, this data types combines either a 1 - a or an a , to get just 1 . What? The fractional type is just as unintelligible – given a 1 / a and an a , we have… 1 . In either case it seems we get out less than we put in! Some research has been done on deciphering some meaning from this mess, but I can’t use negative and fractional types without adopting the additional language semantics they propose. One final note – though we took some morally objectionable steps, using types we don’t have any justification for, we came out fine on the other end.

For fun, let’s try our luck with binary trees.

data BinaryTree a = Leaf a | Branch ( BinaryTree a ) ( BinaryTree a )

Expansion by hand

T = a + T 2 = a + ( a + T 2 ) 2 = a + a 2 + 2 a T 2 + T 4 = . . . = a + a 2 + 2 a 3 + 5 a 4 + 1 4 a 5 + … \begin{align} T & = a + T^2 \\ & = a + (a + T^2)^2 \\ & = a + a^2 + 2aT^2 + T^4 \\ & = ... \\ & = a + a^2 + 2 a^3 + 5 a^4 + 14 a^5 + \ldots \end{align} ​ T ​ ​ ​ ​ ​ ​ ​ = a + T ​ 2 ​ ​ ​ = a + ( a + T ​ 2 ​ ​ ) ​ 2 ​ ​ ​ = a + a ​ 2 ​ ​ + 2 a T ​ 2 ​ ​ + T ​ 4 ​ ​ ​ = . . . ​ = a + a ​ 2 ​ ​ + 2 a ​ 3 ​ ​ + 5 a ​ 4 ​ ​ + 1 4 a ​ 5 ​ ​ + … ​ ​

Taylor series

T = a + T 2 T 2 − T + a = 0 T = 1 − 1 − 4 a 2 T = a + a 2 + 2 a 3 + 5 a 4 + 1 4 a 5 + … \begin{align} T & = a + T^2 \\ T^2 - T + a & = 0 \\ T & = \frac{1 - \sqrt{1 - 4a}}{2} \\ T & = a + a^2 + 2 a^3 + 5 a^4 + 14 a^5 + \ldots \end{align} ​ T ​ T ​ 2 ​ ​ − T + a ​ T ​ T ​ ​ ​ = a + T ​ 2 ​ ​ ​ = 0 ​ = ​ 2 ​ ​ 1 − √ ​ 1 − 4 a ​ ​ ​ ​ ​ ​ = a + a ​ 2 ​ ​ + 2 a ​ 3 ​ ​ + 5 a ​ 4 ​ ​ + 1 4 a ​ 5 ​ ​ + … ​ ​

(By the way, those coefficients are the Catalan numbers)

This describes a binary tree as being either a single leaf, the tree with two leaves, one of the two trees with three leaves, or one of the five trees with four leaves, and so on. Finally a reward for our diligence! We can answer the question of how many ways there are to produce a binary tree with five leaves (14), or 11 (16796)!

As you might expect, an alternate definition of binary trees which can be empty produces almost exactly the same result. Try it for yourself!

data BinaryTree a = Leaf | Branch a ( BinaryTree a ) ( BinaryTree a )

Trouble

Examining the Nat data structure shows that there are issues we haven’t yet considered.

data Nat = Zero | Succ Nat

N a t = 1 + N a t \text{Nat} = 1 + \text{Nat} N a t = 1 + N a t

This equation is clearly inconsistent, since x = 1 + x x = 1 + x x=1+x is false for all possible values of x x x. Why does this go wrong? Is it valid to just expand to Nat = 1 + 1 + 1 + 1 + ... (which is right), even though we can’t isolate Nat algebraically (and thus solve for it with Taylor series)?

Poking holes in things

It’s almost time for calculus on data types, and I want to get there just as desperately as you, dear reader, but we’ll require some motivation first. And so, it’s time for a brief interlude to think about poking holes in things. Yes, holes.

What do I mean by a hole? We want to find the places in our data structure where data could go, and remove it.

How many holes of type a can we find in a tuple?

a 2 a^2 a ​ 2 ​ ​ has 2 (a, a) -> (_, a), (a, _) a 3 a^3 a ​ 3 ​ ​ has 3 (a, a, a) -> (_, a, a), (a, _, a), (a, a, _) a × b a \times b a × b has 1 (a, b) -> (_, b) (remember we’re looking for holes of type a ) a + a a+a a + a has 2 Either a a -> Left _, Right _ a + b a+b a + b has 1 Either a b -> Left _ b b b has 0 ( a + b ) × b (a + b) \times b ( a + b ) × b has 1 (Either a b, b) -> (Left _, b)

We can make a few observations:

Data types with no a s (constants) have no holes of type a .

s (constants) have no holes of type . The number of holes in a sum is the sum of the number of holes in each side.

The number of holes in a product is the number of positions in the product times the number of ways to make a hole in each of those positions.

And with that we’ve finally made it. Ready for the big reveal? The rules we just saw are mysteriously similar to the familiar rules of differentiation. Differentiation tells us how to poke holes in data types. Read ∂ ∂ a \frac{\partial}{\partial a} ​∂a​​∂​​ as “the of holes of type a ”:

There are no holes in a constant ∂ a c = 0 \partial_a c = 0 ∂ ​ a ​ ​ c = 0 Sums are straightforward ∂ ∂ a f ( a ) + g ( a ) = ∂ ∂ a f ( a ) + ∂ ∂ a g ( a ) \frac{\partial}{\partial a} f(a) + g(a) = \frac{\partial}{\partial a} f(a) + \frac{\partial}{\partial a} g(a) ​ ∂ a ​ ​ ∂ ​ ​ f ( a ) + g ( a ) = ​ ∂ a ​ ​ ∂ ​ ​ f ( a ) + ​ ∂ a ​ ​ ∂ ​ ​ g ( a ) As are products ∂ ∂ a f ( a ) × g ( a ) = ∂ ∂ a f ( a ) × g ( a ) + f ( a ) × ∂ ∂ a g ( a ) \frac{\partial}{\partial a} f(a) \times g(a) = \frac{\partial}{\partial a} f(a) \times g(a) + f(a) \times \frac{\partial}{\partial a} g(a) ​ ∂ a ​ ​ ∂ ​ ​ f ( a ) × g ( a ) = ​ ∂ a ​ ​ ∂ ​ ​ f ( a ) × g ( a ) + f ( a ) × ​ ∂ a ​ ​ ∂ ​ ​ g ( a ) The chain rule still applies for composition ∂ ∂ a ( f ∘ g ) = ( ∂ ∂ a f ∘ g ) × ∂ ∂ a g \frac{\partial}{\partial a} \left(f \circ g\right) = \left(\frac{\partial}{\partial a} f \circ g\right) \times \frac{\partial}{\partial a} g ​ ∂ a ​ ​ ∂ ​ ​ ( f ∘ g ) = ( ​ ∂ a ​ ​ ∂ ​ ​ f ∘ g ) × ​ ∂ a ​ ​ ∂ ​ ​ g

Let’s take a moment to marvel at the fact that differentiation, a tool developed by Newton and Leibniz over 300 years ago for physics, has made an unexpected appearance in the most discrete of settings. The correspondence was noticed in 2001 by Conor McBride.

The zipper

The zipper is a way to “focus” on one element of a data structure, in order to edit it efficiently. The name is meant to evoke “going up and down in the structure … analogous to closing and opening a zipper in a piece of clothing.”

I’ll demonstrate with a list.

: -> : -> : -> : -> [] | | | | 1 2 3 4

Let’s focus on the 3 .

[] <- : <- : <- FOCUS -> : -> [] | | | | 1 2 3 4

We can move left:

[] <- : <- FOCUS -> : -> : -> [] | | | | 1 2 3 4

Or right:

[] <- : <- : <- : <- FOCUS -> [] | | | | 1 2 3 4

With each operation taking constant time.

What did we just accomplish? Normally editing an arbitrary list element is O(n), but each zipper operation is O(1).

Relating this back to holes, we see that our zipper was constructed by poking a hole in the list.

Reminder what lists look like:

L = 1 + a L L = 1 + aL L = 1 + a L

What results from poking out the a s?

∂ L ∂ a = ∂ ∂ a ( 1 + a L ) = L + a ∂ L ∂ a \begin{align} \frac{\partial L}{\partial a} & = \frac{\partial}{\partial a} \left(1 + a L\right) \\ & = L + a \frac{\partial L}{\partial a} \end{align} ​ ​ ∂ a ​ ​ ∂ L ​ ​ ​ ​ ​ ​ = ​ ∂ a ​ ​ ∂ ​ ​ ( 1 + a L ) ​ = L + a ​ ∂ a ​ ​ ∂ L ​ ​ ​ ​

∂ L ∂ a = L 1 − a = L 2 \begin{align} \frac{\partial L}{\partial a} & = \frac{L}{1 - a} \\ & = L^2 \end{align} ​ ​ ∂ a ​ ​ ∂ L ​ ​ ​ ​ ​ ​ = ​ 1 − a ​ ​ L ​ ​ ​ = L ​ 2 ​ ​ ​ ​

Looking back at the diagrams above, we can see that L^2 is exactly the type of our focused structure, with one list pointing left and one pointing right.

What just happened? Differentiating a data structure gave us the type of its context, which can be used as a zipper. In fact, differentiation can be used to build zippers for arbitrary ADTs – trees, maps, etc.

Disappointing news

Can we extend the analogy to integration? Why does differentiation make a surprise appearance here? Unfortunately, I can’t answer either question.

I frankly have no clue how to interpret integration. As for why the rules for differentiation and hole-poking are the same, the best I can do is to point out that our goal is nearly the same in either case. In differentiation, the goal is to see how a function varies as one of its variables varies. In hole-poking, the goal is to see how a data structure changes as we examine one type variable. In either case, we need to see how our data changes as we vary the variable, in each place it occurs.

Summary

Algebraic data types are aptly named, for they bear a remarkably strong resemblance to the algebra we’re familiar with. This allows us to count the inhabitants of a type and manipulate data types between equivalent forms. Further, we can extend the analogy as far as calculus, where differentiation helps us build zippers for any data type.

Epilogue

I called our data type manipulations a “coincidence,” but the connection actually rests on solid mathematical ground. Our tricks can be justified by the fact that ADTs form a semiring.

We explored ADTs thoroughly, but only had time to scratch the surface of zippers. The rabbit hole is very deep, but I can send you straight to the bottom. Rather than taking the data structure view, Oleg Kiselyov advocates for zippers as the delimited continuation of a traversal, and has gone so far as creating a zipper filesystem, ZipperFS, and zippers with several holes. This is just one example of the large body of work on zippers.