Arrow = Category + Applicative? (Part IIa)

I’m labeling this part (II a), because I haven’t got everything I wanted to finish for this piece, but there are enough interesting ideas that it’s worth posting something, and enough time has passed that I’m getting people asking about the next part!

Summary So Far

In the previous installment of this series, we established a one-way relationship between Applicative and Arrow, while assuming a common Category instance. In particular, we assumed a data type

data a :~> b = ...

and instances

instance Category (:~>) where id = ... (.) = ...

instance Arrow (:~>) where arr = ... first = ...

and satisfying the axioms

[C1] f = f . id = id . f [C2] f . (g . h) = (f . g) . h

[A1] arr id = id [A2] arr (f . g) = arr f . arr g [A3] first (f . g) = first f . first g [A4] first (arr f) = arr (f `cross` id) [A5] first f . arr (id `cross` g) = arr (id `cross g) . first f [A6] arr fst . first f = f . arr fst [A7] arr assoc . first (first f) = first f . arr assoc

And we showed that we can always write the following Applicative instance, universally quantified over any domain for the Category:

instance Applicative ((:~>) a) where pure x = arr (const x) f <*> x = arr (uncurry (flip ($))) . first x . arr swap . first f . arr dup

And, furthermore, that it will automatically satisfy all four of the axioms for Applicative functors.

[F1] v = pure id <*> v [F2] u <*> (v <*> w) = pure (.) <*> u <*> v <*> w [F3] pure f <*> pure x = pure (f x) [F4] u <*> pure y = pure ($ y) <*> u

This demonstrates that Arrow is at least as specific as the combination of Category and a universally quantified Applicative instance: if you have the first, then you automatically have the second as well. We now want to explore the opposite direction…

The Inverse Correspondence

We will need a specific Arrow instance now to work with, defined in terms of an arbitrary Applicative instance. Here’s the one we’ll be using:

instance Arrow (:~>) where arr f = pure f <*> id first f = pure (,) <*> f . arr fst <*> arr snd

I’ll again try to convince you that these definitions make logical sense. Keep in mind the following,

pure f <*> x = f <$> x = fmap f x

So the definition of arr is very general: it doesn’t even really need Applicative — just Functor. That ought to be comforting, since in a sense all that Functor says is that we can naturally translate from pure Haskell functions to a different context. It’s not surprising that there would be a strong connection between the lifting of functions given by fmap, and that given by Category and Applicative. Indeed, arr is that connection. The Category “id” value on the right of the fmap serves to tie the input from the Category with the input to the function f, giving precisely what we want: a lifting of f into the Category.

Another way to understand arr’s definition is to recall that in the ((->) t) functor, whose objects are just Haskell functions from a fixed domain type, fmap is precisely function composition. The functor we have here looks very close to that one, and indeed, fmap has a similar meaning. The difference is that in our applicative functor, fmap composes pure Haskell functions not with other functions, but with morphisms in our Category, giving a morphism as the result. Lifting a function is done by composing it with the identity morphism.

The definition of first is basically self-evident if you’re familiar with Applicative. If you’re confused, note that (.) binds more tightly than <*>… we’ll be using that a lot. So this just unpacks a tuple, applies the given Category element on the left, and packs it back up using the tuple constructor.

Failure

Our stated goal was that by defining the correspondence above, things will just work out and we’ll have established that all of the axioms for Arrow hold. This should be easy to check! The Applicative laws give us a really nice tool to reason about equivalence of two expressions. You may note that I’ve stated them slightly differently here, versus the previous post, by swapping the sides of a few of them. You should read the four axioms [F1] through [F4] as having a natural left to right direction to them, and by applying each of them as appropriate, we can take any Applicative expression and obtain a normal form:

pure f <*> x_1 <*> x_2 <*> ... <*> x_n

Here, none of x_1 through x_n contain any applicative building blocks (pure or <*>, or any of the constants built from them like fmap, etc.) Category also has such a normal form, where any expression from a category can be written as either a single identity, or a sequence of composed morphisms, with no other identities or composition occurring in the pieces:

f_1 . f_2 . ... . f_n

What about mixtures of the two? We have no laws at all that allow us manipulate expressions across the two types, so we’re stuck with the pieces as they are, simplifying within each piece. So to verify the axioms, we simply substitute the definitions of arr and first from the previous section, reduce the Applicative and Category pieces to their respective normal forms, and compare. Easy, right?

It’s left as an exercise for the reader to try this, but the result is that only the trivial axiom [A1] can be verified in this way. The remaining Arrow laws do not follow from anything we’ve seen so far. It’s not just that we aren’t clever enough to do it: because both Applicative and Category have normal forms, and there’s no mixing of the two possible, it’s simply not possible that any more of the Arrow laws might be shown to hold all the time from those laws alone! (This in spite of the fact that numerous source, such as here, explicitly claim otherwise!)

Dodging a Dead End Sign: New Laws!

In a strict sense sense, this is the end of the line: our original goal has conclusively failed. That said, it’s not so surprising that it failed, and we can still learn a lot by redefining our goal. We now seek to decide precisely what the difference is.

That is, can we state a (hopefully minimal) set of additional laws we can apply to the Applicative/Category combination, such that we can get all of the Arrow axioms? We’re looking for things that obviously ought to hold for all types that are simultaneously instances of Category and Applicative. Of course, we’ll also have to go back and show that they follow from the Arrow axioms as well, or we risk losing the bidirectional correspondence. It’s worth noting that Ross Patterson speculated about such axioms, and that while my axioms look a bit different, if you dig a bit they turn out to very very similar)

Here are the set of axioms I’ll propose:

[G1] u = pure const <*> u <*> id [G2] u <*> id <*> id = pure join <*> u <*> id [G3] arr f . u = pure f <*> u [G4] (u <*> v) . arr f = u . arr f <*> v . arr f

A couple quick notes. First of all, you might notice that the Arrow term, arr, appears in a couple of them. That’s okay, though: we don’t mean arr in the Arrow sense. We mean it in the sense of the definition earlier, as an fmap applied to the identity from the Category. It turns out that arr plays a unique role in the structure we get from combining Category and Applicative. It’s a sort of intermediate level of purity between “pure” elements and arbitrary elements. The elements in the image of arr are pure in the sense of acting like plain functions in the Category, but they depend on the domain type from the Category, making them not quite “pure” in the Applicative sense. It’s useful to make statements about elements with that half-purity property.

Second, you might notice a use of “join” in [G2], which is of course a Monad term! Never fear, though, we mean join specialized to the ((->) r) monad, which is just the function:

join f x = f x x

So no worries, we haven’t let monads creep in quite yet! The connection here is very strong though, and is strengthened by noting that “const” is similarly just “return” specialized to the ((->) r) monad, and together, return and join completely determine the monad structure on a type! So in a sense, laws [G1] and [G2] are relating the structure we’ve got to monad terms… but the monad is the standard function monad instead of one defined on this type.

Having gained a little insight into [G1] and [G2], we now look at the laws [G3] and [G4]. These laws address the question of what happens when we compose an arr value, either on the left or right. Essentially, arr composed on the left can be rewritten in terms of application, while on the right it distributes.

Let’s Prove Some Arrow Laws

One of the Arrow laws is so obvious, there’s no point in putting off its proof any longer; it turns out to be exactly equivalent to the first Applicative axiom, and is the one Arrow law that didn’t require our new assumptions:

arr id = pure id <*> id = id

Now, before we set out to methodically prove that Arrow axioms hold, it pays to look around and take stock of the situation with the new laws one at a time.

We can start out by generalizing [G1] and [G2] a bit. As stated, these laws require identities on the right-hand side, but it turns out they actually only need semi-pure (that is, arr) values. To see this, we’ll first prove a little lemma, which actually only requires the Applicative laws:

u <*> arr f = u <*> (pure f <*> id) = pure (.) <*> u <*> pure f <*> id = pure ($ f) <*> (pure (.) <*> u) <*> id = pure (.) <*> pure ($ f) <*> pure (.) <*> u <*> id = pure (($ f) . (.)) <*> u <*> id = pure (. f) <*> u <*> id

With that in our toolkit, we generalize [G1] and [G2] in the obvious way, just by moving arr values from the right to the left, and then moving the parentheses left. It’s a tad cumbersome, but pretty obvious when you see what’s going on.

pure const <*> u <*> arr f = pure (. f) <*> (pure const <*> u) <*> id = pure (.) <*> pure (. f) <*> pure const <*> u <*> id = pure ((. f) . const) <*> u <*> id = pure const <*> u <*> id = u

u <*> arr f <*> arr f = pure (. f) <*> (u <*> arr f) <*> id = pure (. f) <*> (pure (. f) <*> u <*> id) <*> id = pure (.) <*> pure (. f) <*> (pure (. f) <*> u) <*> id <*> id = pure ((.) (. f) <*> (pure (. f) <*> u) <*> id <*> id = pure (.) <*> pure ((.) (. f)) <*> pure (. f) <*> u <*> id <*> id = pure ((.) (. f) . (. f)) <*> u <*> id <*> id = pure join <*> (pure ((.) (. f) . (. f)) <*> u) <*> id = pure (.) <*> pure join <*> pure ((.) (. f) . (. f)) <*> u <*> id = pure (join . (.) (. f) . (. f)) <*> u <*> id = pure ((. f) . join) <*> u <*> id = pure (.) <*> pure (. f) <*> pure join <*> u <*> id = pure (. f) <*> (pure join <*> u) <*> id) = pure join <*> u <*> arr f

These identities can be very useful: they avoid the need to explicitly “stash away” semi-pure values on the right when applying the first two laws. Note that the values on the right obviously need to be semi-pure, since otherwise we could be removing or duplicating effects (where “effects” here has the appropriate meaning for the particular instance you’re looking at).

Another convenient thing to note is that anywhere we use “arr” to indicate a semi-pure value, of course a pure value works too! The following mini-identity makes this explicit:

pure x = pure const <*> pure x <*> id = pure (const x) <*> id = arr (const x)

We can now turn to the [G3] law, and see what insights it has to offer. The first is actually one of the Arrow laws, but it’s also quite useful in reasoning about other things as well:

arr (f . g) = pure (f . g) <*> id = pure (.) <*> pure f <*> pure g <*> id = pure f <*> (pure g <*> id) = pure f <*> arr g = arr f . arr g

As an immediate consequence, we can simplify compositions involving pure and semi-pure values.

pure x . arr f = arr (const x) . arr f = arr (const x . f) = arr (const x) = pure x

A more broadly applicable set of identities lets us work with applications between pure values and compositions (of any values at all):

pure f <*> u . v = arr f . (u . v) = (arr f . u) . v = (pure f <*> u) . v

u . v <*> pure f = pure ($ f) <*> u . v = (pure ($ f) <*> u) . v = (u <*> pure f) . v

So applying a pure value to a composition, or a composition to a pure value, is the same as doing the application only to the first of the two values that are composed.

We also have the tools now to easily establish another of the Arrow laws. Since we’ve done it so many times by now, I’ll start applying the composition law from Applicative and combining the resulting pure expressions on the left in one step.

arr fst . first f = pure fst <*> first f = pure fst <*> (pure (,) <*> f . arr fst <*> arr snd) = pure ((.) fst) <*> (pure (,) <*> f . arr fst) <*> arr snd = pure (((.) fst) . (,)) <*> f . arr fst <*> arr snd = pure const <*> f . arr fst <*> arr snd = f . arr fst

For the next step, it helps to consider what happens when you apply (with <*>) one semi-pure value to another. The answer is reassuringly logical: Applicative’s <*> combinator can be specialized to functions, and the result goes by several names… Monad’s “ap”, for example, and combinatory logic’s S combinator. Let’s use “ap” to describe the function we’re looking for:

(f `ap` g) x = f x (g x)

Then we have

arr f <*> arr g = arr f <*> (pure g <*> id) = pure (.) <*> arr f <*> pure g <*> id = arr (.) . arr f <*> pure g <*> id = pure ($ g) <*> arr (.) . arr f <*> id = arr ($ g) . arr (.) . arr f <*> id = arr (($ g) . (.) . f) <*> id = pure (($ g) . (.) . f) <*> id <*> id = pure join <*> pure (($ g) . (.) . f) <*> id = pure (f `ap` g) <*> id = arr (f `ap` g)

That turns out to be the hard work in establish our fourth Arrow law.

first (arr f) = pure (,) <*> arr f . arr fst <*> arr snd = arr (,) . arr f . arr fst <*> arr snd = arr ((,) . f . fst) <*> arr snd = arr (((,) . f . fst) `ap` snd) = arr (f `cross` id)

Not bad! Four Arrow laws down, three to go, and we haven’t even used our shiny new [G4] law yet. We can fix that, though, and prove a fifth Arrow law in the process:

first f . arr (id `cross` g) = (pure (,) <*> f . arr fst <*> arr snd) . arr (id `cross` g) = pure (,) . arr (id `cross` g) <*> f . arr fst . arr (id `cross` g) <*> arr snd . arr (id `cross` g) = pure (,) <*> f . arr (fst . (id `cross` g)) <*> arr (snd . (id `cross` g)) = pure (,) <*> f . arr fst <*> arr (g . snd) = pure (,) <*> f . arr fst <*> arr g . arr snd = pure (,) <*> f . arr fst <*> (pure g <*> arr snd) = pure (.) <*> (pure (,) <*> f . arr fst) <*> pure g <*> arr snd = pure ((.) . (,)) <*> f . arr fst <*> pure g <*> arr snd = pure ($ g) <*> (pure ((.) . (,)) <*> f . arr fst) <*> arr snd = pure (($ g) . (.) . (,)) <*> f . arr fst <*> arr snd = pure ((.) (id `cross` g) . (,)) <*> f . arr fst <*> arr snd = pure ((.) (id `cross` g)) <*> (pure (,) <*> f . arr fst) <*> arr snd = pure (id `cross` g) <*> (pure (,) <*> f . arr fst <*> arr snd) = pure (id `cross` g) <*> first f = arr (id `cross` g) . first f

Of course, this series just wouldn’t be itself if we didn’t have a long ugly proof in there somewhere, so here it is: the proof of [A7].

arr assoc . first (first f) = pure assoc <*> first (first f) = pure assoc <*> (pure (,) <*> first f . arr fst <*> arr snd) = pure ((.) assoc) <*> (pure (,) <*> first f . arr fst) <*> arr snd = pure ((.) assoc . (,)) <*> first f . arr fst <*> arr snd = pure (. snd) <*> (pure ((.) assoc . (,)) <*> first f . arr fst) <*> id = pure ((. snd) . (.) assoc . (,)) <*> first f . arr fst <*> id = pure ((. snd) . (.) assoc . (,)) <*> (pure (,) <*> f . arr fst <*> arr snd) . arr fst <*> id = pure ((. snd) . (.) assoc . (,)) <*> (pure (,) . arr fst <*> f . arr fst . arr fst <*> arr snd . arr fst) <*> id = pure ((. snd) . (.) assoc . (,)) <*> (pure (,) . arr fst <*> arr fst . arr fst . first (first f) <*> arr snd . arr fst) <*> id = pure ((. snd) . (.) assoc . (,)) <*> (pure (,) <*> arr (fst . fst) . first (first f) <*> arr (snd . fst)) <*> id = pure ((.) ((. snd) . (.) assoc . (,)) . (,)) <*> arr (fst . fst) . first (first f) <*> arr (snd . fst) <*> id = pure (. (snd . fst)) <*> (pure ((.) ((. snd) . (.) assoc . (,)) . (,)) <*> arr (fst . fst) . first (first f)) <*> id <*> id = pure (.) <*> pure (. (snd . fst)) <*> pure ((.) ((. snd) . (.) assoc . (,)) . (,)) <*> arr (fst . fst) . first (first f) <*> id <*> id = pure ((. (snd . fst)) . (.) ((. snd) . (.) assoc . (,)) . (,)) <*> arr (fst . fst) . first (first f) <*> id <*> id = pure (\x y z -> (x, (snd (fst y), snd z))) <*> arr (fst . fst) . first (first f) <*> id <*> id = pure join <*> (pure (\x y z -> (x, (snd (fst y), snd z))) <*> arr (fst . fst) . first (first f)) <*> id = pure (\((a,b),c) ((d,e),f) -> (a, (e, f))) <*> first (first f) <*> id = pure (. (snd `cross` id)) <*> (pure (,) <*> (pure (fst . fst) <*> first (first f))) <*> id = pure (. (snd `cross` id)) <*> (pure (,) <*> arr (fst . fst) . first (first f)) <*> id = pure (. (snd `cross` id)) <*> (pure (,) <*> f . arr (fst . fst)) <*> id = pure (,) <*> f . arr (fst . fst) <*> arr (snd `cross` id) = pure (,) . arr assoc <*> f . arr fst . arr assoc <*> arr snd . arr assoc = (pure (,) <*> f . arr fst <*> arr snd) . arr assoc = first f . arr assoc

Not pretty, by any stretch of the imagination, but it’s done.

To be continued…

Quite a bit remains here: we still have one Arrow law remaining, we need to show that our four new Category+Applicative laws follow from the Arrow axioms in the other direction, and we need to show that the maps we’ve defined between Applicative and Arrow are inverse to each other. With luck, this and more will come in the next exciting installment.