Typesafe Union-type folding in Scala

Escaping Union type contexts safely

Goal and ideal syntax

In a previous blog post I introduced a way to get unboxed arbitrary arity union types in scala. We can write

foo allows an Int or a String to be passed to it, and nothing else.

But escaping the union context, ie getting the Int or the String out again from our generic T , wasn’t safe. We had to do this:

The above is not checked by the compiler, either for exhaustivity (if we’d missed Int it wouldn’t tell us) nor precision (if we included an extra type we didn’t need it wouldn’t tell us).

But it is certainly the easiest way to naively extract our original value again.

The goal of this article is to be able to " match ” a union type in a typesafe manner. We’ll start from an ideal syntax, something equivalent to a normal pattern match, and work outwards from there. I warn you there are some immoral steps along the way — this is more of an experiment in scala than something to be used in production.

Requirements:

compile-time exhaustivity matching

order of types should not matter

Extra credit:

precision matching (making sure that if we supply an unneeded case the compiler tells us)

Syntax

At the moment our syntax is

This is our ideal look — pretty minimal. We’re going to try and build something minimal taking this as inspiration, and which is also safe.

Our general union type is arbitrarily wide, so any construct we write to generically escape a union context must also be arbitrarily wide.

This means each case must be collapsed iteratively into an object we are constructing as we go. You can’t have a pattern match of arbitrary length after all. It’s going to look a bit like the builder pattern.

This also means that we need to call an additional run method on this object to actually execute the instruction — we need a separation between constructing our instruction and executing it. This is similar to how one builds and executes free monads, or the builder pattern.

So the minimal syntax we can hope for looks a little bit like this:

It looks like a match turned into a builder. .match to initialise, .at to add a case, and .run to signal to the compiler that we’re done building it and return our value.

This is now looking a lot like a fold.

VirtualPolyFunction

This thing we’re building with the .at method is like a polymorphic function, a function from many types to one type. Shapeless has them, and uses them to fold over their HList s and Coproduct s. So we’re probably on the right track.

Ours is different and slightly more difficult to deal with (not that shapeless Poly is easy) because we are unboxed — we have to handle a completely free

type parameter T , constrained only by convention based on what type classes are in scope. I’m going to call it VirtualPolyFunction since it will act on a ‘virtual’ Union . This is my own nomenclature, apologies if ‘virtual’ means something else to you.

At each .at step, we add a new type to the VirtualPolyFunction we’re building which we have to keep track of. This means the structure of the class must either:

be parameterised by a Union or HList

or be arbitrarily nested, like a free monad

I chose to parameterise it by a Union , since that’s what we’re dealing with. Free might give an avenue for avoiding some of the immorality inside our eventual class, but that’s something I have not yet investigated.

Further requirements for the structure of VirtualPolyFunction are:

it must at some point reference our union value which we’re folding over, so must be parameterised by some completely free generic type T .

value which we’re folding over, so must be parameterised by some completely free generic type . It must be parameterised by the return type.

It must also record within it the actual function we’re iteratively building up, from our free domain T to our return type.

to our return type. It must have two methods: One to add a new case to the arbitrary-type-arity virtual polymorphic function, and one to execute the function when given a value.

The minimal possible structure so far is thus:

A quick run-down:

U is the Union our VirtualPolyFunction ( VPF ) is understood to act on — it needn’t yet be connected to T in any way. This will come later

is the our ( ) is understood to act on — it needn’t yet be connected to in any way. This will come later T is the type of the value our VPF will act on, the union value we want to escape

is the type of the value our VPF will act on, the union value we want to escape Return is the eventual return type

is the eventual return type withCase adds a new case to the VPF — note the new Union type in withCase s return type — we’ve added our new type to the head.

adds a new case to the VPF — note the new type in s return type — we’ve added our new type to the head. apply is trivial, it just runs the function on a value

is trivial, it just runs the function on a value func is some function from a generic T to Return . This is the thing we’re trying to make safe (ie, our original naive pattern match).

withCase is the only thing we need to actually implement.

withCase

withCase is unfortunately not nice to implement. Our current way of escaping union types

relies on run-time matching. The compiler’s interpretation in this context of the type of union: T is plain old java Object . It takes each of these matches, sees if union: Object satisfies any of the cases and if not throws a MatchError .

Even though the compiler has lost information about T due to erasure, it still knows that not all Object s are Int s, so it encodes instructions to fall through the case match to the subsequent cases.

Unfortunately inside withCase[X] we will be matching against a generic type X , the erased type of which is also Object . Thus when we write

the compiler reads

To the compiler this is an exhaustive match and we can not add more cases to this. At run-time, if T is not X , we get a ClassCastException — erasure has removed the information we needed.

Getting around this is the immoral bit. The implementation of withCase is

In words,

- Take a value of type T

- See if it is of type X via unsafe runtime casting

- If it is (ie no exception), apply our new function that withCase has just supplied

- If it is not (ie ClassCastException thrown) apply our previous function

In this way our final function right at the end recursively tries to type-cast our t to each type in U , catching an exception each time and falling back to the next cast if needed.

Short of using reflection, I think this is the closest analogy to an arbitrary-length pattern match on a generic there is.

UnionFolder

VirtualPolyFunction isn’t quite enough. The first problem is that it takes a value, where in our ideal syntax we chain methods on a value. The second is that, though each step of withCase is in theory perfectly safe given at least one previous step is safe, there’s no guarantee yet that we have at least one safe step.

In other words, we could construct a value with type VirtualPolyFunction[Int :|: UNil, String, Any] , which

obviously could not work. We need String to be in the union type but it isn’t. So we need a wrapper layer to make it safe.

This wrapper layer has the same two method requirements as VPF . Therefore also has the same structural requirements as VPF , and must carry the union value around inside it to let us chain methods together nicely. The minimal structure is therefore:

U , T and Return are still here

, and are still here value: T is the union value we’re escaping

is the union value we’re escaping at just calls withCase

just calls run is where the safety gets injected; before we can execute this UnionFolder , we need evidence that T belongs to U

This last point is key. We can build up any VPF / UnionFolder we like, but we can’t actually run it on a T until we have evidence that the union type upon which the VPF acts actually contains T .

So the following would be unable to be run:

but the following would be fine:

The implicit that run takes puts back into our construct what the compiler took with erasure: knowledge that our pattern match is safe.

foldUnion

The very last thing to do is construct the initial syntax. We have a way of building up and executing this object on a value, but we don’t have a way yet of elegantly attaching it to a value. I chose the name foldUnion instead of match or fold to reduce confusion and name collision.

Because our union types are completely free, and to the compiler they are just type T , we must define this method on RichT :

Note the signature of foldUnion : X and Y are set by the function we give it, but U is free, the compiler chooses what U is based on what union evidences we find in scope. The one requirement we have on this method is

tUnion : that the value we’re calling this on must belong to some union (this condition is trivially satisfied by any type, eg Int and Int :|: UNil ); but if the compiler finds one passed by a parent method, it prefers that thanks to implicit resolution order

Thus tUnion is trivial, if the compiler finds no union evidence it will construct a satisfactory one itself.

Final syntax

That’s it! The final product looks like this:

Looks pretty good to me.

Note:

If we missed a case (ie commented out the Char line) it would not compile

line) it would not compile The types we chain together need not be in the same order as the union type in the method

We can add in unnecessary cases (ie we could add a Boolean case in the above example) — we could make this throw a compilation error but it would take longer to compile.

We would do this by allowing a free union type parameterisation in our UnionFolder.run method, and insisting that whichever one the compiler picks from our T ’s implicit OneOf is the same as VPF’s U parameter — this approach would satisfy the precision matching ‘extra credit’ mentioned above.

As I mentioned above, this was more of a thought experiment to see how far I could push my union type idea. The exception catching in the middle of VPF obviously has overheads you may wish to consider before using this approach.

Quirks

First: These are total functions not partial functions. If we supplied a case match with a guard it would fail instead of gracefully falling to the subsequent case if the guard failed.

Second: It is available on any type at all thanks to RichTea . The following compiles.

It produces "7" . Technically, Int is isomorphic to Int :|: UNil so you could argue it’s doing the right thing by allowing this, if you wanted to. But the following would not compile: