In this blog post we consider the problem of defining Binary instances for GADTs such as

data Val :: * -> * where VI :: Int -> Val Int VD :: Double -> Val Double

If you want to play along the full source code for the examples in this blog post can be found on github.

Failed attempt

The “obvious” way in which you might attempt to serialize and deserialize Val could look something like

instance Binary ( Val a) where a) VI i) = putWord8 0 >> put i put (i)putWord8put i VD d) = putWord8 1 >> put d put (d)putWord8put d = do get <- getWord8 taggetWord8 case tag of tag 0 -> VI <$> get -- Couldn't match type ‘a’ with ‘Int’ get 1 -> VD <$> get -- Couldn't match type ‘a’ with ‘Double’ get _ -> error "invalid tag"

However, this does not work. The definition of put is type correct (but dubious), but the definition of get is not type correct. And actually this makes sense: we are claiming that we can define Binary (Val a) for any a ; but if the tag is 0, then that a can only be Int , and if the tag is 1, then that a can only be Double .

One option is to instead give a Binary (Some Val) instance with Some defined as

data Some :: ( * -> * ) -> * where Exists :: forall f x . f x -> Some f f xf x

That is often independently useful, but is a different goal: in such a case we are discovering type information when we deserialize. That’s not what we’re trying to achieve in this blog post; we want to write a Binary instance that can be used when we know from the context what the type must be.

Working, but inconvenient

The next thing we might try is to introduce Binary instances for the specific instantiations of that a type variable:

instance Binary ( Val Int ) where VI i) = put i put (i)put i = VI <$> get getget instance Binary ( Val Double ) where VD d) = put d put (d)put d = VD <$> get getget

Note that there is no need to worry about any tags in the encoded bytestring; we always know the type. Although this works, it’s not very convenient; for example, we cannot define

encodeVal :: Val a -> ByteString = encode encodeValencode

because we don’t have a polymorphic instance Binary (Val a) . Instead we’d have to define

encodeVal :: Binary ( Val a) => Val a -> ByteString a) = encode encodeValencode

but that’s annoying: we know that that a can only be Int or Double , and we have Binary instances for both of those cases. Can’t we do better?

Introducing RTTI

Although we know that a can only be Int or Double , we cannot take advantage of this information in the code. Haskell types are erased at compile time, and hence we cannot do any kind of pattern matching on them. The key to solving this problem then is to introduce some explicit runtime type information (RTTI).

We start by introducing a data family associating with each indexed datatype a corresponding datatype with RTTI:

data family RTTI ( f :: k -> * ) :: (k -> * ) (k

For the example Val this runtime type information tells us whether we’re dealing with Int or Double :

data instance RTTI Val a where RttiValInt :: RTTI Val Int RttiValDouble :: RTTI Val Double

For serialization we don’t need to make use of this:

putVal :: Val a -> Put VI i) = put i putVal (i)put i VD d) = put d putVal (d)put d

but for deserialization we can now pattern match on the RTTI to figure out what kind of value we’re expecting:

getVal :: RTTI Val a -> Get ( Val a) a) RttiValInt = VI <$> get getValget RttiValDouble = VD <$> get getValget

We’re now almost done: the last thing we need to express is that if we know at the type level that we have some RTTI available, then we can serialize. For this purpose we introduce a type class that returns the RTTI:

class HasRTTI f a where f a rtti :: RTTI f a f a

which we can use as follows:

instance HasRTTI Val a => Binary ( Val a) where a) = putVal putputVal = getVal rtti getgetVal rtti

This states precisely what we described in words above: as long as we have some RTTI available, we can serialize and deserialize any kind of Val value.

The last piece of the puzzle is to define some instances for HasRTTI ; right now, if we try to do encode (VI 1234) ghc will complain

No instance for (HasRTTI Val Int)

Fortunately, these instances are easily defined:

instance HasRTTI Val Int where rtti = RttiValInt rtti instance HasRTTI Val Double where rtti = RttiValDouble rtti

and the good news is that this means that whenever we construct specific Val s we never have to construct the RTTI by hand; ghc’s type class resolution takes care of it for us.

Taking stock

Instead of writing

encodeVal :: Binary ( Val a) => Val a -> ByteString a) = encode encodeValencode

we can now write

encodeVal :: HasRTTI Val a => Val a -> ByteString = encode encodeValencode

While it may seem we haven’t gained very much, HasRTTI is a much more fine-grained constraint than Binary ; from HasRTTI we can derive Binary constraints, like we have done here, but also other constraints that rely on RTTI. So while we do still have to carry these RTTI constraints around, those are – ideally – the only constraints that we still need to carry around. Moreover, as we shall see a little bit further down, RTTI also scales nicely to composite type-level structures such as type-level lists.

Another example: heterogeneous lists

As a second—slightly more involved—example, lets consider heterogeneous lists or n-ary products:

data NP ( f :: k -> * ) ( xs :: [k]) where ) ([k]) Nil :: NP f '[] f '[] (:*) :: f x -> NP f xs -> NP f (x ' : xs) f xf xsf (x 'xs)

An example of such a heterogeneous list is

VI 12 :* VD 34.56 :* Nil :: NP Val '[ Int , Double ] '[

The type here says that this is a list of two Val s, the first Val being indexed by Int and the second Val being indexed by Double . If that makes zero sense to you, you may wish to study Well-Typed’s Applying Type-Level and Generic Programming in Haskell lecture notes.

As was the case for Val , we always statically know how long such a list is, so there should be no need to include any kind of length information in the encoded bytestring. Again, for serialization we don’t need to do anything very special:

putNP :: All Binary f xs => NP f xs -> Put f xsf xs Nil = return () putNP() :* xs) = put x >> putNP xs putNP (xxs)put xputNP xs

The only minor complication here is that we need Binary instances for all the elements of the list; we guarantee this using the All type family (which is a minor generalization of the All type family explained in the same set of lecture notes linked above):

type family All p f xs :: Constraint where p f All p f '[] = () p f '[]() All p f (x ' : xs) = (p (f x), All p f xs) p f (x 'xs)(p (f x),p f xs)

Deserialization however needs to make use of RTTI again. This means we need to define what we mean by RTTI for these heterogenous lists:

data instance RTTI ( NP f) xs where f) xs RttiNpNil :: RTTI ( NP f) '[] f) '[] RttiNpCons :: ( HasRTTI f x, HasRTTI ( NP f) xs) f x,f) xs) => RTTI ( NP f) (x ' : xs) f) (x 'xs) instance HasRTTI ( NP f) '[] where f) '[] = RttiNpNil rtti instance ( HasRTTI f x, HasRTTI ( NP f) xs) f x,f) xs) => HasRTTI ( NP f) (x ' : xs) where f) (x 'xs) = RttiNpCons rtti

In this case the RTTI gives us the shape of the list. We can take advantage of this during deserialization:

getNP :: All Binary f xs => RTTI ( NP f) xs -> Get ( NP f xs) f xsf) xsf xs) RttiNpNil = return Nil getNP RttiNpCons = ( :* ) <$> get <*> getNP rtti getNPgetgetNP rtti

allowing us to give the Binary instance as follows:

instance ( All Binary f xs, HasRTTI ( NP f) xs) f xs,f) xs) => Binary ( NP f xs) where f xs) = putNP putputNP = getNP rtti getgetNP rtti

Serializing lists of Val s

If we use this Binary instance to serialize a list of Val s, we would end up with a type such as

decodeVals :: ( HasRTTI ( NP Val ) xs, All Binary Val xs) ) xs,xs) => ByteString -> NP Val xs xs = decode decodeValsdecode

This All Binary Val xs constraint however is unfortunate, because we know that all Val s can be deserialized! Fortunately, we can do better. The RTTI for the (:*) case ( RttiNpCons ) included RTTI for the elements of the list. We made no use of that above, but we can make use of that when giving a specialized instance for lists of Val s:

putNpVal :: NP Val xs -> Put xs Nil = return () putNpVal() :* xs) = putVal x >> putNpVal xs putNpVal (xxs)putVal xputNpVal xs getNpVal :: RTTI ( NP Val ) xs -> Get ( NP Val xs) ) xsxs) RttiNpNil = return Nil getNpVal RttiNpCons = ( :* ) <$> get <*> getNpVal rtti getNpValgetgetNpVal rtti instance {-# OVERLAPPING #-} HasRTTI ( NP Val ) xs ) xs => Binary ( NP Val xs) where xs) = putNpVal putputNpVal = getNpVal rtti getgetNpVal rtti

This allows us to define

decodeVals :: HasRTTI ( NP Val ) xs => ByteString -> NP Val xs ) xsxs = decode decodeValsdecode

Note that this use of overlapping type classes instances is perfectly safe: the overlapping instance is fully compatible with the overlapped instance, so it doesn’t make a difference which one gets picked. The overlapped instance just allows us to be more economical with our constraints.

Here we can appreciate the choice of RTTI being a data family indexed by f ; indeed the constraint HasRTTI f x in RttiNpCons is generic as possible. Concretely, decodeVals required only a single HasRTTI constraint, as promised above. It is this compositionality, along with the fact that we can derive many type classes from just having RTTI around, that gives this approach its strength.

Advanced example

To show how all this might work in a more advanced example, consider the following EDSL describing simple functions:

data Fn :: ( * , * ) -> * where Exp :: Fn '( Double , Double ) '( Sqrt :: Fn '( Double , Double ) '( Mod :: Int -> Fn '( Int , Int ) '( Round :: Fn '( Double , Int ) '( Comp :: ( HasRTTI Fn '(b,c), HasRTTI Fn '(a,b)) '(b,c),'(a,b)) => Fn '(b,c) -> Fn '(a,b) -> Fn '(a,c) '(b,c)'(a,b)'(a,c)

If you are new to EDSLs (embedded languages) in Haskell, you way wish to watch the Well-Typed talk Haskell for embedded domain-specific languages. However, hopefully the intent behind Fn is not too difficult to see: we have a datatype that describes functions: exponentiation, square root, integer modules, rounding, and function composition. The two type indices of Fn describe the function input and output types. A simple interpreter for Fn would be

eval :: Fn '(a,b) -> a -> b '(a,b) Exp = exp eval Sqrt = sqrt eval Mod m) = ( `mod` m) eval (m)m) Round = round eval `Comp` f) = eval g . eval f eval (gf)eval geval f

In the remainder of this blog post we will consider how we can define a Binary instance for Fn . Compared to the previous examples, Fn poses two new challenges:

The type index does not uniquely determine which constructor is used; if the type is (Double, Double) then it could be Exp , Sqrt or indeed the composition of some functions.

then it could be , or indeed the composition of some functions. Trickier still, Comp actually introduces an existential type: the type “in the middle” b . This means that when we serialize and deserialize we do need to include some type information in the encoded bytestring.

RTTI for Fn

To start with, let’s define the RTTI for Fn :

data instance RTTI Fn ab where ab RttiFnDD :: RTTI Fn '( Double , Double ) '( RttiFnII :: RTTI Fn '( Int , Int ) '( RttiFnDI :: RTTI Fn '( Double , Int ) '( instance HasRTTI Fn '( Double , Double ) where rtti = RttiFnDD '(rtti instance HasRTTI Fn '( Int , Int ) where rtti = RttiFnII '(rtti instance HasRTTI Fn '( Double , Int ) where rtti = RttiFnDI '(rtti

For our DSL of functions, we only have functions from Double to Double , from Int to Int , and from Double to Int (and this is closed under composition).

Serializing type information

The next question is: when we serialize a Comp constructor, how much information do we need to serialize about that existential type? To bring this into focus, let’s consider the type information we have when we are dealing with composition:

data RttiComp :: ( * , * ) -> * where RttiComp :: RTTI Fn '(b,c) -> RTTI Fn '(a,b) -> RttiComp '(a,c) '(b,c)'(a,b)'(a,c)

Whenever we are deserializing a Fn , if that Fn happens to be the composition of two other functions we know RTTI about the composition; but since the “type in the middle” is unknown, we have no information about that at all. So what do we need to store? Let’s start with serialization:

putRttiComp :: RTTI Fn '(a,c) -> RttiComp '(a,c) -> Put '(a,c)'(a,c)

The first argument here is the RTTI about the composition as a whole, and sets the context. We can look at that context to determine what we need to output:

putRttiComp :: RTTI Fn '(a,c) -> RttiComp '(a,c) -> Put '(a,c)'(a,c) RttiComp rbc rab) = go rac rbc rab putRttiComp rac (rbc rab)go rac rbc rab where go :: RTTI Fn '(a,c) -> RTTI Fn '(b,c) -> RTTI Fn '(a,b) -> Put '(a,c)'(b,c)'(a,b) RttiFnDD RttiFnDD RttiFnDD = return () go() RttiFnII RttiFnII RttiFnII = return () go() RttiFnII RttiFnDI rAB = case rAB of {} gorABrAB{} RttiFnDI RttiFnII RttiFnDI = putWord8 0 goputWord8 RttiFnDI RttiFnDI RttiFnDD = putWord8 1 goputWord8

Let’s take a look at what’s going on here. When we know from the context that the composition has type Double -> Double , then we know that the types of both functions in the composition must also be Double -> Double , and hence we don’t need to output any type information. The same goes when the composition has type Int -> Int , although we need to work a bit harder to convince ghc in this case. However, when the composition has type Double -> Int then the first function might be Double -> Int and the second might be Int -> Int , or the first function might be Double -> Double and the second might be Double -> Int . Thus, we need to distinguish between these two cases (in principle a single bit would suffice).

Having gone through this thought process, deserialization is now easy: remember that we know the context (the RTTI for the composition):

getRttiComp :: RTTI Fn '(a,c) -> Get ( RttiComp '(a,c)) '(a,c)'(a,c)) RttiFnDD = return $ RttiComp RttiFnDD RttiFnDD getRttiComp RttiFnII = return $ RttiComp RttiFnII RttiFnII getRttiComp RttiFnDI = do getRttiComp <- getWord8 taggetWord8 case tag of tag 0 -> return $ RttiComp RttiFnII RttiFnDI 1 -> return $ RttiComp RttiFnDI RttiFnDD _ -> fail "invalid tag"

Binary instance for Fn

The hard work is now mostly done. Although it is probably not essential, during serialization we can clarify the code by looking at the RTTI context to know which possibilities we need to consider at each type index. For example, if we are serializing a function of type Double -> Double , there are three possibilities ( Exp , Sqrt , Comp ). We did something similar in the previous section.

putAct :: RTTI Fn a -> Fn a -> Put = go putActgo where go :: RTTI Fn a -> Fn a -> Put @ RttiFnDD fn = go rfn case fn of fn Exp -> putWord8 0 putWord8 Sqrt -> putWord8 1 putWord8 Comp g f -> putWord8 255 >> goComp r (rtti, g) (rtti, f) g fputWord8goComp r (rtti, g) (rtti, f) @ RttiFnII fn = go rfn case fn of fn Mod m -> putWord8 0 >> put m putWord8put m Comp g f -> putWord8 255 >> goComp r (rtti, g) (rtti, f) g fputWord8goComp r (rtti, g) (rtti, f) @ RttiFnDI fn = go rfn case fn of fn Round -> putWord8 0 putWord8 Comp g f -> putWord8 255 >> goComp r (rtti, g) (rtti, f) g fputWord8goComp r (rtti, g) (rtti, f) goComp :: RTTI Fn '(a,c) '(a,c) -> ( RTTI Fn '(b,c), Fn '(b,c)) '(b,c),'(b,c)) -> ( RTTI Fn '(a,b), Fn '(a,b)) '(a,b),'(a,b)) -> Put = do goComp rAC (rBC, g) (rAB, f) RttiComp rBC rAB) putRttiComp rAC (rBC rAB) go rBC g go rAB f

Deserialization proceeds along very similar lines; the only difficulty is that when we deserialize RTTI using getRttiComp we somehow need to reflect that to the type level; for this purpose we can provide a function

reflectRTTI :: RTTI f a -> ( HasRTTI f a => b) -> b f af ab)

It’s definition is beyond the scope of this blog post; refer to the source code on github instead. With this function in hand however deserialization is no longer difficult:

getAct :: RTTI Fn a -> Get ( Fn a) a) = go getActgo where go :: RTTI Fn a -> Get ( Fn a) a) @ RttiFnDD = do go r <- getWord8 taggetWord8 case tag of tag 0 -> return Exp 1 -> return Sqrt 255 -> goComp r goComp r _ -> error "invalid tag" @ RttiFnII = do go r <- getWord8 taggetWord8 case tag of tag 0 -> Mod <$> get get 255 -> goComp r goComp r _ -> error "invalid tag" @ RttiFnDI = do go r <- getWord8 taggetWord8 case tag of tag 0 -> return Round 255 -> goComp r goComp r _ -> error "invalid tag" goComp :: RTTI Fn '(a,c) -> Get ( Fn '(a,c)) '(a,c)'(a,c)) = do goComp rAC RttiComp rBC rAB <- getRttiComp rAC rBC rABgetRttiComp rAC $ reflectRTTI rAB $ reflectRTTI rBCreflectRTTI rAB Comp <$> go rBC <*> go rAB go rBCgo rAB

We can define the corresponding Binary instance for Fn simply using

instance HasRTTI Fn a => Binary ( Fn a) where a) = putAct rtti putputAct rtti = getAct rtti getgetAct rtti

If desired, a specialized instance for HList Fn can be defined that relies only on RTTI, just like we did for Val (left as exercise for the reader).

Conclusion

Giving type class instances for GADTs, in particular for type classes that produce values of these GADTs (deserialization, translation from Java values, etc.) can be tricky. If not kept in check, this can result in a code base with a lot of unnecessarily complicated function signatures or frequent use of explicit computation of evidence of type class instances. By using run-time type information we can avoid this, keeping the code clean and allowing programmers to focus at the problems at hand rather than worry about type classes instances.

PS: Singletons

RTTI looks a lot like singletons, and indeed things can be set up in such a way that singletons would do the job. The key here is to define a new kind for the type indices; for example, instead of

data Val :: * -> * where VI :: Int -> Val Int VD :: Double -> Val Double

we’d write something like

data U = Int | Double data instance Sing ( u :: U ) where SI :: Sing 'Int SD :: Sing 'Double data Val :: U -> * where VI :: Int -> Val 'Int VD :: Double -> Val 'Double instance SingI u => Binary ( Val u) where u) VI i) = put i put (i)put i VD d) = put d put (d)put d = case sing :: Sing u of get SI -> VI <$> get get SD -> VD <$> get get