yo dawg, I heard you like strings in your types so I put a type in your string so you could type check your strings while you stringify your types - Proxy “XZibit”

The saying goes that one should encode as many invariants in the type system as possible. This way bad programs don’t type check, thereby affording you hours (previously spent writing unit tests) to meander in the warm embrace of nihilism. With that in mind I was faced with the following problem: at work we have a json-encoded message object that has generic payloads. However, each payload should contain the keys "type" and "data" where "type" contains a string-encoded value to indicate how to route or parse "data" . For example:

{ "from": "you" , "to" : "me" , "payload" : { "type" : "identity_validation_request" , "data" : { ... IdentityValidationData data ... } } }

As time withers and our corporeal bodies float aimlessly through this cold, meaningless universe, the possible values of "type" (in this example: "identity_validation_request" ) will increase and be littered throughout our codebase as we add various types of payload. These values are a great example of invariants that should be encoded somewhere in our type system. But how?

The goal of this post is to create a data type that encodes our "type" value as a type-level string, holds this type-level string in its type, and succesfully parses a json encoded string if the "type" value in some json matches the type-level string. It should behave something like this:

-- | `s` will hold the value of our type, `a` is the data type of the payload. data Payload s a -- | creating a function like this is the goal parse :: ByteString -> Maybe (Payload s a) -- | here is some an example json to parse jsonB :: ByteString jsonB = "{\"type\" : \"identity_validation_request\", \"data\": ... }" -- | itWorks should evaluate to the value -- 'Just (Payload "identity_validation_request" IdentityValidationData)' itWorks = decode jsonB :: Maybe (Payload "identity_validation_request" IdentityValidationData) -- | doesNotParse should evaluate to 'Nothing' as -- the type level string "xxx" does not match "identity_validation_request doesNotParse = decode jsonB :: Maybe (Payload "xxxx" IdentityValidationData)

Additionally, we will strive to maintain a global index of (type-level string, type) pairs using a type family, and also provide a simple, polymorphic container for clients to use.

To get there, we will:

serialize and de-serialize Proxy values of type Proxy (s :: Symbol) . serialize and de-serialize a Payload (s :: Symbol) (a :: *) datatype to associcate arbitrary payloads with type-level strings. introduce the TypeKey type family to maintain a global index of types and their assumed keys. serialize and de-serialize values of type Payload (TypeKey a) a . serialize and de-serialize values of type Message a , a polymorphic wrapper around Payload (TypeKey a) a , creating a nice interface for our clients. show that Message a satisfies all our requirements.

table of contents:

Sum Types: The Simplest Thing That Works (De)Serializing Data.Proxy A More General DeSerialization of Data.Proxy Stringly-Typed Programming for the Masses A Global Index of Your String Types Polymorphic Containers Conclusion

For the adventurous, the full solution is here.

1. Sum Types: The Simplest Thing That Works

code: https://gist.github.com/aaronlevin/4aa22bd9c79997029167#file-01-simplest-thing-that-works-hs

Before we dive into type-level tomfoolery, let’s create the simplest thing that works. The goal is to dispatch based on the key in "type" . There are a few ways of doing this. We could forgoe any sense of type-safety and write FromJSON instances that inspect "type" willy-nilly. However, to bring some sanity to our codebase, let’s use a basic sum type called Payloads which contain all possible payloads. This will force us to put all "type" string assumptions in one place.

{-# LANGUAGE OverloadedStrings #-} module SimpleThings where import Control.Applicative ((<$>), (<*>)) import Control.Monad (mzero) import Data.Aeson import qualified Data.Aeson as A -- | sum type containing all possible payloads data Payloads = FooPayload Int String | InviteRequestPayload String | IdentityValidationRequestPayload Int Int String -- | dispatch on the value of `"type"` instance FromJSON Payloads where parseJSON (Object v) = v .: "type" >>= handlePayloadType where -- handle foo_payload key handlePayloadType (A.String "foo_payload") = v .: "data" >>= \sub -> FooPayload <$> sub .: "id" <*> sub .: "msg" -- handle invite_request key handlePayloadType (A.String "invite_request") = v .: "data" >>= \sub -> InviteRequestPayload <$> sub .: "msg" -- handle identity_validation_data key handlePayloadType (A.String "identity_validation_data") = v .: "data" >>= \sub -> IdentityValidationRequestPayload <$> sub .: "from" <*> sub .: "to" <*> sub .: "validation_msg" -- default handlePayloadType _ = mzero parseJSON _ = mzero

In this design we are expecting a json structure that looks like: {"type" : "...", "data": { ... }} . To parse this we inspect the value of the "type" key and dispatch accordingly. This approach is simple and forces us to keep the assumed json keys in one place. However, we loose lots of generality. This solution is not polymorphic (i.e. we aren’t working with an arbitrary, abstract container) and is prone to the expression problem (add a new payload, we now have to dispatch in a bunch of places).

To improve on this we want a data type that looks like:

data Payload a = Payload { payload :: a }

How and where do we encode the "type" string in this polymorphic container? Note that we can no longer dispatch over the "type" value anymore and return different types of a when writing our FromJSON instance (unless we wanted to forgoe something that looked like instance FromJSON a => FromJSON (Payload a) and instead write FromJSON (Payload SomeType) for each possible payload.

To resolve this issue we’re going to take a detour into Data.Proxy , a data type that will help us pass around values that encode type-level information.

2. Serializing Data.Proxy

code: https://gist.github.com/aaronlevin/4aa22bd9c79997029167#file-02-data-proxy-hs

Data.Proxy is a great example of Haskell’s expressive type system. It’s an incredibly simple and essential data type in Haskell’s type-level swiss army knife. It’s definition:

data Proxy a = Proxy

Proxy is useful whenever you need value-level representations of information at the type level. Note that a can be of any kind, specifically, it can be of kind Symbol , which means that Proxy "foo" is a valid type (with the DataKinds extension enabled). Haskell also exposes some support for transitioning from the Symbol kind to a value of type String :

symbolVal :: KnownSymbol s => Proxy s -> String

$ ghci Prelude> import GHC.TypeLits Prelude GHC.TypeLits> import Data.Proxy Prelude GHC.TypeLits Data.Proxy> :set -XDataKinds Prelude GHC.TypeLits Data.Proxy> symbolVal (Proxy :: Proxy "foo") "foo"

Armed with Proxy and symbolVal we can now attempt to serialize and de-serialize JSON into Proxy "foo" . The ToJSON instance is pretty straight forward:

{-# LANGUAGE DataKinds #-} {-# LANGUAGE FlexibleInstances #-} {-# LANGUAGE OverloadedStrings #-} module Proxy where import Control.Monad (mzero) import Data.Aeson import Data.Proxy (Proxy(Proxy)) import GHC.TypeLits (KnownSymbol, symbolVal) instance ToJSON (Proxy "foo") where toJSON p = object [ "type" .= symbolVal p ]

This will serialize Proxy :: Proxy "foo" into {"type": "foo"} .

Now, let’s write a FromJSON instance for Proxy "foo" :

instance FromJSON (Proxy "foo") where parseJSON (Object v) = v .: "type" >>= handleType where handleType (A.String "foo") = return (Proxy :: Proxy "foo") handleType _ = mzero parseJSON _ = mzero jsonString :: BL.ByteString jsonString = "{\"type\": \"foo\"}"

You can see this work in action by loading the code in ghci:

$ ghci Prelude> :load 02-data-proxy.hs [1 of 1] Compiling Proxy ( 02-data-proxy.hs, interpreted ) Ok, modules loaded: Proxy. *Proxy> :set -XDataKinds *Proxy> decode jsonString :: Maybe (Proxy "foo") Just Proxy *Proxy> decode jsonString :: Maybe (Proxy "bar") <interactive>:5:1: No instance for (FromJSON (Proxy "bar")) arising from a use of ‘decode’ In the expression: decode jsonString :: Maybe (Proxy "bar") In an equation for ‘it’: it = decode jsonString :: Maybe (Proxy "bar")

Awesome! This works for Proxy "foo" and we get a compiler error when trying to deserializing into Proxy "bar" .

3. A More General DeSerialization of Data.Proxy

code: https://gist.github.com/aaronlevin/4aa22bd9c79997029167#file-03-more-general-proxy-hs

Obviously we don’t want to write a FromJSON instance for Proxy "bar" and every other type-level string that might appear. If you write the naive FromJSON instance you’ll hit a wall:

{-# LANGUAGE DataKinds #-} {-# LANGUAGE FlexibleInstances #-} {-# LANGUAGE OverloadedStrings #-} module Proxy where import Control.Monad (mzero) import Data.Aeson import qualified Data.Aeson as A import qualified Data.ByteString.Lazy as BL import Data.Proxy (Proxy(Proxy)) import GHC.TypeLits (KnownSymbol, symbolVal) import Data.Text (pack) instance KnownSymbol s => ToJSON (Proxy s) where toJSON p = object [ "type" .= symbolVal p ] instance KnownSymbol s => FromJSON (Proxy s) where parseJSON (Object v) = v .: "type" >>= handleType where handleType (A.String s) | s == pack (symbolVal (Proxy :: Proxy s)) = return (Proxy :: Proxy s) handleType _ = mzero parseJSON _ = mzero jsonString :: BL.ByteString jsonString = "{\"type\": \"foo\"}"

This intance naively checks if the value corresponding to the "type" key matches the symbolVal of the Proxy and if they match, it returns a Proxy of the correct type.

Unfortunately, if you compile this you will get the following error:

03-more-general-proxy.hs:21:44: Couldn't match kind ‘*’ with ‘GHC.TypeLits.Symbol’ Expected type: Value -> aeson-0.8.0.2:Data.Aeson.Types.Internal.Parser (Proxy s) Actual type: Value -> aeson-0.8.0.2:Data.Aeson.Types.Internal.Parser (Proxy s0) In the second argument of ‘(>>=)’, namely ‘handleType’ In the expression: v .: "type" >>= handleType

The problem is that GHC differentiates between the variables that appear in a type signature from the variables that appear in a function’s definition. Therefore, the s in Proxy s in the type signature is different from the s in (Proxy :: Proxy s) appearing in the definition. To resolve this, we can enable the ScopedTypeVariables extension, which will extend the scope of a type variable throughout the function definition. This will allow GHC to infer that s satisfies the KnownSymbol constraint and compile. Adding {-# LANGUAGE ScopedTypeVariables #-} to our list of extensions and loading our code into ghci:

$ ghci Prelude> :load 03-more-general-proxy.hs [1 of 1] Compiling Proxy ( 03-more-general-proxy.hs, interpreted ) Ok, modules loaded: Proxy. *Proxy> :set -XDataKinds *Proxy> decode jsonString :: Maybe (Proxy "foo") Just Proxy *Proxy> decode jsonString :: Maybe (Proxy "bar") Nothing *Proxy> import Data.ByteString.Lazy *Proxy Data.ByteString.Lazy> :set -XOverloadedStrings *Proxy Data.ByteString.Lazy> let otherString = "{\"type\":\"bar\"}" :: ByteString *Proxy Data.ByteString.Lazy> decode otherString :: Maybe (Proxy "bar") Just Proxy

4. Stringly-Typed Programming for the Masses

code: https://gist.github.com/aaronlevin/4aa22bd9c79997029167#file-04-stringly-typed-programming-hs

Let’s now try to accomplish our original goal: to create a data type Payload s a were s is a type-level string representing the value we expect in the json key "type" . This will reuire one additional extension (KindSignatures). We’ll also be updating our ToJSON and FromJSON instanes for Proxy to look for specific strings as opposed to a full json object (this will simplify the Payload serialization). Preamble:

{-# LANGUAGE DataKinds #-} {-# LANGUAGE FlexibleInstances #-} {-# LANGUAGE KindSignatures #-} {-# LANGUAGE OverloadedStrings #-} {-# LANGUAGE ScopedTypeVariables #-} module Proxy where import Control.Applicative ((<$>)) import Control.Monad (mzero) import Data.Aeson import Data.Aeson.Types import qualified Data.Aeson as A import qualified Data.ByteString.Lazy as BL import Data.Monoid ((<>)) import Data.Proxy (Proxy(Proxy)) import GHC.TypeLits (KnownSymbol, Symbol, symbolVal) import Data.Text (pack) -- | Instances for serializing Proxy instance KnownSymbol s => ToJSON (Proxy s) where toJSON = A.String . pack . symbolVal instance KnownSymbol s => FromJSON (Proxy s) where parseJSON (A.String s) | s == pack (symbolVal (Proxy :: Proxy s)) = return (Proxy :: Proxy s) parseJSON _ = mzero

And now for our Payload s a data type with its ToJSON / FromJSON instances. Note that in the FromJSON instance we first attempt to deserialize into Proxy s and if successful we discard the result and parse the rest of the payload.

-- | our new data type. newtype Payload (s :: Symbol) a = Payload a -- | ToJSON instance instance (KnownSymbol s, ToJSON a) => ToJSON (Payload s a) where toJSON (Payload a) = object [ "type" .= (Proxy :: Proxy s) , "data" .= a ] -- | FromJSON instance instance (KnownSymbol s, FromJSON a) => FromJSON (Payload s a) where parseJSON (Object v) = (v .: "type" :: Parser (Proxy s)) >> Payload <$> v .: "data" parseJSON _ = mzero -- | Show intance for ghci instance (KnownSymbol s, Show a) => Show (Payload s a) where show (Payload a) = "Payload " <> symbolVal (Proxy :: Proxy s) <> " " <> show a jsonString :: BL.ByteString jsonString = "{\"type\": \"String\", \"data\": \"cool\"}"

Now, if we load this in ghci we should see the following:

$ ghci Prelude> :set -XDataKinds Prelude> :load 04-stringly-typed-programming.hs [1 of 1] Compiling Proxy ( 04-stringly-typed-programming.hs, interpreted ) Ok, modules loaded: Proxy. *Proxy> decode jsonString :: Maybe (Payload "String" String) Just Payload String "cool" *Proxy> decode jsonString :: Maybe (Payload "Int" String) Nothing *Proxy> decode jsonString :: Maybe (Payload "String" Int) Nothing *Proxy> let x = Payload 10 :: Payload "My Very Special Integer" Int *Proxy> encode x "{\"data\":10,\"type\":\"My Very Special Integer\"}"

This is exactly what we want! We’re able to specify in the Payload type exactly the value of the "type" key we expect. You might think this feature is somewhat pedantic, and I hope to dispell the myth in the next section, but consider how much more readable code using Payload becomes. If you have a REST endpoint deserializing this Payload (or any other type you encode parameters into the type) you will see, immediately, the assumptions being made simply by analyzing the type (e.g. a function returning Payload "invite_request" InviteRequest ).

5. A Global Index of Your String Types

code: https://gist.github.com/aaronlevin/4aa22bd9c79997029167#file-05-indexing-your-keys-hs

All is good in the land of types and strings but we’d be remiss to wontonly throw strings in our types and hope for the best. What would be really nice is the following:

A global index of keys/type-level strings and their corresponding type. A compile-time error when you make a bad assumption about a key and its type.

These two can be accomplished with a closed type family that will serve as our index and a few simple modifications to Payload s a .

We begin with a simple, closed type family, requiring the TypeFamilies exension:

type family TypeKey (a :: *) :: Symbol where TypeKey Int = "int" TypeKey String = "string" -- other types you have

To incorporate this type family we need to update our Payload s a data type to use a Generalized Algebraic Data Type, requiring the GADTs extension:

data Payload (s :: Symbol) a :: * where Payload :: a -> Payload (TypeKey a) a

To write our ToJSON / FromJSON instances we will need to take advantage of equality constraints to work around the limitations of haskell’s type-level computations. Ideally we’d like to write instance ToJSON (ToJSON a, KnownSymbol (TypeKey a)) => ToJSON (Payload (TypeKey a) a) , stating that if there is a ToJSON instance for a and the TypeKey mapping on a results in a known symbol, then we can write a ToJSON instance for Payload . Unfortunately doing so will result in a compiler error that looks like:

05-indexing-your-keys.hs|50 col 28 error| Could not deduce (s ~ Proxy.TypeKey a) || from the context (GHC.TypeLits.KnownSymbol (Proxy.TypeKey a), || aeson-0.8.0.2:Data.Aeson.Types.Class.FromJSON a) || bound by the instance declaration || at /home/aterica/dev/tmp/blogpost/05-indexing-your-keys.hs:47:10-72 || ‘s’ is a rigid type variable bound by || the instance declaration || at /home/aterica/dev/tmp/blogpost/05-indexing-your-keys.hs:47:10 || Expected type: a -> Proxy.Payload s a || Actual type: a -> Proxy.Payload (Proxy.TypeKey a) a

We can work around this by using the equality constraint s ~ TypeKey a hinted to us by GHC.

-- | ToJSON instance instance (s ~ TypeKey a, KnownSymbol s, ToJSON a) => ToJSON (Payload s a) where toJSON (Payload a) = object [ "type" .= (Proxy :: Proxy s) , "data" .= a ] -- | FromJSON instance instance (s ~ TypeKey a, KnownSymbol s, FromJSON a) => FromJSON (Payload s a) where parseJSON (Object v) = (v .: "type" :: Parser (Proxy s)) >> Payload <$> v .: "data" parseJSON _ = mzero -- | Show intance for ghci instance (KnownSymbol s, Show a) => Show (Payload s a) where show (Payload a) = "Payload " <> symbolVal (Proxy :: Proxy s) <> " " <> show a jsonString :: BL.ByteString jsonString = "{\"type\": \"string\", \"data\": \"cool\"}" x :: Payload "int" Int x = Payload 10

Here, (s ~ TypeKey a, KnownSymbol s, ToJSON a) should read as: if s is constrained to be equal to TypeKey a (i.e. s is a type of kind Symbol ) and s is also a KnownSymbol then we can reate a ToJSON instance for Payload s a .

Loading up ghci, we should see that trying to compile Payload "string" String should pass, while Payload "int" String should fail (because TypeKey String was defined to be "string" not "int" ):

$ ghci Prelude> :set -XDataKinds Prelude> :load 05-indexing-your-keys.hs *Proxy> decode jsonString :: Maybe (Payload "string" String) Just Payload string "cool" *Proxy> decode jsonString :: Maybe (Payload "int" String) <interactive>:5:1: Couldn't match type ‘"string"’ with ‘"int"’ In the expression: decode jsonString :: Maybe (Payload "int" String) In an equation for ‘it’: it = decode jsonString :: Maybe (Payload "int" String)

As expected our TypeKey type family will ensure that we get a compile error if we assume the wrong key for a specific type!

Oh, but we are not yet done!

6. Polymorphic Containers

code: https://gist.github.com/aaronlevin/4aa22bd9c79997029167#file-06-polymorphic-containers-hs

Now, you might be thinking: ok, ok, ok, I can put my assumptions in the type, but really I don’t want to specify these keys everywhere, I just want to keep this global index for a reference. So, you want a simple, polymorphic container that hides the underlying type-level machinery? I claim that with the help of a new GADT and a scary extension (UndecidableInstances) we can do this.

Here is our polymorphic container:

data Message a where Message :: (s ~ TypeKey a, KnownSymbol s) => Payload s a -> Message a -- | ToJSON instance which serializes a message's payload instance ToJSON a => ToJSON (Message a) where toJSON (Message payload) = object [ "payload" .= payload ] -- | FromJSON instance instance (s ~ TypeKey a, KnownSymbol s, FromJSON a) => FromJSON (Message a) where parseJSON (Object v) = Message <$> v .: "payload" parseJSON _ = mzero instance Show a => Show (Message a) where show (Message p) = "Message ( " <> show p <> " )"

The Message a data type simply wraps the Payload s a for us, hiding the ugly deails from the client. Nevertheless, it behaves exactly as we’d expect. Consider the following ghci session:

$ ghci Prelude> :set -XOverloadedStrings Prelude> :load 06-polymorphic-containers.hs *Proxy> let message = "{ \"payload\": {\"type\": \"string\", \"data\": \"cool\"} }" :: Data.ByteString.Lazy.ByteString *Proxy> decode message :: Maybe (Message String) Just Message ( Payload string "cool" ) *Proxy> decode message :: Maybe (Message Int) Nothing *Proxy> Message (Payload 420) :: Message Int Message ( Payload int 420 ) *Proxy> Message (Payload "420") :: Message String Message ( Payload string "420" ) *Proxy> data Cool = Cool *Proxy> Message (Payload Cool) :: Message Cool <interactive>:15:1: No instance for (KnownSymbol (TypeKey Cool)) arising from a use of ‘Message’ In the expression: Message (Payload Cool) :: Message Cool In an equation for ‘it’: it = Message (Payload Cool) :: Message Cool

As you can see, Message a has he desired behaviour:

we can deserialize strings only if the "type" key has the right value.

key has the right value. the value of the "type" key, and thus the type-level string needed on our Payload s a type is not exposed to clients using Message .

key, and thus the type-level string needed on our type is not exposed to clients using . if we try to create a Message with a type not indexed in our closed type family TypeKey , we get an error (e.g. Message (Payload Cool) :: Message Cool did not compile).

While this last part required a scary extension, it’s somewhat safe to be used in this context.

Conclusion

To recap what we’ve accomplised so far, let’s recall what we set out to do. We encountered a situation where we wanted to deserialize some JSON that required us to dispatch on a specific value of a json key ( "type" ) and, based on that value, attempt to parse the JSON into a specific type. We discussed several attempts:

Ad-hoc Using a sum type Encoding the expeted value of "type" in a type-level string

We spent most of the time exploring the last option. We were able to:

Serialize and de-serialize Proxy values of type Proxy (s :: Symbol) . This allowed us to encode the "type" value as a type-level string in the proxy. using 1 we created a Payload (s :: Symbol) (a :: *) datatype to associcate arbitrary payloads with type-level strings. we showed you could serialize and de-serialize values of type Payload s a . we then created a global index of types and the assumed keys they would have by using the type family TypeKey . using 4 we were able to serialize and de-serialize values of type Payload (TypeKey a) a , encoding our json-key assumptions at compile time in a global, unique index. we then introduced a Message a datatype that wrapped our Payload (TypeKey a) a , creating a nice interface for our clients. finally, we avoid run-time errors based on bad assumptions by having the compiler throw an error if we try to deserialize an instnae of Message a where a has no entry in our TypeKey type family index.

There is a lot more you can do with these ideas. I hope this (lengthy) post inspires you to try enoding more invariants in the type system. For further inspiration, I recommend trying to grok the reflection library, which takes the idea of encoding information within types to the next level.

May all your strings be well-typed!