Applicative, bidirectional serialization combinators



Published on September 7, 2012 under the tag A neat GADT/Applicative trickPublished on September 7, 2012 under the tag haskell

Unrelated

If you are only interested in the neat Haskell trick, skip this section. It has been a while since I updated my personal blog here. There has been a lot going on in my life in the past few months, including exams, a breakup with my girlfriend and moving to Tokyo for an internship at Tsuru Capital.

It has been really great so far, the work is interesting, so are the people, and we have a nice view from the office.

View from the Tsuru Capital office

Prologue

This blogpost is written in literate Haskell (source here), so you can drop it in a file, load it in GHCi and play around with it, should you feel like doing this.

{-# LANGUAGE FlexibleInstances #-} {-# LANGUAGE GADTs #-} {-# LANGUAGE OverloadedStrings #-} {-# LANGUAGE ScopedTypeVariables #-} {-# LANGUAGE TypeSynonymInstances #-} import Control.Applicative import Control.Monad import Data.Aeson

A lot of serialization libraries use a technique employing two typeclasses (or one typeclass with two methods) in order to convert data from and to some format. An example is the excellent aeson library:

data Food = Food { foodName :: String , foodCost :: Int } deriving ( Show )

instance ToJSON Food where Food name cost) = object toJSON (name cost)object [ "name" .= name name , "cost" .= cost cost ]

instance FromJSON Food where Object obj) = Food parseJSON (obj) <$> obj .: "name" obj <*> obj .: "cost" obj = mzero parseJSON _mzero

Some other libraries using this technique are cassava, postgresql-simple, and binary.

While working on some new internal SQLite bindings and utilities at Tsuru, I discovered another technique which is more concise, but keeps the nice Applicative interface. The code in this blogpost disregards performance and only supports the most basic of features in order to be easier to understand.

Primitive types

We start out by defining a typeclass for primitive serializable types, such as Int and String . We disregard performance, as said before, and we are also going to ignore proper error checking: we just serialize these primitive types to String s.

class Field a where fieldType :: a -> String -- Should work with 'undefined' fieldRead :: String -> a fieldShow :: a -> String

instance Field String where = const "TEXT" fieldType = id -- We need proper escaping here fieldRead = id fieldShow

instance Field Int where = const "INTEGER" fieldType = read fieldRead = show fieldShow

These fields are sufficient to store some Food , so what else do we need? We need “instantiations” of these fields which actually represent a part of a record. This can be modeled as:

data FieldInfo t a = FieldInfo t a { fieldName :: String , fieldExtract :: t -> a }

Let’s give an example, we would map the cost of food as:

costFieldInfo :: FieldInfo Food Int = FieldInfo "cost" foodCost costFieldInfofoodCost

This costFieldInfo is not used in the code below, it is just an example.

GADTs

In the case of an SQLite database, such a FieldInfo actually represent a column in our table. We will be building our table using Applicative , without actually evaluating anything yet. This seems related to free monads – determining the exact category-theoretical relation is left as an excercise for the mathematical-minded reader.

We just have constructors for each method of the applicative interface, plus an SQLite primitive to add a column to our table.

This requires the great GADTs extension, it is not possible to do this with Haskell 98 datatypes – the wiki page has more information.

data Table t f where t f -- Applicative interface: -- <$>, pure and <*> Map :: (a -> b) -> Table t a -> Table t b (ab)t at b Pure :: a -> Table t a t a App :: Table t (a -> b) -> Table t a -> Table t b t (ab)t at b -- Primitives Column :: Field a => FieldInfo t a -> Table t a t at a

This makes the Functor and Applicative instances trivial to implement.

instance Functor ( Table t) where t) fmap = Map

instance Applicative ( Table t) where t) pure = Pure ( <*> ) = App

This GADT allows us to model actual tables:

foodTable :: Table Food Food = Food foodTable <$> Column ( FieldInfo "name" foodName) foodName) <*> Column ( FieldInfo "cost" foodCost) foodCost)

Let’s make some nicer syntax and write foodTable again:

column :: Field a => String -> (t -> a) -> Table t a (ta)t a = Column ( FieldInfo name extract) column name extractname extract)

foodTable' :: Table Food Food = Food foodTable' <$> column "name" foodName columnfoodName <*> column "cost" foodCost columnfoodCost

Lookin’ good!

The real work

While procrastrinating by creating GADT wrappers is fun, at one point we should write some actual implementation.

These implementations work by evaluating the trees we created with the different constructors for the GADT.

Our first method crawls the table tree and returns the name and type of each column:

metaRecord :: Table t t -> [( String , String )] t t[()] = go metaRecordgo where go :: forall t a . Table t a -> [( String , String )] t at a[()] Map _ t) = go t go (_ t)go t Pure _ ) = [] go (_ )[] App t1 t2) = go t1 ++ go t2 go (t1 t2)go t1go t2 Column fi) = [(fieldName fi, fieldType ( undefined :: a))] go (fi)[(fieldName fi, fieldType (a))]

The second method is only slightly more complicated: instead of returning the type of each column, it calls our custom extract function for that column to get its value. We can then call fieldShow on that to do the final serialization, and what we get is the serialized name and value of each column.

toRecord :: forall t . Table t t -> t -> [( String , String )] t t[()] = go tab toRecord tab xgo tab where go :: forall a . Table t a -> [( String , String )] t a[()] Map _ t) = go t go (_ t)go t Pure _ ) = [] go (_ )[] App t1 t2) = go t1 ++ go t2 go (t1 t2)go t1go t2 Column fi) = go (fi) $ fieldExtract fi x)] [(fieldName fi, fieldShowfieldExtract fi x)]

The deserialization method is the hardest. It works by actually evaluating the tree we built. Because of our GADT use, we can do this in a type-safe way.

fromRecord :: forall t . Table t t -> [( String , String )] -> t t t[()] = go tab fromRecord tab recordgo tab where go :: forall a . Table t a -> a t a Map f t) = f (go t) go (f t)f (go t) Pure x) = x go (x) App ft t) = (go ft) (go t) go (ft t)(go ft) (go t) Column fi) = case lookup (fieldName fi) record of go (fi)(fieldName fi) record Nothing -> error $ "Missing field: " ++ fieldName fi fieldName fi Just str -> fieldRead str strfieldRead str

Utilities

We add a small utility function to pretty-print our records:

printRecord :: [( String , String )] -> IO () [()]() = putStrLn . unlines . map (\(x, y) -> x ++ " = " ++ y) printRecord(\(x, y)y)

And a typeclass so we do not have to create an xxxTable function for each datatype:

class HasTable t where table :: Table t t t t

Demo

Now let’s implement food serialization for real: we only need to implement a single typeclass with a single method!

instance HasTable Food where = Food table <$> column "name" foodName columnfoodName <*> column "cost" foodCost columnfoodCost

Some concrete deliciousness:

ramen :: Food = Food "ラーメン" 800 ramen

And a trivial demo:

main :: IO () () = do main putStrLn "Meta record (used in CREATE TABLE... etc.):" $ metaRecord ( table :: Table Food Food ) printRecordmetaRecord ( putStrLn "Serialized ramen:" printRecord (toRecord table ramen) putStrLn "Deserialized sashimi:" print (fromRecord table [( "name" , "刺身" ), ( "cost" , "1200" )] :: Food ) (fromRecord table [(), ()]

Conclusion

I think this is a nice use case of GADTs, and although it has a performance impact, it looks like it is worth it (for now). If it turns out that we can nicely generalize this code, we will probably release it on Hackage. But feel free to steal the idea, and comments are also welcome, of course!