We use Protocol Buffers extensively, and from talking to some folks at BayHac'12 it may be time to revisit the state of protobuf in Haskell.

To be fair, the protocol-buffers package is great. It’s extremely full featured, well tested and I can’t complain about the performance. But when most parties involved are running Haskell, maintaining separate .proto files is more than just a chore. Properly integrating the hprotoc preprocessor into a build system has also proven to be a challenge primarily due to the n:m mapping of source files to target modules.

After spending a little time this evening hacking around, I’ve come up with an alternate solution that looks promising and doesn’t require external files or additional build tools. Though it’s far from a production effort, the type-level version of the code is available on Github for all your forking needs.

Note: GHC 7.2 or up is required for Generic support.

So what does it look like?

By defining a set of types that allow tagging a record field with a field number…

1 2 3 newtype Required ( n :: Nat ) t = Required t newtype Optional ( n :: Nat ) t = Optional t newtype Packed ( n :: Nat ) t = Packed t

and a few more to override the default base-128 varint encoding…

1 2 newtype Fixed t = Fixed t newtype Signed t = Signed t

… should give you enough rope to write regular Haskell records that are efficiently (de)serialized with very little fuss. Create an annotated record, derive a Generic instance and you’re done.

1 2 3 4 5 6 7 8 9 10 11 12 13 14 {-# LANGUAGE DataKinds #-} {-# LANGUAGE DeriveGeneric #-} import Data.Hex ( unhex ) import Data.Monoid ( Last ) import Data.Serialize ( runGet ) import Data.Text ( Text ) import GHC.Generics data TestRec = TestRec { field1 :: Required 1 ( Last Int64 ) , field2 :: Optional 2 ( Last Text ) , field3 :: Optional 3 ( Last Int64 ) } deriving ( Generic , Show )

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 * Pb > print $ runGet decodeMessage =<< unhex "089601120774657374696e67" TestRec { field1 = Required ( Last { getLast = Just 150 }) , field2 = Optional ( Last { getLast = Just "testing" }) , field3 = Optional ( Last { getLast = Nothing }) } * Pb > print $ runGet decodeMessage =<< unhex "089601189701" TestRec { field1 = Required ( Last { getLast = Just 150 }) , field2 = Optional ( Last { getLast = Nothing }) , field3 = Optional ( Last { getLast = Just 151 }) } * Pb > print $ runGet decodeMessage =<< unhex "089601" TestRec { field1 = Required ( Last { getLast = Just 150 }) , field2 = Optional ( Last { getLast = Nothing }) , field3 = Optional ( Last { getLast = Nothing }) }

As you should expect in Haskell, changing a field to an unsupported type such as an Int will reward you with a nice (if not misleading) build break:

1 2 3 data TestRec = TestRec { field3 :: Optional 3 ( Last Int ) } deriving ( Generic , Show )

1 2 3 4 5 6 7 Pb . hs : 272 : 27 : No instance for ( Wire Int ) arising from a use of ` decodeMessage' Possible fix : add an instance declaration for ( Wire Int ) In the first argument of ` runGet' , namely ` decodeMessage' In the first argument of `( =<< ) ', namely `runGet decodeMessage' In the expression : runGet decodeMessage =<< unhex "089601"

Update: 2/8/2013:

Steve and I are working on completing this work, check out our progress on Github.