RecordWildCards and Binary Parsing

RecordWildCards

RecordWildCards is a GHC extension that makes working with Haskell records more convenient. The extension has been blogged about in a few places already, so this post intends to provide a different motivating example: binary parsing.

Kwang’s linked blogpost shows binary serialization using the extension. This post will show the improvements we get with RecordWildCards and binary deserialization, but first…

What does RecordWildCards do?

RecordWildCards provides local bindings for the fields in a record:

#!/usr/bin/env stack -- stack --resolver lts-8.20 --install-ghc exec ghci --package text {-# LANGUAGE OverloadedStrings #-} {-# LANGUAGE RecordWildCards #-} import Data.Monoid ((<>)) ((<>)) import Data.Text ( Text , intercalate) , intercalate) import Data.Text.IO (putStrLn) (putStrLn) import Prelude hiding (putStrLn) (putStrLn) data BlogPost = BlogPost { blogPostTitle :: Text , blogPostTags :: [ Text ] } samplePost :: BlogPost = BlogPost "Foo" [ "Bar" , "Baz, Quux" ] samplePost -- Pattern matching is convenient but fiddly when new fields are -- added or existing fields are rearranged. printViaPatternMatching :: BlogPost -> IO () () BlogPost title tags) = do printViaPatternMatching (title tags) putStrLn $ "Title: " <> title title putStrLn $ "Tags: " <> intercalate ", " tags intercalatetags -- Record accessors are not fiddly when new fields are added or existing fields -- are rearranged, but require more keystrokes and horizontal space. printViaRecordAccessors :: BlogPost -> IO () () = do printViaRecordAccessors blogPost putStrLn $ "Title: " <> blogPostTitle blogPost blogPostTitle blogPost putStrLn $ "Tags: " <> intercalate ", " (blogPostTags blogPost) intercalate(blogPostTags blogPost) -- RecordWildCards offers the best of both worlds with the above two -- approaches. We use the field names directly as bindings to the -- record's values. printViaRecordWildCards :: BlogPost -> IO () () BlogPost { .. } = do printViaRecordWildCards putStrLn $ "Title: " <> blogPostTitle blogPostTitle putStrLn $ "Tags: " <> intercalate ", " blogPostTags intercalateblogPostTags

You can execute the above script with stack and it will spin up GHCi .

How does it help with binary parsing?

Let’s use cereal as our binary parsing library. Here are the imports we’ll need:

#!/usr/bin/env stack -- stack --resolver lts-8.20 --install-ghc exec ghci --package bytestring --package cereal {-# LANGUAGE RecordWildCards #-} import Data.ByteString ( ByteString ) import Data.Serialize.Get ( Get , getWord8, getWord16le, runGet) , getWord8, getWord16le, runGet) import Data.Word ( Word8 , Word16 )

We will keep the domain fun and pretend we have a simple video game configuration we need to parse out of a file. Why we are using a binary file format for this is above our paygrade apparently:

data GameConfig = GameConfig { gameConfigScreenWidth :: Word16 , gameConfigScreenHeight :: Word16 , gameConfigVolume :: Word8 } deriving ( Show ) decodeGameConfig :: Get GameConfig -> ByteString -> Either String GameConfig = runGet decodeGameConfigrunGet

Now we need to provide a Get GameConfig and we’ll be off to the races. There are multiple ways we can tackle this, and the docs indicate Get has instances for Applicative and Monad .

I typically reach for Applicative by default when I need to parse something simple:

applicativeGetter :: Get GameConfig = GameConfig <$> getWord16le <*> getWord16le <*> getWord8 applicativeGettergetWord16legetWord16legetWord8

This has a drawback though: the pieces being parsed are not named. If we look at the parser in isolation, all we know is that a GameConfig wraps two Word16 s and one Word8 and the fields are laid out in that order in the data declaration.

Another option would be to monadically parse the GameConfig :

monadicGetter :: Get GameConfig = do monadicGetter <- getWord16le screenWidthgetWord16le <- getWord16le screenHeightgetWord16le <- getWord8 volumegetWord8 pure $ GameConfig screenWidth screenHeight volume screenWidth screenHeight volume

This provides instant understanding of the meaning of the fields we are parsing. We know what a GameConfig represents without flipping over to its declaration. The disadvantage over the Applicative approach is that we must ensure we are passing the field values in the correct order to the GameConfig constructor. What if we got sleepy and wrote the last line like this?

pure $ GameConfig screenHeight screenWidth volume screenHeight screenWidth volume

A third approach would be to still parse monadically but use record syntax at the end:

monadicGetterWithRecordSyntax :: Get GameConfig = do monadicGetterWithRecordSyntax <- getWord16le screenWidthgetWord16le <- getWord16le screenHeightgetWord16le <- getWord8 volumegetWord8 pure $ GameConfig = screenWidth { gameConfigScreenWidthscreenWidth = screenHeight , gameConfigScreenHeightscreenHeight = volume , gameConfigVolumevolume }

This somewhat helps alleviate the problem of the sleepy dev, but now the parser is almost twice as many lines and we could still incorrectly write the last bit like this:

pure $ GameConfig = screenHeight { gameConfigScreenWidthscreenHeight = screenWidth , gameConfigScreenHeightscreenWidth = volume , gameConfigVolumevolume }

Let’s see what we can do now that we have RecordWildCards in our toolbelt:

monadicGetterWithRecordWildCards :: Get GameConfig = do monadicGetterWithRecordWildCards <- getWord16le gameConfigScreenWidthgetWord16le <- getWord16le gameConfigScreenHeightgetWord16le <- getWord8 gameConfigVolumegetWord8 pure $ GameConfig { .. }

We have solved the problem of the sleepy dev! That aside, the big win here is that we can look at the parser in complete isolation - no flipping to the data declaration. All we have to worry about is that we parse the fields in the correct order, which is the main problem we were solving anyways! We don’t have to worry about the fields in the data structure being laid out in the same order as the bytes in the file.