Cooking Classes with Datatype Generic Programming

Haskell Generics are a somewhat misunderstood topic but are an extremely powerful technique for writing reusable and comparable interfaces across an enormous universe of types with very little effort. They are probably my favorite example of the advantages of an expressive type system endows us with.

Source for examples code is available here.

Generics are a form of datatype-generic programming, which although the namesake has some similarity to Java Generics they are different concepts entirely. GHC’s implementation of Generics fall out of the simple observation that all datatypes in Haskell can be written as a combination of a sum of products.

A sum type, is a data structure used to hold a value that could take on several different, but fixed, types. For example:

data Pastry = Turnover | Macaroon | Brownie | Cookie

A product type, is a data structure used to hold a fixed ordered set of several types. Selecting a single field is called projection.

data Person = Person { firstName :: String , lastName :: String , age :: Int , height :: Float , phoneNumber :: String , flavor :: String }

In Haskell all datatypes can be expressed as sums of products:

data Expr = Add { l :: Expr , r :: Expr } | Mul { l :: Expr , r :: Expr } | Sub { l :: Expr , r :: Expr } | Div { l :: Expr , r :: Expr } | Number { val :: Int }

During compilation most of the information about the structure of the datatypes is thrown out, by design Haskell erases all type information. Prior to type-checking a phase known as elaboration expands out all record selectors into toplevel functions which extract the named fields of a product.

data Point a b = Point { x :: a, y :: b }

x :: Point a b -> a x ( Point a _) = a y :: Point a b -> b y ( Point _ b) = b

The rest of the information about products largely gets thrown out after compilation, and the product just get expanded into pattern matching code. For sum types the only information that is kept around is the tag for each constructor of the sum type. For instance Add is assigned the tag 1, Mul is assigned 2 etc. In a case statement the only information that is available at runtime is which branch we’re scrutinizing.

So what if consider not tossing out all this information and instead exposed it to our program so that we could write generic logic that can introspect the “structure” of our datatypes.

Compiler Hooks

Since GHC 6.10 we’ve had type families which, among other things, allow us to associate data types with our typeclass. So the structure of our Generic class can have a associated Rep type which can carry information along with the typeclass.

class Generic a where type Rep a :: * -> * from :: a -> ( Rep a) x to :: ( Rep a) x -> a

To represent the structure of our datatype we need to set up several datatypes to encode, sums, products, empty branches and various metadata about the names of fields, constructors and their types. All of which have a free parameter p which is bound to the head of typeclass instance when used in the associated datatype Rep a .

data V1 p -- Empty data U1 p = U1 -- () data ( :+: ) f g p = L1 (f p) | R1 (g p) -- Sum data ( :*: ) f g p = (f p) :*: (g p) -- Product newtype K1 i c p = K1 { unK1 :: c } -- a container for a c newtype M1 i t f p = M1 { unM1 :: f p } -- metadata wrapper

Now we could write this instance by hand for all of our datatypes, and for a simple enumeration it would look like the following:

data Ingredient = Flour | Sugar instance Generic Ingredient where type Rep Ingredient = M1 D ( T_Ingredient (( M1 C ( C_Flour U1 )) :+: ( M1 C ( C_Sugar U1 )))) from Flour = M1 ( L1 ( M1 U1 )) from Sugar = M1 ( R1 ( M1 U1 )) to ( M1 ( L1 ( M1 U1 ))) = Flour to ( M1 ( R1 ( M1 U1 ))) = Sugar data T_Ingredient data C_Flour data C_Sugar

The instance here is purely mechanical and can be derived from GHC’s internal representation of it’s syntax tree, namely the types GHC.DataCon and GHC.TypeCon. Using the -XDeriveGeneric extension we can have GHC crank this typeclass out automatically:

{-# LANGUAGE DeriveGeneric #-} data Ingredient = Flour | Sugar deriving ( Generic )

Lest we not handwave away the work that GHC is doing, let’s actually recreate the introspection logic that GHC uses when instantiating a Generic class from a module’s data definitions. Let’s load a module dynamically, intercept the compilation and dump out the internal structure of the it’s datatypes to see how this would be mechanically translated into a typeclass instance.

import GHC import GHC.Paths as Paths import Name import TyCon import TypeRep import DataCon import HscTypes import Text.Show.Pretty main :: IO () main = do -- Inside the GHC Monad rep <- runGhc ( Just Paths.libdir) $ do -- Spin up a GHC compiler environment dflags <- getSessionDynFlags setSessionDynFlags dflags -- Make a dummy module to inject let mn = mkModuleName "Test" -- Make a dummy target addTarget Target { targetId = TargetModule mn , targetAllowObjCode = True , targetContents = Nothing } -- Run the GHC pipeline load LoadAllTargets modSum <- getModSummary mn p <- parseModule modSum t <- typecheckModule p -- Pluck out the module tycons after we're done type-checking DesugaredModule tcmod modguts <- desugarModule t let tycons = mg_tcs modguts -- Deconstruct all datatypes into their sums-of-products. return (deconstruct tycons) putStrLn (ppShow rep)

Now that we have access to GHC’s internal representation of the “module guts” we can write our deconstructor logic. The logic is a slim few hundred lines of mostly ADT munging.

deconstruct :: [ TyCon ] -> [ Data ] deconstruct = fmap go where go x | isProduct x = M1 $ D Datatype { dataTypeName = getOccString (tyConName x) , modName = modString x , isNewtype = isNewTyCon x , datatype = Product (mkProduct x) , recursive = isRecursiveTyCon x } | isVoid x = M1 $ D Datatype { dataTypeName = getOccString (tyConName x) , modName = modString x , isNewtype = isNewTyCon x , datatype = V1 , recursive = isRecursiveTyCon x } | otherwise = M1 $ D Datatype { dataTypeName = getOccString (tyConName x) , modName = modString x , isNewtype = isNewTyCon x , datatype = Sum (mkProduct x) , recursive = isRecursiveTyCon x } mkRecord :: TyCon -> [ Data ] mkRecord x = concatMap mkRProduct (tyConDataCons x) mkProduct :: TyCon -> [ Data ] mkProduct x = fmap go (tyConDataCons x) where go :: DataCon -> Data go x | isRecord x = Product (mkRProduct x) go x | isDProduct x = Product (mkDProduct x) go x = M1 ( C ( Constructor (conNames x))) mkDProduct :: DataCon -> [ Data ] mkDProduct xs = [ K1 (showType x) | x <- dataConOrigArgTys xs] mkRProduct :: DataCon -> [ Data ] mkRProduct x = [ M1 ( S ( Selector (getOccString fld)) ty) | (fld, ty) <- zip (fieldNames x) (mkDProduct x)]

Setting up the dummy module Test.hs to run our decompilation script:

data PlatonicSolid = Tetrahedron | Cube | Octahedron | Dodecahedron | Icosahedron data Person = Person { firstName :: String , lastName :: String , age :: Int , height :: Float , phoneNumber :: String , flavor :: String } deriving ( Show ) data T = T1 { a :: Int , b :: Float } | T2 { c :: Int , d :: Double }

For PlatonicSolid we get the representation:

M1 ( D Datatype { dataTypeName = "PlatonicSolid" , modName = "Test" , isNewtype = False , datatype = Sum [ M1 ( C Constructor { conName = "Tetrahedron" }) , M1 ( C Constructor { conName = "Cube" }) , M1 ( C Constructor { conName = "Octahedron" }) , M1 ( C Constructor { conName = "Dodecahedron" }) , M1 ( C Constructor { conName = "Icosahedron" }) ] , recursive = False })

For Person we get the representation:

M1 ( D Datatype { dataTypeName = "Person" , modName = "Test" , isNewtype = False , datatype = Product [ M1 ( S Selector { selName = "firstName" } ( K1 "String" )) , M1 ( S Selector { selName = "lastName" } ( K1 "String" )) , M1 ( S Selector { selName = "age" } ( K1 "Int" )) , M1 ( S Selector { selName = "height" } ( K1 "Float" )) , M1 ( S Selector { selName = "phoneNumber" } ( K1 "String" )) , M1 ( S Selector { selName = "flavor" } ( K1 "String" )) ] , recursive = False })

For the sum of products T we get the representation:

M1 ( D Datatype { dataTypeName = "T" , modName = "Test" , isNewtype = False , datatype = Sum [ Product [ M1 ( S Selector { selName = "a" } ( K1 "Int" )) , M1 ( S Selector { selName = "b" } ( K1 "Float" )) ] , Product [ M1 ( S Selector { selName = "c" } ( K1 "Int" )) , M1 ( S Selector { selName = "d" } ( K1 "Double" )) ] ] , recursive = False })

These data points are then used to generate the Rep instance in the derived Generic instances. So that’s a rough approximation of how -XDeriveGeneric works under the hood, nothing terribly complicated just book keeping.

GHC.Generics

From the internal representation we crank out several typeclass instances which store the metadata about the various constructors.

class Datatype d where datatypeName :: t d f a -> String moduleName :: t d f a -> String isNewtype :: t d f a -> Bool isNewtype _ = False class Selector s where selName :: t s f a -> String class Constructor c where conName :: t c f a -> String conFixity :: t c f a -> Fixity conFixity _ = Prefix conIsRecord :: t c f a -> Bool conIsRecord _ = False

For example, for Ingredient example from before, we’d have several constructor instances automatically generated by which we could query the names from the AST.

type Rep Ingredient = M1 D ( T_Ingredient (( M1 C ( C_Flour U1 )) :+: ( M1 C ( C_Sugar U1 )))) data T_Ingredient data C_Flour data C_Sugar instance Datatype T_Ingredient where datatypeName _ = "Ingredient" moduleName _ = "Main" instance Constructor C_Flour where conName _ = "Flour" instance Constructor C_Sugar where conName _ = "Sugar"

Unlike reflection in languages like Java, Generics are not pushing type information into the runtime. Apart from a dictionary lookup for they are a effectively free abstraction that has no overhead. We’re simply making more information from the compiler manifest in the types during the type-checking phase, all of which gets erased during compilation.

Example

I tried to come up a non-contrived example for illustrating the usefulness of generics, and there are plenty of examples (serializes for JSON, Protocol Buffers, SQL Generation, traversals, command line parsers, etc) that are well-documented elsewhere on the web. So let’s consider an example based on the silly pun in the title of this article, namely cooking typeclasses.

So we have a Pie type, naturally.

data Pie = Pie { filling :: Filling , topping :: Maybe Topping } deriving ( Show , Generic ) data Filling = Apple | Cherry | Pumpkin deriving ( Show , Generic ) data Topping = IceCream | WhipCream deriving ( Show , Generic )

Using generics we’d like to a generate a list of the types of pie that we can put on a menu from the structure of the Haskell types. Records will denote named variations (“filling” vs “topping”) of the menu item, while sum types denote the various options in the variations (“cherry filling” vs “apple filling”).

data Item = Item Text [ Item ] | Variant Text [ Item ] | Choice Text deriving ( Show , Generic )

We implement a typeclass with a default signature which gives us the option to manually specify how a type gets converted into a menu item, or fall back on using it’s generic representation to automatically generate it.

class Menu a where menu :: a -> [ Item ] default menu :: ( Generic a, GMenu ( Rep a)) => a -> [ Item ] menu _ = gmenu ( Proxy :: Proxy a) gmenu :: forall a . ( Generic a, GMenu ( Rep a)) => Proxy a -> [ Item ] gmenu _ = gopts ( Proxy :: Proxy ( Rep a))

Our generic menu operates over various GHC.Generics types to expand out the sums and products into the Item categories that correspond to the menu. The instance for GMenu (K1 R f) has a Menu constraint which allows manual override for specific datatypes. Since we’re passing around a proxy we’ll have to manually thread the dictionary around sometimes by passing an undefined cast to the type of the instance we need to resolve.

-- Generic Menu class GMenu a where gopts :: Proxy a -> [ Item ] -- Datatype instance GMenu f => GMenu ( M1 D x f) where gopts _ = gopts ( Proxy :: Proxy f) -- Constructor Metadata instance ( GMenu f, Constructor c) => GMenu ( M1 C c f) where gopts x | conIsRecord (undefined :: t c f a) = [ Item (pack (conName m)) (gopts ( Proxy :: Proxy f))] | otherwise = [ Choice (pack (conName m))] where m = (undefined :: t c f a) -- Selector Metadata instance ( GMenu f, Selector c) => GMenu ( M1 S c f) where gopts _ = [ Variant (pack (selName m)) (gopts ( Proxy :: Proxy f))] where m = (undefined :: t c f a) -- Constructor Paramater instance ( GMenu ( Rep f), Menu f) => GMenu ( K1 R f) where gopts _ = menu (undefined :: f) -- Sum branch instance ( GMenu a, GMenu b) => GMenu (a :+: b) where gopts _ = gopts ( Proxy :: Proxy a) ++ gopts ( Proxy :: Proxy b) -- Product branch instance ( GMenu a, GMenu b) => GMenu (a :*: b) where gopts _ = gopts ( Proxy :: Proxy a) ++ gopts ( Proxy :: Proxy b) -- Void branch instance GMenu U1 where gopts _ = []

Specifically we’ll override the Maybe type so that it simply expands out to a choice of either “AsIs” of the variant or just the list of choices endowed by the inner parameter.

instance Menu a => Menu ( Maybe a) where menu _ = [ Choice (pack "AsIs" )] ++ (menu (undefined :: a))

As an example Maybe Topping expands out into three choices.

menu ( a :: Maybe Topping ) ~ [ Choice "AsIs" , Choice "IceCream" , Choice "whipCream" ]

Using GHC 7.10’s new -XDeriveAnyClass extension we can actually go back and automatically derive Menu inside the deriving clause.

data Pie = Pie { filling :: Filling , topping :: Maybe Topping } deriving ( Show , Generic , Menu )

Now synthesizing a dictionary for Pie we can get a menu

menu (undefined :: Pie )

And voila:

[ Item "Pie" [ Variant "filling" [ Choice "Apple" , Choice "Cherry" , Choice "Pumpkin" ] , Variant "topping" [ Choice "AsIs" , Choice "IceCream" , Choice "WhipCream" ] ] ]

Since our logic is datatype generic, any Haskell we can write down can be automatically translated to a Menu just by deriving Menu . So now we can a new Crisp desert (my favorite!) and we get everything for free!

data Crisp = Crisp { contents :: Filling , temperature :: Temperature } deriving ( Show , Generic , Menu ) data Temperature = Warm | Cold deriving ( Show , Generic , Menu ) -- Add an instance for a pair of menu items. That expands into multiple items. instance ( Menu a, Menu b) => Menu (a,b) where menu _ = menu (undefined :: a) ++ menu (undefined :: b)

And we can generate the composite menu of both deserts:

menu (undefined :: ( Pie , Crisp ))

[ Item "Pie" [ Variant "filling" [ Choice "Apple" , Choice "Cherry" , Choice "Pumpkin" ] , Variant "topping" [ Choice "AsIs" , Choice "IceCream" , Choice "WhipCream" ] ] , Item "Crisp" [ Variant "contents" [ Choice "Apple" , Choice "Cherry" , Choice "Pumpkin" ] , Variant "temperature" [ Choice "Warm" , Choice "Cold" ] ] ]

So that’s generics. One of the best goto examples of how an expressive type system and a few clever compiler hooks can make programmers lives easier by cooking our boilerplate for us and giving tastier more correct code.