PatternSynonyms for expressive code

PatternSynonyms is a very handy GHC extension to abstract away some implementation details of your data types and present a cleaner interface to the rest of the code.

Patterns

Patterns are ubiquitous in Haskell code; it is one of the first things you learn as you get familiar with the language. Almost every piece of Haskell code will have a pattern match construct in a function definition or case expression. However, they have some limitations as well.

As an example, let us look at how to represent identifiers:

import qualified Data.Text as Text import Data.Text (Text) newtype Identifier = Identifier { unIdentifier :: Text } deriving Eq

We can construct values of type Identifier with the Identifier data constructor (e.g. Identifier "foo" ). You can deconstruct an identifier in two ways:

identifierLength1 :: Identifier -> Int identifierLength1 = Text.length . unIdentifier identifierLength2 :: Identifier -> Int identifierLength2 (Identifier s) = Text.length s

Having both a record field accessor and a pattern matching syntax is very useful. We can pick the best one depending on the usage.

A curious case

How would you implement case-insensitive identifiers? In such a case, Identifier "foo" and Identifier "Foo" are considered equal. We must retain the original case in the data type though. Many other use cases - such as displaying the identifier with the original input case - require that.

Well, it turns out not too complicated. We can do this:

import Data.Function(on) newtype Identifier = Identifier { unIdentifier :: Text } instance Eq Identifier where (==) = (==) `on` (Text.toCaseFold . unIdentifier)

This is as expressive as the previous implementation. We still have the same data constructor and pattern matching at the usage sites. It has one serious drawback though - performance. Text.toCaseFold is evaluated for each equality check which makes this a slow operation.

A performance fix

Here is one way to solve the performance problem:

data Identifier = Identifier { unIdentifier :: Text , folded :: Text } makeIdentifier :: Text -> Identifier makeIdentifier s = Identifier { unIdentifier = s , folded = Text.toCaseFold s } instance Eq Identifier where (==) = (==) `on` folded

This moves the expensive Text.toCaseFold to record construction so that we don’t pay a penalty on each == evaluation. This covers the common use case where the identifier is constructed once and then the equality check is performed many times on it. In fact, this is the implementation used by the case-insensitive package.

However, we lost the nice symmetric data constructor and pattern match we had in the previous solution. We want to hide the internal details of Identifier record from the usage sites. So we are forced to provide a smart constructor makeIdentifier instead and hide the Identifier data constructor. This also means we cannot pattern match on the Identifier data constructor anymore. The only way to deconstruct an identifier is via the unIdentifier function which is less expressive than the previous solution.

PatternSynonyms to the rescue

This is where pattern synonyms come in. We can use a pattern synonym to define this “missing” pattern. Here is the syntax:

{-# LANGUAGE PatternSynonyms #-} data Identifier = MkIdentifier { unIdentifier :: Text , folded :: Text } instance Eq Identifier where (==) = (==) `on` folded pattern Identifier :: Text -> Identifier pattern Identifier s <- MkIdentifier s _ where Identifier s = MkIdentifier s (Text.toCaseFold s)

There are two components to this pattern synonym. First, we define how to use Identifier in pattern matches. The part:

pattern Identifier s <- MkIdentifier s _

tells GHC to replace a pattern match on Identifier s with MkIdentifier s _ . For e.g., a case expression like this:

case ident of Identifier s -> ...

gets translated to:

case ident of MkIdentifier s _ -> ...

The second part:

Identifier s = MkIdentifier s (Text.toCaseFold s)

tells GHC to replace all occurrences of Identifier s in expressions with the corresponding MkIdentifier s (Text.toCaseFold s) expression. For e.g., a let expression like this:

let ident = Identifier s in ...

gets translated to:

let ident = MkIdentifier s (Text.toCaseFold s) in ...

Here is how the complete solution:

module Identifiers ( Identifier , pattern Identifier ) where import Data.Text (Text) import qualified Data.Text as Text import Data.Function (on) data Identifier = MkIdentifier { unIdentifier :: Text , folded :: Text } instance Eq Identifier where (==) = (==) `on` folded pattern Identifier :: Text -> Identifier pattern Identifier s <- MkIdentifier s _ where Identifier s = MkIdentifier s (Text.toCaseFold s)

The MkIdentifier data constructor and all its internal implementation details are hidden away but the usage sites still can use the nice Identifier s patterns for data construction and pattern matching. Clearly, this is a good improvement from the previous solution and retains all the performance benefits of it.

Note that there are other ways to declare pattern synonyms. See GHC documentation for more details. You might also be interested in the Data.Sequence module that makes good use of pattern synonyms along with the ViewPatterns extension.