Today we will explore how to build a small parser combinator library in Haskell from scratch. This blog post is the result of an experiment to see if I could actually implement this by only looking at the base and text documentation, explicitly without looking at other parser implementations or examples.

I think most other Haskell parser examples will work on String s, but since String come with a lot of downsides I will try to run our parser on Text from the text package and see where that gets me. Thus, our parser will take a Text and return the leftover unparsed Text with a parsed result a or a parse error:

type ParseError = T . Text newtype Parser a = Parser { runParser :: T . Text -> ( T . Text , Either ParseError a ) }

To go on, we should implement useful type classes such as Functor , Applicative , Alternative and Monad . This will give us lots of combinators for free.

Now let’s define a Functor instance for our Parser . The complete minimal definition is fmap :: (a -> b) -> f a -> f b (apply a function a -> b to the “contained” value of f ):

instance Functor Parser where fmap f ( Parser parse ) = Parser $ \ txt -> let ( rest , result ) = parse txt in ( rest , fmap f result )

Next up is the Applicative instance which requires the definitions of pure :: a -> f a to lift a value a into the f “universe” (in our case Parser ), and (<*>) :: f (a -> b) -> f a -> f b to sequentially apply two parsers and combine the result.

instance Applicative Parser where pure val = Parser $ \ txt -> ( txt , Right val ) ( Parser funParser ) <*> continue = Parser $ \ txt -> let ( rest , result ) = funParser txt in case result of Left err -> ( rest , Left err ) Right f -> runParser ( fmap f continue ) rest

The <*> implementation is a little bit more interesting, so here is what it does in words: first, we run the left hand parser to receive the function ( a -> b ) which we must then apply to the value of the second parser. Next, we either propagate failure, or we use our previously defined Functor instance to convert the second parser Parser a to a Parser b and run that on the left over of the left hand parser.

After defining Applicative we will also implement it’s close friend and very handy Alternative :

instance Alternative Parser where empty = Parser $ \ txt -> ( txt , Left "Parsing failed!" ) ( Parser pa ) <|> otherParser = Parser $ \ txt -> case pa txt of full @ ( _ , Right _ ) -> full _ -> runParser otherParser txt

empty is just a failing Parser that does nothing - this is because we can not invent an arbitrary a . <|> will first try to run the left hand side parser, and if that succeeds then it will return the result. Otherwise, the right hand parser is run.

Now it’s time for becoming a Monad !

instance Monad Parser where return = pure fail errMsg = Parser $ \ txt -> ( txt , Left $ T . pack errMsg ) ( Parser parse ) >>= next = Parser $ \ txt -> let ( leftOver , res ) = parse txt in case res of Left errMsg -> ( leftOver , Left errMsg ) Right val -> runParser ( next val ) leftOver

With all those abstract concepts implemented we are ready to write concrete parsers. Let’s start out by writing a parser that reads input until the predicate on each subsequent character fails:

satisfy :: ( Char -> Bool ) -> Parser T . Text satisfy f = Parser $ \ txt -> let ( matches , rest ) = T . span f txt in ( rest , Right matches )

This is very simple because the text library already provides a function span :: (Char -> Bool) -> Text -> (Text, Text) that essentially does the heavy lifting efficiently for us. We also want a satisfy1 function, that requires that we read at least one character:

satisfy1 :: ( Char -> Bool ) -> Parser T . Text satisfy1 f = satisfy f >>= \ res -> do when ( T . null res ) $ fail "satisfy1 didn't read anything!" pure res

This combination gives us skileWhile and skipWhile1 for free:

skipWhile :: ( Char -> Bool ) -> Parser () skipWhile = void . satisfy skipWhile1 :: ( Char -> Bool ) -> Parser () skipWhile1 = void . satisfy1

Now we’ll write a parser for a specific single character Char and a whole string T.Text .

char :: Char -> Parser Char char c = Parser $ \ txt -> case T . uncons txt of Just ( firstC , rest ) | firstC == c -> ( rest , Right c ) _ -> ( txt , Left $ T . pack $ "Expected a " ++ show c ) string :: T . Text -> Parser T . Text string t = Parser $ \ txt -> let tlen = T . length t in if T . take tlen txt == t then ( T . drop tlen txt , Right t ) else ( txt , Left $ T . pack $ "Expected " ++ show t )

To implement parsers for Int and Double we will cheat a little and use the read :: Read a => String -> a function from base. Usually I’d go for readMay :: Read a => String -> Maybe a from the safe package, but thanks to our already defined parser combinators we can be quite sure that our read will not crash at runtime:

numStarter :: Parser T . Text numStarter = do optNeg <- optional ( char '-' ) rest <- satisfy1 isDigit pure $ maybe rest (` T . cons ` rest ) optNeg int :: Parser Int int = fmap ( read . T . unpack ) numStarter double :: Parser Double double = do firstPart <- numStarter secondPart <- optional $ do ch <- char '.' rest <- satisfy1 isDigit pure ( ch ` T . cons ` rest ) pure $ ( read . T . unpack ) ( firstPart <> fromMaybe "" secondPart )

Now is probably a good time to define some unit tests for our parsers. We use the excellent HTF package for this:

test_char :: IO () test_char = do assertEqual ( "Fooo" , Right 'c' ) ( runParser ( char 'c' ) "cFooo" ) assertEqual ( "Fooo" , Left "Expected a 'c'" ) ( runParser ( char 'c' ) "Fooo" ) assertEqual ( "" , Left "Expected a 'c'" ) ( runParser ( char 'c' ) "" ) test_string :: IO () test_string = do assertEqual ( "Fooo" , Right "cc" ) ( runParser ( string "cc" ) "ccFooo" ) assertEqual ( "Fooo" , Left "Expected \" cc \" " ) ( runParser ( string "cc" ) "Fooo" ) assertEqual ( "" , Left "Expected \" cc \" " ) ( runParser ( string "cc" ) "" ) test_int :: IO () test_int = do assertEqual ( "bar" , Right 23 ) ( runParser int "23bar" ) assertEqual ( "bar" , Right ( - 23 )) ( runParser int "-23bar" ) assertEqual ( ".bar" , Right 23 ) ( runParser int "23.bar" ) assertEqual ( "a23.bar" , Left "skipWhile1 didn't ready anything!" ) ( runParser int "a23.bar" ) assertEqual ( "" , Left "skipWhile1 didn't ready anything!" ) ( runParser int "" ) test_double :: IO () test_double = do assertEqual ( "bar" , Right 23 ) ( runParser double "23bar" ) assertEqual ( "bar" , Right ( - 23 )) ( runParser double "-23bar" ) assertEqual ( "bar" , Right 23.2 ) ( runParser double "23.2bar" ) assertEqual ( ".bar" , Right 23 ) ( runParser double "23.bar" ) assertEqual ( "a23.bar" , Left "skipWhile1 didn't ready anything!" ) ( runParser double "a23.bar" ) assertEqual ( "" , Left "skipWhile1 didn't ready anything!" ) ( runParser double "" )

Great, our basic building blocks seem to be working! As you can see the error messages our parsers produce are not quite useful (yet?), but this might be material for a possible blog post in the near future.

Now let’s write a parser for this simple data file:

language: haskell; type: functional; language: purescript; type: functional; language: java; type: oop;

We would like to parse it into the following Haskell data structures:

data LanguageType = LanguageTypeFunctional | LanguageTypeOOP deriving ( Show , Eq ) data Language = Language { l_name :: T . Text , l_type :: LanguageType } deriving ( Show , Eq ) type LangList = [ Language ]

We start off by writing parsers for the building blocks, with tests:

langType :: Parser LanguageType langType = LanguageTypeFunctional <$ string "functional" <|> LanguageTypeOOP <$ string "oop" langName :: Parser T . Text langName = satisfy1 ( \ c -> not ( isSpace c ) && c /= ';' ) test_langType :: IO () test_langType = do assertEqual ( "" , Right LanguageTypeFunctional ) ( runParser langType "functional" ) assertEqual ( "" , Right LanguageTypeOOP ) ( runParser langType "oop" ) assertEqual ( "foobar" , Left "Expected \" oop \" " ) ( runParser langType "foobar" ) test_langName :: IO () test_langName = do assertEqual ( "" , Right "haskell" ) ( runParser langName "haskell" ) assertEqual ( "" , Right "java" ) ( runParser langName "java" ) assertEqual ( " bar baz" , Right "java" ) ( runParser langName "java bar baz" )

and combining them to parse a single row:

skipVertSpace :: Parser () skipVertSpace = skipWhile ( \ c -> c == ' ' || c == ' \t ' ) lang :: Parser Language lang = do void $ string "language:" *> skipVertSpace name <- langName skipVertSpace void $ char ';' skipVertSpace void $ string "type:" *> skipVertSpace ty <- langType skipVertSpace void $ char ';' skipVertSpace pure ( Language name ty ) test_lang :: IO () test_lang = do assertEqual ( "" , Right $ Language "haskell" LanguageTypeFunctional ) ( runParser lang "language: haskell; type: functional;" ) assertEqual ( "" , Right $ Language "java" LanguageTypeOOP ) ( runParser lang "language:java; type:oop; " ) assertEqual ( "language1:!java; type:oop; " , Left "Expected \" language: \" " ) ( runParser lang "language1:!java; type:oop; " )

To write a parser for the whole file, we need to introduce two new parser combinators. sepBy will be used to parse values separated by a separator:

sepBy :: Parser val -> Parser sep -> Parser [ val ] sepBy valP sepP = do listHead <- optional valP case listHead of Nothing -> pure [] Just x -> do rest <- many ( sepP *> valP ) pure ( x : rest )

and endOfInput is a combinator to check if we consumed all input:

endOfInput :: Parser () endOfInput = Parser $ \ txt -> if T . null txt then ( txt , Right () ) else ( txt , Left "Expecting endOfInput" )

Putting it all together, our file parser (with test of course!) will look like this:

langFile :: Parser LangList langFile = ( lang ` sepBy ` char '

' ) <* skipWhile isSpace <* endOfInput test_langFile :: IO () test_langFile = assertEqual ( "" , Right langList ) ( runParser langFile sampleFile ) where langList = [ Language "haskell" LanguageTypeFunctional , Language "purescript" LanguageTypeFunctional , Language "java" LanguageTypeOOP ] sampleFile = T . unlines [ "language: haskell; type: functional;" , "language: purescript; type: functional;" , "language: java; type: oop;" ]

Success! Now this is just the beginning of a parser combinator library, there are still many areas to be explored such as nicer error messages, backtracking, performance concerns and of course more combinators! You should probably use one of the awesome parser combinator libraries out there that address these issues:

attoparsec: Addresses backtracking and performance concerns

parsec: Addresses error messages and backtracking ( try operator)

operator) megaparsec: modern version of parsec

(and many more…)

That’s all for now - a working project can be found on GitHub: agrafix/parser-playground. To build and run the tests, clone the project an run stack test .

Feel free to join the discussion on reddit or read the original paper