Refining my first steps with Parsec June 22, 2007

To start gaining experience with Parsec and Haskell I completed the first working version of a parser for SmallTalk’s version of ‘Hello World’. To follow up I decided it would be worthwhile getting advice on those parts that just didn’t feel right. Off to Haskell Cafe I went.

I had only heard good things about the Haskell community, and after my experience I have nothing to say to the contrary. My post and the follow up responses can be found here. Thanks to all those that responded.

The Questionable Code

I was mainly concerned with the double do block present in primary. For reference, here is the applicable code:

primary = do { i <- identifier; return $ PrimaryIdentifier i; } <|> do { l <- stringLiteral; return $ PrimaryLiteral l; }

identifier = do i <- many1 letter return $ Identifier i

stringLiteral = do (char '\\'') s <- manyTill anyChar (char '\\'') return $ StringLiteral s

I had a feeling that Haskell was capable of producing more readable code than this.

Avoiding do

One disadvantage of do-notation, is that it can often obscure what is actually a simple monadic operation. It is said better over at Data.SyntaxFree:

Do-notation — and the fact that it’s introduced in the context of IO — obscures the fact that monads are implemented quite simply in Haskell itself as a type-class and correspond to a simple algebraic structure that models many useful computational phenomena.

Modifying the above code so it doesn’t require do notation results in the following (Thanks to Dave Tapley):

primary = (identifier >>= (return . PrimaryIdentifier)) <|> (stringLiteral >>= (return . PrimaryLiteral)) identifier = (many1 letter) >>= (return . Identifier) stringLiteral = (char '\\'') >> (manyTill anyChar (char '\\'')) >>= (return . StringLiteral)

Although I wouldn’t consider this more readable than when using do notation, I believe this gives a better understanding of what’s going on, and ideas on simplifying it. In particular, some common constructs start to appear. The most notable of these is of the form:

parser >>= return . Constructor

Extracting the Common Construct

Thanks to the suggestions of many, there are Haskell functions that can simplify this common construct: liftM and fmap. Which one to choose is a matter of style. As is using them as infix functions. In this case I’ve decided to use an infix liftM.

primary = (PrimaryIdentifier `liftM` identifier) <|> (PrimaryLiteral `liftM` stringLiteral) identifier = Identifier `liftM` (many1 letter) stringLiteral = StringLiteral `liftM` ((char '\\'') >> (manyTill anyChar (char '\\'')))

One last improvment

The final section of code this code that could do with improvement is in stringLiteral. I have chosen to use a suggestion from Tillmann Rendel.

That suggestion is to define a new monadic combinator:

a >>~ b = a >>= \\x -> b >> return x

This combinator worse as for >> , but returns the result of the calculation on the left, rather than the right. This is best demonstrated in use:

stringLiteral = StringLiteral `liftM` (quote >> nonQuote >>~ quote) quote = char '\\'' nonQuote = many $ noneOf ['\\'']

Summary

I had thought my first attempt at this parser resulted in a simple, readable parser. I still think it did. But now the expressive power of Parsec and Haskell have surpassed my expectations.