As a neophyte to functional programming and Haskell, with full appreciation over the succinctly outputted type errors, for a long time I felt something was remiss with the output of syntax errors. In this post I present a preliminary fix to GHC that improves in the aforementioned arena.

Like many other compilers who have not yet forsaken the annotated BNF approach of describing a syntax and parser, Haskell's prominent compiler GHC uses Happy in order to generate efficient parsing tables for the syntax.

The problem

While there are some hand-coded hints about errors being detected while compiling a module with GHC, other errors sometimes don't provide any extra information.

example 1 - source 1 2 test i = case i of 2 main = return ()

example 1 - output example1.hs:2:1: parse error (possibly incorrect indentation or mismatched brackets)

And another example:

example 2 - source 1 2 3 4 test i = i where e main = return ()

example 2 - output example2.hs:4:1: parse error (possibly incorrect indentation or mismatched brackets)

As you can see, we have the same error on both occasions. Is it possible to generate an error message that differentiates between the two, automatically? Apparently yes.

Making it happier

Thanks to the parsing tables, the parser does know which tokens can follow at the place of the error. So, I took the time to modify Happy and modify GHC to produce the following results. The gist of these changes is to allow passing the list of next possible tokens to the user-provided error function. The results are not perfect, but they are interesting nonetheless.

Let's repeat the previous examples and add some more.

example 1 - source 1 2 test i = case i of 2 main = return ()

example 1 - output example1.hs:2:1: parse error (possibly incorrect indentation or mismatched brackets), possible tokens: '|' '->'

And for the second example:

example 2 - source 1 2 3 4 test i = i where e main = return ()

example 2 - output example2.hs:4:1: parse error (possibly incorrect indentation or mismatched brackets), possible tokens: '=' '|'

A third example:

example 3 - source 1 2 3 4 data X = X { bla :: Int test :: Int }

example 3 - output example3.hs:3:10: parse error on input, possible tokens: '}' ‘::’

The avid reader would wonder at this point, why '::' is mentioned and ',' is not mentioned in the possible tokens list. We observe that the error is after 'Int test', probably much deeper and in a different syntax production than the one that defines each part of a record. With more time I would examine the Action and Goto tables generated by Happy from the extensive definition of the syntax in GHC.

example 4 - source 1 2 3 4 5 6 data X = X { bla :: Int , test :: Int } deriving main = return ()

example 4 - output example4.hs:6:1: parse error (possibly incorrect indentation or mismatched brackets), possible tokens: '(' CONID QCONID PREFIXQCONSYM

Here it gets more interesting, because the parser now outputs token identifiers that are not just punctuation or operators, but user-defined identifier strings. These are the same names used when defining the syntax.

example 5 - source 1 main = = return ()

example 5 - output example5.hs:1:8: parse error on input, possible tokens: '_' 'as' 'case' 'do' 'hiding' 'if' 'let' 'qualified' 'forall' 'export' 'label' 'dynamic' 'safe' 'interruptible' 'unsafe' 'mdo' 'family' 'role' 'stdcall' 'ccall' 'capi' 'prim' 'javascript' 'proc' 'group' 'static' '{-# CORE' '{-# SCC' '{-# GENERATED' '\\' '~' '-' '[' '[:' '(' '(#' '(|' SIMPLEQUOTE VARID CONID QVARID QCONID PREFIXQVARSYM PREFIXQCONSYM IPDUPVARID CHAR STRING INTEGER RATIONAL PRIMCHAR PRIMSTRING PRIMINTEGER PRIMWORD PRIMFLOAT PRIMDOUBLE '[|' '[p|' '[t|' '[d|' '[||' TH_ID_SPLICE '$(' TH_ID_TY_SPLICE '$$(' TH_TY_QUOTE TH_QUASIQUOTE TH_QQUASIQUOTE ‘=’