Published on 2010-09-01

How To Debug a Perl 6 Grammar

When a programmer starts to learn his craft, he spends a lot of time making small, stupid mistakes that prevent his programs from running. With a bit of practice, he learns how to make fewer errors, and write more runnable code at once.

With grammars, it's the same all over again. In the author's experience, even expert programmers start with silly mistakes when they begin to write grammars. It's just vastly different from writing ordinary code, and requires a similar learning experience.

Here are some instructions that help you to write and debug grammars.

Start with small steps

Start with small steps, and test along the way.

Start with a simple, single parsing rule, and test cases for it. Keep expanding the test cases and the grammar simultaneously. Only add more features when all tests that you expect to pass actually do.

Test rules individually

If you can't understand certain behavior, test rules individually. That way you can figure out if a rule is wrong, wrongly (or never) called, or interacts badly with other rules.

grammar MyGrammar { token TOP { ^ [ < comment > | < chunk > ]* $ } token comment { ' # ' \N* $$ } token chunk { ^^(\S+) \= (\S+) $$ } } say ? MyGrammar . parse( " #a comment

foo = bar " ) ; say ? MyGrammar . parse( " #a comment

" , : rule < comment > ) ; say ? MyGrammar . parse( " foo = bar " , : rule < chunk > ) ;

The example above shows a simple grammar that doesn't match a test string, due to a stupid thinko. The last two lines test the rules individually, identifying token chunk as the faulty one.

Debug with print or say

Just like ordinary code, you can sprinkle your grammar rules with calls to say() . You just need to embed them in curly braces, so that they get executed as ordinary code.

grammar MyGrammar { token chunk { { say " chunk: called " } ^^ { say " chunk: found start of line " } (\S+) { say " chunk: found first identifier: $0 " } \= { say " chunk: found = " } (\S+) $$ } } say ? MyGrammar . parse( " foo = bar " , : rule < chunk > ) ;

You can see that the rule matched the start of the line, and foo , but not the equals sign. What's between the two? A space. For which there is no rule to match it. Making chunk a rule instead of a token fixes this problem.

Remember that backtracking can cause a single block to be executed multiple times, even if not part of a quantified construct.

$ perl6 -e '"aabcd" ~~ /^ (.*) { say $0 } b /' aabcd aabc aab aa

Be careful with backtracking control

Programmers who are familiar with Perl 5 regexes or similar regex engines are used to backtracking: If the "most obvious" way to match a string does not work out, the regex engine tries all possible other ways.

This is what many expect for small regexes, but when writing a grammar that has several nesting levels, it can be deeply confusing.

Most day-to-day parsing problems can be formulated in a way that requires little or no backtracking, and it should be done that way, both for efficiency and programmer sanity.

Some constructs are easier with backtracking, but if you use them, embed them in a non-backtracking rule (ie token or rule , which have the :ratchet modifier implicitly set):

rule verbatim { ' [% ' ~ ' %] ' verbatim :! ratchet .*? ' [% ' endverbatim ' %] ' }

This uses backtracking inside the regex, but once it found a possible match, it will never try another, because here verbatim is a rule, which (like token) suppresses backtracking into itself.

Regex::Tracer for Rakudo Grammars

Jonathan Worthington's excellent Regex::Tracer module in the Regex::Grammar distribution is a very useful tool for debugging Regexes. It is limited to Rakudo only.