The Marpa parser

Marpa is a parsing algorithm. It is new, but very much based on earlier work by Jay Earley, Joop Leo, John Aycock and R. Nigel Horspool. Marpa is intended to replace, and to go well beyond, recursive descent and the yacc family of parsers.

Marpa is fast. It parses in linear time: all the grammar classes that recursive descent parses; the grammar class that the yacc family parses; in fact, any unambiguous grammars, with a couple of exceptions that are not likely to be an issue in practice (see quibbles); and all ambiguous grammars that are unions of a finite set of any of the above grammars.

Marpa is powerful. Marpa will parse anything that can be written in BNF. This includes any mixture of left, right and middle recursions.

Marpa is convenient. Unlike recursive descent, you do not have to write a parser -- Marpa generates one from BNF. Unlike PEG or yacc, parser generation is unrestricted and exact. Marpa converts any grammar which can be written as BNF into a parser which recognizes everything in the language described by that BNF, and which rejects everything that is not in that language. The programmer is not forced to make arbitrary choices while parsing. If a rule has several alternatives, all of the alternatives are considered for as long as they might yield a valid parse.

Marpa is flexible. Like recursive descent, Marpa allows you to stop and do your own custom processing. Unlike recursive descent, Marpa makes available to you detailed information about the parse so far -- which rules and symbols have been recognized, with their locations, and which rules and symbols are expected next.

Learning about Marpa

What you are looking at is the web site maintained by the author of Marpa (Jeffrey Kegler). It is NOT the best page for starting to learn about Marpa. Good places to do that are:

Marpa's official starting page, which is maintained by Ron Savage.

The documentation of Marpa::R2, Marpa's current stable release.

Other Marpa resources

Discussion of Marpa currently centers around the "marpa parser" Google Group and the IRC channel: #marpa on irc.freenode.net .

Most of the posts on Ocean of Awareness, my blog, are about Marpa. To get oriented in my blog, start at its annotated list of the most interesting Marpa posts.

If you are interested in tutorials,

My blog contains several tutorials.

Peter Stuifzand has written another as part of the Marpa Guide.

And amon has written this one for Stackoverflow.

Supporting Marpa

Marpa is supported by donations:

Donate to Marpa via patreon.com

Donate to Marpa via paypal.me

This is the most convenient way to make a one-time donation.

The name of the paypal.me account is that of the next version of Marpa: Kollos.

("Marpa" and my own name were taken.)

Click through and you'll see the "Carmel, CA" address

and a picture of me in a hat.

Theory

For those interested in the mathematics behind Marpa, there's a paper with pseudocode, and proofs of correctness and of my complexity claims.

Marpa internals

Libmarpa is a C library, and is the core of Marpa.

Marpa internals: These are resources of interest only to those working on the internals of Marpa itself -- "bleeding edge" documentation, etc.

Quibbles

I mentioned above that Marpa parses unambiguous grammars in linear time, with a couple of exceptions, and claimed that those were unlikely to be bothersome in practice. Here are the details.

For an unambiguous grammar to be parsed in linear time, it must

be free of unmarked middle recursions; and

be free of ambiguous right recursions.

Unmarked middle recursions?

The marker of a middle recursion is anything that allows the parser to find the middle. It is possible to represent a halting Turing computation as a marker, so that the general problem of finding any possible marker is, in fact, undecidable. But that's not something you are likely to want to do in practice. For practical purposes, if you can spot the middle by eyeball, the middle recursion is "marked". If you can't, the middle recursion might be unmarked.

Ambiguous right recursions

Right recursive symches are very easy to avoid. You just rewrite the rules so that they recurse on different symbols. Preserving the semantics is no problem in this case -- you simply make sure both of the new symbols have the same semantics as the original one.