Earley Parsing Explained

Earley parsers are among the most general parsers out there. They can parse any context free language without restriction, and can even be extended towards context sensitivity. They are reasonably fast on most practical grammars, and are easy to implement (the core algorithms take less than 200 lines of code).

On the other hand, most of the information I found about them is encoded in cryptic academese. Deciphering it is hard for non-experts (it was certainly hard for me).

This tutorial is mostly aimed at the curious and the implementer. It tries to convey an intuitive understanding of Earley parsing. Hard core theorists seeking deep math should read the papers referenced in this tutorial. Here, I just want to help you write your own Earley parsing framework.

Prerequisites

Unfortunately, I can't assume zero knowledge. To understand this tutorial, you will need a good grasp of the vocabulary around formal grammars and the Chomksky hierarchy. Notions of automata theory can also be useful.

At the very least, you should be able to implement a recursive descent parser for a simple language, such as arithmetic expressions. If you have never written one, do so now.

Table of contents