Visualization of Ruby's Grammar

Posted by Nick Sieger

As part of the momentum surrounding the Ruby implementer’s summit, I have decided to take on a pet project to understand Ruby’s grammar better, with the goal of contributing to an implementation-independent specification of the grammar. Matz mentioned during his keynote how parse.y was one of the uglier parts of Ruby, but just how ugly?

Well, judge for yourself. Below is a grammar dependency graph generated using ANTLRWorks and GraphViz. The steps I took are as follows. I took parse.y, stripped all C definitions, code and actions from it to give a bare YACC definition. Next, I did the equivalent of gsub(/[kt]([A-Z]+)/, '1') (since ANTLR’s convention is to have lexer tokens named starting with a capital letter). I then used the Bison-to-ANTLR converter to generate an ANTLR 2.x grammar, which I hand-modified to produce a v3 grammar. Opening the resulting grammar in ANTLRWorks allows you to generate a DOT file from which GraphViz can then generate a jpeg image. I’ve also included visualizations of the Java 1.5 and Javascript (ECMAScript) grammars for comparison.

I haven’t even begun to absorb all the meanings from this picture, but one stark difference between Ruby and the other two is the node in the middle of the picture with a high concentration of outgoing edges. That node is called primary in the grammar definition, and it is probably one of the reasons that Ruby syntax is so flexible and forgiving. A primary node’s direct children apparently represent a large portion of the syntax, and explain why in Ruby a single statement can either be a literal, a method invocation (or series of them), a standalone expression (such as a < b ), all the way up to larger syntactic groupings such as if ... else ... end and begin ... rescue ... end , among many others.

Ruby

Java 1.5

Generated from Java 1.5 grammar on antlr.org

Javascript

Generated from ECMAScript grammar on antlr.org