Angle Brackets and Curly Braces

Code Generation: The Real Lesson of Rails

by Bill Venners

March 16, 2006



Summary

In an article introducing Ruby on Rails' Active Record, Bruce Tate suggests that Java could enjoy some of the benefits of Rails by taking a wrapping rather than a mapping approach to persistence. I think this misses the point. What Rails really demonstrates is the benefit of code generation.


Bruce Tate has written an article, "Crossing borders: Exploring Active Record" published on IBM DeveloperWorks (see Resources), which introduces Active Record, the persistence layer of Ruby on Rails. In this article Tate compares Active Record's approach to Hibernate and JDO's approach by distinguishing "wrapping" from "mapping." He says:

In Hibernate, you'd usually begin development by working on your Java objects because Hibernate is a mapping framework. The object model becomes the center of your Hibernate universe. Active Record is a wrapping framework, so you start by creating a database table. The relational schema is the center of your Active Record universe.

Tate then provides a nice example introducing Active Record, and suggests:

The Java platform already boasts state-of-the-art mapping frameworks, but I now believe that it needs a groundbreaking wrapping framework. Active Record relies on language capabilities to extend Rails classes on the fly. A Java framework could possibly simulate some of what Active Record offers, but creating something like Active Record would be challenging, possibly breaking three existing Java conventions: A persistence solution should work only on a Java POJO (plain old Java object). First and foremost, it would be difficult to create properties based on the contents of a database. A domain object might have a different API. Instead of calling person.get_name to set a property, you might use person.get(name) instead. At the cost of static type checking, you'd get a class built of metadata driven from a database.

First and foremost, it would be difficult to create properties based on the contents of a database. A domain object might have a different API. Instead of calling person.get_name to set a property, you might use person.get(name) instead. At the cost of static type checking, you'd get a class built of metadata driven from a database. A persistence solution should express configuration in XML or annotations. Rails bucks this trend through forcing naming conventions with meaningful defaults, saving the user an incredible amount of repetition. The cost is not great because you can override defaults as needed with additional configuration code. Java frameworks could easily adopt the Rails convention-over-configuration paradigm.

Rails bucks this trend through forcing naming conventions with meaningful defaults, saving the user an incredible amount of repetition. The cost is not great because you can override defaults as needed with additional configuration code. Java frameworks could easily adopt the Rails convention-over-configuration paradigm. Schema migrations should be driven from the persistent domain model. Rails bucks this convention with migrations. The core benefit is the migration of both data and schema. Migrations also allow Rails to break the dependence on a relational database vendor. And the Rails strategy decouples the persistence strategy from the issue of schema migrations. In each of these cases, Rails breaks long-standing conventions that Java framework designers have often held as sacred. Rails starts with a working schema and reflects on the schema to construct a model object. A Java wrapping framework might not take the same approach. Instead, to take advantage of Java's support for static typing (and the advantages of tools that recognize those types and provide features such as code completion), a Java framework would start with a working model and use Java's reflection and the excellent JDBC API to dynamically force that model out to the database.

The Real Lesson of Rails

My observation is that the main technique Rails uses to improve developer productivity is code generation, even though Ruby's dynamic nature makes the code generation less obvious. The technique is also called metaprogramming, which simply means writing programs that write programs. Nevertheless, code generation is an old technique, and one I've used several times throughout my career to improve productivity.

For example, back in the late 1980s I was working on a project in C that used a proprietary API to send SQL to an Informix database. There was one C file that we used as a layer between our application and the database. That C file defined C structures for each table, and functions that used SQL to store and retrieve those structures to and from the database. We updated our database schema every time we did a new release, which was around twice a year. So twice a year I found myself editing the SQL schema file that created the tables and also editing that C file that served as the layer. It dawned on me that all the information needed to generate the C file was contained in the SQL schema file, so I wrote a Yacc/Lex program that parsed the SQL schema file and generated the C file (i.e., it generated the database layer for our application). We used this generator again and again over subsequent years, and was in retrospect a good investment. It saved us time, because it made the changes to the C file, and once I got the bugs out of the generator, the generator never made a mistake.

That old Yacc/Lex program demonstrated what Bruce Tate would refer to as a "wrapping" approach, because the C structures were based on the database schema. However, to me what is important is simply that I'm expressing my intention in one place, and using a tool to generate other pieces of my system that can be determined from that one specification. In the case of my Yacc/Lex tool, the specification was the SQL file that contained all the create table commands that defined our database schema.

Another place to express intent is in a Domain Specific Language (DSL) or "little language." In our new architecture at Artima, for example, as one step of our build we generate Java code from little programs we write in DSLs we created using JavaCC. We generate major portions of our controllers and entity layer that way. It minimizes the amount of code we write, because we express our controllers and entities in a concise DSL, and once we get the generator working, it never makes a mistake when writing the Java code. In the case of entities, we use Hibernate to do O/R mapping. Our entity generator creates POJOs, manager classes that have CRUD and other persistence methods, Hibernate XML mapping files, database triggers, and some SQL that generates database sequences. We use Hibernate's SchemaUpdate tool to actually synchronize the database so it matches the schema we indirectly specify in our DSL scripts.

Static versus Dynamic Code Generation

Ruby's dynamic features make Active Record look a bit different than what I've done in the past with code generation in C, C++, and Java. In Rails, code is effectively generated at runtime rather than pre-compile time. One thing we do in our controller generator, for example, is pull out the request parameters and put them in instance variables. This saves us time because we never have to write code to extract the parameters, they are already in instance variables that we can just use from our controllers. Rails does a similar thing by adding an instance variable to the controller dynamically for each request, and initializing it with the parameter's value. Rails does this dynamically to each controller object it creates to handle a request, after the request comes in.

One difference between static and dynamic code generation for the developer is that in the Rails case, you don't have to wait for static code generation before trying a change, and you don't have to look at the generated code. In my opinion, Rails applications feel like they require so much less code in great part because the generated code is hidden (but also because Ruby is concise in general). In our case at Artima, we see lots of generated .java files lying around in the midst of our project. On the other hand, we generate nice JavaDoc comments with our generated code, and so we get nice API documentation to look at if we want. If there's a problem or a question, we can go look at the generated code too. In Rails, it is just kind of magic. That makes it feel very lightweight, but if you ever encounter a problem with that magic, it might be more painful to solve it.

The other main difference for the developer is that static code generation is not part of the build cycle in Rails. This is a two-edged sword, though, because when doing the example programs in Rails at least, everything is quite fast. I make a change, I go to the browser and immediately try it. (Doing simple examples is as far as I've gotten with Rails.) As the number of database tables grows, there may be some perceptible lag time each iteration waiting for all the dynamic code generation to take place.

To me the important lesson for the Java community to take away from Rails is that you should consider using code generation where appropriate. I think doing code generation is something you have to be careful about, however. If you make a code generator, then you have to pay time up front building the generator, and you have to support the generator thereafter. It makes sense when you're going to get a good payback, which means you'll be using it regularly. It also helps if you can use an existing tool to generate code. For example, you could use Ruby on Rails. Or, you could use existing Hibernate tools to reverse engineer mapping files and POJOs from existing database schemas.

A database layer is often a good candidate for code generation, because these layers need to be updated whenever the database schema changes. If you only need to do something once, then writing a code generator is probably a bad idea. And even if you think you'll be doing this over and over, until you've done it several times by hand you probably don't know enough to automate. In our current architecture effort, we probably built about a dozen controllers by hand, and a dozen entities, before we felt we knew enough to automate.

I think programmers often don't like code generators that come from the outside, because like any framework they will often only take you 90% of the way you need to go. After that you start fighting the framework, which in a code generator often means you want to tweak the generated code. But that usually defeats the purpose of the code generator, which is to give you huge leverage. An exception are things like the scaffold generation in Rails, which is intended to just be a quick start that you edit and carry forward. Our in-house code generators are the kind where we aren't supposed to touch the generated code, and we solve the 90% problem because when we need to change the generated code, we change the code generator (since we wrote it, we can change it).

I think dynamic languages like Ruby and Python, because they make metaprogramming easy, push people in this direction of writing software that creates software, and that's a great thing. However, I think that with tools like JavaCC and ANTLR, it is relatively easy to define a code generator for Java. Authors of Java frameworks could use this technique, or you can use it yourself on individual projects. If you find yourself frustrated doing repetive programming tasks in Java, the problem may not be with Java but with how you are using it. That frustration could be a signal that you should consider adding some automation via code generation. Resources

Bruce Tate's IBM DeveloperWorks article is entitled, "Crossing borders: Exploring Active Record"

http://www-128.ibm.com/developerworks/java/library/j-cb03076/

Active Record documentation:

http://api.rubyonrails.com/classes/ActiveRecord/Base.html

Wikipedia on metaprogramming:

http://en.wikipedia.org/wiki/Metaprogramming_(programming)

Wikipedia on metaprogramming:

https://javacc.dev.java.net/

Hibernate is a popular object/relational mapping framework for Java:

http://hibernate.org

JavaCC is a parser generator for Java:

https://javacc.dev.java.net/

ANTLR is a parser generator written by Terrence Parr:

http://www.antlr.org/

SchemaUpdate is the tool we use to synchronize our database schema with our generated Hibernate mapping files in Artima's new architecture:

http://www.hibernate.org/hib_docs/v3/api/org/hibernate/tool/hbm2ddl/SchemaUpdate.html

Eric Gamma talked about the dangers of building and using frameworks, which he called "frameworkitis," in this interview:

http://www.artima.com/lejava/articles/reuse3.html

Talk Back!

Have an opinion? Readers have already posted 42 comments about this weblog entry. Why not add yours?

RSS Feed

If you'd like to be notified whenever Bill Venners adds a new entry to his weblog, subscribe to his RSS feed.

About the Blogger

Bill Venners is president of Artima, Inc., publisher of Artima Developer (www.artima.com). He is author of the book, Inside the Java Virtual Machine, a programmer-oriented survey of the Java platform's architecture and internals. His popular columns in JavaWorld magazine covered Java internals, object-oriented design, and Jini. Active in the Jini Community since its inception, Bill led the Jini Community's ServiceUI project, whose ServiceUI API became the de facto standard way to associate user interfaces to Jini services. Bill is also the lead developer and designer of ScalaTest, an open source testing tool for Scala and Java developers, and coauthor with Martin Odersky and Lex Spoon of the book, Programming in Scala.

This weblog entry is Copyright © 2006 Bill Venners. All rights reserved.