Do object-relational mappers (ORMs) really improve application development?

When I started developing web applications, I used perl. Not even all of perl, mostly just a bunch of “if” statements and an occasional loop that happened to be valid perl (aside: I remember being surprised that I was allowed to write a loop that would run on a shared server, because “what if it didn’t terminate?!”). I didn’t use databases; I used a mix of files, regexes to parse them, and flock to control concurrency (not because of foresight or good engineering, but because I ran into concurrency-related corruption).

I then made the quantum leap to databases. I didn’t see the benefits instantaneously[1], but it was clearly a major shift in the way I developed applications.

Why was it a quantum leap? Well, many reasons, which are outside of the scope of this particular post, but which I’ll discuss more in the future. For now, I’ll just cite the overwhelming success of SQL over a long period of time; and the pain experienced by anyone who has built and maintained a few NoSQL applications[2].

I don’t think ORMs are a leap forward; they are just an indirection[3] between the application and the database. Although it seems like you could apply the same kind of “success” argument, it’s not the same. First of all, ORM users are a subset of SQL users, and I think there are a lot of SQL users that are perfectly content without an ORM. Second, many ORM users feel the need to “drop down” to the SQL level frequently to get the job done, which means you’re not really in new territory.

And ORMs do have a cost. Any tool that uses a lot of behind-the-scenes magic will cause a certain amount of trouble — just think for a moment on the number of lines of code between the application and the database (there and back), and imagine the subtle semantic problems that might arise.

To be more concrete: one of the really nice things about using a SQL DBMS is that you can easily query the database as though you were the application. So, if you are debugging the application, you can quickly see what’s going wrong by seeing what the application sees right before the bug is hit. But you quickly lose that ability when you muddy the waters with thousands of lines of code between the application error and the database[4]. I believe the importance of this point is vastly under-appreciated; it’s one of the reasons that I think a SQL DBMS is a quantum leap forward, and it applies to novices as well as experts.

A less-tangible cost to ORMs is that developers are tempted to remain ignorant of the SQL DBMS and the tools that it has to offer. All these features in a system like PostgreSQL are there to solve problems in the easiest way possible; they aren’t just “bloat”. Working with multiple data sources is routine in any business environment, but if you don’t know about support for foreign tables in postgresql, you’re likely to waste a lot of time re-implementing similar functionality in the application. Cache invalidation (everything from memcache to statically-rendered HTML) is a common problem — do you know about LISTEN/NOTIFY? If your application involves scheduling, and you’re not using Temporal Keys, there is a good chance you are wasting development time and performance; and likely sacrificing correctness. The list goes on and on.

Of course there are reasons why so many people use ORMs, at least for some things. A part of it is that application developers may think that learning SQL is harder than learning an ORM, which I think is misguided. But a more valid reason is that ORMs do help eliminate boilerplate in some common situations.

But are there simpler ways to avoid boilerplate? It seems like we should be able to do so without something as invasive as an ORM. For the sake of brevity, I’ll be using hashes rather than objects, but the principle is the same. The following examples are in ruby using the ‘pg’ gem (thanks Michael Granger for maintaining that gem!).

First, to retrieve records as a hash, it’s already built into the ‘pg’ gem. Just index into the result object, and you get a hash. No boilerplate there.

Second, to do an insert, there is a little boilerplate. You have to build a string (yuck), put in the right table name, make the proper field list (unless you happen to know the column ordinal positions, yuck again), and then put in the values. And if you add or change fields, you probably need to modify it. Oh, and be sure to avoid SQL injection!

Fortunately, once we’ve identified the boilerplate, it’s pretty easy to solve:

# 'conn' is a PG::Connection object def sqlinsert(conn, table, rec) table = conn.quote_ident(table) rkeys = rec.keys.map{|k| conn.quote_ident(k.to_s)}.join(",") positions = (1..rec.keys.length).map{|i| "$" + i.to_s}.join(",") query = "INSERT INTO #{table}(#{rkeys}) VALUES(#{positions})" conn.exec(query, rec.values) end

The table and column names are properly quoted, and the values are passed in as parameters. And, if you add new columns to the table, the routine still works, you just end up with defaults for the unspecified columns.

I’m sure others can come up with other examples of boilerplate that would be nice to solve. But the goal is not perfection; we only need to do enough to make simple things simple. And I suspect that only requires a handful of such routines.

So, my proposal is this: take a step back from ORMs, and consider working more closely with SQL and a good database driver. Try to work with the database, and find out what it has to offer; don’t use layers of indirection to avoid knowing about the database. See what you like and don’t like about the process after an honest assessment, and whether ORMs are a real improvement or a distracting complication.

[1]: At the time, MySQL was under a commercial license, so I tried PostgreSQL shortly thereafter. I switched between the two for a while (after MySQL became GPL), and settled on PostgreSQL because it was much easier to use (particularly for date manipulation).

[2]: There may be valid reasons to use NoSQL, but I’m skeptical that “ease of use” is one of them.

[3]: Some people use the term “abstraction” to describe an ORM, but I think that’s misleading.

[4]: The ability to explore the data through an ORM from a REPL might resemble the experience of using SQL. But it’s not nearly as useful, and certainly not as easy: if you determine that the data is wrong in the database, you still need to figure out how it got that way, which again involves thousands of lines between the application code that requests a modification and the resulting database update.