For the last few days I’ve been struggling to bend Rails to my will regarding the proper way to assure data consistency. Today I made some progress. This builds upon some research I did a few months ago, and hopefully this is a more or less complete solution to the problem of making Rails work the way I want it to regarding test databases.



DHH has clearly stated that he does not like a smart database. This is common among application developers, particularly in the agile methods camp, in that they generally appear not to understand relational set theory, or if they do, they believe that it is inherently inferior to object oriented methods (which lack a theoretical basis, as Fabian Pascal will happily shout at anyone who will listen). I gather from DHH’s statements that he merely is trying to practice Don’t Repeat Yourself (a.k.a. DRY, one of the most important values of Rails). I gather from Rails itself that he either respects the need of some folks to disagree with him enough to provide hooks to bypass ActiveRecord, or that he at least agreed with someone else’s patch. By this I mean that there are ways around the ORM features of ActiveRecord, to do raw SQL and to execute raw DDL at database creation time, which implies that he isn’t trying to force his opinions on others, but rather to make it easier to do things his way than to do them a different way.

Fair enough. Rails is opinionated software, as DHH often says, and I have found several cases where letting go of my particular way of doing things has been fine, given that Rails has a different but equally valid way of doing things that is made super easy by the framework.

However, I disagree with his decision to keep the DB stupid, for two reasons.

First, I prefer to put logic where it belongs, rather than gathering it all in one place. AJAX, and in particular Google Maps, is a good example of presentation logic going where it belongs, making the whole application work better. SQL RDBMSs have features that can be abused, and in some cases these features are there because a wrong-thinking but wealthy client demanded them, but most of the advanced features that a “Real Database” has are there so that you can protect yourself against data loss or data corruption. The database is in a unique position to let you declare rules for things that must always be true, and then to trust that the database will never violate those rules. Older versions of MySQL were notably lacking in these features and their absence was justified by MySQL staff who basically said “you don’t need that, and if you want it, you’re confused.” Rails has inherited some of these damaged assumptions from MySQL, leaving basic relational features like foreign keys out of the framework(!). Fortunately Rails allows plugins, and there is a set of foreign key plugins that overturn this decision. But in general, if the database belongs to your application, that’s not an excuse to move database functionality into application code. By calling it your application’s database (as opposed to an Integration Database) you imply that it is part of your application, and therefore any rules or procedural code in it is necessarily also part of your application. You can’t monopolize the database and say that no one else has any business using it, while at the same time holding it at arm’s length and saying it’s not a valid part of the application. It is. No, business rules probably don’t belong in the database, but basic data consistency maintenance (in rule or procedural form) does.

I’m being charitable here, but my experience with individual practitioners of the Stupid Database Method invariably ends with me finding out that they don’t really understand databases at all (hence the desire to abstract the database away entirely with a driver plugin architecture topped by an ORM layer, lest they have to understand how a specific database product works), and would rather remain ignorant and reinvent the same functionality in the application layer or in the ORM layer. (It’s a case of “when all you have is a hammer, everything looks like a nail”, where the hammer is a general-purpose programming language, and you’re looking at a problem of high performance concurrent transactional programming.)

Not surprisingly, the database of choice for these folks is the least featureful, lowest cost, easiest to install one available. Because naturally it’s much more agile to write and debug new multithreaded transactional code in a high level dynamic language. than it is to get the same functionality for free in a thoroughly tested product that’s written in C. Right? Perhaps DHH is not one of these people. I assume he is not, again based on what he has said and coded. But nevertheless, the folks I’ve talked to personally who agree with his point of view are all coming from a point of view of willful ignorance.

Secondly, I prefer to employ defense in depth against data errors. Transient errors can have workarounds, but data errors are permanent, and that means that if your data is valuable, the damage done can be irreversible. Just because it’s possible for correct application code to avoid race conditions, improper escaping, etc. doesn’t mean that you should put all your eggs in that basket. When the price of data corruption is high (i.e. if you value the data in your database) then it’s worth the duplication of effort: test the application code, but also put a constraint in the database that will catch things the application code missed.

This is the same sort of thinking that leads to using automated unit tests, then functional tests, then integration tests, and then some manual QA, all overlapping. Duplication of effort? Yes. Worth it? Yes. Database bugs are arguably the worst kind of bugs to find in production, so they merit extra code that maybe isn’t absolutely necessary for the application to work, but is nice to have since you’d like to sleep at night.

So, I feel justified in wanting to put CHECK constraints and triggers in my database.

The implementation details are discussed in part 2.