Since the rise of the Web, SQL-based relational databases have been the dominant structured storage technology behind online applications. The past few years have seen the emergence of the cloud as a compelling environment for online application development, bringing true utility computing into the infrastructure pantheon. But the cloud and SQL do not mix well, and multiple efforts are now underway to offer viable alternatives to the venerable database. In this article, I'll review the forces that have led to this shift, and I'll argue that while relational databases are by no means doomed, they will soon be joined in the cloud, and possibly out-shined by, new non-relational database technologies.

The trouble with SQL

For most developers, SQL-based relational databases work just fine. Support for SQL is extremely broad, setup has become reasonably straightforward, there are plenty of resources to help with management, and modern hardware allows a single machine to handle a lot of transactions quickly. For smaller projects, SQL databases can offer something close to a plug-and-play storage environment. But there are weaknesses, and for some teams these are big problems.

Foremost of the weaknesses of relational databases is their inability to scale horizontally. Some database packages allow teams with large budgets to scale vertically, to a point, using expensive "big iron" hardware, but others (most notably MySQL) run into architectural limitations long before the hardware is exhausted. (See page two of my prior article for a brief discussion.) Either way, there's a ceiling there.

Despite steady improvement in the field of clustered databases, in the relational world these remain fairly limited both in feature-set and in scalability. We can confidently say that these limitations will not go away any time soon: Brewer's Theorem (aka the CAP Theorem), demonstrated in 2002 by Gilbert & Lynch, says in effect that a system cannot have high Consistency, Availability, and Partition Tolerance simultaneously. SQL offers a variety of strict consistency guarantees (both ACID transactional semantics and data-integrity tools such as foreign keys), and for online applications, high availability is a must. Given this, partition tolerance—in effect meaning the system's ability to withstand internal latency and failures—must be low, limiting the size of any reliable clustered database technology with SQL's semantics.

Additionally, managing relational databases in a production environment can become labor intensive and error-prone. Each database package comes with its own world of configuration options, performance sensitivities, bugs, and tools. While these issues usually start small, they can become a drain on developers' time and resources as the product matures and its needs become more complex. This complexity of management arises from the complexity of the database packages themselves; it is their very breadth of capabilities which makes them difficult to manage.

Finally, SQL encourages (but does not require) developers to perform data processing in the database itself, in addition to data storage. Much of the time, the easiest way to map two tables together is to use a JOIN, and the easiest way to sort the results is with an ORDER BY, and so forth. Doing so adds load to the database's CPU, often a precious resource, while saving load on the application host—a bad trade-off that leads more quickly to the relational database's scaling wall.

These issues alone have spurred the development of relational-database alternatives. But it is the cloud which will ultimately drive their success.

The promise of utility computing

The move to the cloud is arguably the most visible force in the world of online application development. Not everyone is moving, but as I argued in my last article, the cloud is going to be an increasingly common backbone for applications. From a developer's perspective, cloud computing platforms (particularly in the up-and-coming Platform as a Service (PaaS) flavor) ideally offer infrastructure components as utility services rather than discrete units of servers running software. This simplified approach not only saves development time, but enables application scalability by offering what amounts to inexhaustible resources.

In this landscape, the conventional relational database is something of an alien. SQL itself enforces a server-centric view of the world: clients persistently connect to individual servers, each with their own namespace and no mutual awareness. Database servers are long-running and have configurations fairly specific to the hardware on which they run. Unpredictable resource contention means that sharing server resources between customers is risky beyond very small workloads. Because of this, cloud platform providers are offering relational databases as dedicated servers running on virtual machines, e.g., Amazon's MySQL-based RDS, Heroku's PostgreSQL-based database units, etc. But this approach resembles managed hosting much more than cloud computing—it is not a utility service. To offer developers truly scalable structured storage services, providers must turn away from SQL.