Big News In Databases — Fall 2018

Don’t fall behind! Here’s the most important database news from the last few months.

The Revival of Great SQL Ideas

The introduction of SQL features into NoSQL systems is a recurring pattern these days. Apache Flink, for example, has just introduced the match_recognize clause. Couchbase also follows this path as it announced the introduction of SQL window function into the next version of their product. However, SQL is more than a query language with powerful features. Constraints, transactions and transaction isolation are also integral parts of SQL. In this issue, I’d like to focus on how NoSQL systems try to achieve the consistency guarantees that SQL databases provide.

To give you some background, I present you with two quotes from the article “Thank You for Your Help NoSQL, but We Got It from Here” that Rick Negrin of MemSQL published in July:

Eventually, the large scale cloud services (Google, Facebook, Yahoo, Microsoft, and others) all built their own custom systems to handle scale demand.

In this sentence, there is a very important message that is easy to miss: these companies build tailor-made systems for their very specific needs. If you have the resources to re-implement a database for your very specific needs, you will clearly build one that is better at meeting those needs than a generic off-the-shelf product…and of course you will leave out all the functionality you don’t need.

Then, those companies started to talk about their solutions.

Eventually open source systems started popping up using those same ideas, and the NoSQL movement was born.

You might know the rest of the story: the NoSQL vendors pushed their solutions as the future of databases. Tech people tend to love everything that is new, especially when it is backed by a convincing buzzword like NoSQL.

Unfortunately, many NoSQL users have missed the point: each of those systems was meant to solve one very specific problem, and only that problem. NoSQL systems are not generic databases that happen to outperform the competitors for one use case. Instead, they are “one-trick ponies” that can do only one thing. Even though they can do that one trick very well, they are not very helpful for other things.

Back to the year 2018. Today we can observe that the NoSQL vendors are teaching more tricks to their ponies—mostly tricks SQL databases have been able to do for ages. The introduction of multi-document transactions to MongoDB 4.0—even though it is rather limited—is probably the best example from the past few months. In general, NoSQL systems have had a hard time regarding consistency. The current trend is that even distributed databases are taking strict consistency more seriously again.

Google Spanner is such a system that values consistency over availability. Of course, Google started to talk about Spanner and eventually (or inevitably?) open source systems started popping up using those same ideas. Déjà vu? Well, it’s basically what I quoted above. Once again, history repeats itself.

So, what’s going wrong this time? This is the question that Daniel Abadi explores in his article “NewSQL database systems are failing to guarantee consistency, and I blame Spanner”. The article gives a brief introduction into why strong consistency is desirable, why NoSQL systems gave up on it in the first place, and how some more recent systems strive for strong consistency in a distributed environment. The bottom line is that systems that try to re-implement the ideas from Google Spanner cannot give the same guarantees because Spanner uses highly precise time sources such as atomic clocks to achieve consistency. If you use the Spanner ideas without such a time source, you cannot reach the same level of consistency and availability that Spanner does. According to Abadi, that is the mistake some systems make.

On My Own Behalf: New Training Dates 2019 A lot has happened since SQL-92. My week-long training is the update for developers. Check it out now.

Once more the question arises: which database to use, especially when you consider that there is an agreement that strong consistency is desirable? I’d say just do it the Amazon way, and use an old-fashioned SQL database.

Yes, Amazon still uses Oracle Databases. Even though their Dynamo paper was one of the reasons NoSQL became a thing, and after they introduced several other database services like Redshift, Aurora, Neptune, and just in November Timestream and Quantum Ledger.

Nevertheless Amazon uses Oracle Database to run their business. Even though they don’t want to. In August they announced their plans to get rid of Oracle Databases by 2020, which “will be hard” (Larry Ellison, a week later). On Prime Day the migration has already caused an outage in one of its biggest warehouses, but the migration makes progress nevertheless and was the confirmed target as of late November. It seems like it takes more than a decade to build enough one-trick ponies to replace a traditional SQL databases. In the end, they will still use SQL—just not the Oracle Database.

The lesson we can learn from this story is the following: start with a generic database. In retrospect, this is what most successful companies did. SQL database are a good choice because they can do many tricks. You will need those tricks, especially when your requirements are volatile.

When you eventually get to the point where the database reaches its limits, if that ever even happens, optimize first (seek professional help!), then identify the single issue that causes the biggest pain. By now, chances are good there is a ready-to-use one-trick pony that can handle that pain point better than the generic database. Offload this pain point to the one-trick pony (seek more professional help!). If needed, repeat. Don’t give the flexibility and consistency of SQL up until you absolutely have to.

The modern and successful architecture that is commonly used today is to have an SQL database that is sometimes surrounded by some one-trick ponies to take care of a few pain points. The contribution the NoSQL movement has made to this architecture is that you don’t need to build the one-trick ponies by yourself like the first movers had to.

Cloud Offering: Trade Your Hardware for Security Updates

Like in any other business, cloud providers have creative ways to convince potential customers to use their services. I found the one from Microsoft in July particularly funny. As the end-of-life for the releases 2008 and 2008R2 of Windows and SQL Server approaches, Microsoft offers three more years of free security patches if those systems are migrated to the Azure cloud. Instead of paying for new licenses, you can choose to pay for “rented hardware” in the Azure cloud. Isn’t that tempting?

Technology and Science

Just in Time (JIT) compilation is currently a hot topic for SQL databases. The first major database that introduced this technology was Microsoft SQL Server 2014 (aka Hekaton). But Oracle also has a big bet running on its Java JIT implementation Graal, and the latest PostgreSQL release introduced JIT compilation into a free open-source database.

Although JIT compilation can improve performance considerably, vendors have to put some additional effort into maintenance in order to offer it. That’s why two scientists from the Netherlands tried to automate some parts of this process. They used a part from the SQLite source code—the Virtual Database Engine—for automatic JIT compilation.

Unrelated to that, yet another interesting paper about compiling queries was published in 2015: Functional Pearl: A SQL to C Compiler in 500 Lines of Code.

Stay Updated Subscribe my blogs via e-mail at MailChimp.

Historic Document

For those interested in the early days of SQL, here is an interesting read for you: The 1995 SQL Reunion: People, Projects, and Politics. These are the notes from a meeting of people involved with the development of SQL and its products in the 1970s and 1980s.

It covers a wide variety of topics, such as the fact that a day-long discussion took place to decide which keyword to use for the null value. But my favorite part is this:

Don Chamberlin: Larry called up. Larry’s company in those days was not called Oracle. [...]. He had heard about the System R prototype and he wanted to make sure that his product was fully compatible with it, right down to the error code values. We went and asked Frank, “Can we give our error codes to this guy Ellison” and Frank said, “No - those are IBM Confidential.” Franco Putzolu: That was the only part that was confidential.

In the meanwhile, the SQL standard has come to define error codes—it is just that barely any vendors follow them.

New Major Releases

There have been some major releases in recent months:

Note that MySQL now follows an “agile” release cycle: minor release can also introduced new features. For example: Instant Add Column (8.0.12), function-based indexes (directly, in create index; 8.0.13).

Microsoft has also released a public preview of SQL Server 2019.

New Recordings and Slides

Some of my conference sessions were recorded and you can find them below. Also, have a look at my conference schedule to see when and where I’m speaking next.

From Twitter, Keeping it Brief (follow me on Twitter)

SQL Renaissance Ambassador

As the SQL Renaissance Ambassador, it’s my mission to make developers aware of the evolution of SQL in the 21st century. My book “SQL Performance Explained” has been published in five languages and can be read online free of charge at use-the-index-luke.com. My next book is currently in the works and can already be read online as it’s being written (modern-sql.com). I am available as a trainer, speaker and consultant for all companies and developers interested in SQL. You’ll find more info at winand.at.