Big News In Databases — Summer 2018

Don’t fall behind! Here’s the most important database news from the last few months.

SQL’s Rise Continues

Google’s shift towards SQL was already mentioned in my previous newsletter. This time a new article by Google further forwards an argument that Google initiated years ago: in the original Spanner paper (2012) Google said that strong consistency—like that offered by SQL databases—causes fewer problems than the eventual consistency model that is often employed by NoSQL systems. In the F1 paper (2013), Google described eventual consistency as an “unacceptable burden to place on developers”. In their recent article, Google has even put that message into the tile: Why you should pick strong consistency, whenever possible.

Moreover Amazon is not afraid to point out a few advantages when it comes to using SQL and the relational model. They tweeted this from their latest AWS re:Invent conference: SQL on the relational schema is orders of magnitude faster than JSON processing.

The marketing term NoSQL, which was the hippest buzzword just a few years back, is slowly becoming a synonym for a defect. Without SQL, there is something missing. It is no surprise that more and more systems seek to distance themselves from this term by introducing an SQL interface. Recently, a particularly well-established system opted for this path: Elasticsearch.

The adoption of SQL doesn’t stop there. Even traditional SQL databases continue to embrace SQL by introducing more modern SQL features into their dialects. Triggered by the release of MySQL 8.0, I’ve written a comprehensive article about this development.

On My Own Behalf: My Next Training A lot has happened since SQL-92. My week-long training is the update for experienced SQL users. Check it out now.

Cloud First

Oracle has completed a version number jump from 12.2 to 18 and released their new database, initially to the Oracle Cloud and to Engineered Systems only. In stark contrast to semantic versioning, this release is only at the level of a new patch. In the previous version numbering scheme, the release number would have been 12.2.0.2, i.e. only the last digit would have been incremented. From a developer’s perspective, the release notes are thus not very exciting.

Still, not everyone is so in love with the cloud. SAP demonstrated a particularly noteworthy failure. Two years after introducing their cloud solution SAP Anywhere, they cancelled this service—apparently, they only ended up with 30 customers in those two years.

Technology and Science

Of course the technology news of the past few month has been dominated by the Spectre and Meltdown security vulnerabilities. Those vulnerabilities cast a shadow on cloud solutions as they often share hardware among several customers. The newly discovered type of security vulnerabilities can break the separation so that a malicious client could spy on other clients that use the same hardware. As these weaknesses exploit a hardware bug, they are not limited to databases.

The so-called “fsyncgate” problem only affects systems that use buffered IO to store data persistently—such as some databases. At its core, the problem is that errors can go unnoticed when they occur while writing changed data back from the operating system cache to the disk. Consequently, these errors cannot be handled by the database. The problem was originally reported by Craig Ringer of 2ndQuadrant—a PostgreSQL consulting firm. The actual root cause might be an omission in the POSIX specification of the fsync() function: it does not explicitly specify the subsequent behavior after an error has been reported. Application and operating system developers therefore did not have common expectations for this scenario. Matthew Wilcox, a Linux kernel hacker working for Microsoft (!), explained this problem from the perspective of the Linux kernel at the PostgreSQL developer conference PgCon… and he apologized for the trouble.

According to the “shoot the messenger” principle, PostgreSQL has been heavily criticized for fsyncgate. Indeed, PostgreSQL suffers from this problem more than other databases as it doesn’t offer direct IO. However, other databases—and more generally all software that uses buffered IO—can be affected by this issue. MySQL with InnoDB in the default configuration on many Unix-like systems is one example.

Stay Updated Subscribe my blogs via e-mail at MailChimp.

Otherwise recent news was pretty good for PostgreSQL: DB-Engines.com named PostgreSQL the database of the year 2017. PostgreSQL also ranked first—ex aequo with Firebird—in a Code Quality Comparison of Firebird, MySQL, and PostgreSQL. With four times as many demerit points, MySQL ranks second and thus last in this comparison.

On the academic side, there was a new paper from Google about indexing—still my passion since I wrote SQL Performance Explained. The authors of this paper have put a bold claim in the abstract: “by using neural nets we are able to outperform cache-optimized B-trees by up to 70% in speed while saving an order-of-magnitude in memory”. Of course, that is only true once the network is trained. The training itself takes an extraordinary amount of resources so this idea is only applicable to mostly static data sets. It might be an interesting concept for the Google search engine, but it’s not applicable to everyday indexing.

New Major Releases

There have been some impressive releases in recent months:

New and Updated Content on Websites

From Twitter, Keeping it Brief (follow me on Twitter)

SQL Renaissance Ambassador

As the SQL Renaissance Ambassador, it’s my mission to make developers aware of the evolution of SQL in the 21st century. My book “SQL Performance Explained” has been published in five languages and can be read online free of charge at use-the-index-luke.com. My next book is currently in the works and can already be read online as it’s being written (modern-sql.com). I am available as a trainer, speaker and consultant for all companies and developers interested in SQL. You’ll find more info at winand.at.