Big News In Databases — Fall 2019

Twice a year, I review the news articles from the past six months and report on the most interesting ones…it’s that time again! Subscribe the newsletter to receive future issues.

The Successor to SQL?

In the previous issue I reported on how the SQL standard is currently being extended with new features to query property graph data, and that it might even get stream-processing capabilities. This time I’ll look at an attempt to replace SQL all together.

In August 2019 Amazon announced their new query language PartiQL (pronounced: particle) as the “one query language for all your data”. When taking a closer look, it turns out to be more of a rebranding of SQL++ than a new invention.

The story starts in 2014 when the paper about SQL++ was published. The original version of the paper was a comparison of the capabilities of various query languages such as SQL-on-Hadoop, NoSQL and NewSQL. As a means to an end, SQL++ was created as a hypothetical super-language that supports all of the features of the surveyed languages. The comparison itself was basically a discussion about how the analyzed languages support the SQL++ features.

It was only in later versions of the paper that SQL++ itself moved into the focus. One reason might have been that some vendors where showing interest in implementing that language. One of them was Couchbase—they considered aligning their own language N1QL with SQL++. In the meantime, Couchbase announced their SQL++ implementation N1QL for Analytics. This is not to be confused with N1QL for Query’s—formally N1QL—, which is not SQL++ compatible. Relevant XKCD comic.

At the same time, one of the authors of the SQL++ paper was working at Amazon to establish this super-language as a new norm, i.e. PartiQL. The changes compared to SQL++ are not so much in the language but rather in the ecosystem around it. One example is that there is an (incomplete) open-source reference implementation of PartiQL.

The question whether or not PartiQL can be established as the one query language for all data cannot be simply dismissed by noting that others have failed at this before. PartiQL is not as naive as other attempts in this field. One important aspect is that PartiQL is SQL compatible. Even though PartiQL covers only a small part of SQL-92, there are still many valid SQL queries that are at the same time valid PartiQL queries. That simplifies the adoption. Further, PartiQL cannot be as easily merged into the SQL standard as the examples mentioned above. The essential difference between PartiQL and SQL—dynamic vs. static typing—goes to the heart of SQL. The general adoption of dynamic typing for SQL is unimaginable to me. The only realistic integration of PartiQL into the SQL standard I can see is the introduction of “dynamic types” that are accessed like SQL arrays and SQL object, but don’t apply static typing. A scoped dynamic typing so to say. That could be the best of both worlds.

Cloud Wars: Punitive Tariff for the Switch to the Cloud

In a recent forecast, Gartner sees a 17% revenue growth for public cloud offerings in 2020. No wonder software vendors are generally simplifying the migration of existing on-premises installations into the cloud. It is therefore quite surprising that Microsoft recently did the opposite by introducing a new migration hurdle.

New Training Dates A lot has happened since SQL-92! Checkout my new modern SQL training dates.

The most important thing first: The new hurdle only affects a very specific scenario. It is about using dedicated hardware in the cloud environments of Alibaba, Amazon, Google and also Microsoft itself. In that case, on-premises licenses that were bought from October 2019 onwards are not automatically eligible to be used in a “bring your own license” setup. For that, you now need to additionally buy Microsoft Software Assurance, which increases the yearly cost by 25%.

Technology and Science

At long last, MySQL supports the hash join algorithm! This algorithm is often best for large amounts of data and has been used by other products for decades. However, today I don’t want to discuss hash join; instead, I would like to provide some background and cover of why it was only recently introduced to MySQL.

The story begins in the 1980s—long before the first MySQL release. Towards the end of this decade the so-called Volcano model for evaluating queries was presented (paper). The key idea was that all operations should have the same interface so that they can be arbitrarily combined into a greater whole, a little like Lego bricks. The system has established itself and is known to many database users as the execution plan.

Initially, MySQL—and thus MariaDB—didn’t implement the Volcano model. Instead, assumptions about how the various operations can be combined sneaked into the code and limited its flexibility. The introduction of new operations, for example hash join, was rather difficult. Work-List Item 11785 has further details. Eventually Oracle decided to implement the Volcano model for MySQL. The new implementation was released with MySQL 8.0.18 in October 2019.

Introducing new operations is comparably simple now. The hash join is just the one example. Another one is EXPLAIN ANALYZE, which uses the common interface between the execution plan operators for performance instrumentation. The syntax as well as the output format were copied from PostgreSQL.

Another, unrelated publication about the Volcano model came from CockroachDB in late October 2019: They extended the interface between the operations of the Volcano-Model to pass multiple rows in one go. Once they also changed the operations to take advantage of that, they observed a 4-fold speed increase for some queries. The same concept was also applied to SQL Server 2019: There it is called batch mode, which was originally introduced for the columnar storage system. Starting with Version 2019, SQL Server can also use batch mode when accessing tables stored in a row-by-row fashion.

Finally, there is one more paper I’d like to mention: APOLLO: Automatic Detection and Diagnosis of Performance Regressions in Database Systems. It aims to automatically detect performance regression between database versions. After running the algorithm for two months, it has identified a total of 11 bugs in SQLite and PostgreSQL. Two of them have already been fixed, and five more were confirmed by the respective vendor.

New Versions

A more detailed analysis of the new SQL functionality of these releases will be published on modern-sql.com (Twitter, Email, RSS).

New Articles, Slides and Recordings

From Twitter, Keeping it Brief (follow me on Twitter)

SQL Renaissance Ambassador

As the SQL Renaissance Ambassador, it’s my mission to make developers aware of the evolution of SQL in the 21st century. My book “SQL Performance Explained” has been published in five languages and can be read online free of charge at use-the-index-luke.com. My next book is currently in the works and can already be read online as it’s being written (modern-sql.com). I am available as a trainer, speaker and consultant for all companies and developers interested in SQL. You’ll find more info at winand.at.