Big News In Databases — Summer 2019

Twice a year I review the news articles from the past six months and report on the most interesting ones…it’s that time again! Subscribe the newsletter to receive future issues.

SQL is Growing

In previous issues, I repeatedly addressed the growing adoption of SQL among traditional databases such as MySQL and non-relational systems like Elasticsearch. In this issue, I focus on the growth of the SQL standard itself by shining a spotlight on three vendors of proprietary query languages who want to integrate their language concepts into the SQL standard.

Rasdaman and Multidimensional Arrays (SQL/MDA)

The first representative of the group has already made the grade: A new part of the SQL standard was just published in June: ISO/IEC 9075-15:2019 Multi dimensional arrays (SQL/MDA).

This functionality allows storing, accessing and processing massive multidimensional arrays such as n-channel satellite images. That means that SQL can now decode images and directly access and process image areas via pixel coordinates. You’ll find all the details in this Technical Report.

The driving force behind this extension was Peter Baumann, who initiated and co-edited the adoption of concepts from the Rasdaman-Database for SQL.

Neo4j, Graph Query Language (GQL) and SQL/PGQ

The next push for standardization—significantly driven by Neo4j—has just reached a critical milestone: The ballot for the creation of a new standard, GQL, is ongoing. If it passes, the normative documents that define a new database language will be written over the next years.

So GQL is not an SQL extension, rather it is a new language specifically designed to process graph structures. There are some cross references between SQL and GQL nevertheless. In the one way, the GQL will “inherit” some properties of SQL by referring to the SQL standard. In another way, SQL will get a new part 16 (SQL/PGQ) to make some GQL functionality directly available in SQL—for that, rumors mention the year 2020.

Apache and Streaming Data

The topic of streaming data is currently at the very beginning of the standardization process. A driving force is Apache, which offers several streaming platforms for which SQL-like languages are available (KSQL, Spark, Flink, etc.). A proposal published end of May (summary) claims that SQL can be very effective for processing streaming data with the addition of a few minor non-invasive extensions. In this way, the authors are initiating the standardization process.

Cloud Wars: Strange Allies Working Against Your Own Hardware

Starting with release 12.2 of the Oracle Database, there is an interesting pattern: New add-on features (options and management packs) are only available in the Oracle cloud and on engineered systems—i.e. only on Oracle hardware. If you want to run the Oracle Database on your own hardware, you cannot use these new add-on features, which simplify the management of large installations. This limitation affects end users who want to use their own hardware as well as third-party cloud providers like Microsoft and Amazon.

A recently announced alliance between Microsoft and Oracle has to be looked at in this context. The two cloud providers established a direct network link between two of their data centers and also integrated the identity and access management between them. The use cases envisioned in the press release always deploy the Oracle Database in the Oracle cloud, but the application software on Microsoft Azure. Apparently, the only reason to use the Oracle cloud is the Oracle Database—and that maybe only due to the limited availability of advanced management features on non-Oracle hardware. When it comes to running the application software, Oracle surrenders.

Technology and Science

In addition to the above mentioned “gentle pressure” to get clients into the cloud, a cloud deployment of course has some intrinsic benefits too. Besides the usage-based cost, it really comes down to just how simple it is to manage. In a cloud environment, it only takes a few clicks to install a database with redundancy across data centers.

However, this simplicity in terms of management introduces a new risk: bugs in the software that runs the cloud. A recent paper (summary) analyzed critical incidents in the Azure cloud during a six month period in 2018. One finding: 40% of these incidents were caused by bugs in the cloud software itself. One example of such an incident made it into the news in January 2019: An automatic process has inadvertently dropped client databases.

Another paper worth reading (summary) provides a current overview of the CALM Theorem. This theorem is about avoiding the coordination effort in distributed systems in order to improve performance.

Finally, a paper about efficient sliding window implementations in streaming context: Optimal and General Out-of-Order Sliding-Window Aggregation.

New Releases

A detailed analysis of the new SQL functionality of these releases will be published soon on modern-sql.com (Twitter, Email, RSS).

New Articles, Slides and Recordings

From Twitter, Keeping it Brief (follow me on Twitter)

SQL Renaissance Ambassador

As the SQL Renaissance Ambassador, it’s my mission to make developers aware of the evolution of SQL in the 21st century. My book “SQL Performance Explained” has been published in five languages and can be read online free of charge at use-the-index-luke.com. My next book is currently in the works and can already be read online as it’s being written (modern-sql.com). I am available as a trainer, speaker and consultant for all companies and developers interested in SQL. You’ll find more info at winand.at.