The world of databases has changed significantly in the last eight years or so. Do you remember the time when word database was equivalent to a relational database? Relational databases ruled this niche for more than forty years. And for a good reason. They have strong consistency, transactions, and expressiveness, they are a good integration tool, and so on.

But forty years is a long period of time. A number of things have changed during this time, especially in the technology world. Today, we can see that relational databases cannot satisfy every need of today's IT world. Having fixed database schema, static representation of data and impedance mismatch are just some of the obstacles that users of relational databases faced. That, in turn, gave space for a completely new branch of databases to develop NoSQL databases.

NoSQL Databases and Different Data Models

The term "NoSQL" was pretty and it was first tossed around back in 2009. It seems, however, that community is agreeing nowadays that it actually stands for Not Only SQL. Also, this term covers a wide range of databases. Why is that? Well, as relational database model was not a perfect fit for everyone's problems, people started to create different databases with models that can better handle obstacles they were facing. Therefore, these databases are very different from each other.

That is how database models became the main distinction between NoSQL databases and the way they function. Today, we can separate a few types of NoSQL databases based on their database model:

Key-value stores : Stores data in array with the single key.

Column stores : Stores data in columned families in column order.

Graph stores : Use graph structures for queries with nodes, edges, and properties to represent and store data.

Document stores: Stores data in self-describing structures that are usually similar to each other but don’t have to be the same (documents).

Each of these models tackles a different kind of problem, making them good for some solutions and bad for the others. For example, document databases are good storing unstructured data, but not as good at storing relational data such as graph databases. But this distinction between models is also the secret that enabled NoSQL boom to happen. Users became more aware of data and its nature. In this way, NoSQL databases world gave us the ability to choose the best data model for our solution.

Polyglot Persistence

However, what if you work on a large system that has a lot of different parts with numerous different problems to handle? There is also a great deal of different kind of data tossed around in that system, and more importantly, the nature of that data is different in different sections of the system. Which database — or to be more precise, which database model — should be used? That is how polyglot persistence emerged.

What this essentially means is that you should use multiple databases against a single backend. Each chunk of the system would use a different database which best fits its needs. Part of the system that handles structured data would use relational database model, while the part that works with unstructured, object-like data would use the document database mode, and the part that deals with analytical data would use a column database model, and so on.

This, of course, is easier said than done. Assuring that a project with many databases is fault-tolerant is challenging, to say the least. Apart from the increased code complexity, data consistency and data duplication become frequent issues. Deployment becomes more complicated and frequent, too. In addition, synchronization of these databases is an issue that cannot be overlooked. For example, if you want to backup data at a certain moment in time, this can be an issue because every database needs a different amount of time to backup.

Hence, the polyglot persistence was a nice idea that had to evolve. What multi-model database tries to address are exactly the problems that we face with the polyglot persistence concept.

Multi-Model Databases

What is the idea behind multi-model databases? Multi-model databases are trying to incorporate different database models into a single incorporated engine. This engine should be able to use unified querying language and expose a single API that will have the ability to be used on all database models. Personally, this was a tough pill to swallow at first. Let’s briefly explain how multi-model databases are able to map information from one data model to another.

The main concept is to keep all data in a single data model and then represent other models by mapping the higher-level models to a lower-level representation. For example, let’s say that we have three models in a multi-model database: document, key-value, and graph. Graphs can be mapped in document database model by creating a separate collection for vertexes and separate collection for edges.

Documents in document databases usually have a unique identifier for each document. This way, this can be mapped on key-value stores, where the key would be the document’s unique identifier and the value is the whole document value. One can see how those relational databases can be mapped to key-value model, too. Therefore, the lowest level of representation is key-value structure, and all other models can be mapped to it. Once this is established, one can easily create query language on top of that.

These features, of course, needed to lay on top of a highly performant multi-key ACID transactions, and in a way that they retain the advantages of NoSQL scalability and fault tolerance. Did I just describe the perfect database? One that gives you the ability to change the data model without having to sacrifice performance or scalability. That is the dream, indeed.

At the market, there is a wide range of multi-model databases already, like OrientDB, DataStax, Couchbase, and so on.

Conclusion

A multi-model database allows us to both use the best features of polyglot persistence concept and minimize its limitations. Now, we can create complex systems that use multiple database models and use a single engine to achieve that. That way, the complexity of development, operations, and deployment are minimized.





Read more from the author on Rubik's Code.