The starting point for this exploration is a recent experience I made with my own attempt in the NoSQL arena, the Plasma project. Two weeks ago, it was "only" a distributed, replicating, and failure-resiliant filesystem PlasmaFS, with its own map/reduce implementation on top of it. Then I had an idea: is it possible to develop a key/value database on top of this filesystem? Which features, and relative advantages/disadvantages would it have? In other words, I was examining whether the existing platform makes it simpler to develop a database with a reasonable feature set.

When we talk about clusters, I have especially Internet applications in mind that are bombarded by the users with requests, but that have also to do a lot of background processing.

The key/value database needed less than 2000 lines of code

Now, PlasmaFS is not following the simple pattern of HDFS, but bases on a transactional core, and it even allows the users to manage the transactions. For example, it is possible to rename a bunch of files atomically by just wrapping the rename operations into a single transaction. The transactional support goes even further: When reading from a file one can activate a special snapshot mode, which just means that the reader's view of the file is isolated from any writes happening at the same time.

These are clearly advanced features, and the question was whether they helped for writing a key/value database library. And yes, it was extremely helpful - in less than 2000 lines of code this library provides data distribution and replication, a high degree of data safety, almost unlimited scalabilitiy for database reads, and reasonable performance for writes. Of course, most of these features are just "inherited" from PlasmaFS, and the library just had to implement the file format (i.e. a B tree, see this page for details). This is not cheating, but exactly the point: the platform makes it easy to provide features that would otherwise be extremely complicated to provide.

NoServer

This key/value database is just a library, and one can use it only on machines where PlasmaFS is deployed. Of course it is possible to access the same database file from several machines - PlasmaFS handles all the networking involved with it. The point is that during the implementation of the library this never had to be taken into account. There is no networking code in this library, and this is why it is the first example of the new NoServer paradigm - not only server.

The genuine advantage of this paradigm is that it enables developers to write code they never would be able to create without the help of the platform. This is a bit comparable to the current situation for SQL databases: Everybody can store data in them, even over the network, without needing to have any clue how this works in detail. In the NoServer paradigm, we just go one step further, because the provided services by the platform are a lot more low-level, and the developer has a lot more freedom. Instead with a query language the shared resources are accessed with normal file operations, extended by transactional directives. The hope is that this makes a lot of server programming superflous, especially the difficult parts of it (e.g. what to do when a machine crashes).

A simple key/value database is obviously not difficult to create with these programming means. The interesting question is what else can be done with it in a cluster environment. Obviously, having a common filesystem on all machines of the cluster makes a lot of file copying superflous that a normal cluster would do with rsync and/or ssh. PlasmaFS can even be directly mounted (although the transactional features are unavailable then), so even applications can access PlasmaFS files that have not specially been ported to it. An example would be a read-only Lucene search index residing in PlasmaFS. Replacing the index by an updated one would be done by simply moving the new index into the right directory, and signalling Lucene that it has to re-open the index.

So far Plasma is implemented, and works well (I just released the release 0.5, which is now beta quality). The vision goes of course beyond that.

What the platform also needs

There are a number of further datastructures that can obviously be well represented in files, such as hashtables or queues. Let's explore the latter a bit more in detail: How would a queue manager look like? There are a few data representation options. For example, every queue element could be a file in a directory, or a container format is established where the elements can be appended to. PlasmsFS also allows it to cut arbitrary holes into files, so it is even possible to physically remove elements from the beginning of the queue file by just removing the data blocks storing the elements from the file. As we don't want to run the queue manager as server, but just as library inside any program accessing the queue, the question is how event notifications are handled (which would be obvious in server context). Usually, one has to notify some followup processor when new elements have been added to the queue. Plasma currently does not include a method for doing this, so the platform needs to be extended by a notification framework (which should not be too difficult).

An important question is also how programs are activated running on different nodes. In my vision there would be a central task execution manager. Of course, this manager is normal client/server middleware. Again, the point here is that the application developer needs no special skills for triggering remote activation, he just uses libraries. I've no absolutely clear picture of this part yet, but it seems to be necessary to have the option of invoking programs in the inetd style as well as directly as if started via ssh. Also, a central directory would be maintained that includes important data such as which program can be run on which node.

We won't live totally without servers, only with fewer ones

My vision does not include that servers are completely banned. We will still need them for special features or data access patterns, and of course for interaction with other systems. For example, PlasmaFS is bad at coordinating concurrent write accesses to the same file. Also, PlasmaFS employs a central namenode with a limited capacity only. So, if you are doing OLTP processing, a normal SQL database will still do better. If you need extraordinary write performance, but can pay the price of weakened consistency guarantees, a system like Cassandra will work better.

Nevertheless, there is the big field of "average deployments" where the number of nodes is not too big and the performance requirements are not too special, but the ACID guarantees PlasmaFS gives are essential. For this field, the NoServer paradigm could be the ideal choice to reduce the development overhead dramatically.

Check Plasma out