Long story short: really good ;).

At juptr.io, we use memory mapped files to back persistent, iterable HashMap's of ordinary (fast-serialized) java objects, which can be write-through cached in-memory. Iterable Hashmaps support both column-oriented data organization and queries as well as document store key/value-style access patterns.

In order to scale horizontally, those data hashmaps are sharded amongst multiple nodes making up a cluster.

Data modification and query is achieved by sending (fast-)serialized plain Java lambda-functions over the wire. The receiving node executes the lambda function inside kind of 'sandbox' (outer code calling the lambda provided by sender) by throwing in either a reference to its local data map/table or by iterating and feeding the lambda record by record.



We rarely need to transfer data over the network just in order to operate on it, instead we send a lambda which operates "nearby" the data and sends back operation results.

Move code not data.





The byte code of a remoted lambda enjoys full HotSpot close-to-the-metal-compilation and optimization.Because an ordinary lambda implementation naturally separates parameters from actual code, compilation of a lambda query happens once, not with each query.We are not limited by some proprietary query language. A lambda query can perform any operation expressable with java just like an ordinary forEach iteration on a local collection.When using anonymous classes instead of lambdas, its possible to use arbitrary data structures in order to accumulate and capture intermediate results during query record iteration. An additional aggregation step at sender side is required then as N intermediate results of N data nodes are received in a scaled setup (map reduce'ish).

Simplicity

As there is no impedance mismatch, working with big data feels as simple as looping a collection on the heap.