Distributed Filesystem and MapReduce - by gerd, 2011-10-02

PlasmaFS is a distributed filesystem for large files, implemented in user space. Plasma Map/Reduce runs the famous algorithm scheme for mapping and rearranging large files. Plasma KV is a key/value database on top of PlasmaFS.

PlasmaFS is deployed on an arbitrary number of namenodes and datanodes. All data and metadata is replicated. ACID transactions provide data safety and clear query semantics. PlasmaFS focuses on large files and blocksizes in the range 64K to 1M. It is error-resiliant and extensible.

PlasmaFS is accessible over a command-line client (plasma), NFS v3, and over its own native network API.

Plasma MapReduce implements the Map/Reduce algorithm scheme. The processed files are stored in PlasmaFS filesystems.

Plasma tries to be extremely performant - for example, it uses shared memory where possible, and minimizes network traffic. It is implemented in Ocaml and compiles to machine code (no VM). It focuses on 64 bit machines. The design, however, also aims at clean semantics and data safety in order to minimize the risk of losing data.

Plasma KV is a key/value database where the data files are stored in PlasmaFS. It targets at simple database applications that are dominated by reads and that need to be extremely scalable. Unlike other NoSQL implementations, Plasma KV provides high data safety by using the transactional interface of PlasmaFS. Also, it allows high isolation between readers and writers - in particular, a writer does not lock readers out.