The NoSQL movement has spawned a slew of alternative data stores, all of which attempt to fill voids left by traditional relational database implementations. But while it's easy to fit the various relational databases (MySQL, Oracle, DB2, and so on) under a single categorical umbrella, the NoSQL world is much more diverse, and the NoSQL label is too general. NoSQL data stores such as MongoDB and Cassandra are so vastly different from each other that apples-to-apples comparisons are practically impossible. Thus, within the world of NoSQL, there are subcategories such as key-value stores, graph databases, and document-oriented stores.

Document-oriented stores, or document stores for short, aren't new to the world of computing. Industry graybeards will quickly recognize Lotus Notes as one of the first successful NoSQL document stores from the late '80s. Document stores encapsulate data into loosely defined documents, rather than tables with columns and rows. Implementations of the underlying document vary by data store, with some representing a document as XML and others as JSON, for instance.

[ Also on InfoWorld: NoSQL standouts: New databases for new applications | First look: Oracle NoSQL Database | Follow the latest developments in business technology news and get a digest of the key stories each day in the InfoWorld Daily newsletter. ]

But in general, documents aren't rigidly defined, and in fact they offer a high degree of flexibility when it comes to defining data. This flexibility has costs. For example, these data stores do not support SQL, instead supporting custom query languages better suited to the underlying document structure (such as XPath-like query languages for XML data stores). But the lack of rigidity in data definition has many benefits as well. In many cases, compared to traditional relational databases, the more flexible document stores enable faster iterative-style development where data requirements are evolving more rapidly than the pace of development.

MongoDB: Flexible, scalable NoSQL

In recent years, a number of document stores have come out and garnered a high degree of developer mind share. One of the most popular of these is MongoDB, an open source, schema-free document store written in C++ that boasts support for a wide array of programming languages, a SQL-like query language, and a number of intriguing features related to performance and scalability.

Out of the box, Mongo supports sharding, which permits horizontal scaling by divvying up a collection of documents across a cluster of nodes, thus making reads faster. What's more, Mongo offers replication in two modes: master-slave and replica sets. In a replica set, there is no master node; instead, all nodes are copies of one another and there is no single point of failure. Replica sets therefore bring more fault tolerance to larger environments supporting massive amounts of data. These features and more don't require an army of DBAs to implement, nor do they need massive hardware expenditures. Mongo can run on commodity hardware platforms, provided there is a healthy amount of memory.

Mongo is schema-less -- it'll store any document you decide to put into it. There is no upfront document definition requirement. Ultimately, documents are grouped into collections, which are akin to tables in a relational database. Collections can be defined on the fly as well. Documents are stored in a binary JSON format, dubbed BSON, and encapsulate data represented as name-value pairs (which are somewhat like columns and rows).