MongoDB: leave your SQL at home

LWN.net needs you! Without subscribers, LWN would simply not exist. Please consider signing up for a subscription and helping to keep LWN publishing

MongoDB is an open source document-oriented database system that is designed for speed and scalability in web site data operations, bridging the gap between simple "key/value" structured storage and the heavyweight requirements of relational database systems. Like other databases in the so-called "NoSQL" vein, MongoDB trades in full ACID compliance for the ability to solve a smaller set of problems easily and quickly.

MongoDB theory

MongoDB's data sets are called "collections" and are roughly analogous to the tables in a traditional relational database. Unlike relational database tables, however, they have no predefined structure (or schema, to use the canonical term) — each record in the collection is a "document" that can potentially have a different structure than every other document in the collection.

This is not to say that MongoDB documents are unstructured, of course; they use a key-value pair syntax modeled on the popular JavaScript Object Notation (JSON) format. MongoDB calls this syntax BSON (alternately expanded as "Binary JSON" and "Binary Serialized dOcument Notation"), and it is designed to be easily traversed, easily coded-to, and lightweight — enough so that it is also MongoDB's network transfer format. Document keys are strings, and values can be variety of types including strings, arrays, and even other documents.

For example, a JSON object such as

{ "firstName": "Nathan", "lastName": "Willis", "Url": "http://www.freesoftwhere.org" }

{"firstName" : "Nathan" , "lastName" : "Willis" , "Url": "http://www.freesoftwhere.org" , \ "_id" : ObjectId(497cf6075172cf775cace8fb)}

would appear quite simply as the document:in a MongoDB collection.

MongoDB's query language is also based on the BSON syntax, so data can be fetched with simple expressions such as db.users.find({'lastName': 'Willis'}) or sorted with db.users.find({}).sort({lastName: 1}) . All of MongoDB's queries are dynamic, however, meaning that clients can query the database on any key, without first having to calculate a "view" that indexes the data based on a particular key. This is different from other document-oriented databases, such as CouchDB, which can perform only static queries.

The conceptual differences between MongoDB's schema-free documents and a traditional relational database produce some limitations, but also enable some real-world speed optimizations. Developer Richard Kreuter described MongoDB in a talk at Texas Linux Fest on April 10. He said that because documents are schema-free, the database can be designed to store information commonly accessed in a serial fashion within a single document — for example, a blog post's content and all of the reply comments. 99 percent of the time, he said, they will be retrieved in precisely that order. By not storing the post, user names, and comments in separate tables, access is substantially sped up. The only cost is loss of the comparatively-infrequently-needed ability to atomically update the post and the comments simultaneously from different database clients.

The project lists web site content management, real-time analytics, caching, and logging as ideal use cases for MongoDB. Highly transactional systems, on the other hand, are a poor fit, as the MongoDB server can enforce transactionality only on operations that touch a single document.

In addition to its overall document-centric design, MongoDB also offers several interesting features that database application developers are likely to find convenient. One example is the "upsert" operation, which updates an object in a database document if the object already exists, and creates it if it does not exist. Another example is "capped collections," in which a collection is created with a fixed size, and the oldest entries are automatically removed. Capped collections allow a collection to automatically retain order, but free the developer from having to manually "age-out" the oldest objects by tracking their timestamps.

MongoDB deployment and administration

MongoDB is developed primarily at 10gen, a company which offers commercial support contracts and training for MongoDB administration and development. The latest release is version 1.4, from March 22, 2010, and is under the AGPL version 3. The project provides packages for 32-bit and 64-bit versions of x86 Linux, Solaris, Windows, and Mac OS X, as well as an Apt repository for Debian and Ubuntu.

The main MongoDB server runs as the mongod process. Packages include a shell interpreter interface called mongo , which uses JavaScript as its command language — most of the documentation and tutorials on he Mongo web site use this interface for their examples. Language drivers are available for C, C++, Python, Java, and Perl clients in the official packages, and C#, REST, ColdFusion, Ruby, PHP, JavaScript, and several others in community-supported add-ons.

Mongo supports several replication configurations, including the usual master-slave, as well as "replica sets" that automatically negotiate which database server functions as the master at a given point in time. Master-master replication is supported only in a limited fashion.

Mongo is designed to be highly horizontally scalable, supporting database cluster functionality like failover, map/reduce, and sharding. The current release supports auto-sharding, in which a routing process called mongos interacts with the client in order to abstract away the actual cluster of mongod servers.

Because Mongo does not support transactions in the sense that relational databases support them, it does not support transaction logs that enable database repair — the only real protections against data loss are backups and replication. One other feature worth noting is that the current releases of Mongo only support username-and-password authentication that grants read-write or read-only access to a particular database. Deployments that need stronger security or more fine-grained access control may not find Mongo a good fit.

Still, there are plenty of large-scale production MongoDB servers in the wild — most notably the web, project, and download pages on SourceForge.net, the GitHub service, and the Disqus blog-discussion-system. Those examples and the others listed on the Mongo "production deployments" page all seem to fit broadly into the problem space that Mongo is optimized for: "high-volume, low-value data" web sites, which have little need for the transactional requirements that a relational database system like MySQL provides. If your site also fits the pattern, MongoDB deserves a close look.