What If A Key Value Store Mated With A Relational Database System?

Last night, the folks from the Grand Rapids ruby group were kind enough to allow me to present on MongoDB. The talk went great. I’ve been excited about Mongo for a couple weeks now, so it was cool to see that it wasn’t just me.

The funny thing is, at nearly the same time, Wynn Netherland presented on MongoDB to the Dallas ruby group. We discovered that he wrote part 1 and I wrote part 2 of the presentation despite not working together on it so we ended up showing each other’s slides as well.

I figured since I spent the time to throw some slides together, I might as well put an intro up here too. First, the slides (they probably won’t mean a lot as they were mostly outlines for me to speak from).

Intro to MongoDB

Ok, so what the crap is Mongo? I find the best way to describe Mongo is the best features of key/values stores, document databases and RDBMS in one. No way, you say. That sounds perfect. Well, Mongo is not perfect, but I think it brings something kind of new to the database table.

Mongo is built for speed. Anything that would slow it down (aka transactions) have been left on the chopping block. Instead of REST, they chose sockets and have written drivers for several languages (of course one for Ruby).

Collections

It is collection/document oriented. Collections are like tables in MySQL (they are even grouped in databases) and serve the purpose of breaking up the top level entities in your application (User, Article, Account, etc.) by type and thus into smaller query sets, to make queries faster.

Documents

Inside of each collection, you store documents. Documents are basically objects that have no schema. The lack of schema may be scary to some, but I look at it this way. You have to know your application schema at the app level, so why put the schema in the database and in your app. Why not just put the schema in your app and have the database store whatever you put in it? This way, you database schema is kind of versioned with your application code. I think that is pretty cool.

Documents are stored in BSON (blog post), which is binary encoded JSON that is built to be more efficient and also to include a few more data types than JSON. This means that if you send Mongo a document that has values of different types, such as String, Integer, Date, Array, Hash, etc., Mongo knows exactly how to deal with those types and actually stores them in the database as that type. This differs from traditional key/value stores, which just give you a key and a string value and leave you to handle serialization yourself.

Object Relationships

There are two ways to relate documents in Mongo. The first, is to simply embed a document into another document. An example of this would be tags embedded in article. Let’s take a look.

{ title: 'Mongolicious', body: 'I could teach you, but I would have to charge...', tags: ['mongo', 'databases', 'awesome'] }

As you can see, tags are just a key in the article document. The benefits of this are that you never have to do any joins when you show the article and it’s tags as they are all stored in the same place. The other cool thing is that Mongo can index the tags and understand indexing keys that have multiple values (such as arrays and hashes). This means if you index tags, you can find all documents tagged with ‘foo’ and it will be performant. Embedded documents work great for some things, but other things wouldn’t make sense embedded.

Let’s imagine that you have an client document and you want the client to have multiple contacts. If you embedded the contacts for the client with it in a document, it would be inefficient to have a page that listed all the contacts. To have a contact list, you would have to pull out every client and collect all the contacts and then sort them. Also, if a contact should be associated with multiple clients, you would have to duplicate their information for each client.

In SQL, you would have a clients table and a contacts table and then a join model between them so that any contact would be in the system once and could be associated with one or more clients without duplicate. So how would you do this in Mongo? The same way…kind of.

In Mongo, you’d have a client collection and a contact collection. To associate a contact to a client, you just create a db reference to to the contact from the client.

Dynamic Queries

Yep, Mongo has dynamic queries. It actually has a kind of quirky, yet lovable syntax for defining criteria. Below are a few examples from my presentation which are mostly self-explanatory. These are examples of what you would run in Mongo’s JavaScript shell.

# finds all Johns db.collection.find({‘first_name’: ‘John’}) # finds all documents with first_name # starting with J using a regex db.collection.find({‘first_name’: /^J/}) # finds first with _id of 1 db.collection.find_first({‘_id’:1}) # finds possible drinkers (age > 21) db.collection.find({‘age’: {‘$gt’: 21}}) # searches in embedded document author for # author that has first name of John db.collection.find({‘author.first_name’:‘John’}) # worse case scenario, or if you need "or" # queries you can drop down to JavaScript db.collection.find({$where:‘this.age >= 6 && this.age <= 18’})

You can also sort by one or more keys, limit the number of results, offset a number of results (for pagination), and define which keys you want to select. The other thing that is slick is Mongo supports count and group. Count is the same idea as MySQL’s count. It returns the number of documents that match provided criteria. Group is the same concept, but is accomplished with map/reduce.

To really get a feel for all that you can do with queries, check out Mongo’s advanced query documentation.

Random Awesomeness

Capped collections (blog post): Think memcache. You can set a limit for a collection to a certain number of documents or size of space. When the number or size goes over limit the old document gets pushed out. For more info, see MongoDB and Caching

Upserts: Think find or create in one call. You provide criteria and the document details and Mongo determines if the document exists or not and either inserts or updates it. You can also do special things like incrementers with $inc. For more, read Using mongo for real time analytics

Multikeys: for indexing arrays of keys. Think tagging.

GridFS and auto-sharding: Storing files in the database in a way that doesn’t suck. They have mentioned in IRC that they might even make Apache/Nginx modules that server files straight from GridFS so requests can go straight from web server to Mongo instead of traveling through your app server. For more, read You don’t need a file system

How do I use it with Ruby?

If you have made it this far, you are probably intrigued and are wondering how you can use Mongo with Ruby. There is an official mongo-ruby-driver on GitHub for starters. It supports most of Mongo’s features, if not all, and gets the job done, but it is really low level. It would be like writing an application using the MySQL gem. You can do it, but it won’t be fun. I’ve even started giving back to the driver.

There are two “ORM’s” for Mongo and both are on GitHub. The first is an ActiveRecord adapter and the second is MongoRecord. I took a look at both of these, and decided to write my own. Why?

Mongo is not a RDBMS (like MySQL) so why use RDBMS wrappers (like the AR adapter)?

(like MySQL) so why use wrappers (like the AR adapter)? I think the DSL for modeling your application should teach you Mongo.

for modeling your application should teach you Mongo. Mongo is perfect for the website management system I’m building and I just didn’t like the other wrappers. Why would I want to build something with something that I didn’t like?

It sounded fun!

MongoMapper

I started the Friday of Memorial weekend and was able to crank out most of the functionality. Since then, I’ve been working on it whenever I get time and it is really close to being ready for a first release. That said, it is not public yet. Don’t worry, as soon as it is ready for prime time, I’ll be posting more here. So what features does MongoMapper have built in?

Typecasting

Callbacks (uses ActiveSupport callbacks)

Validations (uses my fork of validatable)

Connection and database can differ per document

Create, update, delete, delete_all, destroy, destroy_all that work just like ActiveRecord

Find with id, multiple ids, :all, :first, :last, etc. Also supports Mongo specific find critieria like $gt, $lt, $in, $nin, etc.

Associations

Drop in Rails compatibility

So out of the features listed above, all are complete but the last two at the time of this post. I’m currently working through associations and then I’m going to start making a Rails app with MongoMapper to figure out what I need for “drop in and forget” Rails compatibility. I have a few other smart people helping me so my guess is that it will be out in the next two weeks.

Let me know with a comment below what you like and don’t like about Mongo. I’m very curious what other Rails developers think after reading this intro and the articles I’ve linked to. I’m stoked, but I’m sure it is not for everyone.

Links