Why I think Mongo is to Databases what Rails was to Frameworks

Strong statement, eh? The more I work with Mongo the more I am coming around to this way of thinking. I tell no lie when I say that I now approach Mongo with the same kind of excitement I first felt using Rails. For some, that may be enough, but for others, you probably require more than a feeling to check out a new technology.

Below are 7 Mongo and MongoMapper related features that I have found to be really awesome while working on switching Harmony, a new website management system by my company, Ordered List, to Mongo from MySQL.

1. Migrations are Dead

Remember the first time you created and ran a migration in Rails. Can you? Think back to the exuberance of the moment when you realized tempting fate on a production server was a thing of the past. Well I have news for you Walter Cronkite, migrations are so last year.

Yep, you don’t migrate when you want to add or remove columns with Mongo. Heck, you don’t even add or remove columns. Need a new piece of data? Throw a new key into any model and you can start adding data to it. No need to bring your app to a screeching halt, migrate and then head back to normal land. Just add a key and start collecting data.

2. Single Collection Inheritance Gone Wild

There are times when inheritance is sweet. Let’s take Harmony for example. Harmony is all about managing websites. Websites have content. Content does not equal pages. Most website management tools are called content management systems and all that means is that you get a title field and a content field. There, you can now manage content. Wrong!

Pages are made up of content. Each piece of content could be as tiny as a number or as large a massive PDF. Also, different types of pages behave differently. Technically a blog and a page are both pages, but a page has children that are most likely ordered intentionally, whereas a blog has children that are ordered by publish date.

So how did Mongo help us with this? Well, we created a base Item model. Sites have many items. Items have many custom pieces of data. So, we have an Item model that acts as the base for our Page, Blog, Link, BlogPost and such models. Then each of those defines specific keys and behaviors that they do not have in common the other items.

By using inheritance, they all share the same base keys, validations, callbacks and collection. Then for behaviors and keys that are shared by some, but not all, we are creating modules and including them. One such module is SortableItem. This gets included in Page, Blog and Link as those can all be sorted and have previous and next items. The SortableItem module defines a position key and keeps the position order in check when creating and destroying items that include it. Think of it as acts_as_list.

This has been so handy. Steve was building the doc site and said he wished he had a link type, something that shows up in the navigation, but cross links to another section or another site. I was like, so make it! Here it is in all its glory.

class Link < Item include SortableItem key :url, String, :required => true, :allow_blank => false def permalink Harmony.escape_url(title) end end

Yep, barely any code. We inherit from item, include the sortable attributes, define a new key named url (where the link should go to) and make sure the permalink is always set to the title. Nothing to it. This kind of flexibility is huge when you get new feature ideas.

All these completely different documents are stored in the same collection and follow a lot of the same rules but none of them has any more data stored with it than is absolutely needed. No creating a column for any key that could be in any row. Just define the keys that go with specific document types. Sweet!

3. Array Keys

Harmony has sites and users. Users are unique across all of Harmony. One username and password and you can access any specific site or all sites of a particular account. Normally this would require a join table, maybe even some polymorphism. What we decided to do is very simple. Mongo natively understands arrays. Our site model has an array key named authorizations and our Account model has one named memberships. These two array keys store arrays of user ids. We could de-normalize even more and just have a sites array key on user, but we decided not to.

class Site include MongoMapper::Document key :authorizations, Array, :index => true # def add_user, remove_user, authorized? end class Account include MongoMapper::Document key :memberships, Array, :index => true # def add_user, remove_user, member? end

What is cool about this is that it is still simple to get all the users for a given site.

class Site def users user_ids = [authorizations, memberships].flatten.compact.uniq User.all(:id => user_ids, :order => 'last_name, first_name') end end

The sweet thing about this is that not only does Mongo know how to store arrays in documents, but you can even index the values and perform efficient queries on arrays.

Eventually, I want to roll array key stuff like this into MongoMapper supported associations, but I just haven’t had a chance to abstract them yet. Look for that on the horizon.

4. Hash Keys

As if array keys were not enough, hash keys are just as awesome. Harmony has a really intelligent activity stream. Lets face it, most activity streams out there suck. Take Github’s for example. I will pick on them because I know the guys and they are awesome. They are so successful, they can take it. :)

It may be handy that I can see every single user who follows or forks MongoMapper, but personally I would find it way more helpful if their activity stream just put in one entry that was more like this.

“14 users started watching MongoMapper today and another 3 forked it. Oh, and you had 400 pageviews.”

Am I right? Maybe I have too many projects, but their feed is overwhelming for me at times. What we did to remedy this in Harmony is make the activity stream intelligent. When actions happen, it checks if the same action has happened recently and just increments a count. What you end up with are things in the activity stream like:

“Mongo is to Databases what Rails was to Frameworks was updated 24 times today by John Nunemaker.”

On top of that, we use a hash key named source to store all of the attributes from the original object right in the activity stream collection. This means we do 0, yes 0, extra queries to show each activity. Our activity model looks something like this (obviously this is really pared down):

class Activity include MongoMapper::Document key :source, Hash key :action, String key :count, Integer, :default => 1 end

Then, we define an API in that model to normalize the different attributes that could be there. For example, here is the title method:

class Activity def title source['title'] || source['name'] || source['filename'] end end

In order to determine if a new action is already in the stream and just needs to be incremented, we can then use Mongo’s ability to query inside hashes with something like this:

Activity.first({ 'source._id' => id, :action => 'updated', :created_at.gt => Time.zone.now.beginning_of_day.utc })

How fricken sweet is that? Major. Epic.

5. Embedding Custom Objects

What is that you say? Arrays and hashes just aren’t enough for you. Well go fly a kite…or just use an embedded object. When Harmony was powered by MySQL (back a few months ago), we had an Item model and an ItemAttributes (key, value, item_id) model.

Item was the base for all the different types of content and item attributes were how we allowed for custom data on items. This meant every time we queried for an item, we also had to get its attributes. This isn’t a big deal when you know all the items up front as you can eager load some stuff, but we don’t always know all the items that are going to be shown on a page until it happens.

That pretty much killed the effectiveness of eager loading. It also meant that we just always got all of an item’s attributes each time we got an item (and performed the queries to get the info), just in case one of those attributes was being used (title and path are on all items).

With Mongo, however, we just embed custom data right with the item. Anytime we get an item, all the custom data comes with it. This is great as there is never a time where we would get an attribute without the item it is related to. For example, here is part of an item document with some custom data in it:

{ "_id" =>..., "_type" =>"Page", "title" =>"Our Writing", "path" =>"/our-writing/", "data" =>[ {"_id" =>..., "file_upload"=>false, "value"=>"", "key"=>"content"}, {"_id" =>..., "file_upload"=>true, "value"=>"", "key"=>"pic"} ], }

Now anytime we get an item, we already have the data. No need to query for it. This alone will help performance so much in the future, that it alone had the weight to convince us to switch to Mongo, despite being almost 90% done in MySQL.

The great part is embedded objects are just arrays of hashes in Mongo, but MongoMapper automatically turns them into pure ruby objects.

class Item include MongoMapper::Document many :data do def [](key) detect { |d| d.key == key.to_s } end end end class Datum include MongoMapper::EmbeddedDocument key :key, String key :value end

Just like that, each piece of custom data gets embedded in the item on save and converted to a Datum object when fetched from the database. The association extension on data even allows for getting data by its key quite easily like so:

Item.first.data['foo'] # return datum instance if foo key present

6. Incrementing and Decrementing

A decision we made the moment we switched to Mongo was to take advantage of its awesome parts as much as we could. One way we do that is storing published post counts on year, month and day archive items and label items. Anytime a post is published, unpublished, etc. we use Mongo’s increment modifier to bump the count up or down. This means that there is no query at all needed to get the number of posts published in a given year, month or day or of a certain label if we already have that document.

We have several callbacks related to a post’s publish status that call methods that perform stuff like this under the hood:

# ids is array of item ids conditions = {:_id => {'$in' => ids}} # amount is either 1 or -1 for increment and decrement increments = {'$inc' => {:post_count => amount}} collection.update(conditions, increments, :multi => true)

For now, we drop down the ruby driver (collection.update), but I have tickets (inc, the rest) to abstract this out of Harmony and into MongoMapper. Modifiers like this are super handy for us and will be even more handy when we roll out statistics in Harmony as we’ll use increments to keep track of pageviews and such.

7. Files, aka GridFS

Man, with all the awesome I’ve mentioned above, some of you may be tired, but I need you to hang with me for one more topic. Mongo actually has a really cool GridFS specification that is implemented for all the drivers to allow storing files right in the database. I remember when storing files in the database was a horrible idea, but with Mongo this is really neat.

We currently store all theme files and assets right in Mongo. This was handy when in development for passing data around and was nice for building everything in stage before our move to production. When we were ready to move to production, we literally just dumped stage and restored it on production. Just like that all data and files were up and running.

No need for S3 or separate file backup processes. Just store the files in Mongo and serve them right out of there. We then heavily use etags and HTTP caching and intend on doing more in the future to make sure that serving these files stays performant, but that is for another day. :) As of now, it is plenty fast and sooooo convenient.

Conclusion

We have been amazed at how much code we cut out of Harmony with the switch from MySQL to Mongo. We’re also really excited about the features mentioned above and how they are going to help us grow our first product, Harmony. I can’t imagine building some of the flexibility we’ve built into Harmony or some of the ideas we have planned for the future with a relational database.

I am truly as excited about the future of Mongo as I once was (and still am) about the future of Rails.