MongoDB at Etsy, Part 2

Posted by John Allspaw on July 3, 2010

Note: In 2020 we updated this post to adopt more inclusive language. Going forward, we’ll use “primary/replica” in our Code as Craft entries.

Dan and Wil posted a while ago about how we’re using MongoDB at Etsy. We’ve been using MongoDB as the backing store for Treasury, and over the course of two and a half months, we’ve had over 50,000 excellent Treasuries created by the Etsy community, all stored and served in MongoDB. Not so bad.

I thought I’d post a bit about what I like about MongoDB, operationally.

The things that were most important to me on that front were:

Stability

Sane behavior under increasing load

Metrics availability

Familiar replication behavior

Of course the first thing that was of concern was stability. Who wants to implement something crashy? We got enough honest reports of stability from other folks with MongoDB in production and our experience is no different. This is in large part due to the activity on the mongodb-user list, as well as talking to folks who have already put it into production.

Performance

One of the things that Dan touched on was that MongoDB behaves well when the working set of data exceeds available physical RAM. The folks over at 10gen obviously have put a good deal of thought into this, because otherwise you wouldn’t be able to call it “humongous”. 🙂 While the mechanics of how data is buffered to RAM and persists to disk appear to be different, the behavior appears to be along the same lines as InnoDB’s buffer pooling and persistence. Which is to say, query performance is excellent when the database is small enough to keep entirely in RAM, but when the data blows past RAM, the performance then plateaus, limited only by your disk I/O subsystem. This is preferable and familiar; there’s no massive drop (or crash) in performance, there’s only a nice plateau of response at the bound-by-disk condition.

When we first evaluated MongoDB, we loaded up all of the ‘favorites‘ data into a MongoDB instance (it’s currently kept in Postgres in production) which was well over 40G of data. We then hammered it with 5000 random queries at a time, constantly. The machine has 8 15K SAS spindles, RAID10, and only 16G of RAM. We watched the query performance as the filesystem cache filled to 16G, and once it hit 16G, the performance leveled out, consistent with a purely disk-bound workload. We saw the same results when we loaded up 70G worth of Etsy’s listing data and performed the same test. This is what you’d normally expect with any sane datastore that is expected to scale with large (and growing) working sets of persisted data.

Why is this important? Because any datastore can be fast when its working set is in RAM, but we need to prepare for exceeding that, and not relying on vertically scaling our machines to handle increasing traffic.

Metrics

The other piece that I appreciate in MongoDB is the metrics it exposes about its operations, and how it’s easy to get at them. There’s currently an http interface at http://127.0.0.1:28017/_status that dumps a good deal of parse-able information. We use these metrics not just for alerting thresholds (on current connections, replica replication lag) but also for gathering statistics for all of the Mongo operations being done. Wil has a ganglia gmetric script that parses this data to put into Ganglia. He’s open-sourced the gmetric and it’s on github. This way, we can correlate what CPU, memory, disk, etc. values look like with increasing usage across all of the various mongo metrics. Context is everything. 🙂

Familiarity

Having come from a LAMP background, I’m used to having database replication be a given. I realized the other day that the last time I worked somewhere replication of data wasn’t an absolute requirement, it was 2003. So when we first evaluated MongoDB, replication was one of the first things I was happy to see. Replication is super simple, and for anyone familiar with MySQL’s built-in replication, all of the fundamentals apply: MongoDB writes to an oplog collection for transactions which is then used as the replication stream between primaries and replicas.

The oplog is actually a database, unlike binlogs in MySQL-land. This means you can query it and inspect it in the same way you would any database in Mongo. Dan’s got a project up on github that allows you to inspect and manipulate the oplog easily, and he’ll be adding more abilities to it in the future. This will be familiar to those who have worked with mysqlbinlog in that it provides an easy way to juggle replication events in recovery or troubleshooting scenarios.

Dan didn’t write that tool out of curiosity. It turns out that we were going to need something like that soon enough. Stay tuned for Part 3 of this Mongodb Etsy story….