MongoDB at Etsy

Posted by Dan McKinley on May 19, 2010

Hi! Dan McKinley and Wil Stuckey from the Etsy Curation team here. We’ll be your hosts for a three-part series about the use of MongoDB here at Etsy.



The Curation Team. Well, half of it. (Photo credit: Elizabeth Weinberg.)

In this, the first entry, we’ll give some background on how and why we use MongoDB, and explain our initial impressions as developers. In the second post, John Allspaw will talk about how well MongoDB is working out operationally. And not to build this up excessively or give too much away, but the third post will be a harrowing look inside web operations gone horribly awry, chock full of fear, loathing, and remediation checklists.

The Application

The first project using MongoDB is the new Treasury. For the unfamiliar, the Treasury is a member-curated browsing tool, originally built as a flash application. Our team has rewritten it as a modern ajax application in order to solve the scaling issues inherent in the original FMS design.

Etsy homepages are most frequently chosen from the treasuries, and there are quite a few other existing or proposed features that are at least vaguely similar. So we wanted our backend to be flexible and to make something like a polymorphic, generalized “list” object as easy as possible. It was thinking along those lines that first made us consider using a schemaless database, although we would not currently consider that to be the primary benefit.

Why not use a relational database?

For us, building a read-heavy social application, this question should really be rephrased as “why not use MySQL?” This was actually a pretty difficult decision. MySQL is very well-understood operationally (especially by the people on our team). Replication in MySQL is pretty easy, and we love replication.

There is absolutely no part of this project that is technically impossible with MySQL or, for that matter, any relational database. For us, this just came down to development speed. Ignoring everything else, the following two solutions are roughly equivalent in terms of performance.

Relational Solution Document Store Solution Use a relational database, with a normalized or semi-normalized schema.

When rendering a response, run a handful of queries and then aggregate the data for the object.

Cache the resultant aggregate object either on a TTL or do invalidation.

Return the cached copy of the aggregate object. Use a document datastore, and embed sub-objects or child lists within their parents.

When rendering a response, retrieve the document by key and return it.

In our case, the development time saved using a document database is worth the risks. Caching at many levels is of course still a part of our application, but so far we’ve not found any reason to cache a single MongoDB document retrieved by primary key in an external cache like memcached, a practice that is currently common for us when we use relational databases.

Why MongoDB?

The number of databases that could be used for this kind of project have, um, proliferated somewhat recently. Why would we choose MongoDB over all of the others? Well, there were a few characteristics that we knew that we wanted:

The database should be safe to use as the system of record. In other words, it will not be storing data that is essentially replicated from other locations. We need the data on disk, backed up, and to have reasonable operational guarantees when there are hardware failures or the process is killed. This requirement also imposes certain constraints on the database’s maturity–we had to rule out CouchDB because of the possibility that the storage format would change before it came out of alpha.

The database performance should degrade gracefully when the data volume exceeds available RAM (this rules out some contenders, such as Tokyo Cabinet).

Our tests found MongoDB to be a sweet spot between reliability, speed, and maturity. But to be clear, this was the picture six months ago when we started prototyping. The world of document datastores since then has changed significantly even in that time frame. Today, the choice would probably be more difficult.

And to be perfectly honest, the proximity of 10gen to Etsy’s Brooklyn headquarters as well as the responsiveness of Eliot and his team to questions was also a factor in our decision. (In the interest of full disclosure: 10gen shares investors with Etsy.)

Stay Tuned

In upcoming posts, we’ll dive deeper into our production experience with MongoDB. In short, things are going well, but we have learned some lessons the hard way. Hopefully, we can help you avoid making the same mistakes. Next time, John Allspaw will talk about MongoDB from the perspective of an operations professional. See you then!