Goodbye global lock – MongoDB 2.0 vs 2.2

By David Mytton,

CEO & Founder of Server Density.

Published on the 23rd May, 2012.

Perhaps the most oft-cited problem with MongoDB is the infamous global lock. In general terms, this means that the entire server is locked when you perform a write operation. This sounds bad but is actually blown out of proportion compared to the real world in production impact. It has been improved over the versions and MongoDB 2.0 includes significant improvements in relation to how the server will yield for these kinds of operations.

Indeed, in our own setup we used to throttle inserts through Memcached before going into MongoDB 1.8 but having upgraded to 2.0, we were able to eliminate the throttling and insert directly to MongoDB. This improvement is illustrated well by some benchmarks at the end of last year comparing v1.8 to 2.0.

Nevertheless, a major focus for the upcoming 2.2 release has been removing the global lock and introducing database level locking as an initial step towards collection level locking and potentially even more granular concurrency in future releases.

There are 2 parts to the improvements in v2.2:

Elimination of the global reader/writer lock – database level locks as the first step.

PageFaultException architecture – yield lock on page fault.

The first is the true database level locking but may require some architecture changes to your application e.g. if you use one large collection it makes little difference because you’re still writing to a single database. However, the second improves concurrency within a single collection and it is this that will likely provide immediate benefits for users upgrading.

Dwight Merriman, CEO of 10gen and one of the MongoDB original authors gave a good talk at MongoSF about the internals of these changes so it is recommended that you watch the video explaining both of these points.

10gen do not provide official benchmarks because they tend to be irrelevant to real world usage. For your own purposes you should use something like benchRun to see how your queries will be affected by upgrades. That said, benchmarks can be useful in certain situations, such as to demonstrate these kind of differences between versions.

Rick Copeland did some excellent benchmarks to look at the improvements between v1.8 and 2.0 so I decided to run them against MongoDB 2.1(.1) as well as 2.0 and 1.8. Remember that v2.1 is the development version which will turn into the 2.2 stable release.

Comparing v1.8, v2.0 and v2.1

I used exactly the same code that Rick used in his original benchmarks by launching his AMI on Amazon EC2 (m1.large), set up the same database and ran the benchmarks for faulting reads and writes. I didn’t run the non-faulting benchmarks because the major changes in 2.2 are to do with how page faults and locks are handled and reading/writing to memory isn’t going to provide any interesting differences – it’ll always be fast! And when you get to the large data volumes MongoDB is supposed to be good at, you’re unlikely to have all your data + indexes in RAM and instead be using the working set in memory concept.

In the above graph I’m essentially reproducing Rick’s results then adding the MongoDB 2.1 tests. The difference is significant – there is no dropoff in performance regardless of the number of faulting writes against reads. The reason for this is because the global lock is completely gone, which is illustrated by the graph below.

Here, I took Rick’s experiment further to collect statistics from mongostat in order to understand what is happening on the mongod and throughout the entire run the time spent in the global lock was 0%.

Conclusions

The global lock is gone in MongoDB 2.2 which offers major improvements if you use many databases, but the real impact for anyone upgrading is how yielding works with the PageFaultException improvements. This is because of the way MongoDB will detect the page fault and touch the page before the mutation has occurred during the write.

The first graph shows that MongoDB is able to maintain consistent performance with no drop off during the tests.

Since Rick’s code just does queries against a single DB these benchmarks are showing the improvements just from the second PageFaultException improvements, which is probably what most people upgrading will be interested in. It would also be interesting to benchmark activity across multiple databases across the versions to see how that has improved.