MongoDB on Google Compute Engine – tips and benchmarks

By David Mytton,

CEO & Founder of Server Density.

Published on the 1st April, 2014.

Over the last 4 years running MongoDB in production at Server Density, I’ve been able to work on deployments on dedicated hardware, VMs and across multiple cloud providers.

The best environment has always been dedicated servers because of the problems with host contention, particularly with CPU and disk i/o but Google has been quite vocal about the consistency and performance of their Compute Engine product, particularly about how they’ve eliminated the noisy neighbour problem with intelligent throttling.

So I thought I’d try it out to see how MongoDB performs on Google Compute Engine.

Testing the Write Concern – performance vs durability

The MongoDB Write Concern is historically controversial because Mongo was originally designed to get very high write throughput at the expense of durability but this wasn’t well documented. The default was changed a while back and it now gives you an acknowledgement that a write was accepted, but is still quite flexible to allow you to tune whether you want speed or durability.

I am going to test a range of write concern options to allow us to see what kind of response times we can expect:

Unacknowledged: w = 0 AND j = 0

This is a fire and forget write where we don’t know if the write was successful and detection of things like network errors is uncertain.

Acknowledged: w = 1 AND j = 0 (the default)

This will give us an acknowledgment that the write was successfully received but no indication that it was actually written. This picks up most errors e.g. parse errors, network errors etc.

Journaled: w = 1 AND j = 1

This will cause the write to wait until it has been both acknowledged and written to the journal of the primary replica set member. This gives you single server durability but doesn’t guarantee the data has been replicated to other members of your cluster. In theory you could have a data center failure and lose the write.

Replica acknowledged: w = 2 AND j = 0

The test will give us an idea how long it takes for the write to be acknowledged on the replica set primary and acknowledged by at least 1 other member of the replica set. This gives us some durability across 2 servers but in theory the write could still fail on both because we are not doing a check for the write hitting the journal.

Replica acknowledged and Journaled: w = 2 AND j = 1

This ensures that the write is successfully written to the primary and has been acknowledged by at least one of the replica set members.

Replica acknowledged with majority: w = majority AND j = 0

In a multi datacenter environment you want to know that your writes are safely replicated. Using the majority keyword will allow you to be sure that the write has been acknowledged on the majority of your replica set members. If you have the set deployed evenly across data centers then you know that your data is safely in multiple locations.

Replica acknowledged with majority and journaled: w = majority AND j = 1

Perhaps the most paranoid mode, we will know that the write was successfully acknowledged by the primary and was replicated to a majority of the nodes.

Environment configuration

Replica sets

Real world applications use replica sets to give them redundancy and failover capabilities across multiple data centers. To accurately test this, the test environment will involve 4 data nodes across 2 zones: x2 in the us-central1-a zone and x2 in the us-central1-b zone.

In a real deployment you must have a majority to maintain the set in the event of a failure, so we should deploy a 5th node as an arbiter in another data center. I’ve not done this here for simplicity.

Google Compute Engine

I tested with the n1-standard-2 (2 vCPUs and 7.5GB RAM) and n1-highmem-8 (8 vCPUs and 52GB RAM) instance types – with the backports-debian-7-wheezy-v20140318 OS image.

Be aware that the number of CPU cores your instance has also affects the i/o performance. For maximum performance then you need to use the 4 or 8 core instance types even if you don’t need all the memory they provide.

There is also a bug in the GCE Debian images where the default locale isn’t set. This prevents MongoDB from starting properly from the Debian packages. The workaround is to set a default:

sudo locale-gen en_US.UTF-8

sudo dpkg-reconfigure locales

Google Persistent Disks

It’s really important to understand the performance characteristics of Google Persistent Disks and how IOPs scale linearly with volume size. Here are the key things to note:

At the very least you need to mount your MongoDB dbpath on a separate persistent disk. This is because the default root volume attached to every Compute Engine instance is very small and will therefore have poor performance. It does allow bursting for the OS but this isn’t sufficient for MongoDB which will typically have sustained usage requirements.

on a separate persistent disk. This is because the default root volume attached to every Compute Engine instance is very small and will therefore have poor performance. It does allow bursting for the OS but this isn’t sufficient for MongoDB which will typically have sustained usage requirements. Use directoryperdb to give each of your databases their own persistent disk volume. This allows you to optimise both performance and cost because you can resize the volumes as your data requirements grow and/or to gain the performance benefits of more IOPs.

to give each of your databases their own persistent disk volume. This allows you to optimise both performance and cost because you can resize the volumes as your data requirements grow and/or to gain the performance benefits of more IOPs. Putting the journal on a separate volume is possible even without directoryperdb because it is always in its own directory. Even if you don’t put your databases on separate volumes, it is worth separating the journal onto its own persistent disk because the performance improvements are significant – up to x3 (see results below).

because it is always in its own directory. Even if you don’t put your databases on separate volumes, it is worth separating the journal onto its own persistent disk because the performance improvements are significant – up to x3 (see results below). You may be used to only needing a small volume for the journal because it uses just a few GB of space. However, allocating a small persistent disk volume will mean you get poor performance because the available IOPs increase with volume size. Choose a volume of at least 200GB for the journal.

If you split all your databases (or even just the journal) onto different volumes then you will lose the ability to use snapshots for backups. This is because the snapshot across multiple volumes won’t necessarily happen at the same time and will therefore be inconsistent. Instead you will need to shut down the mongod (or fsync lock it) and then take the snapshot across all disks.

I’ve run the testing several times with different disk configurations so I can see the different performance characteristics:

With no extra disks i.e. dbpath on the default 10GB system volume With a dedicated 200GB persistent disk for the dbpath With a dedicated 200GB persistent disk for the dbpath and another dedicated 200GB persistent disk for the journal

Test methodology

I wrote a short Python script to insert a static document into a collection. This was executed 1000 times and repeated 3 times. The Python timeit library was used to complete the tests so the fastest time was taken, as per the docs indicating that the mean/standard deviation of the 3 test cycles is not that useful.

Results – MongoDB performance on Google Compute Engine

n1-standard-2

n1-highmem-8

Conclusions

There’s not much difference in performance when you’re using a single persistent disk volume until you start increasing the durability options because acknowledged and unacknowledged writes are just going to memory. When you increase the write concern options then you become limited by the disk performance and it’s clear that splitting the journal onto a separate volume makes a significant difference.

The increased performance of the larger n1-highmem-8 instance type with 8 cores vs 2 cores of the n1-standard-2 is also apparent in the figures – although the actual difference is quite small, it is still a difference which would likely help in a real world environment.

The key takeaway is that performance decreases as durability increases – this is a tradeoff. For maximum performance with durability then you can take advantage of Compute Engine’s higher spec instance types and larger, separate persistent disk volumes per database / journal.