Notes from a production MongoDB deployment

By David Mytton,

CEO & Founder of Server Density.

Published on the 28th February, 2010.

Back in July last year I wrote about our migration from MySQL to MongoDB. We have been running MongoDB in production for our server monitoring service, Server Density, since then – 8 months – and have learnt quite a few things about it. These are the kind of things that you only experience once you reach a certain level of usage and it is surprising how few issues we’ve had. Where a problem or bug has cropped up, it has been quickly resolved/fixed by the MongoDB dev team; so we’re very happy.

A few stats

Taken from our MongoDB instances as of Fri 26th Feb:

Collections (tables): 17,810

Indexes: 43,175

Documents (rows): 664,158,090

We currently have a manual failover setup with a single master and slave. The master has 72GB RAM and the slave is in a different DC. Given disk space limits, we are in the final stages of migrating to using automated replica pairs with manual sharding across 4 database servers (x2 masters, x2 slaves in different DCs). There were some delays deploying our new servers but we’re expecting all data to finish synching this coming week so we can switch over. This is a classic move from initial hardware (vertical) scaling to adding more servers (horizontal) as we grow. Although manually sharded for now, we’re planning to switch to the auto-sharding coming in MongoDB soon.

Namespace limits

We split our customers across (currently) 3 MongoDB databases because there is a namespace limit of 24,000 per database. This is essentially the number of collections + number of indexes.

You can check the total number of namespaces in use for a particular DB by using the command db.system.namespaces.count() from the MongoDB console. You can also increase this limit by using the –nssize paramter but there are tradeoffs and limits for this. See the MongoDB docs for details.

Durability

There is no full single server durability in MongoDB. This has been highlighted by the MongoDB developers themselves but essentially means you must run multiple servers in replication. If you suffer a power loss and/or MongoDB exits uncleanly then you will need to repair your database. If it’s small this is a simple process from the console, but it involves MongoDB going through every single document and rebuilding it. When you have databases at the size of ours, this would take hours.

You can use master/slave replication (ideally with the slave server in a different data centre) but if there’s a failure on the master you need to failover manually. Instead, you can use replica pairs which handle deciding which is the master between them, and in the event of a switchover will become eventually consistent when they both come back online.

Initial sync/replication of large databases

Our databases are very large and it takes about 48-72 hours to fully sync all our current data onto a new slave in a different DC (via a site-to-site VPN for security). During this time you’re at risk because the slave is not up to date.

Further, you need to ensure that your op log is large enough to hold all actions since the sync begins. MongoDB takes a snapshot from the time you start the sync and then when it’s done, uses the op log to catch up with actions since.

We found we needed a 75GB op log, which can be specified with the –oplogSize command line parameter. However, these files are allocated before MongoDB starts accepting connections. You therefore have to wait whilst the OS creates ~37 2GB files for the logs. On our fast 15k SAS disks this takes between 2-7 seconds per file – 5 mins – but on some of our older test boxes it took up to 30 seconds (~20 mins). During this time your database will be unavailable.

This is particularly a problem if you’re restarting an existing single server in replication mode, or are setting up a replica pair – the restart will take time as the files are allocated.

To avoid this problem, you can create the files yourself and MongoDB will then not have to do it. Run the following from the console, as soon as you hit return after “done” it’ll start making the files:

[sourcecode language=”bash”]

for i in {0..40}

do

echo $i

head -c 2146435072 /dev/zero > local.$i

done[/sourcecode]

Stop MongoDB, make sure any existing local.* files are removed and then move these new ones into your data directory, and restart MongoDB.

This will work for –oplogSize=75000. Note that creating the files will hog the filesystem I/O and slow everything down, but it won’t take your database offline. If you prefer, make the files on another system then transfer them over the network.

Initial sync “slows” things

When doing a fresh sync from a master to a slave, we have observed a “slowdown” in our application response times. We’ve not looked into this in detail because it’s still within acceptable response time ranges but it seems to be caused by a slightly higher wait time when our application is connecting and querying MongoDB. This causes http processes to hang around a little longer for each request and so CPU load increases.

My theory is that because the slave is reading all data from every collection in every database, the cache is being invalidated often and so reads can’t take advantage of it as well. But I’m not sure – it’s not been reported to MongoDB as it’s not really a problem. It would, however, be a problem, if your server was unable to handle the load from the Apache processes. It’s quite an edge case though.

Running MongoDB as a daemon/forked process + logging

Back in the olden days we ran MongoDB in a screen session but you can now just specify the –fork parameter and MongoDB will start as a forked process. If you do this, be sure to specify –logpath so you can check for any errors. We also use the –quiet option otherwise too much gets logged and rapidly fills up the file. There is no inbuilt log rotation.

OS tweaks

We came across a problem with too many open files which was caused by the default OS file descriptor limit. This is set at 1024 which can be too low. MongoDB now has documentation about this but you can set your limit higher on Red Hat systems by editing the /etc/security/limits.conf file. You also need to set UsePAM to yes in /etc/ssh/sshd_config to have it take effect when you log in as your user.

We also disabled atime on our database servers so that the filesystem isn’t caught up updating timestamps every time MongoDB reads from all its files.

Index creation blocks

We create all our indexes at the very beginning so the initial index process is pretty much instantaneous. However, if you have an existing collection and create a new index on it then that process will block the database until the index is created.

This is resolved in the development version of MongoDB (1.3) which introduces background indexing as an option.

Efficiency of reclaiming diskspace

The nature of our server monitoring application means we collect a lot of data, but it is deleted after a set retention period. We have found that there is a massive discrepancy between a master and a freshly copied slave. Obviously the slave is copying the data and storing as optimally as possible (no gaps between sectors where stuff has been deleted) so the very first sync will use less disk space than the master. However, we have seen a master using almost 900GB where a fresh slave uses only 350GB.

We have an open issue with MongoDB commercial support about this.

Support is excellent

Even before 10gen (the company behind MongoDB) got their $1.5m venture funding, their support was excellent. And it still is. We use their gold commercial support service and it has proved useful when we encountered many of the issues above. Opening tickets is easy and we get extremely fast responses from developers. I can also call and speak to them 24/7 in emergencies.

And if you don’t want / need to pay then their open mailing list is still very good. We need the ability to get support very quickly 24/7 but I’ve always had very fast replies from the mailing list.

Conclusion – was it the right move?

Yes. MongoDB has been an excellent choice. Administration of massive databases is easy, it scales extremely well and support is top notch. The only thing I’m missing and waiting for now is auto sharding. We’re doing manual sharding but having it handled by MongoDB is going to be very cool!