Using MongoDB as a Time Series Database

By David Mytton,

CEO & Founder of Server Density.

Published on the 24th September, 2015.

We’ve used MongoDB as a time series database since 2009.

MongoDB helps us scale for the expanding volumes of data we collect in our server monitoring service. Over the years, we went from processing 40GB per month, to more than 250TB.

By the way, while it has served us well over the years, I’m not necessarily advocating MongoDB as the best possible database for time-series data. It’s what we’ve used so far (and we’re always evaluating alternatives).

On that basis, I’ve written a few times about how we use MongoDB (here is a recent look at the tech behind our time series graphs). As part of our upcoming Story of the Payload campaign (stay tuned!) I thought I’d revisit our cluster setup with a detailed look at its inner workings.

The hardware

Over the years, we’ve experimented with a range of infrastructure choices. Those include Google Compute Engine, AWS, SSDs versus spinning disks, VMWare, and our transition from managed cloud to Softlayer where we are today. We standardize on Ubuntu Linux and the cluster is configured as follows:

Ubuntu Linux 12.04 LTS

We run every server on an LTS release and upgrade to the next LTS on a fixed schedule. We can speed up specific upgrades if our team needs access to newer features or bundled libraries.

Bare metal servers

We experimented with VMs in the past and found host contention to be an issue, even with guaranteed disk I/O performance (products like AWS EBS or Compute Engine SSDs offer this option, unlike Softlayer that doesn’t).

Solid State Disks

We have multiple SSDs, and house each database—including the journal—on its own disk.

Everything is managed with Puppet

We used to write our own manifests. The official Forge MongoDB module has since become a better option so we are migrating to it.

As for the servers themselves, they have the following specs:

x2 2GHz Intel Xeon-SandyBridge (E5-2650-OctoCore) (16 cores total)

16x16GB Kingston 16GB DDR3

x1 100GB Micron RealSSD P300 (for the MongoDB journal)

x2 800GB Intel S3700 Series (one per database)

The MongoDB cluster

Our current environment has been in use for 18 months. During this time we scaled both vertically (adding more RAM) and horizontally (adding more shards). Here are some details and specs:

x3 data node replica sets plus 1 arbiter per shard.

x2 nodes in the primary data centre at Washington DC, and a failover node at San Jose, CA. The arbiter is housed in a third data centre in Dallas, TX.

x5 shards with distribution based on item ID (server, for example) and metric. This splits up a customer’s record across multiple shards for maximum availability. MongoDB deals with balancing using hash-based sharding.

The average workload is around 6000 writes/sec which equates to about 500,000,000 new documents per day.

We use the MongoDB Cloud Backup service which offers real-time offsite backups. It acts as a replica node for each replica set. It receives a (compacted and compressed) copy of every write operation. Current throughput sits at a sustained 42 Mbps.

We use the Google Compute Engine and MongoDB Cloud Backup service API to restore our backups and verify them against our production cluster, twice per day.

We keep a copy of the backup in Google’s Cloud Storage in Europe as a final disaster recovery option. We store copies twice per day, going back for 10 days.

The Data

The write workload of the cluster consists of inserts and updates, for the most part.

For the lowest granularity level of data, we use an append-only schema where new data is inserted and never updated. These writes take approx 2-3ms.

For the hourly average type of metrics (we keep those forever – check out our monitoring graphs) we allocate a document per day, per metric, per item. This document gets updated with the sum and count. From that, we calculate the mean average when we query the data. These writes typically complete within 500 ms.

We optimise for writes in place, use field modifiers and avoid growing documents by pre-allocation. Even so, there is a large overhead associated with updating documents.

When querying the data (drawing graphs for example) the average response time as experienced by the user is 189 ms. Median response is 39 ms, the 95th percentile is 532 ms, and 99th percentile is 1400 ms. The majority of that time is used by application code as it constructs a response for the API. The slowest of queries are the ones comprising multiple items and metrics over a wide time range. If we were to exclude our application code, the average MongoDB query time is 0.0067ms.

Summary

So that’s the MongoDB cluster setup we have here at Server Density. We would like to hear from you now. How do you use MongoDB? What does you cluster look like, and how has it evolved over time?