This is a post by Sandeep Parikh, Solutions Architect at MongoDB and Kelly Stirman, Director of Products at MongoDB.

Data as Ticker Tape

New York is famous for a lot of things, including ticker tape parades.

For decades the most popular way to track the price of stocks on Wall Street was through ticker tape, the earliest digital communication medium. Stocks and their values were transmitted via telegraph to a small device called a “ticker” that printed onto a thin roll of paper called “ticker tape.” While out of use for over 50 years, the idea of the ticker lives on in scrolling electronic tickers at brokerage walls and at the bottom of most news networks, sometimes two, three and four levels deep.

Today there are many sources of data that, like ticker tape, represent observations ordered over time. For example:

Financial markets generate prices (we still call them “stock ticks”).

Sensors measure temperature, barometric pressure, humidity and other environmental variables.

Industrial fleets such as ships, aircraft and trucks produce location, velocity, and operational metrics.

Status updates on social networks.

Calls, SMS messages and other signals from mobile devices.

Systems themselves write information to logs.

This data tends to be immutable, large in volume, ordered by time, and is primarily aggregated for access. It represents a history of what happened, and there are a number of use cases that involve analyzing this history to better predict what may happen in the future or to establish operational thresholds for the system.

Time Series Data and MongoDB

Time series data is a great fit for MongoDB. There are many examples of organizations using MongoDB to store and analyze time series data. Here are just a few:

Silver Spring Networks, the leading provider of smart grid infrastructure, analyzes utility meter data in MongoDB.

EnerNOC analyzes billions of energy data points per month to help utilities and private companies optimize their systems, ensure availability and reduce costs.

Square maintains a MongoDB-based open source tool called Cube for collecting timestamped events and deriving metrics.

Server Density uses MongoDB to collect server monitoring statistics.

Appboy, the leading platform for mobile relationship management, uses MongoDB to track and analyze billions of data points on user behavior.

Skyline Innovations, a solar energy company, stores and organizes meteorological data from commercial scale solar projects in MongoDB.

One of the world’s largest industrial equipment manufacturers stores sensor data from fleet vehicles to optimize fleet performance and minimize downtime.

In this post, we will take a closer look at how to model time series data in MongoDB by exploring the schema of a tool that has become very popular in the community: MongoDB Management Service (MMS). MMS helps users manage their MongoDB systems by providing monitoring, visualization and alerts on over 100 database metrics. Today the system monitors over 25k MongoDB servers across thousands of deployments. Every minute thousands of local MMS agents collect system metrics and ship the data back to MMS. The system processes over 5B events per day, and over 75,000 writes per second, all on less than 10 physical servers for the MongoDB tier.

Schema Design and Evolution

How do you store time series data in a database? In relational databases the answer is somewhat straightforward; you store each event as a row within a table. Let’s say you were monitoring the amount of system memory used per second. In that example you would have a table and rows that looked like the following:

timestamp memory_used 2013-10-10T23:06:37.000Z 1000000 2013-10-10T23:06:38.000Z 2000000





If we map that storage approach to MongoDB, we would end up with one document per event:

{ timestamp: ISODate("2013-10-10T23:06:37.000Z"), type: ”memory_used”, value: 1000000 }, { timestamp: ISODate("2013-10-10T23:06:38.000Z"), type: ”memory_used”, value: 15000000 }

While this approach is valid in MongoDB, it doesn’t take advantage of the expressive nature of the document model. Let’s take a closer look at how we can refine the model to provide better performance for reads and to improve storage efficiency.

The Document-Oriented Design

A better schema approach looks like the following, which is not the same as MMS but it will help to understand the key concepts. Let’s call it the document-oriented design:

{ timestamp_minute: ISODate("2013-10-10T23:06:00.000Z"), type: “memory_used”, values: { 0: 999999, … 37: 1000000, 38: 1500000, … 59: 2000000 } }

We store multiple readings in a single document: one document per minute. To further improve the efficiency of the schema, we can isolate repeating data structures. In the ```timestamp_minute``` field we capture the minute that identifies the document, and for each memory reading we store a new value in the ```values``` sub-document. Because we are storing one value per second, we can simply represent each second as fields 0 - 59.

More Updates than Inserts

In any system there may be tradeoffs regarding the efficiency of different operations, such as inserts and updates. For example, in some systems updates are implemented as copies of the original record written out to a new location, which requires updating of indexes as well. One of MongoDB’s core capabilities is the in-place update mechanism: field-level updates are managed in place as long as the size of the document does not grow significantly. By avoiding rewriting the entire document and index entries unnecessarily, far less disk I/O is performed. Because field-level updates are efficient, we can design for this advantage in our application: with the document-oriented design there are many more updates (one per second) than inserts (one per minute).

For example, if you wanted to maintain a count in your application, MongoDB provides a handy operator that increments or decrements a field. Instead of reading a value into your application, incrementing, then writing the value back to the database, you can simply increase the field using $inc:

```{ $inc: { pageviews: 1 } }```

This approach has a number of advantages: first, the increment operation is atomic - multiple threads can safely increment a field concurrently using $inc. Furthermore, this approach is more efficient for disk operations, requires less data to be sent over the network and requires fewer round trips by omitting the need for any reads. Those are three big wins that result in a more simple, more efficient and more scalable system. The same advantages apply to the use of the $set operator.

The document-oriented design has several benefits for writing and reading. As previously stated, writes can be much faster as field-level updates because instead of writing a full document we’re sending a much smaller delta update that can be modeled like so:

db.metrics.update( { timestamp_minute: ISODate("2013-10-10T23:06:00.000Z"), type: ”memory_used” }, {$set: {“values.59”: 2000000 } } )

With the document-oriented design reads are also much faster. If you needed an hour’s worth of measurements using the first approach you would need to read 3600 documents, whereas with this approach you would only need to read 60 documents. Reading fewer documents has the benefit of fewer disk seeks, and with any system fewer disk seeks usually results is significantly better performance.

A natural extension to this approach would be to have documents that span an entire hour, while still keeping the data resolution per second:

{ timestamp_hour: ISODate("2013-10-10T23:00:00.000Z"), type: “memory_used”, values: { 0: 999999, 1: 1000000, …, 3598: 1500000, 3599: 2000000 } }

One benefit to this approach is that we can now access an hour’s worth of data using a single read. However, there is one significant downside: to update the last second of any given hour MongoDB would have to walk the entire length of the “values” object, taking 3600 steps to reach the end. We can further refine the model a bit to make this operation more efficient:

{ timestamp_hour: ISODate("2013-10-10T23:00:00.000Z"), type: “memory_used”, values: { 0: { 0: 999999, 1: 999999, …, 59: 1000000 }, 1: { 0: 2000000, 1: 2000000, …, 59: 1000000 }, …, 58: { 0: 1600000, 1: 1200000, …, 59: 1100000 }, 59: { 0: 1300000, 1: 1400000, …, 59: 1500000 } } }