This short post will look at the performance of MongoDB queries against indexed and non-indexed collections, for various (virtual) server hardware configurations. We’ll look at the performance gains possible through indexing and talk briefly about their implications. This is not meant to be a definitive treatment on the topic of indexing or MongoDB. Your mileage may vary.

We will spend nearly all our time in the mongo shell. This post assumes that the reader has MongoDB installed locally, and has basic familiarity with MongoDB and mongo shell commands. The mongo shell is a JavaScript environment; we’ll use basic JavaScript in the examples below.

For a nice introduction to MongoDB, refer to An Overview of MongoDB & Mongoose by Jim Medlock.

Indexes

MongoDB indexes are defined at the collection level, and can be created on any field (or combination of fields) present in that collection’s documents. By default, MongoDB creates an index on the _id field. We can create additional indexes to support our application’s access patterns. This simply means that we should create indexes for fields that we commonly include in our queries. See here for more information on indexing strategies.

Setup

We need a database. In the mongo shell, the use command creates (or switches to) a specified database. We’ll call ours indexingdemo :

use indexingdemo

Now we need some documents. We’ll use JavaScript in the mongo shell to generate an array of 3 million documents. Then we can bulk-insert the array into the database. Our collection will consist of incrementing student IDs and random scores for quizzes, exams and homework.

I use a nested for loop to populate a docs array; 1 million outer iterations * 3 inner iterations = 3 million total documents:

var docs = [];

for (var i = 0; i < 1000000; i++) {

var types = ["quiz", "exam", "homework"];

for (var j = 0; j < 3; j++) {

docs.push({

student : i,

type : types[j],

score : Math.round(Math.random()*100)

});

}

}

Generating 3 million documents …

Then insert the array of documents into the scores collection (which will be created on the fly):

db.scores.insert(docs)

That could take a few minutes depending on the speed of your hardware. When finished, MongoDB will return a BulkWriteResult . The nInserted property should equal 3000000:

… and the insertion results.

We can query our database now. For example, we may want to see all of one student’s scores:

db.scores.find({ student: 585534 }).pretty()

Chain the .pretty() method for formatted results.

Our query returns 3 documents — one for each of the student’s quiz , exam , and homework scores.

Before creating our index, let’s get a performance baseline. We can chain the .explain() method to the end of our query, passing in "executionStats" to return detailed performance information:

db.scores.find({ student: 585534 }).explain("executionStats")

That produces a lot of output. Scroll up a bit to find the property executionTimeMillis . We’ll use this execution time when comparing test runs.

Execution time: approx 1 second

Now let’s create our index. Assume that queries typically include both student and type fields. We will create a compound index on student and type . This will support efficient queries for just student or student and type .

db.scores.createIndex({ student: 1, type: 1 })

Create a compound index

Note the "numIndexesBefore" : 1 — that prior index is the one MongoDB automatically created for ObjectId .

Now let’s re-run the test query:

db.scores.find({ student: 585534 }).explain("executionStats")

You should notice that the query feels faster — it should complete almost instantaneously. Because — haha! — it does:

Yeah — zero milliseconds

Now, executionTimeMillis of 0ms doesn’t really mean it took zero time. If I re-run the query multiple times I occasionally see some non-zero results. Either way, the indexed query is fast.

The above results were obtained in a VirtualBox VM. Windows 7 Pro x64 host OS + Ubuntu 16.04 guest OS using 2 CPU cores (Intel Core i5–3470) and 4GB of RAM. Not elite hardware by today’s standards.

Scaling

Since this is a virtual machine and I have control over the system resources, I can examine how performance scales across various (virtual) hardware configurations by adjusting the VM’s settings. In VirtualBox this is as easy as moving some sliders around. Let’s play with CPU core count and RAM.

What scaling?

The indexed results (in blue) for each configuration are to the left of each of the non-indexed results (in orange) — they’re not easy to see! Each bar in the chart represents the average of 5 runs for each of the following CPU/RAM combinations:

Indexed avg Non-Indexed avg

--------------------------------------------------------

2 cores, 4 GB RAM 0 ms 1100 ms

2 cores, 3 GB RAM 0 ms 1113 ms

2 cores, 2 GB RAM 0 ms 1062 ms

2 cores, 1 GB RAM 2 ms 1196 ms 1 cores, 4 GB RAM 1 ms 1102 ms

1 cores, 3 GB RAM 2 ms 1088 ms

1 cores, 2 GB RAM 1 ms 1061 ms

1 cores, 1 GB RAM 2 ms 1140 ms

What’s going on here? First, we can generalize that our sample query’s performance is not bound by the number of CPU cores or system RAM ≥ 1GB. In all configurations, indexed queries execute within 1–2 ms of zero and non-indexed queries complete in a little over 1 second.

Why?

Without indexing MongoDB has to perform a full collection scan, which means hitting the disk(s). My host system has a single Crucial MX100 SATA3 SSD. I suspect the non-indexed results are limited by my disk I/O. To test this, I created a collection of 1.5 million documents (half the original size) and re-ran the queries. I did not test VM configurations other than the min/max configurations used above. The results below are averages of 5 runs:

Indexed avg Non-Indexed avg

--------------------------------------------------------

2 cores, 4 GB RAM 0 ms 497 ms

1 cores, 1 GB RAM 1 ms 552 ms

Non-indexed queries against a collection of 1.5 million documents execute in about half a second. Compared to the 3 million document collection: half the documents = half the time. This supports my hypothesis re: full collection scans in a disk I/O limited environment.

But what about the lack of scaling for our indexed queries? Recall that our results are essentially 0ms regardless of hardware config. To answer this, let’s begin by looking at the size of our index:

db.scores.totalIndexSize()

Index size (in bytes)

Our index is 85,639,168 bytes (81.67 MB). For best performance, indexes should fit in available RAM. And even with just 1 GB of system RAM there is plenty of available RAM to hold our index.

Conclusions

The main takeaway is obvious: create indexes to support your queries — the benefits are incredible! Otherwise queries require full collection scans, which are limited by your server’s disk subsystem I/O. Server CPU and RAM resources were less of a factor than expected. This is encouraging for those who use inexpensive products from VPS hosting providers like Digital Ocean and OVH.