Introduction

In MongoDB, whenever we create a query, we must ask ourselves if this query is efficient. It is important to optimize these operations to make sure we properly use the resources we have.

When it comes to queries, the explain() method is a great tool. explain() is the best way to know what happens during a query.

The method will help you answer the following questions:

Is the query using the index we expect?

Does our index provide a sort? ( or is the query using a in-memory sort?)

Is our query using the index to provide the projection?

How selective is our index?

Which part of the query plan is the most expensive?

Creating data and explainable objects

First, we will import some data. You can retrieve the json file at this url.

I saved it as primer-dataset.json. Next, we need to import our data. To do this, we will use mongoimport. Make sure you have a mongod instance running and type the following command in a new terminal window ( adapt the path of your file in the command ):

mongoimport --db test --collection restaurants --drop --file ~/Desktop/primer-dataset.json

This will import our data in the test database, in the restaurants collection. It will first drop any collection with the name restaurants if one was present.

Let's check everything is there. Launch a mongo shell and check the content of test.restaurants:



> use test switched to db test > db.restaurants.find().count() 25359

Perfect! Now, let's create explainable objects. explain() can be used with three different arguments:

explain("queryPlanner") => The default, the one you will have if you don't specify any arguments.

explain("executionStats") => This one will give you detailed informations about the winning plan.

explain("allPlansExecution") => The most verbose. This will give you detailed informations about the winning plan and the rejected plans.

Note: The default explain() does NOT execute the query. "executionStats" and "allPlansExecution" will execute the query.

To avoid repeating ourselves, let's create explainable object like so:



> exp = db.restaurants.explain() Explainable(test.restaurants) > expExecStats = db.restaurants.explain("executionStats") Explainable(test.restaurants) > expVerbose = db.restaurants.explain("allPlansExecution") Explainable(test.restaurants)

This will allow us to run queries like so :

exp.find({restaurant_id : "50013447"})

Great, now let's see what explain is all about.

Explain output

Let's start with the default output (or "queryPlanner"), with the query shown above:



> exp.find({restaurant_id : "50013447"}) { "queryPlanner" : { "plannerVersion" : 1, "namespace" : "test.restaurants", "indexFilterSet" : false, "parsedQuery" : { "restaurant_id" : { "$eq" : "50013447" } }, "winningPlan" : { "stage" : "COLLSCAN", "filter" : { "restaurant_id" : { "$eq" : "50013447" } }, "direction" : "forward" }, "rejectedPlans" : [ ] }, "serverInfo" : { "host" : "pc4.home", "port" : 27017, "version" : "3.4.6", "gitVersion" : "c55eb86ef46ee7aede3b1e2a5d184a7df4bfb5b5" }, "ok" : 1 }

Whenever we are trying to optimize a query, the most interesting thing is to look at the winning plan. This is the plan MongoDB chose to execute this query. In this case, we have no indexes on the collection, therefore, there is only one plan available.

If we had several plans available, MongoDB would try each plan over a short period of time and see which one gave the best results. The best could be about the amount of time it took to complete or the number of documents returned in that period of time.

MongoDB will then cache this plan, so it will not compare available plans every single time for similar queries. But, after certain events, such as the server restarting or an index creation, MongoDB will compare the plans again to see if a better plan can be used.

Here, we do a collection scan (COLLSCAN). Not much going on with this without any index. Let's run the same thing with executionStats, which is probably the one you will run the most:



> expExecStats.find({restaurant_id: "50013447"}) { "queryPlanner" : { "plannerVersion" : 1, "namespace" : "test.restaurants", "indexFilterSet" : false, "parsedQuery" : { "restaurant_id" : { "$eq" : "50013447" } }, "winningPlan" : { "stage" : "COLLSCAN", "filter" : { "restaurant_id" : { "$eq" : "50013447" } }, "direction" : "forward" }, "rejectedPlans" : [ ] }, "executionStats" : { "executionSuccess" : true, "nReturned" : 1, "executionTimeMillis" : 52, "totalKeysExamined" : 0, "totalDocsExamined" : 25359, "executionStages" : { "stage" : "COLLSCAN", "filter" : { "restaurant_id" : { "$eq" : "50013447" } }, "nReturned" : 1, "executionTimeMillisEstimate" : 57, "works" : 25361, "advanced" : 1, "needTime" : 25359, "needYield" : 0, "saveState" : 200, "restoreState" : 200, "isEOF" : 1, "invalidates" : 0, "direction" : "forward", "docsExamined" : 25359 } }, "ok" : 1 }

We have a new key "executionStats" in our output. Let's break it down:

nReturned: the number of documents returned ( 1 )

executionTimeMillis: time to execute the query ( 52 milliseconds )

totalKeysExamined: if an index is used, how many keys MongoDB had to look at to return the documents needed ( no index here, so 0)

totalDocsExamined: how many documents MongoDB had to examined to return the ones we needed ( 25359 )

We can see right away that this query is not efficient at all. We have to examine 25359 documents to return 1. It makes sense because we are doing a collection scan. A collection scan means we are looking at every single document to see if it matches our filter. The ratio totalDocsExamined/nReturned needs to be better. This is where an index would be useful.

Let's create an index on the restaurant_id field.



db.restaurants.createIndex({restaurant_id: 1}) { "createdCollectionAutomatically" : false, "numIndexesBefore" : 1, "numIndexesAfter" : 2, "ok" : 1 }

Great, now let's run our same query again:



> expExecStats.find({restaurant_id: "50013447"}) { "queryPlanner" : { "plannerVersion" : 1, "namespace" : "test.restaurants", "indexFilterSet" : false, "parsedQuery" : { "restaurant_id" : { "$eq" : "50013447" } }, "winningPlan" : { "stage" : "FETCH", "inputStage" : { "stage" : "IXSCAN", "keyPattern" : { "restaurant_id" : 1 }, "indexName" : "restaurant_id_1", "isMultiKey" : false, "multiKeyPaths" : { "restaurant_id" : [ ] }, "isUnique" : false, "isSparse" : false, "isPartial" : false, "indexVersion" : 2, "direction" : "forward", "indexBounds" : { "restaurant_id" : [ "[\"50013447\", \"50013447\"]" ] } } }, "rejectedPlans" : [ ] }, "executionStats" : { "executionSuccess" : true, "nReturned" : 1, "executionTimeMillis" : 9, "totalKeysExamined" : 1, "totalDocsExamined" : 1, "executionStages" : { "stage" : "FETCH", "nReturned" : 1, "executionTimeMillisEstimate" : 0, "works" : 2, "advanced" : 1, "needTime" : 0, "needYield" : 0, "saveState" : 0, "restoreState" : 0, "isEOF" : 1, "invalidates" : 0, "docsExamined" : 1, "alreadyHasObj" : 0, "inputStage" : { "stage" : "IXSCAN", "nReturned" : 1, "executionTimeMillisEstimate" : 0, "works" : 2, "advanced" : 1, "needTime" : 0, "needYield" : 0, "saveState" : 0, "restoreState" : 0, "isEOF" : 1, "invalidates" : 0, "keyPattern" : { "restaurant_id" : 1 }, "indexName" : "restaurant_id_1", "isMultiKey" : false, "multiKeyPaths" : { "restaurant_id" : [ ] }, "isUnique" : false, "isSparse" : false, "isPartial" : false, "indexVersion" : 2, "direction" : "forward", "indexBounds" : { "restaurant_id" : [ "[\"50013447\", \"50013447\"]" ] }, "keysExamined" : 1, "seeks" : 1, "dupsTested" : 0, "dupsDropped" : 0, "seenInvalidated" : 0 } } }, "ok": 1 }

This time, in our "executionStats", we can see that we use an index scan ( IXSCAN ). The "winningPlan" informs us that we use the index we just created and here are the new metrics on this query:

nReturned : 1

executionTimeMillis : 9

totalKeysExamined : 1

totalDocsExamined : 1

The ratio is now one to one and the execution time is much, much shorter.

Finally, let's see the "allPlansExecution" output. I will create two indexes:



> db.restaurants.createIndex({name: 1, cuisine: 1, borough: -1}) { "createdCollectionAutomatically" : false, "numIndexesBefore" : 2, "numIndexesAfter" : 3, "ok" : 1 } > db.restaurants.createIndex({name: 1}) { "createdCollectionAutomatically" : false, "numIndexesBefore" : 3, "numIndexesAfter" : 4, "ok" : 1 }

I created a compound index on ascending name and cuisine fields and descending borough field. I created another single index on the name field. Let's run a query:



> expVerbose.find({name: "Subway"}).sort({cuisine: 1})

It's a long output so I will just show the winningPlan first :



"winningPlan" : { "stage" : "FETCH", "inputStage" : { "stage" : "IXSCAN", "keyPattern" : { "name" : 1, "cuisine" : 1, "borough" : -1 }, "indexName" : "name_1_cuisine_1_borough_-1", "isMultiKey" : false, "multiKeyPaths" : { "name" : [ ], "cuisine" : [ ], "borough" : [ ] }, "isUnique" : false, "isSparse" : false, "isPartial" : false, "indexVersion" : 2, "direction" : "forward", "indexBounds" : { "name" : [ "[\"Subway\", \"Subway\"]" ], "cuisine" : [ "[MinKey, MaxKey]" ], "borough" : [ "[MaxKey, MinKey]" ] } } },

For this query, MongoDB used the compound index I just defined. We can see this with: "indexName": "name_1_cuisine_1_borough_-1_"

We have a new key "allPlansExecution" is our output now. Let's look at it with the "rejectedPlans":



"rejectedPlans" : [ { "stage" : "SORT", "sortPattern" : { "cuisine" : 1 }, "inputStage" : { "stage" : "SORT_KEY_GENERATOR", "inputStage" : { "stage" : "FETCH", "inputStage" : { "stage" : "IXSCAN", "keyPattern" : { "name" : 1 }, "indexName" : "name_1", "isMultiKey" : false, "multiKeyPaths" : { "name" : [ ] }, "isUnique" : false, "isSparse" : false, "isPartial" : false, "indexVersion" : 2, "direction" : "forward", "indexBounds" : { "name" : [ "[\"Subway\", \"Subway\"]" ] } } } } } ] }, "allPlansExecution" : [ { "nReturned" : 0, "executionTimeMillisEstimate" : 21, "totalKeysExamined" : 100, "totalDocsExamined" : 100, "executionStages" : { "stage" : "SORT", "nReturned" : 0, "executionTimeMillisEstimate" : 21, "works" : 101, "advanced" : 0, "needTime" : 101, "needYield" : 0, "saveState" : 2, "restoreState" : 2, "isEOF" : 0, "invalidates" : 0, "sortPattern" : { "cuisine" : 1 }, "memUsage" : 45365, "memLimit" : 33554432, "inputStage" : { "stage" : "SORT_KEY_GENERATOR", "nReturned" : 100, "executionTimeMillisEstimate" : 10, "works" : 101, "advanced" : 100, "needTime" : 1, "needYield" : 0, "saveState" : 2, "restoreState" : 2, "isEOF" : 0, "invalidates" : 0, "inputStage" : { "stage" : "FETCH", "nReturned" : 100, "executionTimeMillisEstimate" : 0, "works" : 100, "advanced" : 100, "needTime" : 0, "needYield" : 0, "saveState" : 2, "restoreState" : 2, "isEOF" : 0, "invalidates" : 0, "docsExamined" : 100, "alreadyHasObj" : 0, "inputStage" : { "stage" : "IXSCAN", "nReturned" : 100, "executionTimeMillisEstimate" : 0, "works" : 100, "advanced" : 100, "needTime" : 0, "needYield" : 0, "saveState" : 2, "restoreState" : 2, "isEOF" : 0, "invalidates" : 0, "keyPattern" : { "name" : 1 }, "indexName" : "name_1", "isMultiKey" : false, "multiKeyPaths" : { "name" : [ ] }, "isUnique" : false, "isSparse" : false, "isPartial" : false, "indexVersion" : 2, "direction" : "forward", "indexBounds" : { "name" : [ "[\"Subway\", \"Subway\"]" ] }, "keysExamined" : 100, "seeks" : 1, "dupsTested" : 0, "dupsDropped" : 0, "seenInvalidated" : 0 } } } } }, { "nReturned" : 101, "executionTimeMillisEstimate" : 0, "totalKeysExamined" : 101, "totalDocsExamined" : 101, "executionStages" : { "stage" : "FETCH", "nReturned" : 101, "executionTimeMillisEstimate" : 0, "works" : 101, "advanced" : 101, "needTime" : 0, "needYield" : 0, "saveState" : 2, "restoreState" : 2, "isEOF" : 0, "invalidates" : 0, "docsExamined" : 101, "alreadyHasObj" : 0, "inputStage" : { "stage" : "IXSCAN", "nReturned" : 101, "executionTimeMillisEstimate" : 0, "works" : 101, "advanced" : 101, "needTime" : 0, "needYield" : 0, "saveState" : 2, "restoreState" : 2, "isEOF" : 0, "invalidates" : 0, "keyPattern" : { "name" : 1, "cuisine" : 1, "borough" : -1 }, "indexName" : "name_1_cuisine_1_borough_-1", "isMultiKey" : false, "multiKeyPaths" : { "name" : [ ], "cuisine" : [ ], "borough" : [ ] }, "isUnique" : false, "isSparse" : false, "isPartial" : false, "indexVersion" : 2, "direction" : "forward", "indexBounds" : { "name" : [ "[\"Subway\", \"Subway\"]" ], "cuisine" : [ "[MinKey, MaxKey]" ], "borough" : [ "[MaxKey, MinKey]" ] }, "keysExamined" : 101, "seeks" : 1, "dupsTested" : 0, "dupsDropped" : 0, "seenInvalidated" : 0 } } } ] },

I removed "executionStats" for our winning plan. We will just focus on "allPlansExecution".

First, we have information about the plan rejected by MongoDB. We can see it was the plan using the index {name: 1} . Why did it reject it? Well, we can read that it did an index scan first, but then, when it came to sorting, we read "stage" : "SORT_KEY_GENERATOR" . This means we did a in-memory sort. In-memory sort are not really efficient.

But let's look at the metrics in "allPlansExecution". This is where we can see why a plan was chosen. Let's compare both plans:

nReturned: 0 vs 101

executionTimeMillis: 21 vs 0

totalKeysExamined: 100 vs 101

totalDocsExamined: 100 vs 101

MongoDB ran both candidate plans. In the short period of time both plans had to prove their worth, this is the metrics we have. One plan returned no documents in 21 milliseconds, the other returned 101 documents immediately. With these metrics, it's pretty easy to understand why MongoDB chose to run this query with the compound index {name: 1, cuisine: 1, borough: -1} . The {name: 1} index just could not compete.

I hope this introduction gave you a better idea of what explain() can do for you. You will be able to see which queries are not efficient and adapt the way you interact with your database accordingly.