I'm jumping straight away to the topic rather than boring introduction. Hoping that this post would give you insights about some of the best practices to be followed in production environment. Certainly, this is not an advanced guide, but this would be a good place to start with.

Usage of db.runCommand() is recommended

It provides very good abstraction between the mongo shell and drivers. Hence this is the most recommended way of querying the database. So, how its usage is relevant in real-time scenarios? Consider a scenario where two developers are working on querying a database. One person uses Mongo shell and another person uses third party tool(s) such as Robomongo since he's looking for more user-friendly interface to execute queries and see results in better way. When I write this article, Robomongo still have issues with MongoDB version 3.6. The second person executes the below update query in Robomongo:

db.collection.update( {}, { $set :{ "services.$[element].attributes.attr.value" : "val3" }}, {arrayFilters: [ { "element.instanceId" : 1, "element.service" : "service1" } ], multi: true } )

And it throws error regarding the usage of arrayFilters. However the first developer was able to execute the same query in Mongo shell without any issues. This is because of the Mongo version compatibility issue with Robomongo. Second developer had to update the database too, so he tried executing the same update query using db.runCommand():

db.runCommand({ update: "collection" , updates:[{q:{}, u:{$ set : { "services.$[element].attributes.attr.value" : "val3" }}, arrayFilters: [ { "element.instanceId" : 1 , "element.service" : "service1" } ], multi: true }] })

Now it executes the update query successfully from Robomongo without complaining about arrayFilters usage in the query. You could figure out that we have bypassed the mentioned Robomongo query compatibility issue by executing the same query through db.runCommand(). How it was possible? Because it was providing an independent abstraction between shell and Mongo drivers. So, always try out the query in Mongo shell directly in case if you feel query is right and still not executing. People sometimes choose third party tools because there can be issues while copy pasting queries from text editor to Mongo shell, especially if you add tab space by mistake. Also, command prompt is blocked in some companies which makes third party tools such as Robomongo are the only option for developers to query MongoDB. In such cases db.runCommand() comes to the rescue. Having said that, Mongo shell is the official tool for executing Mongo queries and that's highly recommended compared to any other interfaces available.

Counting Query Results

There are three ways of counting the number of results: count(), itcount() & using an accumulator operation on $group stage. count() & itcount() are similar, the only difference is that count() counts the number of results that your find() operation returns from the server while itcount() count the number of results from the cursor. It iterate over the cursor and retrieve the results from server. When you do aggregation, itcount() is the only way to count the number of results (by counting the results from the cursor). However, both of these methods are not equivalent to $group stage counting when it comes to performance. Other two methods need the results to be send explicitly to client so that it can count the results. Counting by group stage will does the work on server itself and doesn't send the results to client, hence reducing the overload of client side calculation.

db.collection.aggregate([ {$group:{_id: null ,count:{$sum: 1 }}} ])

Optimize Match conditions and avoid mess-up

Consider the below collection:

{ "_id" : 1, "title" : "123 Department Report" , "tags" : [ "G" , "STLW" ], "access" : [ "SI" , "G" ] }, { "_id" : 2, "title" : "324 Department Report" , "tags" : [ "G1" , "STLW1" ], "access" : [ "HW" , "SI" ] }

To check the existence of both elements "SI" and "G" in the array "access", one could attempt like this:

db.collection.find({ $or :[{access: "SI" },{access: "G" }]})

or this:

db.collection.find({ access :{$in:[ "SI" , "G" ]}})

But is that what we really wanted? both of the queries will return collections with _id:1 & _id:2 while we actually wanted document _id:1 alone. Obvious option is to go with $all operator. So, remember to test your query with sufficient data and focus on solving problem statement rather than the output.

db.collection.find({ access :{$all:[ "SI" , "G" ]}})

What if we have arrays with embedded documents? We will see that now. Consider the below collection:

{ "_id" : 1 , "title" : "123 Department Report" , "tags" : [ "G" , "STLW" ], "year" : 2014 , "subsections" : [ { "subtitle" : "Section 1: Overview" , "access" : [ "SI" , "G" ], "content" : "Section 1: This is the content of section 1." }, { "subtitle" : "Section 2: Analysis" , "access" : [ "STLW" ], "content" : "Section 2: This is the content of section 2." } ] }

We can make use of $elemMatch operator to match documents that contain array elements that satisfies the specified criteria.

db.colelction.find( { subsections :{$elemMatch:{subtitle:{$regex: /Analysis$/}, access:"STLW"}}} )

The same query can be performed like this:

db.forecast.find( {$and : [{ "subsections.subtitle" : {$regex: /Analysis$/ }},{ "subsections.access" : "STLW" }]} )

We know that nested arrays can be accessed with a '.' in between (eg: subsections.subtitle ). Now if you're performing an update operation on a nested array of documents, then you should check on how to use positional operators such as $ & $[]

db.magazines.update( { $and :[{ "Articles.Writers.Name" : "tom" },{ "Articles.Writers.Email" : "tom@example.com" }]}, { $pull : { "Articles.$.Writers" :{ "Name" : "tom" , "Email" : "tom@example.com" }}}, {multi: true } )

Above query has the intention to remove array elements from Articles.Writers where Name=tom & Email=tom@example.com. Here we used positional operator $, which will remove only the first matching element. If you want to remove all array elements that matches the criteria, use $[]

db.magazines.update( { $and :[{ "Articles.Writers.Name" : "tom" },{ "Articles.Writers.Email" : "tom@example.com" }]}, { $pull : { "Articles.$[].Writers" :{ "Name" : "tom" , "Email" : "tom@example.com" }}}, {multi: true } )

Matching empty arrays and documents to remove from collections

We cant have a match criteria that looks like array:[] , use $size operator to filter out empty arrays. In the below example, we are matching all empty language arrays and removing them in the following stage.

db.collection.update( {languages:{ $size : 0}}, { $unset :{languages: "" }} )

If we need to delete empty documents (object), we can have an update query looks like this:

db.collection.update( { $and :[{languages:{}},{ $not :{languages:{ $type : "array" }}}]}, { $unset :{languages: "" }} )

Since array could also be treated as objects, we need to filter them out using type check as above.

Differentiate between $map, $filter, $redact and $reduce operators

One challenge that most freshers of MongoDB are facing is to differentiate among some of the powerful operators present in MongoDB aggregation framework. MongoDB official definitions are good enough to understand, but I will make it simple, you just need to remember this:

$map -> Transform array into different form (changing the content). Does not have impact on array size, which means no shrinking or expansion in array size. $filter -> Return a subset of array based on given criteria. Array could be shrinked here, but no expansion. Can only be applied on Array . $redact -> We can restrict contents inside an array or document based on given criteria. Similar to $filter, but both arrays and documents can be applicable. Hence very useful if you have nested document structure. $reduce -> Get a scalar value as an output by iterating through array elements. Note that output is not an array here.

Here is a sample query for $map:

db.movies.aggregate([ { $project : {cast: 1}, {directors: 1}, {writers: { $map : {input: " $writers " , as: "writer" , in : { $arrayElemAt : [{ $split : [ "$ $writer " , " " ]},0]} } } } } ])

We're transforming the array writers by changing its content (We're extracting first name from full name) and then projecting into the aggregation pipeline.

Join Two collections where foreign key value is present inside an array

Consider a scenario where we want to join two collections named Coll1 & Coll2. Coll1 :

{ _id : "1" , airlines: "Air Mexico" , src_airport: "ABC" , dst_airport: "CDE" }

Coll2 :

{ _id : "1" , airlines:[ "Air Asia" , "Air Mexico" , "Malaysian Airlines" ] }, { id : "2" , airlines:[ "Dubai Airways" , "Air India" , "Jet Airways" ] }

Here localField is 'airlines', but we can't write foreignField as 'airlines' since its an array. We can't simply join these collections unless we normalize them. How are we performing normalization such that we could join on criteria airlines: "Air Mexico"? That's where $unwind becomes more meaningful. We know that use of $unwind may explode your query execution time, however we don't have any other option other than performing $unwind on above scenario. Once I do the $unwind, collection will be projected as below:

{ _id : "1" , airlines: "Air Asia" }, { id : "1" , airlines: "Air Mexico" }, { _id : "1" , airlines: "Malaysian Airlines" }, { id : "2" , airlines: "Dubai Airways" }, { _id : "2" , airlines: "Air India" }, { id : "2" , airlines: "Jet Airways" }

Now if we do join the collections using $lookup, it will work just the way you wanted.

Be sure to add existence check or get ready to handle null values

Consider the below aggregation query:

db.movies.aggregate([ { $project : { sizeMatch1: { $size : { $ifNull :[{ $setIntersection :[ " $cast " , " $writers " ]},[]]}}, sizeMatch2: { $size : { $ifNull :[{ $setIntersection :[ " $directors " , " $writers " ]},[]]}}, sizeMatch3: { $size : { $ifNull :[{ $setIntersection :[ " $directors " , " $cast " ]},[]]}} } } ])

We're taking intersection on multiple arrays and calculating its size and then projecting them to different fields. Now, if you notice, $ifNull is added on query to check for null values and return an empty array in case respective arrays are not present in collection. So, if an array is not present in collection, it will return an empty array and hence size will be zero. We had to add this null check wherever needed especially if you have null values or no values present in the collection. Is there a better approach? Yes there is. Its always good to have a match criteria placed prior to projection stage and hence avoiding lots of error data being added to upcoming stages. An obvious performance advantage is expected. Here is the updated query:

db.movies.aggregate([ { $match :{cast:{ $exists : true }, directors:{ $exists : true }, writers:{ $exists : true }}} { $project :{ sizeMatch1: { $size : { $setIntersection :[ " $cast " , " $writers " ]}}, sizeMatch2: { $size : { $setIntersection :[ " $directors " , " $writers " ]}}, sizeMatch3: { $size : { $setIntersection :[ " $directors " , " $cast " ]}} } } ])

Pattern Matching

Let us say, there is a scenario where you want to get all movies who won Oscar awards and there's an awards field in the collection which is basically an array.

{ _id : 1 , title: "Avatar" , awards:[ "Nominated for 8 Oscars" , "Won Golden Globe" , "Won 2 Oscars" ] }, { _id : 2 , title: "The Avenger" , awards:[ "Nominated for 2 Oscars" , "Nominated for One Golden Globe" ] }

we can't put following regular following regular expressions: "Won Oscar", "Won", "/^Oscar/" (Pre-fix match by Oscar), or "/Oscar$/"(Post-fix match by Oscar) since the string could be variant as given in above documents. Solution is quite straight forward:

db.movies.find({ awards : {$regex: ".*Won.*Oscar.*" }})

'.' means any character and '*' means any number of times.

You want to add/modify a field at the same time you want to keep the current document structure as such

The challenge with $project is that it requires you to explicitly add all the fields you want to keep in upcoming stages. But there is another operator in aggregation framework which will let us add/modify a field without explicit projection of other fields in the document. It is the $addFields operator that does the job quite efficiently. The only pitfall with $addFields is that it can't remove any fields from the stage. It is possible only with $project.

db.collection.aggregate([ { $addFields : { totalCoins: { $reduce : { input: " $refferedUsers " , initialValue: 0, in : { $add : [ "$ $value " , "$ $this .coins" ] } } } } } ])

if totalCoins field already present in the document, this will override them. Otherwise a new field called totalCoins will be added into the document. This is a new feature since MongoDB 3.4

Now there's another scenario where you want to group the documents and at the same time you want to keep track of document structure. It's a tricky situation here. We know that while we group documents, we can't project fields as we did in the case of $project or $addFields. Moreover, we can only project a field with an accumulator operation. So how do we get around this problem? There's a way we can track and add current root structure inside group stage using the System variables in MongoDB.

{ $group : { _id : " $authorName " , books: { $push : "$ $ROOT " } } }

Here, document will be grouped based on authorName field and then push all such documents (top-level document since we specified ROOT) into array called books. If you ever want to do some sort of operation later in the stage, all you have to do is $unwind on books and then proceed with other stages of operations.

The powerful $objectToArray, $arrayToObject operators

If you want to transform the array into totally different shape such as a change in the resulting field name or converting an object to array of key value pairs, then we can make use of $objectToArray or $arrayToObject . In order to use $arrayToObject, you need array elements in the form of key value pairs (or an array with two elements) , so you need to convert using $objectToArray first.

$objectToArray in action { "fieldname" : "fieldValue" } -> [{ "k" : "fieldname" , "v" : "fieldValue" }] Use $map to iterate and trasform 'k' or 'v' or both $arrayToObject in action [{ "k" : "changedfieldname" , "v" : "changedfieldValue" }] -> { "changedfieldname" : "changedfieldValue" }

I have seen many use cases where these operators are the only way to proceed with. I can't list them all here, but be aware of these powerful operators in aggregation framework. Note that these operators are only available since MongoDB 3.6

Usage of $facet operator

MongoDB 3.4 introduces faceted search where multiple aggregation pipelines can be processed in single stage. Why do we need a faceted search while we can just do multiple aggregation operations via multiple stages? Since single stage can occupy multiple aggregation pipelines, we're basically hitting the DB only at once. Now you see the performance advantage with faceted search :)

Also note the following constraints with $facet :

1) There can't be $facet stage inside a $facet operator. 2) Sub pipelines (Arrays created) inside $facet is independent of each other, if you want to use them further, you have to do it in another stage outside $facet.

You can read more about $facet operator here

Is there a way to update the documents using complex queries which we cant do using update command?

Basically, we're talking about queries that generates reports or complex queries that involve several lines of aggregation operations. We know that aggregation framework is way powerful than the normal database update command. So, what we do in such a scenario? $out operator comes the rescue.

db.collection.aggregate([ { $match stages..}, { $group stages}, { $project stages} { $out : "collection" } ])

Note that, $out operator just replace the entire collection with the result seen on previous aggregation pipeline stage. This is equivalent to a fresh collection creation. If you want to ensure no data loss or structure, then please test this out in a backup collection (by providing different collection name as input to $out operator). Also note that $out operator has to be the final stage in the aggregation pipeline.

Updating an embedded document in a nested array

Consider the below Mongo collection:

{ "_id" : "12345678" , "Invoices" : [ { "_id" : "123456789" , "Currency" : "EUR" , "DueTotalAmountInvoice" : 768.3699999999999 , "InvoiceDate" : "2016-01-01 00:00:00.000" , "Items" : [ { "Item" : 10 , "ProductCode" : "ABC567" , "Quantity" : 1 }, { "Item" : 20 , "ProductCode" : "CDE987" , "Quantity" : 1 } ] }, { "_id" : "87654321" , "Currency" : "EUR" , "DueTotalAmountInvoice" : 768.3699999999999 , "InvoiceDate" : "2016-01-01 00:00:00.000" , "Items" : [ { "Item" : 30 , "ProductCode" : "PLO987" , "Quantity" : 1 , "Units" : "KM3" }, { "Item" : 40 , "ProductCode" : "PLS567" , "Quantity" : 1 , "DueTotalAmountInvoice" : 768.3699999999999 } ] } ] }

Now if you want to change the Quantity of all items with Item:10 in Invoice 123456789, what you would do? Checkout the below query:

db.collection.update( { "Invoices.Items.Item" :10, "Invoices._id" :123456789}, { "Invoices.Items.Quantity" :2} {multi: true } )

Do you see any issue with the above query? The first matching stage indeed ensure that only those documents having Item:10 returned. So, remember it returns the whole document if it has Item:10. Then comes the update statement where we change the value of Quantity to 2. If you observe, we're changing value of Quantity for all the sub-documents which is not what we wanted. You need an arrayFilter for this and the query will now look like this:

db.collection.update( { "_id" : "12345678" }, //initial match conditions {$set : { "Invoices.$[element1].Items.$[element2].Quantity" : 3 }}, //array Filter set match conditions in nested array elements { multi: true , arrayFilters: [ { "element1._id" : "123456789" },{ "element2.Item" : { $eq: 10 }} ]} )

There are many more tips to be added here & it will be updated on a regular basis. You can bookmark the page and come back later to check if additional notes are added from my side. Don't hesitate to contact me if you have any questions. Happy Learning :)