MongoDB 3.4 has 27 new aggregation concepts. Last week I covered the new $graphLookup stage, which recursively searches collections. This week I'll cover the new $facet stage and the related $bucket and $bucketAuto operators. These new features are tailor-made for building powerful product search pages, and will quickly become your favorite aggregation features if you're working on an ecommerce product.

What is Faceted Search?

Here's the view you see when searching for products on Amazon:

The "Refine By" menu on the left is a classic example of faceted search. When you only want to see products that are eligible for Amazon prime, the counts in the "Refine By" menu update to only reflect products that are eligible for Prime. For example, there are 26 products in the $10-$25 range, but only 7 of those are eligible for Prime.

Updating these counts in a sane and performant way was a huge pain before MongoDB 3.4. Your options were to either re-structure your data specifically for faceted search or execute a separate aggregation for every single individual "facet" you want to filter by.

Let's take a look at a more concrete example. Here's a collection of 4 books:

db.books.insertMany([ { title: 'Professional AngularJS' , author: 'Valeri Karpov' , price: 37 , year: 2015 , tags: [ 'JavaScript' , 'AngularJS' ] }, { title: 'The 80/20 Guide To ES2015 Generators' , author: 'Valeri Karpov' , price: 10 , year: 2016 , tags: [ 'JavaScript' , 'ES6' ] }, { title: 'Total Recall: My Unbelievably True Life Story' , author: 'Arnold Schwarzenegger' , price: 32 , year: 2013 , tags: [ 'Biography' ] }, { title: "Arnold's Bodybuilding for Men" , author: 'Arnold Schwarzenegger' , price: 14 , year: 1984 , tags: [ 'Fitness' ] } ]);

In order to enable faceted search by year, you need to run an aggregation query that buckets books by the year they were published.

db.books.aggregate([ { $group: { _id: '$year' , count: { $sum: 1 } } }, { $sort: { count: -1 , _id: -1 } } ]); { "_id" : 2016 , "count" : 1 } { "_id" : 2015 , "count" : 1 } { "_id" : 2013 , "count" : 1 } { "_id" : 1984 , "count" : 1 }

To drill down into just books by Arnold Schwarzenegger, you'd need to add a $match query.

db.books.aggregate([ { $match: { author: 'Arnold Schwarzenegger' } }, { $group: { _id: '$year' , count: { $sum: 1 } } }, { $sort: { count: -1 , _id: -1 } } ]); { "_id" : 2013 , "count" : 1 } { "_id" : 1984 , "count" : 1 }

However, you need to execute a separate aggregation for every other facet. For example, in MongoDB 3.2 there would be no good way to get both the author counts and the year counts in a single aggregation. You'd have to execute the below aggregation separately to get the author counts.

db.books.aggregate([ { $group: { _id: '$author' , count: { $sum: 1 } } }, { $sort: { count: -1 , _id: 1 } } ]); { "_id" : "Arnold Schwarzenegger" , "count" : 2 } { "_id" : "Valeri Karpov" , "count" : 2 }

Furthermore, there was no way to bucket books by price. The $group stage lets you count up the number of books that cost $10 exactly, but there's no way to group books by which ones are in the $10-$25 price range versus the $25-$50 range. At BookaLokal and SixPlus we gave up on using aggregation for this and just used a Node.js stream to count up the different facets. However, $facet and $bucket make it possible to count up facets in a single aggregation.

Introducing $facet and $bucket

$facet is fundamentally an aggregation stage that executes multiple aggregation pipelines in parallel and spits out the combined result when the pipelines are all done. You can think of it as async.parallel() or Promise.all() for aggregation pipelines.

Sounds complicated, so here's a concrete example. Remember the 2 separate aggregations for counting books by year and books by author? Here's the same 2 aggregations combined into a single aggregation with $facet . The $facet stage lets you execute both pipelines in parallel and store the output from both pipelines in a single document.

db.books.aggregate([ { $facet: { years: [ { $group: { _id: '$year' , count: { $sum: 1 } } }, { $sort: { count: -1 , _id: -1 } } ], authors: [ { $group: { _id: '$author' , count: { $sum: 1 } } }, { $sort: { count: -1 , _id: 1 } } ] } } ]); { "years" : [ { "_id" : 2016 , "count" : 1 }, { "_id" : 2015 , "count" : 1 }, { "_id" : 2013 , "count" : 1 }, { "_id" : 1984 , "count" : 1 } ], "authors" : [ { "_id" : "Arnold Schwarzenegger" , "count" : 2 }, { "_id" : "Valeri Karpov" , "count" : 2 } ] }

One big advantage of using $facet is that you can combine $match with $facet . If you execute a separate aggregation for every facet, you have to execute the same $match query for every facet, which can be a lot of wasted work. With $facet , you can execute $match once and use then $facet .

db.books.aggregate([ { $match: { author: 'Arnold Schwarzenegger' } }, { $facet: { } } ]);

What about grouping by price? That's what the $bucket operator does for you. Here's $bucket without $facet .

db.books.aggregate([ { $bucket: { groupBy: '$price' , boundaries: [ 0 , 10 , 25 , 50 ] } } ]); { "_id" : 10 , "count" : 2 } { "_id" : 25 , "count" : 2 }

If you don't care about setting up ranges manually, $bucketAuto will automatically break your documents up into a specified number of buckets.

db.books.aggregate([ { $bucketAuto: { groupBy: '$price' , buckets: 2 } } ]); { "_id" : { "min" : 10 , "max" : 32 }, "count" : 2 } { "_id" : { "min" : 32 , "max" : 37 }, "count" : 2 }

You can combine $bucketAuto with the established $facet query to get counts for every facet:

db.books.aggregate([ { $facet: { years: [ ], authors: [ ], price: [ { $bucketAuto: { groupBy: '$price' , buckets: 2 } } ] } } ]); { "years" : [ ], "authors" : [ ], "price" : [ { "_id" : { "min" : 10 , "max" : 32 }, "count" : 2 }, { "_id" : { "min" : 32 , "max" : 37 }, "count" : 2 } ] }

$facet and $bucket in Node.js

There are no version restrictions that I'm aware of on using $facet , $bucket , and $bucketAuto in mongodb and mongoose. The below scripts work at least as far back as mongodb@2.1.0 and mongoose@4.5.0 :

const mongodb = require ( 'mongodb' ); mongodb.MongoClient.connect( 'mongodb://localhost:27017/test' ). then(db => db.collection( 'books' ).aggregate([ { $facet: { years: [ { $group: { _id: '$year' , count: { $sum: 1 } } }, { $sort: { count: -1 , _id: -1 } } ], authors: [ { $group: { _id: '$author' , count: { $sum: 1 } } }, { $sort: { count: -1 , _id: 1 } } ], price: [ { $bucketAuto: { groupBy: '$price' , buckets: 2 } } ] } } ]).toArray()). then(res => console .log( JSON .stringify(res, null , ' ' ))). catch (error => console .error( 'error' , error));

const mongoose = require ( 'mongoose' ); mongoose.connect( 'mongodb://localhost:27017/test' ); var Book = mongoose.model( 'Book' , new mongoose.Schema()); Book.aggregate([ { $facet: { years: [ { $group: { _id: '$year' , count: { $sum: 1 } } }, { $sort: { count: -1 , _id: -1 } } ], authors: [ { $group: { _id: '$author' , count: { $sum: 1 } } }, { $sort: { count: -1 , _id: 1 } } ], price: [ { $bucketAuto: { groupBy: '$price' , buckets: 2 } } ] } } ]). then(res => console .log(res)). catch (error => console .error( 'error' , error));

Mongoose 4.8.0 will include a helper for $facet for its chainable aggregation pipeline builder:

MyModel.aggregate().facet({ field1: [ ], field2: [ ] }).exec();

Moving On