Paging in MongoDB – How to actually avoid poor performance ?

What is the best way (performance wise) to paginate results in MongoDB ? Especially when you also want to get the total number of results ?

Project running with .NET Core 2.0

Where to start ?

For answering to these questions, let’s start from the datasets defined in my earlier article Part 1: How to search good places to travel (MongoDb LINQ & .NET Core). That article was quick introduction, on how to load big chunks of data and then retrieve values using WebApi and LINQ. Here, I will start from that project, extending it with more details related to paging the query results. You could also check Part 3 – MongoDb and LINQ: How to aggregate and join collections

You can find the full solution, together with the data here: https://github.com/fpetru/WebApiQueryMongoDb

Topics covered

Paging query results with skip and limit

Paging query results using last position

MongoDb BSonId

Paging using MongoDb .NET Driver

To install

Here are all the things needed to be installed:

See the results

Here are few steps to have the solution ready, and see the results immediately:

Clone or download the project Run import.bat file from Data folder – this will create the database (TravelDb), and fill in two datasets Open solution with Visual Studio 2017 and check the connection settings appsettings.json Run the solution

If you have any issues on installing MongoDb, setting up the databases, or project structure, please review my earlier article.

Paging results using cursor.skip() and cursor.limit()

If you do a Google search, this is usually the first presented method to make pagination of the query results in MongoDB. It is a straightforward method, but also expensive in terms of performance. It requires the server to walk from the beginning of the collection or index each time, to get the offset or skip position, before actually begin to return the result you need.

For example:

db.Cities.find().skip(5200).limit(10);

The server will need to parse the first 5200 items in WikiVoyage collection, and then return the next 10. This doesn’t scale well due to skip() command.

Paging using the last position

To be faster, we should search and retrieve the details starting from the last retrieved item. As an example, let’s assume we need to find all the cities in France, with a population greater than 15.000 inhabitants.

Following this method, the initial request to retrieve first 200 records would be:

LINQ Format

We first retrieve AsQueryable interface:

var _client = new MongoClient(settings.Value.ConnectionString); var _database = _client.GetDatabase(settings.Value.Database); var _context = _database.GetCollection<City>("Cities").AsQueryable<City>();

and then we run the actual query:

query = _context.CitiesLinq .Where(x => x.CountryCode == "FR" && x.Population >= 15000) .OrderByDescending(x => x.Id) .Take(200); List<City> cityList = await query.ToListAsync();

The subsequent queries would start from the last retrieved Id. Ordering by BSonId we retrieve the most recent records created on the server before the last Id.

query = _context.CitiesLinq .Where(x => x.CountryCode == "FR" && x.Population >= 15000 && x.Id < ObjectId.Parse("58fc8ae631a8a6f8d000f9c3")) .OrderByDescending(x => x.Id) .Take(200); List<City> cityList = await query.ToListAsync();

Mongo’s ID

In MongoDB, each document stored in a collection requires a unique _id field that acts as a primary key. It is immutable, and may be of any type other than an array (by default a MongoDb ObjectId, a natural unique identifier, if available; or just an auto-incrementing number).

Using default ObjectId type,

[BsonId] public ObjectId Id { get; set; }

it brings more advantages, such as having available the date and timestamp when the record has been added to the database. Furthermore, sorting by ObjectId will return last added entities to the MongoDb collection.

cityList.Select(x => new { BSonId = x.Id.ToString(), // unique hexadecimal number Timestamp = x.Id.Timestamp, ServerUpdatedOn = x.Id.CreationTime /* include other members */ });

Returning fewer elements

While the class City has 20 members, it would be relevant to return just the properties we actually need. This would reduce the amount of data transferred from the server.

cityList.Select(x => new { BSonId = x.Id.ToString(), // unique hexadecimal number Name, AlternateNames, Latitude, Longitude, Timezone, ServerUpdatedOn = x.Id.CreationTime });

Indexes in MongoDB – few details

We would rarely need to get data, in exact order of the MongoDB internal ids (_id)I, without any filters (just using find()). In most of the cases, we would retrieve data using filters, and then sorting the results. For queries that include a sort operation without an index, the server must load all the documents in memory to perform the sort before returning any results.

How do we add an index ?

Using RoboMongo, we create the index directly on the server:

db.Cities.createIndex( { CountryCode: 1, Population: 1 } );

How do we check our query is actual using the index ?

Running a query using explain command would return details on index usage:

db.Cities.find({ CountryCode: "FR", Population : { $gt: 15000 }}).explain();

Is there a way to see the actual query behind the MongoDB LINQ statement ?

The only way I could find this, it was via GetExecutionModel() method. This provides detailed information, but inside elements are not easy accessible.

query.GetExecutionModel();

Using the debugger, we could see the elements as well as the full actual query sent to MongoDb.



Then, we could get the query and execute it against MongoDb using RoboMongo tool, and see the details of the execution plan.

Non LINQ way – Using MongoDb .NET Driver

LINQ is slightly slower than using the direct API, as it adds abstraction to the query. This abstraction would allow you to easily change MongoDB for another data source (MS SQL Server / Oracle / MySQL etc.) without many code changes, and this abstraction brings a slight performance hit.

Even so, newer version of the MongoDB .NET Driver has simplified a lot the way we filter and run queries. The fluent interface (IFindFluent) brings very much with LINQ way of writing code.

var filterBuilder = Builders<City>.Filter; var filter = filterBuilder.Eq(x => x.CountryCode, "FR") & filterBuilder.Gte(x => x.Population, 10000) & filterBuilder.Lte(x => x.Id, ObjectId.Parse("58fc8ae631a8a6f8d000f9c3")); return await _context.Cities.Find(filter) .SortByDescending(p => p.Id) .Limit(200) .ToListAsync();

where _context is defined as

var _context = _database.GetCollection<City>("Cities");

Implementation

Wrapping up, here is my proposal for the paginate function. OR predicates are supported by MongoDb, but it is usually hard for the query optimizer to predict the disjoint sets from the two sides of the OR. Trying to avoid them whenever is possible is a known trick for query optimization.

// building where clause // private Expression<Func<City, bool>> GetConditions(string countryCode, string lastBsonId, int minPopulation = 0) { Expression<Func<City, bool>> conditions = (x => x.CountryCode == countryCode && x.Population >= minPopulation); ObjectId id; if (string.IsNullOrEmpty(lastBsonId) && ObjectId.TryParse(lastBsonId, out id)) { conditions = (x => x.CountryCode == countryCode && x.Population >= minPopulation && x.Id < id); } return conditions; } public async Task<object> GetCitiesLinq(string countryCode, string lastBsonId, int minPopulation = 0) { try { var items = await _context.CitiesLinq .Where(GetConditions(countryCode, lastBsonId, minPopulation)) .OrderByDescending(x => x.Id) .Take(200) .ToListAsync(); // select just few elements var returnItems = items.Select(x => new { BsonId = x.Id.ToString(), Timestamp = x.Id.Timestamp, ServerUpdatedOn = x.Id.CreationTime, x.Name, x.CountryCode, x.Population }); int countItems = await _context.CitiesLinq .Where(GetConditions(countryCode, "", minPopulation)) .CountAsync(); return new { count = countItems, items = returnItems }; } catch (Exception ex) { // log or manage the exception throw ex; } }

and in the controller

[NoCache] [HttpGet] public async Task<object> Get(string countryCode, int? population, string lastId) { return await _travelItemRepository .GetCitiesLinq(countryCode, lastId, population ?? 0); }

The initial request (sample):

http://localhost:61612/api/city?countryCode=FR&population=10000

followed by other requests where we specify the last retrieved Id:

http://localhost:61612/api/city?countryCode=FR&population=10000&lastId=58fc8ae631a8a6f8d00101f9

Here is just a sample:



At the end

I hope this helps, and please let me know if you need to be extended or have questions.