Implemented solution

In our articles service, we have two branches:

If from + size is lower than or equals to 10 000, we perform a classic Elasticsearch query,

is lower than or equals to 10 000, we perform a classic Elasticsearch query, Otherwise, we use pre-calculated pages and we perform a search_after query based on the last articles of the previous page.

In other words, pages within the first 10 000 items are fresh because computed on demand using a classic Elasticsearch request. Other pages are static, pre-calculated, not as fresh as expected but it is acceptable for SEO purposes.

The main challenge is to have an “almost up to date” index with the information of the last article for each page. For example, in order to display the sport page 2000, the articles service needs to know the last article of the sport page 1999 then performs the search_after query based on this article.

First of all, we need an Elasticsearch index with all queries that have more than 10 000 results. We created a service named paginator that manages those queries, needed to calculate and to refresh pages.

Example of document in queries index

Each Elasticsearch query has a predictable identifier which is a hash of request field (the MD5 result of the stringified request).

Second, we need an index with all calculated pages. We created an other service named paginator-calc that receives a query and performs a scroll query in order to compute all pages.

Each page has the query identifier, the page number, and information about the last article needed for the search_after query. This service is not a part of paginator service because it can scale differently.

Example of document in pages index

Pages are calculated. The last step consists in implementing the search_after query in articles service when from + size is greater than 10 000.

Example of search_after request