Sorting results is an integral use-case of search engines and database management systems. Whether you are searching for the cheapest flight tickets or combing through the best black friday deals, behind the scenes — a database system is being summoned to sort the results by a specific search criteria.

In this post, we will talk about sorting with one such popular DB system, Elasticsearch. Having had its start as a search engine, Elasticsearch defaults to a full-text search use-case. But it has now grown to perform a wide range of queries, including geographic / location searches, exact DB matches, numeric range queries, nested joins, specialized and scripted queries. These queries can also be combined together to create compound queries.

Sort.by(relevance)

Elasticsearch comes with a good default out of the box. It sorts the results by relevance to the search query term, most relevant first.

Elasticsearch measures the relevance score as a floating-point number called _score, and orders results in the descending order of their _score values.

GET /_search

{

"query" : {

"match" : {

"tweet" : "grow up"

}

}

}

Here, we apply a search query to find the phrase “grow up” in the tweet field of our Elasticsearch dataset.

Showing match query results for the search term “grow up” in the “tweet” field

L2–8 shows meta information like it took 3ms for the query to return the result and some information about the shards.

for the query to return the result and some information about the shards. L9 onwards we see the actual query results.

L10 We know that there are two matching results to the query.

L11: We see the max relevance _score value as 1.979. This is followed by the two matching objects, the first with a _score value of 1.979 and the second with a _score value of 0.304. The drastic score difference is likely due to the fact that the second tweet doesn’t have “grow up” as a phrase. It only has the word “up”.

However, one might end up with a scenario where there is no meaningful way to represent relevance, like in the case of term filters. Take this query for instance, where we search for the term “London” in the group_city field of a different dataset.

GET /_search

{

"query" : {

"bool" : {

"filter" : {

"term" : {

"group_city" : "London"

}

}

}

}

}

Showing term filter results for “London”

Elasticsearch returns 211 results (L10) for this query each with a _score value of 0. There is no way to distinguish relevance in this case as all the results fit the search criteria.

Sort.by(field1, field2, …)

This is where the sort parameter comes in handy, allowing us to sort results by one or more fields.

As an example, we can take our previous query and sort the results explicitly by the venue_name in ascending order.

GET /_search

{

"query" : {

"bool" : {

"filter" : {

"term" : {

"group_city" : "London"

}

}

}

},

"sort" : {

"venue.venue_name": {"order": "asc"}

}

}

It’s that simple. Or at least it should be in principle.

But if you felt like this after staring at the query syntax, you are not alone!