In previous articles, we saw how queries and filters operate in elasticsearch and their differences between both. In this series, we will concentrate on the most commonly used queries in elasticsearch, which includes match, phrase match, prefix, term, multi_much, and bool. This post provides an overview and deep dives into match and phrase match examples.

Queries Overview

Index creation

We are creating an index named “testindex”. We apply some mappings to it.

curl -X POST<a href="http://localhost:9200/testindex"> http://localhost:9200/testindex</a> -d '{ "mappings": { "employee": { "properties": { "employee-id": { "type": "string", "index": "not_analyzed" }, "name_not_analyzed": { "type": "string", "index": "not_analyzed" } } } } }'

There are two fields, name and name_not_analyzed for which the mappings are applied. For the latter, we have told elasticsearch not to analyze this field and consider it as the string type. We will see the purpose this mapping has in the later section of this post.

Documents Indexing

Suppose, for our demo purpose, we are indexing three documents which have the details of three employees of a firm.

Document 1

curl -XPOST '<a href="http://localhost:9200/testindex/employee/1">http://localhost:9200/testindex/employee/1</a>' -d '{ "name": "Sean Turner", "name_not_analyzed" : "Sean Turner", "status": "i am feeling to go for a ride in my porche", "employee-id": "NY M-2389", "dessert":"i love macaroon" , "favourite_car":"porche" }'

Document 2

curl -XPOST '<a href="http://localhost:9200/testindex/employee/2">http://localhost:9200/testindex/employee/2</a>' -d '{ "name": "July Adams", "name_not_analyzed" : "July Adams", "status": "just feeling sulky", "employee-id": "IL F-2213", "dessert": "i love marshmallows" , "favourite_car":"porche" }'

Document 3

curl -XPOST '<a href="http://localhost:9200/testindex/employee/3">http://localhost:9200/testindex/employee/3</a>' -d '{ "name": "Chris Turner", "name_not_analyzed" : "Chris Turner", "status": "hear advices and just listen to the feeling of heart", "employee-id": "NY M-3456", "dessert":"i love milkshakes" , "favourite_car":"audi" }'

Match Query

Match query is one of the basic and most prominent queries in elasticsearch and is used to search both analyzed and not_analyzed fields.

When searching an analyzed field, the query string will undergo the same analysis process as the field to which the query is applied. Besides the field name_not_analysed , all fields are prone to the default analysis by elasticsearch. The default analyzer used in elasticsearch is the “Standard Analyzer.” Consider the following query:

{ "query": { "match": { "name": "turner" } } }

In this query we are searching for the documents with Turner in it against the field name . Since the field name is an analyzed field, it will have the following values in its inverted index:

Note that the standard analyzer has broken all the values in the field on the white spaces and then lowercased them. The match query will make sure that the query string, Turner in this case, also undergoes the same analysis. The resulting query string is turner which is in Document 1 and Document 3. Hence the search results would be showing both the documents.

Phrase query

Phrase query is extensively used when we need to match the exact phrase against a field. That is when the order of the terms in our query matter. Let us run a simple match query against the field status with the values I am feeling .

{ "query": { "match": { "status": "i am feeling" } } }

We expect to match only document 1 to the above query, but wait let us check the results. The results tells us an entirely different story. We can see all the documents have matched the above query. Why does this happen? As we said earlier, when the standard analyzer analyzes the field “status“, the inverted index for the terms in the query would look like below:

Since the match query operation is boolean OR in nature, it will return us with all the documents. This is because the term feeling is occurring in all the status field of the indexed documents. How do we overcome this? The solution to this issue is the match_phrase query. We can write the same query using match_phrase query like below:

{ "query": { "match_phrase": { "status": "i am feeling" } } }

Here it would take into consideration of the ordering of the terms. That is:

Now, there is only one such document in our index matching the positioning of the query strings, and that is Document 1. Hence, in the result we get to see only Document 1, which was the result expected by us. Note: In match_phrase query, the query string also gets analyzed with the same analyzer as that of the field. Case differences would also yield the same results.

Conclusion