Getting the Elasticsearch query right down to its syntax can be tough and confounding, even though search is the primary function of Elastic…umm…search.To help, this guide will take you through the ins and outs of common search queries for Elasticsearch and set you up for future querying success.

Lucene Query Syntax

Elasticsearch is part of the ELK Stack and is built on Lucene, the search library from Apache, and exposes Lucene’s query syntax. It’s such an integral part of Elasticsearch that when you query the root of an Elasticsearch cluster, it will tell you the Lucene version:

{"name":"node-1","cluster_name":"my-cluster","cluster_uuid":"8AqSmmKdQgmRVPsVxyxKrw","version":{"number":"6.1.2","build_hash":"5b1fea5","build_date":"2018-01-10T02:35:59.208Z","build_snapshot":false,"lucene_version":"7.1.0","minimum_wire_compatibility_version":"5.6.0","minimum_index_compatibility_version":"5.0.0"},"tagline":"You Know, for Search"}

Knowing the Lucene syntax and operators will go a long way in helping you build queries. Its use is in both the simple and the standard query string query. Here are some of the basics:

ELK IS HARD.

LOGZ.IO MAKES IT EASY. Find out More!

1. Boolean Operators

As with most computer languages, Elasticsearch supports the AND, OR, and NOT operators:

jack AND jill — Will return events that contain both jack and jill

— Will return events that contain both jack and jill ahab NOT moby — Will return events that contain ahab but not moby

— Will return events that contain ahab but not moby tom OR jerry — Will return events that contain tom or jerry, or both

2. Fields

You might be looking for events where a specific field contains certain terms. You specify the field, type a colon, then a space, then the string in quotation marks or the value without quotes. Here are some Lucene field examples:

name: “Ned Stark”

status: 404

Be careful with values with spaces such as “Ned Stark.” You’ll need to enclose it in double quotes to ensure that the whole value is used.

3. Ranges

You can search for fields within a specific range, using square brackets for inclusive range searches and curly braces for exclusive range searches:

age:[3 TO 10] — Will return events with age between 3 and 10

— Will return events with age between 3 and 10 price:{100 TO 400} — Will return events with prices between 101 and 399

— Will return events with prices between 101 and 399 name: [Adam TO Ziggy] — Will return names between and including Adam and Ziggy

As you can see in the examples above, you can use ranges in non-numerical fields like strings and dates as well.

4. Wildcards

The search would not be a search without wildcards. You can use the * character for multiple character wildcards or the ? character for single character wildcards:

Ma?s — Will match Mars, Mass, and Maps

— Will match Mars, Mass, and Maps Ma*s — Will match Mars, Matches, and Massachusetts

5. Regex Queries

Regexes give you even more power. Just place your regex between forward slashes (/):

/p[ea]n/ — Will match both pen and pan

— Will match both pen and pan /<.+>/ — Will match text that resembles an HTML tag

6. Fuzzy Search Queries

Fuzzy searching uses the Damerau-Levenshtein Distance to match terms that are similar in spelling. This is great when your data set has misspelled words.

Use the tilde (~) to find similar terms:

blow~

This will return results like “blew,” “brow,” and “glow.”

Use the tilde (~) along with a number to specify the how big the distance between words can be:

john~2

This will match, among other things: “jean,” “johns,” “jhon,” and “horn”

7. Free Text

It’s as simple as it sounds. Just type in the term or value you want to find. This can be a field, a string within a field, etc.

8. Elasticsearch Terms Query

Also just called a term query, this will return an exact match for a given term. Take this example from a database of baseball statistics:

POST /mlb_index/_search { "query": { "term" : { "pitcher_last": "rivera" “pitcher_first”: “mariano” "boost": 1.0 } }, "_game" : [“date”,”innings_pitched”,"pitch_count","cutters",”fastballs”] }

Make sure you are using the term query here, NOT the text query. The term query will search for the exact match; text query will automatically filter punctuation.

9. Elasticsearch Terms Set Query

Similar to the term query, the terms_set query can hunt down multiple values based on certain conditions defined in the PUT request. To further the baseball example:

PUT /pitchers { "mappings": { "properties": { "pitcher_last": { "type": "keyword" "pitcher_first": { "type": "keyword" }, "pitch_type": { "type": "keyword" } } } }

URI Search

The easiest way to search your Elasticsearch cluster is through URI search. You can pass a simple query to Elasticsearch using the q query parameter. The following query will search your whole cluster for documents with a name field equal to “travis”:

curl “localhost:9200/_search?q=name:travis”

With the Lucene syntax, you can build quite impressive searches. Usually you’ll have to URL-encode characters such as spaces (we omitted it in these examples for clarity):

curl “localhost:9200/_search?q=name:john~1 AND (age:[30 TO 40} OR surname:K*) AND -city”

A number of options are available that allow you to customize the URI search, specifically in terms of which analyzer to use (analyzer), whether the query should be fault-tolerant (lenient), and whether an explanation of the scoring should be provided (explain).

Although the URI search is a simple and efficient way to query your cluster, you’ll quickly find that it doesn’t support all of the features ES offers. The full power of Elasticsearch is evidentg through Request Body Search. Using Request Body Search allows you to build a complex search request using various elements and query clauses that will match, filter, and order as well as manipulate documents depending on multiple criteria.

The Request Body Search

Request Body Search uses a JSON document that contains various elements to create a search on your Elasticsearch cluster. Not only can you specify search criteria, you can also specify the range and number of documents that you expect back, the fields that you want, and various other options.

The first element of a search is the query element that uses Query DSL. Using Query DSL can sometimes be confusing because the DSL can be used to combine and build up query clauses into a query that can be nested deeply. Since most of the Elasticsearch documentation only refers to clauses in isolation, it’s easy to lose sight of where clauses should be placed.

To use the Query DSL, you need to include a “query” element in your search body and populate it with a query built using the DSL:

{“query”: { “match”: { “_all”: “meaning” } } }

In this case, the “query” element contains a “match” query clause that looks for the term “meaning” in all of the fields in all of the documents in your cluster.

The query element is used along with other elements in the search body:

{ “query”: { “match”: { “_all”: “meaning” } }, “fields”: [“name”, “surname”, “age”], “from”: 100, “size”: 20 }

Here, we’re using the “fields” element to restrict which fields should be returned and the “from” and “size” elements to tell Elasticsearch we’re looking for documents 100 to 119 (starting at 100 and counting 20 documents).

The Query DSL

The Query DSL can be invoked using most of Elasticsearch’s search APIs. For simplicity, we’ll look only at the Search API that uses the _search endpoint. When calling the search API, you can specify the index and/or type on which you want to search. You can even search on multiple indices and types by separating their names with commas or using wildcards to match multiple indices and types:

Search on all the Logstash indices:

curl localhost:9200/logstash-*/_search

Search in the current and legacy indices, in the documents type:

curl localhost:9200/current,legacy/documents/_search

Search in the clients indices, in the bigcorp and smallco types:

curl localhost:9200/clients/bigcorp,smallco/_search

We’ll be using Request Body Searches, so searches should be invoked as follows:

curl localhost:9200/_search -d ‘{“query”:{“match”: {“_all”:”meaning”}}}’

Compound Queries

Although there are multiple query clause types, the one you’ll use the most is Compound Queries because it’s used to combine multiple clauses to build up complex queries.

The Bool Query is probably used the most because it can combine the features of some of the other compound query clauses such as the And, Or, Filter, and Not clauses. It is used so much that these four clauses have been deprecated in various versions in favor of using the Bool query. Using it is best explained with an example:

curl localhost:9200/_search -d ‘{ "query":{ "bool": { "must": { "fuzzy" : { "name": "john", "fuzziness": 2 } }, "must_not": { "match": { "_all": "city" } }, "should": [ { "range": { "age": { "from": 30, "to": 40 } } }, { "wildcard" : { "surname" : "K*" } } ] } } }’

Within the query element, we’ve added the bool clause that indicates that this will be a boolean query. There’s quite a lot going in there, so let’s cover it clause-by-clause, starting at the top:

must

All queries within this clause must match a document in order for ES to return it. Think of this as your AND queries. The query that we used here is the fuzzy query, and it will match any documents that have a name field that matches “john” in a fuzzy way. The extra “fuzziness” parameter tells Elasticsearch that it should be using a Damerau-Levenshtein Distance of 2 two determine the fuzziness.

must_not

Any documents that match the query within this clause will be outside of the result set. This is the NOT or minus (-) operator of the query DSL. In this case, we do a simple match query, looking for documents that contain the term “city.” Using _all as the field name indicates that the term can appear in any of the document’s fields. This is the must_not clause, so matching documents will be excluded.

should

Up until now, we have been dealing with absolutes: must and must_not. Should is not absolute and is equivalent to the OR operator. Elasticsearch will return any documents that match one or more of the queries in the should clause. The first query that we provided looks for documents where the age field is between 30 and 40. The second query does a wildcard search on the surname field, looking for values that start with “K.”

The query contained three different clauses, so Elasticsearch will only return documents that match the criteria in all of them. These queries can be nested, so you can build up very complex queries by specifying a bool query as a must, must_not, should or filter query.

filter

One clause type we haven’t discussed for a compound query is the filter clause. Here is an example where we use one:

curl localhost:9200/_search -d ‘{ “query”:{ “bool”: { “must”: { { “match_all”: {} } }, “filter”: { “term”: { “email”: “joe@bloggs.com” } } } } }`

The match_all query in the must clause tells Elasticsearch that it should return all of the documents. This might not seem to be a very useful search, but it comes in handy when you use it in conjunction with a filter as we have done here. The filter we have specified is a term query, asking for all documents that contain an email field with the value “joe@bloggs.com.” We have used a filter to specify which documents we want, so they will all be returned with a score of 1. Filters are not used in the calculation of scores, so the match_all query gives all documents a score of 1.

One thing to note is that this query will not work as expected if the email field is analyzed, which is the default for fields in Elasticsearch. The reason behind this is a topic best discussed in another blog post, but it comes down to the fact that Elasticsearch analyzes both fields and queries when they come in. In this case, the email field will be broken up into three parts: joe, blogs, and com. This means that it will match searches and documents for any three of those terms.

Filters Versus Queries

People who have used Elasticsearch before version 2 will be familiar with filters and queries. You used to build up a query body using both filters and queries. The difference between the two was that filters were generally faster because they check only if a document matches at all and not whether it matches well. In other words, filters give a boolean answer whereas queries return a calculated score of how well a document matches a query. Various performance enhancements were associated with filters due to their simplified nature.

Since version 2 of Elasticsearch, filters and queries have merged and any query clause can serve as either a filter or a query (depending on the context). As with version 1, filters are cached and should be used if scoring does not matter.

Scoring

We have mentioned the fact that Elasticsearch returns a score along with all of the matching documents from a query:

> curl “localhost:9200/_search?q=application” { "_shards":{ "total" : 5, "successful" : 5, "failed" : 0 }, "hits":{ "total" : 1, "max_score": 2.3, "hits" : [ { "_index" : "logstash-2016.04.04", "_type" : "logs", "_id" : "1", "_score": 2.3, "_source" : { "message" : "Log message from my application" } } ] } }

This score is calculated against the documents in Elasticsearch based on the provided queries. Factors such as the length of a field, how often the specified term appears in the field, and (in the case of wildcard and fuzzy searches) how closely the term matches the specified value all influence the score. The calculated score is then used to order documents, usually from the highest score to lowest, and the highest scoring documents are then returned to the client. There are various ways to influence the scores of different queries such as the boost parameter. This is especially useful if you want certain queries in a complex query to carry more weight than others and you are looking for the most significant documents.

When using a query in a filter context (as explained earlier), no score is calculated. This provides the enhanced performance usually associated with using filters but does not provide the ordering and significance features that come with scoring.

Conclusion

The hardest thing about Elasticsearch is the depth and breadth of the available features. We have tried to cover the essential elements in as much detail as possible without drowning you in information. Ask any questions you might have in the comments, and look out for more in-depth posts covering some of the features we have mentioned. You can also read my prior Elasticsearch tutorial to learn more.