Since the earlier versions of ES, it supported scripting, but the scripting language has evolved over the ES releases. Starting with MVEL prior version 1.4, Groovy (post version 1.4), and now the latest entry “Painless” starting ES 5.0, the scripting in ES has evolved. A key reason for this evolution is the need for faster, safer and simpler scripting language.

A common use case when working with Elasticsearch(ES) are the creation of dynamic fields, performing calculations on fields on the fly, modifying the scoring based upon a logic, etc. In order to perform these operations, Elasticsearch supports scripting.

Prior the release of “Painless” in ES 5.0, the majority of the security vulnerabilities that were reported in ES had to deal with vulnerabilities due to the scripting. Painless scripting language addresses these issues and is more secure, and faster than its predecessors. Starting ES 5.0, “Painless” is the default scripting language and its syntax is similar to “Groovy”.

“Painless” is a dynamic scripting language specifically built specially for ES and it cannot be used as a generic purpose language just like “Groovy”. We covered “Painless” in our earlier blog post Painless Scripting in Elasticsearch. This blog takes it a level further and explains the various usages of scripting that would be very handy.

Before we can start using the scripts lets understand the syntax of scripts in ES.

"script": { "lang": "...", "inline" | "stored": "...", "params": { ... } }

As shown above, the script syntax consists of three parts:

lang: The language the script is written in. The default is painless and hence this field is optional if painless scripts are used. Some of the values that are valid for this field are “painless”, “expression”, “mustache”, “java” etc. inline|stored: Two types of scripts are supported. Those are inline scripts where it is specified as an inline and stored scripts where it’s is specified as stored. As you read through this post, we will also explain stored scripts in much more detail. params: if the script that is defined uses any params, then those can be passed in this section.

Sample Docs

Let’s index some sample docs and see scripting in action:

curl -XPUT 'localhost:9200/stocks/mystocks/1' -H 'Content-Type: application/json' -d ' {"company":"Apple","symbol":"aapl","shares":100,"price":150,"purchase_date":"2017/07/25"}' curl -XPUT 'localhost:9200/stocks/mystocks/2' -H 'Content-Type: application/json' -d ' {"company":"Google","symbol":"googl","shares":50,"price":950,"purchase_date":"2017/06/25"}'

Let’s update the document “1” and modify the risk factor from “high” to “low”. The command for that would be as below:

curl -XPOST 'localhost:9200/stocks/mystocks/1/_update' -H 'Content-Type: application/json' -d '{ "script" : { "inline":"ctx._source.risk = \"low\"" } }'

Validation

Let’s validate if the document really got updated:

curl -XGET 'localhost:9200/stocks/mystocks/1?pretty=true' { "_index" : "stocks", "_type" : "mystocks", "_id" : "1", "_version" : 4, "found" : true, "_source" : { "company" : "Apple", "symbol" : "aapl", "shares" : 100, "price" : 150, "purchase_date" : "2017/07/25", "notes" : "I brought this stock at the best price", "risk" : "low" } }

As shown in the above example, since the default language in ES 5.X is “painless,” there wasn’t any need for specifying the “lang” parameter. Also, as we didn’t pass any parameters to the scripts, “params” was omitted, too. What would be the advantage of “params”? Why should you go for “params” other than hardcoding the values in the script?

Params Field

The first time Elasticsearch sees a new script, it compiles it and stores the compiled version in a cache. Compilation can be a heavy process. Whenever there is a change in the script (say in the above example, when we change the value of the risk field as either high or moderate or something else) the script has to be recompiled every time the value changes. But, if we foresee that the script would use dynamic parameters (variables) then it’s best to pass them using the “params” field. The script that uses “params” would be only compiled once.

Let’s see how to modify the above example to pass variable to the script:

curl -XPOST 'localhost:9200/stocks/mystocks/1/_update' -H 'Content-Type: application/json' -d '{ "script" : { "inline":"ctx._source.risk = params.level", "params" :{ "level" : "moderate" } } }'

Validation

curl -XGET 'localhost:9200/stocks/mystocks/1?pretty=true' { "_index" : "stocks", "_type" : "mystocks", "_id" : "1", "_version" : 4, "found" : true, "_source" : { "company" : "Apple", "symbol" : "aapl", "shares" : 100, "price" : 150, "purchase_date" : "2017/07/25", "notes" : "I brought this stock at the best price", "risk" : "moderate" } }

If you compile too many unique scripts within a small amount of time, Elasticsearch will reject the new dynamic scripts with a circuit_breaking_exception error. By default, up to 15 inline scripts per minute will be compiled. Hence it’s always good to use scripts that use parameters.

Stored Scripts

Apart from inline scripts, ES provides a way to create and store scripts and later those can be referenced in the scripts by script-id/name. These scripts are stored in the cluster state and can be retrieved using the end point _scripts/{script-name}

Learn about our Kubernetes Development Support

Let’s create the above inline script in the previous example as a stored script.

curl -XPOST 'localhost:9200/_scripts/modify_risk' -H 'Content-Type: application/json' -d '{ "script" : { "lang": "painless", "code":"ctx._source.risk = params.level" } }'

We can retrieve the stored script as below:

curl -XGET 'localhost:9200/ _scripts/modify_risk?pretty=true'

Response:

{ "_id" : "modify_risk", "found" : true, "script" : { "lang" : "painless", "code" : "ctx._source.risk = params.level" } }

Let’s use the stored script instead of inline script for the previous example:

curl -XPOST 'localhost:9200/stocks/mystocks/1/_update' -H 'Content-Type: application/json' -d '{ "script" : { "stored": "modify_risk", "params" :{ "level" : "normal" } } }'

Validation

Let’s validate if the document actually got updated:

curl -XGET 'localhost:9200/stocks/mystocks/1?pretty=true' { "_index" : "stocks", "_type" : "mystocks", "_id" : "1", "_version" : 5, "found" : true, "_source" : { "company" : "Apple", "symbol" : "aapl", "shares" : 100, "price" : 150, "purchase_date" : "2017/07/25", "notes" : "I brought this stock at the best price", "risk" : "normal" } }

All scripts are cached by default so that they only need to be recompiled when updates occur. File scripts keep a static cache and will always reside in memory. Both inline and stored scripts are stored in a cache that can evict residing scripts. By default, scripts do not have a time-based expiration, but you can change this behavior by using the script.cache.expire setting. You can configure the size of this cache by using the script.cache.max_size setting. By default, the cache size is 100.

Within the script, one can access the document values either using the syntax doc[‘field_name’] if doc values are enabled. ‘Not Analyzed’ fields will have doc values enabled by default, or by using the syntax ctx._source.field_name (i.e use “_source” fields). For search and aggregation operations, use doc values and for update use “_source” fields.

Often during search, one would have to create dynamic fields on the fly. Using script fields (“script_fields”) one can create dynamic fields in the query response. Let’s create a scripted field named totalcost which would be formed by multiplying price and shares and thus see script fields in action. Also note as price and shares fields are not_analyzed and have doc values enabled by default, I am using accessing those fields using doc values than source which would be much slower.

curl -XPOST 'localhost:9200/stocks/mystocks/_search' -H 'Content-Type: application/json' -d '{ "query": { "match_all": {} }, "script_fields": { "totalcost" : { "script" : { "inline": "doc['\''price'\''].value * doc['\''shares'\''].value" } } } }'

Response:

{ "took": 135, "timed_out": false, "_shards": { "total": 5, "successful": 5, "failed": 0 }, "hits": { "total": 2, "max_score": 1, "hits": [ { "_index": "stocks", "_type": "mystocks", "_id": "2", "_score": 1, "fields": { "totalcost": [ 47500 ] } }, { "_index": "stocks", "_type": "mystocks", "_id": "1", "_score": 1, "fields": { "totalcost": [ 15000 ] } } ] } }

Now we see the newly created dynamic field “totalcost” in the query response. We can create as many dynamic fields as we want using script fields. Did you notice something different in the returned response? Yes, “_source” is missing. When we add “script_fields” section, “_source” is not returned in the response. In order to get “_source”, add it explicitly in the query as shown below.

curl -XPOST 'localhost:9200/stocks/mystocks/_search' -H 'Content-Type: application/json' -d '{ "query": { "match_all": {} }, “_source” : [], "script_fields": { "totalcost" : { "script" : { "inline": "doc['\''price'\''].value * doc['\''shares'\''].value" } } } }'

Response:

{ "took": 18, "timed_out": false, "_shards": { "total": 5, "successful": 5, "failed": 0 }, "hits": { "total": 2, "max_score": 1, <strong></strong> "hits": [ { "_index": "stocks", "_type": "mystocks", "_id": "2", "_score": 1, "_source": { "company": "Google", "symbol": "googl", "shares": 50, "price": 950, "purchase_date": "2017/06/25", "notes": "Its is a risky bet", "risk": "low" }, "fields": { "totalcost": [ 47500 ] } }, { "_index": "stocks", "_type": "mystocks", "_id": "1", "_score": 1, "_source": { "company": "Apple", "symbol": "aapl", "shares": 100, "price": 150, "purchase_date": "2017/07/25", "notes": "I brought this stock at the best price", "risk": "normal" }, "fields": { "totalcost": [ 15000 ] } } ] } }

Conclusion

In this article, we went in depth about “painless” and its syntax, its usage and some good practices like why to use params, when to use doc values versus _source when accessing the document fields and how to create fields on the fly etc. In the next article let’s explore further usages of “painless” scripting. We would be covering topics like using painless scripting in a query context, filter context and topic like using conditionals in scripting, accessing nested objects, accessing items in a list and usage of scripting in scoring. Keep awaiting for the next article.

Give it a Whirl!

It’s easy to spin up a standard hosted Elasticsearch cluster on any of our 47 Rackspace, Softlayer, or Amazon data centers. And you can now provision your own AWS Credits on Qbox Private Hosted Elasticsearch.

Questions? Drop us a note, and we’ll get you a prompt response.

Not yet enjoying the benefits of a hosted ELK-stack enterprise search on Qbox? We invite you to create an account today and discover how easy it is to manage and scale your Elasticsearch environment in our cloud hosting service.