In this blog, we will be creating an index in detail, which ranges from static index creation for the creation of simple indices, to dynamic template creation for creating multiple indices.

Components of an Index

While creating an index, there are three important settings:

1. Shard settings

The data we index in Elasticsearch is stored in shards. Index is the name given to a group of shards.

There are two types of shards :

a. Primary shards

b. Replica shards

Now, here in the shard settings, we can define the number of shards (primary or replica) to be used to create our index.

2. Analyser Settings

The index analysis module acts as a configurable registry of Analyzers that can be used in order to both break indexed (analyzed) fields when a document is indexed and process query strings.

There are 2 types of analysers :

a. Built in analyser(s)

b. Custom analyser(s) created by the user

3. Type and mapping Settings

There can be several types in both an index and a document. The there can be different fields with varying data types. Take a look at the core types , which you will likely recognize.

We can rely on Elasticsearch’s default mapping to take care of the mappings assigned to the types and fields or we can manually specify it.

Static Index Creation

In the previous section we have seen the basic settings of an index. So let us apply all of them and create one index. Say we are going to index the following employee data into an index named “company” under the type “employeeInfo”

{ "name" : "George Harrison", "age" : 32, "experienceInYears" : 20 }

Now as a whole, we create the index as follows:

curl -X PUT "http://localhost:9200/company" -d '{ "settings": { "index": { "number_of_shards": 1, "number_of_replicas": 1 }, "analysis": { "analyzer": { "analyzer-name": { "type": "custom", "tokenizer": "keyword", "filter": "lowercase" } } }, "mappings": { "employeeinfo": { "properties": { "age": { "type": "long" }, "experienceInYears": { "type": "long" }, "name": { "type": "string", "analyzer": "analyzer-name" } } } } } }'

We can see in the above request that there is a settings section, under which we have three divisions:

“Index” settings

“Analysis” settings

“Mappings” settings

Let us go through each one of them:

Index settings

As mentioned earlier, this section have defined the number of primary and replica shards needed for our index. Here we have defined both the numbers to be 1 .

Analysis settings

In this section we have defined a custom analyzer named “analyzer-name”. This analyzer employs keyword tokeniser, which allows us to use the filters such as lowercase to the text.

This analyzer was created keeping in mind the field name in our document, which is to be made searchable regardless of tokenization and casing. Later in the mappings section , you can see we have applied this custom analyzer to the field name .

Mappings settings

Here the rest of the fields ( age and experienceInYears ) are mapped. Also we have specified which type in the index are we mapping. Here we are defining the mappings for the type employeeinfo . Here since we are predefining the mapping setting for each field at the time of index creation, this form of mapping is called “ static field mapping “. For our purpose now, “static mapping” will work fine but as the data gets dynamic in nature, we need to employ other methods of mapping which will be discussed in our next blog.

Dynamic Template Creation

Index templating is one of the most useful and important features of Elasticsearch. This feature comes in handy when we need to create indices with similar names,and common index settings for them.

Consider a case in which we need to create weekly indices namely company-01 , company-02 , etc with the same settings to every one of them. It would be very time consuming if we are going to create these indices on by one and define the mapping for each one of them. In such cases, we can make use of the template creation feature provided by Elasticsearch

Here the pattern for the indices names is company-* . Now we can define default settings for these indices as shown below:

curl -XPUT 'localhost:9200/_template/testindextemplate' -d '{ "template": "company-*", "order": 0, "settings": { "index": { "number_of_shards": 1, "number_of_replicas": 1 }, "analysis": { "analyzer": { "analyzer-name": { "type": "custom", "tokenizer": "keyword", "filter": "lowercase" } } }, "mappings": { "employeeinfo": { "properties": { "age": { "type": "long" }, "experienceInYears": { "type": "long" }, "name": { "type": "string", "analyzer": "analyzer-name" } } } } } }'

Here we have created a template named testindextemplate and have defined which all name patterns the settings should apply for in the field template . In this example, these settings would apply to any index with the name pattern starting with company- . We have also defined an order field in the above code, which is given the value 0. The order value sets the priority in which the created template would be preferred by Elasticsearch.

We can verify the template mappings by using the GET API as below:

curl -XGET localhost:9200/_template/testindextemplate

This will return the mapping information of the template.

Templating proves extremely useful in indexing time based data. When we index time based data, it would be most helpful if we index it in chronological order with definite patterns like company-YYYY-MM_DD . So creating a template for these would makes sense for applying the default mapping information to the indices falling under it.

Conclusion

We hope you enjoyed our tutorials demonstrating how to create an index in detail; with both static and dynamic approaches with common settings. We’d love feedback and questions as you apply the tutorials to your own work.

In the next blog, we will go deep into various types of mappings that can be applied to an index such as dynamic type mapping, and dynamic template for mappings.