The lack of modification of mapping for existing fields is one of the drawbacks of the ElasticSearch. Once the mapping is created for a field, it is not possible to change it unless it has been reindexed. Reindexing deletes the existing index and creates a new index with new mapping and some downtime in the process. This is critical for a business. However, ElasticSearch has a solution to the problem, index aliases. The alias is like a symbolic link which can point to one or more indices. It gives us the flexibility to create a new index in the background and making the changes in a way that is almost unnoticeable to the application.

There are few options to use index aliases for reindexing:

Option 1: A single alias which points to one index. When reindexing, a new index is created and all data is pushed into the new index. After finishing the reindexing, point the alias to the new index and delete the old index.

Option 2: Using two aliases, one for read and one for write. When you need to reindex your data, a new index is created, the write alias points to the new index and read alias points to the old index. After writing all data into the new index, read index will also point to the new index and finally delete the old index.

Option 3: Similar to option 2, this approach also utilises two aliases. However, the main difference is that when the new index is created, the read alias points both the old and the new indices. The drawback of this approach is that your application has to deal with duplicate documents in the search result.

Zero-downtime reindexing using aliases

For this blog post, I have chosen option 2. This method covers most of the aspects of the zero downtime reindexing. One downside of this strategy is that newly created, updated, or deleted documents write only the new index during the reindexing process. This means your application searches could see stale data during this time and new data will not show up until the reindexing is completed. If your business can not afford this, option 3 might be the right solution.

An example Laravel application has been set up to guide through the whole process of the zero downtime reindexing. The example app is only for the demonstration purpose. The production code needs to handle many edge cases such as handling errors. For data, we are going to use MySQL world database. This database has three tables: country, city and countrylanguage.

There are six stages to reindexing:

Step1: Create an index like world_1544094130 with a timestamp appended. After that, create two aliases world_write and world_read . Both aliases pointing at the index world_1544094130.

Step 2: The reindexing process is started. This creates a new index (i.e. world_1544094181 ) with the current timestamp.

Step 3: Update the write alias ( world_write ) to point the new index ( world_1544094181 ). This operation is atomic, hence, no need to worry about a short period of time where the alias does not point to an index.

POST /_aliases

{

"actions" : [

{

"remove" : {

"index" : "old_index",

"alias" : "world_write"

}

},

{

"add" : {

"index" : "new_index",

"alias" : "world_write"

}

}

]

}

Step 4: Query database and index all documents through write alias ( world_alias) . The write alias writes data to the newly created index.

There is another way to reindex data. ElasticSearch provides a reindex API which can copy documents from one index to another. The example is shown here.

POST _reindex

{

"source": {

"index": "old_index"

},

"dest": {

"index": "new_index"

}

}

Step 5: After reindexing all data, change read alias ( world_read ) to point newly created index ( world_1544094181 ).

POST /_aliases

{

"actions" : [

{

"remove" : {

"index" : "old_index",

"alias" : "world_read"

}

},

{

"add" : {

"index" : "new_index",

"alias" : "world_read"

}

}

]

}

Step 6. Remove the old index

As you can see, except for one drawback, this approach can easily reindex data without any downtime. Thus, according to me, the drawback is tolerable for most of the cases as this is not largely noticeable to the users and is very efficient.