Using Elasticsearch for fun and profit

Since you know everything you need about our infrastructure, let's talk about playing with our Elasticsearch cluster the smart way for fun and, indeed, profit.

Elasticseach and our indexes naming allows us to be lazy so we can watch more cute kitten videos on Youtube. To create an index with the right mapping and settings, we use Elasticsearch templates and auto create index patterns.

Every node in the cluster has the following configuration:

action:

auto_create_index: +<mapping id 1>_*,+<mapping id 2>_*,-*

And we create a template in Elasticsearch for every mapping we need.

PUT /_template/template_<mapping id>

{

"template": "<mapping id>_*",

"settings": {

"number_of_shards": 1

},

"mappings": {

"add some json": "here"

}

} },

"mappings": {

"add some json": "here"

}

}

Every time the indexer tries to write into a not yet existing index, Elasticsearch creates it with the right mapping. That's the magic.

Except this time, we don't want to create empty indexes with a single shard as we're going to copy existing data.

After playing with Elasticsearch for years, we've noticed that the best size / shard was about 10GB. This allows faster reallocation and recovery at a cost of more Lucene segments during heavy writing and more frequent optimization.

On Blink, 1,000,000 documents weight about 2GB so we're creating indexes with 1 shard for each 5 million documents + 1 when the dashboard already has more than 5 million documents.

Before reindexing a client, we run a small script to create the new indexes with the right amount of shards. Here's a simplified version without error management for your eyes only.

curl -XPUT http://esnode01:9200/ _ -d ‘{ “settings.index.number_of_shards” : ‘$(( $(curl -XGET http://esnode01:9200/ _ /_count | cut -f 2 -d : | cut -f 1 -d “,”) / 5000000 + 1))’}’

Now we're able to reindex, except we didn't solve the CPU issue. That's where fun things start.

What we're going to do is to leverage Elasticsearch zone awareness to dedicate a few data nodes to the writing process. You can also add some new nodes if you can't afford removing a few from your existing cluster, it works exactly the same way.

First, let's kick out all the indexes from those nodes.

PUT /_cluster/settings

{

“transient” : {

“cluster.routing.allocation.exclude._ip” : “<data node 1>,<data node 2>,<data node x>”

}

}

Elasticsearch then moves all the data from these nodes to the remaining ones. You can also shutdown those nodes and wait for the indexes to recover but you might lose data.

Then, for each node, we edit Elasticsearch configuration to assign these nodes to a new zone called envrack (f$#!ed up in French). We put all these machines in the secondary data center to use the spare http query nodes for the indexing process.

node:

zone: 'envrack'

Then restart Elasticsearch so it runs with the new configuration.

We don't want Elasticsearch to allocate the existing indexes to the new zone when we bring back these nodes online, so we update these index settings accordingly.

curl -XPUT http://esmaster01:9200/<old mapping id>_*/_settings -d ‘{

“routing.allocation.exclude.zone” : “envrack”

}’;

The same way, we don't want the new indexes to be allocated to the production zones, so we update the creation script.

#!/bin/bash

counter=$(curl -XGET shards=1counter=$(curl -XGET http://esnode01:9200/ _ /_count | cut -f 2 -d : | cut -f 1 -d “,”) if [ $counter -gt 5000000 ]; then

shards=$(( $counter / 5000000 + 1 ))

fi

“settings” : {

“index.number_of_shards” : ‘$counter’,

“index.numer_of_replicas” : 0,

“routing.allocation.exclude.zone” : “barack,chirack”

}

} curl -XPUT http://esnode01:9200/ _ -d ‘{“settings” : {“index.number_of_shards” : ‘$counter’,“index.numer_of_replicas” : 0,“routing.allocation.exclude.zone” : “barack,chirack”

More readable than a oneliner isn't it?

We don't add a replica for 2 reasons:

The cluster is zone aware and we only have one zone for the reindexing

Indexing with a replica means indexing twice, so using twice as much CPU. Adding a replica after indexing is just transferring the data from one host to another.

Indeed, losing a data node means losing data. If you can't afford reindexing an index multiple times in case of crash, don't do this and add another zone or allow your new indexes to use the data from the existing zone in the backup data center.

There's one more thing we want to do before we start indexing.

Since we've set the new zone in the secondary data center, we update the http query nodes configuration to make them zone aware so they read the local shards in priority. We do the same with the active nodes so they read their zone first. That way, we can query the passive http query nodes when reading during the reindexing process with little hassle on what the clients access.

In the main data center:

node:

zone: 'barack'

And in the secondary:

node:

zone: 'chirack'

Here's what our infrastructure looks like now.