Elasticsearch, A distributed, RESTful search and analytics engine

Today we will Setup a 2 Node Elasticsearch Cluster on CentOS 7 and go through some API examples on creating indexes, ingesting documents, searches etc.

But before we get to that, let's cover some basics.

Some Basics:

Elasticsearch Cluster is made up of a number of nodes

Each Node contains Indexes, where as an Index is a Collection of Documents

Indexes is splitted into Multiple Shards

Shards exists of Primary Shards and Replica Shards

A Replica Shard is a Copy of a Primary Shard which is used for HA/Redundancy

Shards gets placed on random nodes throughout the cluster

A Replica Shard will NEVER be on the same node as the Primary Shard's associated shard-id

What are we building Today:

This cluster will consist of 2 datanodes, so with this scenario a Master Node will be elected. In a future post I will be going through how to setup a Elasticsearch Cluster with Dedicated Masters.

Our Environment:

Node Name: es01 (172.18.32.146)

Node Name: es02 (172.18.24.120)

I am using LXD for my Hypervisor as I am running Elasticsearch on LXC containers so I had to set my vm.max_map_count to 262144 when getting errors like the following:

[1]: max virtual memory areas vm.max_map_count [65530] is too low, increase to at least [262144]

That can be done by setting the following, In my case on the Hypervisor:

$ sudo sysctl -w vm.max_map_count=262144 $ sudo sysctl -p

Pre-Requisites:

We will need to update our index repositories and update our current packages, then install Java 1.8

$ sudo yum update -y $ sudo yum install net-tools wget -y $ sudo yum install java-1.8.0-openjdk -y

Once our dependencies is installed, check for the latest Elasticsearch version, at the time of writing, I will be using Elasticsearch 5.4.1.

Verify that Java is Installed:

$ java -version openjdk version "1.8.0_131" OpenJDK Runtime Environment (build 1.8.0_131-b12) OpenJDK 64-Bit Server VM (build 25.131-b12, mixed mode)

Setup the Repository for Elasticsearch:

cat > /etc/yum.repos.d/es.repo << EOF [elasticsearch-5.x] name=Elasticsearch repository for 5.x packages baseurl=https://artifacts.elastic.co/packages/5.x/yum gpgcheck=1 gpgkey=https://artifacts.elastic.co/GPG-KEY-elasticsearch enabled=1 autorefresh=1 type=rpm-md EOF

Once this is done, we need to update our repository index:

$ sudo yum update -y

Install Elasticsearch:

$ sudo yum install elasticsearch -y

Once Elasticsearch is installed, repeat these steps on the second node. Once that is done, move on to the configuration section.

Configure Elasticsearch:

We will now configure elasticsearch and make the configuration aware that we will have another node that will be joining this cluster.

For a basic cluster, open the configuration: /etc/elasticsearch/elasticsearch.yml and apply this basic config to both nodes:

cluster.name: myescluster node.name: ${HOSTNAME} network.host: 0.0.0.0 discovery.zen.ping.unicast.hosts: ["172.18.32.146", "172.18.24.120"]

Start Elasticsearch:

First enable Elasticsearch to start on boot:

$ sudo systemctl enable elasticsearch

Then restart Elasticsearch on both nodes:

$ sudo systemctl restart elasticsearch

Verify that the processes has started and that we can see the ports listening:

$ sudo netstat -tulpn Active Internet connections (only servers) Proto Recv-Q Send-Q Local Address Foreign Address State PID/Program name tcp6 0 0 :::9200 :::* LISTEN 278/java tcp6 0 0 :::9300 :::* LISTEN 278/java

The following section will go on a couple of basic API examples to get some information from the cluster and on how to send data to the cluster.

Send a GET Request to one of your Nodes:

Use a http client like curl to test if Elasticsearch is working by sending a get request to one of your nodes on port 9200.

$ curl -XGET http://172.18.32.146:9200

{ "name" : "es01", "cluster_name" : "myescluster", "cluster_uuid" : "SBIUIQwlTRCaXxmYeG2BpA", "version" : { "number" : "5.4.1", "build_hash" : "2cfe0df", "build_date" : "2017-05-29T16:05:51.443Z", "build_snapshot" : false, "lucene_version" : "6.5.1" }, "tagline" : "You Know, for Search" }

View the Number of Nodes in your Cluster:

From the output below we can see that node es02 was elected as master.

$ curl -XGET http://172.18.32.146:9200/_cat/nodes?v ip heap.percent ram.percent cpu load_1m load_5m load_15m node.role master name 172.18.32.146 10 59 2 1.55 1.61 1.34 mdi - es01 172.18.24.120 10 26 2 1.55 1.61 1.34 mdi * es02

At this point in time, we do not have any indexes in our cluster, an index in Elasticsearch is like a database, if you would think from a Relational Database world:

Database => Table => Columns or Rows

then in Elasticsearch we have:

Index => Type => Document

So now, let's create our first index.

Note: that when you create an index, the Default Primary Shard Count will be 5, and the Default Replica Shard Count will be 1.

You can change the replica shard count after a index has been created, but not the primary shard count as that you will need to set on index creation.

Example API: Create Elasticsearch Index:

$ curl -XPUT http://172.18.32.146:9200/myfirstindex

{"acknowledged":true,"shards_acknowledged":true}

Example API: View Elasticsearch Indices:

From the below output you will find that we have 5 primary shards and 1 replica shard, wih 0 documents in our index and that our Cluster is in a Green state, which means our cluster is healthy.

Note that a replica shard will NEVER reside on the same node as the primary shard for HA and Redundancy.

$ curl -XGET http://172.18.32.146:9200/_cat/indices?v health status index uuid pri rep docs.count docs.deleted store.size pri.store.size green open myfirstindex 3lz10_G1QSmz-YhaxAqnQg 5 1 0 0 1.2kb 650b

If you have 2 nodes and you set your replica shards to the number of 2, you will see your green status will change from green to yellow, as the replica shard will be in an unassigned stated. Let's see how that will look like:

Example API: Set Replica shards to 2:

$ curl -XPUT http://172.18.32.146:9200/myfirstindex/_settings -d '{"number_of_replicas": 2}'

Now we have 2 replica shards, but because we only have 2 nodes, we will have a yellow state cluster:

$ curl -XGET http://172.18.32.146:9200/_cat/indices?v health status index uuid pri rep docs.count docs.deleted store.size pri.store.size yellow open myfirstindex 3lz10_G1QSmz-YhaxAqnQg 5 2 0 0 1.2kb 650b

To get some more insights into this, lets have a look at the shards assignments:

Example API: View Shards Assignments:

So here we can see that the replica shards that were created is in an unassigned state, as soon as we add another node, the shard will be assigned to a node, then our cluster will turn to a green state again.

$ curl -XGET http://172.18.24.120:9200/_cat/shards?v index shard prirep state docs store ip node myfirstindex 2 p STARTED 0 130b 172.18.32.146 es01 myfirstindex 2 r STARTED 0 130b 172.18.24.120 es02 myfirstindex 2 r UNASSIGNED myfirstindex 1 r STARTED 0 130b 172.18.32.146 es01 myfirstindex 1 p STARTED 0 130b 172.18.24.120 es02 myfirstindex 1 r UNASSIGNED myfirstindex 3 r STARTED 0 130b 172.18.32.146 es01 myfirstindex 3 p STARTED 0 130b 172.18.24.120 es02 myfirstindex 3 r UNASSIGNED myfirstindex 4 p STARTED 0 130b 172.18.32.146 es01 myfirstindex 4 r STARTED 0 130b 172.18.24.120 es02 myfirstindex 4 r UNASSIGNED myfirstindex 0 p STARTED 0 130b 172.18.32.146 es01 myfirstindex 0 r STARTED 0 130b 172.18.24.120 es02 myfirstindex 0 r UNASSIGNED

We will change the replication count back to 1:

$ curl -XPUT http://172.18.32.146:9200/myfirstindex/_settings -d '{"number_of_replicas": 1}'

Example API: Ingest Some Data:

We will ingest 3 documents into our index, this will be a simple document consisting of a name, country and gender, example:

{ "name":"james", "country":"south africa", "gender": "male" }

Example API: PUT Record to Elasticsearch:

If we use the PUT API, we need to specify an ID of the document that we are ingesting into Elasticsearch:

Let's ingest our first document:

$ curl -XPUT http://172.18.32.146:9200/myfirstindex/people/1 -d '{"name":"james", "country":"south africa", "gender": "male"}'

Now you will find we have one index in our index:

$ curl -XGET http://172.18.24.120:9200/_cat/indices?v health status index uuid pri rep docs.count docs.deleted store.size pri.store.size green open myfirstindex 3lz10_G1QSmz-YhaxAqnQg 5 1 1 0 9.6kb 4.8kb

Example API: GET Request to our Cluster on the Document ID:

$ curl -XGET http://172.18.24.120:9200/myfirstindex/people/1?pretty

{ "_index" : "myfirstindex", "_type" : "people", "_id" : "1", "_version" : 1, "found" : true, "_source" : { "name" : "james", "country" : "south africa", "gender" : "male" } }

Example API: POST Request to Elasticsearch to ingest a Document:

Without having to specify the ID of our document, we can use the POST API to our Elasticsearch cluster which will generate a Document ID automatically:

$ curl -XPOST http://172.18.32.146:9200/myfirstindex/people/ -d '{"name": "kevin", "country": "new zealand", "gender": "male"}'

$ curl -XPOST http://172.18.32.146:9200/myfirstindex/people/ -d '{"name": "sarah", "country": "ireland", "gender": "female"}'

So then again, looking at our document count:

$ curl -XGET http://172.18.24.120:9200/_cat/indices?v health status index uuid pri rep docs.count docs.deleted store.size pri.store.size green open myfirstindex 3lz10_G1QSmz-YhaxAqnQg 5 1 3 0 35.3kb 17.6kb

Example API: GET Request on the Search API:

curl -XGET http://172.18.24.120:9200/myfirstindex/_search?pretty

{ "took" : 572, "timed_out" : false, "_shards" : { "total" : 5, "successful" : 5, "failed" : 0 }, "hits" : { "total" : 3, "max_score" : 1.0, "hits" : [ { "_index" : "myfirstindex", "_type" : "people", "_id" : "AVx1E-i_KQucUVwniOdM", "_score" : 1.0, "_source" : { "name" : "sarah", "country" : "ireland", "gender" : "female" } }, { "_index" : "myfirstindex", "_type" : "people", "_id" : "1", "_score" : 1.0, "_source" : { "name" : "james", "country" : "south africa", "gender" : "male" } }, { "_index" : "myfirstindex", "_type" : "people", "_id" : "AVx1E5dyKQucUVwniOdL", "_score" : 1.0, "_source" : { "name" : "kevin", "country" : "new zealand", "gender" : "male" } } ] } }

Lets try searching for any documents containing the value of Sarah:

$ curl -XGET http://172.18.24.120:9200/myfirstindex/_search?q=sarah | python -m json.tool

{ "_shards": { "failed": 0, "successful": 5, "total": 5 }, "hits": { "hits": [ { "_id": "AVx1E-i_KQucUVwniOdM", "_index": "myfirstindex", "_score": 0.25316024, "_source": { "country": "ireland", "gender": "female", "name": "sarah" }, "_type": "people" } ], "max_score": 0.25316024, "total": 1 }, "timed_out": false, "took": 8 }

More info on the Search API can be found here

Example API: POST Request on the Search API for Queries:

Here we will use a query to match the country = new zealand

$ curl -XPOST http://172.18.24.120:9200/myfirstindex/_search/ -d ' { "query": { "match": { "country": "new zealand" } } } ' | python -m json.tool

{ "_shards": { "failed": 0, "successful": 5, "total": 5 }, "hits": { "hits": [ { "_id": "AVx1E5dyKQucUVwniOdL", "_index": "myfirstindex", "_score": 0.51623213, "_source": { "country": "new zealand", "gender": "male", "name": "kevin" }, "_type": "people" } ], "max_score": 0.51623213, "total": 1 }, "timed_out": false, "took": 16 }

More info on Queries can be found here

Example API: Deleting your Index:

$ curl -XDELETE http://172.18.24.120:9200/myfirstindex

If you are looking to ingest a lot of data into elasticsearch, you can have a look at my Python Data Generator for Elasticsearch Script

Here is a script that can bootstrap the setup process: setup-elasticsearch-clusternode-centos.sh

I hope this was useful, I will post more Elasticsearch Tutorials tagged under the Elasticsearch Tag .