In this article, we introduce some of the Elasticsearch basic concepts and examples of using its REST API.



Elasticsearch is using the inverted index technique for indexing the input data and consequently performing the search queries on the data sets. So there are some basic and useful commands to interact with Elasticsearch.

In your Unix Terminal, execute the following command in the directory of the Elasticsearch:

$ bin/elasticsearch -f 1 $ bin / elasticsearch - f

When it’s running in a single node, it is elected as the Master node and any node with the same cluster name (default is ‘elasticseach’) can join the cluster automatically. The HTTP address is by default bound to 9200 – http://localhost:9200.

Optionally, you can open a new Terminal and again run the above command as follows with different node name.

$ bin/elasricsearch -f -Des.node.name=Node2 1 $ bin / elasricsearch - f - Des .node .name = Node2

After running the second node, you can see the detected Master node among Node2 logs and the Node2 among the logs of the Master node. In other words, both nodes detected each other smoothly. The HTTP address is by default bound to 9201 – http://localhost:9201. You can easily interact with the cluster using either 9200 or 9201 port via REST calls.

For executing HTTP calls to the REST api of the Elasticsearch via the terminal, I am using a command line tool called Curl.

The input data format of the Elasticsearch is in JSON document. There is a classical tweeter example. Assuming that the Elasticsearch cluster is responsible for indexing tweet messages and afterwards be responsive to the search queries of the users. A tweet, for instance, can be as follows:

{ "user" : "Walter", "post_date" : "2009-11-15T14:12:13", "message" : "My name is Heisenberg" } 1 2 3 4 5 { "user" : "Walter" , "post_date" : "2009-11-15T14:12:13" , "message" : "My name is Heisenberg" }

So let’s index this tweet by invoking a simple HTTP PUT method:

$ curl -X PUT http://127.0.0.1:9200/twitter/tweet/10 -d { "user" : "Walter", "post_date" : "2009-11-15T14:12:13", "message" : "My name is Heisenberg" } 1 2 3 4 5 $ curl - X PUT http : / / 127.0.0.1 : 9200 / twitter / tweet / 10 - d { "user" : "Walter" , "post_date" : "2009-11-15T14:12:13" , "message" : "My name is Heisenberg" }

Then, Elasticsearch response is:

{ "_index" : "twitter", "_type" : "tweet", "_id" : "10", "_version" : 1, "created" : true } 1 2 3 4 5 6 7 { "_index" : "twitter" , "_type" : "tweet" , "_id" : "10" , "_version" : 1 , "created" : true }

A type is like a table in a relational database. Each type has a list of fields that can be specified for documents of that type. The mapping defines how each field in the document is analyzed.

An index is like a database in a relational database. It has a mapping which defines multiple types. An index is a logical namespace which maps to one or more primary shards and can have zero or more replica shards.

Elasticsearch acts as a document-oriented data store too. You can fetch your document by executing the following GET method:

$ curl -X GET http://127.0.0.1:9200/twitter/tweet/10 1 $ curl - X GET http : / / 127.0.0.1 : 9200 / twitter / tweet / 10

A mapping is like a schema definition in a relational database. Each index has a mapping, which defines each type within the index, plus a number of index-wide settings. A mapping can either be defined explicitly, or it will be generated automatically when a document is indexed.

Let’s see the Mapping of our ‘twitter’ index with the type ‘tweet’ by following command:

$ curl -X GET http://127.0.0.1:9200/twitter/tweet/_mapping 1 $ curl - X GET http : / / 127.0.0.1 : 9200 / twitter / tweet / _mapping

The response is:

{ "twitter" : { "mappings" : { "tweet" : { "properties" : { "message" : { "type" : "string" }, "post_date" : { "type" : "date", "format" : "dateOptionalTime" }, "user" : { "type" : "string" } } } } } } 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 { "twitter" : { "mappings" : { "tweet" : { "properties" : { "message" : { "type" : "string" } , "post_date" : { "type" : "date" , "format" : "dateOptionalTime" } , "user" : { "type" : "string" } } } } } }

Search queries can be performed in 2 ways: using URI’s and Query DSL. In this post, I just introduce some of the basic queries with the URI’s and postpone the latter for a complete post because of its importance.

The most basic search query is the one which comes with query strings. Let’s search for the tweets which contain “Heisenberg” using the _search method:

$ curl -X GET http://127.0.0.1:9200/twitter/tweet/_search?q=Heisenberg 1 $ curl - X GET http : / / 127.0.0.1 : 9200 / twitter / tweet / _search ? q = Heisenberg

The response is:

{ "took" : 2, "timed_out" : false, "_shards" : { "total" : 5, "successful" : 5, "failed" : 0 }, "hits" : { "total" : 1, "max_score" : 0.3125, "hits" : [ { "_index" : "twitter", "_type" : "tweet", "_id" : "10", "_score" : 0.3125, "_source":{ "user" : "Walter", "post_date" : "2009-11-15T14:12:13", "message" : "My name is Heisenberg"} } ] } } 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 { "took" : 2 , "timed_out" : false , "_shards" : { "total" : 5 , "successful" : 5 , "failed" : 0 } , "hits" : { "total" : 1 , "max_score" : 0.3125 , "hits" : [ { "_index" : "twitter" , "_type" : "tweet" , "_id" : "10" , "_score" : 0.3125 , "_source" : { "user" : "Walter" , "post_date" : "2009-11-15T14:12:13" , "message" : "My name is Heisenberg" } } ] } }

The took field shows the search execution time by Elasticsearch in milliseconds without JSON (de)serialization and the network overhead. The _shard shows the number of shards were searched. Shards are created by our indexes and every shard is a single Apache Lucene instance. Score is about the relevance of our result to the query. For example, if we had more results in hits array, we could have sorted them by the score (i.e. relevance).

We could also search for those tweets in which the ‘user’ field start with ‘Wal’ as follows:

$ curl -X GET http://127.0.0.1:9200/twitter/tweet/_search?q=user:Wal* 1 $ curl - X GET http : / / 127.0.0.1 : 9200 / twitter / tweet / _search ? q = user : Wal *

In one of my future posts, I will explain how to create Query DSL and more advanced search functions using Filters and queries.

by