Elasticsearch is the living heart of what is today’s the most popular log analytics platform — the ELK Stack (Elasticsearch, Logstash and Kibana). Elasticsearch’s role is so central that it has become synonymous with the name of the stack itself. Primarily for search and log analysis, Elasticsearch is today one of the most popular database systems available today. This Elasticsearch tutorial provides new users with the prerequisite knowledge and tools to start using Elasticsearch. It includes installation instructions, and initial indexing and data handling instructions.

What is Elasticsearch?

Initially released in 2010, Elasticsearch (sometimes dubbed ES) is a modern search and analytics engine which is based on Apache Lucene. Completely open source and built with Java, Elasticsearch is a NoSQL database. That means it stores data in an unstructured way and that you cannot use SQL to query it.

This Elasticsearch tutorial could also be considered a NoSQL tutorial. However, unlike most NoSQL databases, Elasticsearch has a strong focus on search capabilities and features — so much so, in fact, that the easiest way to get data from ES is to search for it using the extensive Elasticsearch API.

In the context of data analysis, Elasticsearch is used together with the other components in the ELK Stack, Logstash and Kibana, and plays the role of data indexing and storage.

Installing Elasticsearch

The requirements for Elasticsearch are simple: Java 8 (specific version recommended: Oracle JDK version 1.8.0_131). Take a look at this Logstash tutorial to ensure that you are set. Also, you will want to make sure your operating system is on the Elastic support matrix, otherwise you might run up against strange and unpredictable issues. Once that is done, you can start by installing Elasticsearch.

You can download Elasticsearch as a standalone distribution or install it using the apt and yum repositories. We will install Elasticsearch on an Ubuntu 16.04 machine running on AWS EC2 using apt .

First, you need to add Elastic’s signing key so you can verify the downloaded package (skip this step if you’ve already installed packages from Elastic):

wget -qO - https://artifacts.elastic.co/GPG-KEY-elasticsearch | sudo apt-key add -

For Debian, we need to then install the apt-transport-https package:

sudo apt-get install apt-transport-https

The next step is to add the repository definition to your system:

echo "deb https://artifacts.elastic.co/packages/6.x/apt stable main" | sudo tee -a /etc/apt/sources.list.d/elastic-6.x.list

All that’s left to do is to update your repositories and install Elasticsearch:

sudo apt-get update sudo apt-get install elasticsearch

Configuring Elasticsearch

Elasticsearch configurations are done using a configuration file whose location depends on your operating system. In this file, you can configure general settings (e.g. node name), as well as network settings (e.g. host and port), where data is stored, memory, log files, and more.

For development and testing purposes, the default settings will suffice yet it is recommended you do some research into what settings you should manually define before going into production.

For example, and especially if installing Elasticsearch on the cloud, it is a good best practice to bind Elasticsearch to either a private IP or localhost:

sudo vim /etc/elasticsearch/elasticsearch.yml network.host: "localhost" http.port:9200

Running Elasticsearch

Elasticsearch will not run automatically after installation and you will need to manually start it. How you run Elasticsearch will depend on your specific system. On most Linux and Unix-based systems you can use this command:

sudo service elasticsearch start

And that’s it! To confirm that everything is working fine, simply point curl or your browser to http://localhost:9200, and you should see something like the following output:

{ "name" : "33QdmXw", "cluster_name" : "elasticsearch", "cluster_uuid" : "mTkBe_AlSZGbX-vDIe_vZQ", "version" : { "number" : "6.1.2", "build_hash" : "5b1fea5", "build_date" : "2018-01-10T02:35:59.208Z", "build_snapshot" : false, "lucene_version" : "7.1.0", "minimum_wire_compatibility_version" : "5.6.0", "minimum_index_compatibility_version" : "5.0.0" }, "tagline" : "You Know, for Search" }

To debug the process of running Elasticsearch, use the Elasticsearch log files located (on Deb) in /var/log/elasticsearch/.

Creating an Elasticsearch Index

Indexing is the process of adding data to Elasticsearch. This is because when you feed data into Elasticsearch, the data is placed into Apache Lucene indexes. This makes sense because Elasticsearch uses the Lucene indexes to store and retrieve its data. Although you do not need to know a lot about Lucene, it does help to know how it works when you start getting serious with Elasticsearch.

Elasticsearch behaves like a REST API, so you can use either the POST or the PUT method to add data to it. You use PUT when you know the or want to specify the ID of the data item, or POST if you want Elasticsearch to generate an ID for the data item:

curl -XPOST 'localhost:9200/logs/my_app' -H 'Content-Type: application/json' -d' { "timestamp": "2018-01-24 12:34:56", "message": "User logged in", "user_id": 4, "admin": false } ' curl -X PUT 'localhost:9200/app/users/4' -H 'Content-Type: application/json' -d ' { "id": 4, "username": "john", "last_login": "2018-01-25 12:34:56" } '

And the response:

{"_index":"logs","_type":"my_app","_id":"ZsWdJ2EBir6MIbMWSMyF","_version":1,"result":"created","_shards":{"total":2,"successful":1,"failed":0},"_seq_no":0,"_primary_term":1} {"_index":"app","_type":"users","_id":"4","_version":1,"result":"created","_shards":{"total":2,"successful":1,"failed":0},"_seq_no":0,"_primary_term":1}

The data for the document is sent as a JSON object. You might be wondering how we can index data without defining the structure of the data. Well, with Elasticsearch, like with any other NoSQL database, there is no need to define the structure of the data beforehand. To ensure optimal performance, though, you can define Elasticsearch mappings according to data types. More on this later.

If you are using any of the Beats shippers (e.g. Filebeat or Metricbeat), or Logstash, those parts of the ELK Stack will automatically create the indices.

To see a list of your Elasticsearch indices, use:

curl -XGET 'localhost:9200/_cat/indices?v&pretty' health status index uuid pri rep docs.count docs.deleted store.size pri.store.size yellow open logstash-2018.01.23 y_-PguqyQ02qOqKiO6mkfA 5 1 17279 0 9.9mb 9.9mb yellow open app GhzBirb-TKSUFLCZTCy-xg 5 1 1 0 5.2kb 5.2kb yellow open .kibana Vne6TTWgTVeAHCSgSboa7Q 1 1 2 0 8.8kb 8.8kb yellow open logs T9E6EdbMSxa8S_B7SDabTA 5 1 1 0 5.7kb 5.7kb

The list in this case includes the indices we created above, a Kibana index and an index created by a Logstash pipeline.

Elasticsearch Querying

Once you index your data into Elasticsearch, you can start searching and analyzing it. The simplest query you can do is to fetch a single item. Read our article focused exclusively on Elasticsearch queries.

Once again, via the Elasticsearch REST API, we use GET:

curl -XGET 'localhost:9200/app/users/4?pretty'

And the response:

{ "_index" : "app", "_type" : "users", "_id" : "4", "_version" : 1, "found" : true, "_source" : { "id" : 4, "username" : "john", "last_login" : "2018-01-25 12:34:56" } }

The fields starting with an underscore are all meta fields of the result. The _source object is the original document that was indexed.

We also use GET to do searches by calling the _search endpoint:

curl -XGET 'localhost:9200/_search?q=logged' {"took":173,"timed_out":false,"_shards":{"total":16,"successful":16,"skipped":0,"failed":0},"hits":{"total":1,"max_score":0.2876821,"hits":[{"_index":"logs","_type":"my_app","_id":"ZsWdJ2EBir6MIbMWSMyF","_score":0.2876821,"_source": { "timestamp": "2018-01-24 12:34:56", "message": "User logged in", "user_id": 4, "admin": false } }]}}

The result contains a number of extra fields that describe both the search and the result. Here’s a quick rundown:

took : The time in milliseconds the search took

: The time in milliseconds the search took timed_out : If the search timed out

: If the search timed out shards : The number of Lucene shards searched, and their success and failure rates

: The number of Lucene shards searched, and their success and failure rates hits: The actual results, along with meta information for the results

The search we did above is known as a URI Search, and is the simplest way to query Elasticsearch. By providing only a word, ES will search all of the fields of all the documents for that word. You can build more specific searches by using Lucene queries:

username:johnb – Looks for documents where the username field is equal to “johnb”

– Looks for documents where the username field is equal to “johnb” john* – Looks for documents that contain terms that start with john and is followed by zero or more characters such as “john,” “johnb,” and “johnson”

– Looks for documents that contain terms that start with john and is followed by zero or more characters such as “john,” “johnb,” and “johnson” john? – Looks for documents that contain terms that start with john followed by only one character. Matches “johnb” and “johns” but not “john.”

There are many other ways to search including the use of boolean logic, the boosting of terms, the use of fuzzy and proximity searches, and the use of regular expressions.

Elasticsearch Query DSL

URI searches are just the beginning. Elasticsearch also provides a request body search with a Query DSL for more advanced searches. There is a wide array of options available in these kinds of searches, and you can mix and match different options to get the results that you require.

It contains two kinds of clauses: 1) leaf query clauses that look for a value in a specific field, and 2) compound query clauses (which might contain one or several leaf query clauses).

Elasticsearch Query Types

There is a wide array of options available in these kinds of searches, and you can mix and match different options to get the results that you require. Query types include:

Geo queries, “More like this” queries Scripted queries Full text queries Shape queries Span queries Term-level queries Specialized queries

As of Elasticsearch 6.8, the ELK Stack has merged Elasticsearch queries and Elasticsearch filters, but ES still differentiates them by context. The DSL distinguishes between a filter context and a query context for query clauses. Clauses in a filter context test documents in a boolean fashion: Does the document match the filter, “yes” or “no?” Filters are also generally faster than queries, but queries can also calculate a relevance score according to how closely a document matches the query. Filters do not use a relevance score. This determines the ordering and inclusion of documents:

curl -XGET 'localhost:9200/logs/_search?pretty' -H 'Content-Type: application/json' -d' { "query": { "match_phrase": { "message": "User logged in" } } } '

And the result:

{ "took" : 28, "timed_out" : false, "_shards" : { "total" : 5, "successful" : 5, "skipped" : 0, "failed" : 0 }, "hits" : { "total" : 1, "max_score" : 0.8630463, "hits" : [ { "_index" : "logs", "_type" : "my_app", "_id" : "ZsWdJ2EBir6MIbMWSMyF", "_score" : 0.8630463, "_source" : { "timestamp" : "2018-01-24 12:34:56", "message" : "User logged in", "user_id" : 4, "admin" : false } } ] } } '

Removing Elasticsearch Data

Deleting items from Elasticsearch is just as easy as entering data into Elasticsearch. The HTTP method to use this time is—surprise, surprise—DELETE:

$ curl -XDELETE 'localhost:9200/app/users/4?pretty' { "_index" : "app", "_type" : "users", "_id" : "4", "_version" : 2, "result" : "deleted", "_shards" : { "total" : 2, "successful" : 1, "failed" : 0 }, "_seq_no" : 1, "_primary_term" : 1 }

To delete an index, use:

$ curl -XDELETE 'localhost:9200/logs?pretty'

To delete all indices (use with extreme caution) use:

$ curl -XDELETE 'localhost:9200/_all?pretty'$

The response in both cases should be:

{ "acknowledged" : true } To delete a single document: $ curl -XDELETE 'localhost:9200/index/type/document'

What’s Next?

This tutorial helps beginners with Elasticsearch and as such provides just the basic steps of CRUD operations in Elasticsearch. Elasticsearch is a search engine, and as such features an immense depth to its search features.

Recent versions have introduced some incredible new Elasticsearch features and also some significant changes to the underlying data structure.