What is Elasticsearch ?

First, we need to know what is Elasticsearch. It is a very highly-scalable document storage engine that specializes in text base search. It’s built on top of Lucene (full-featured text search engine library fully written in Java), so many of their core features are inherited from it. It has distributed multi tenant-capable and support high latency applications. It’s schema-free so you can be a modified structure as per your requirement without impacting to other data. It has JSON-based nature along with simple REST API.

How to configure

First, you need download elasticsearch setup. Now Extract zip file and setup your cluster name and node in elasticsearch.yml file. Run bet file for Windows user (open bin directory inside you find elasticsearch.bet) Almost done now need to check server is started or not. hit the URL in the browser: http://localhost:9200/

{ "name" : "I am Intelligence ", "cluster_name" : "bookStore", "version" : { "number" : "2.3.1", "build_hash" : "bd980929010aef504e7ab0843e61d0664569dkc39", "build_timestamp" : "2016-04-04T12:25:05Z", "build_snapshot" : false, "lucene_version" : "5.5.0" }, "tagline" : "You Know, for Search." }

Note : I change cluster name elasticsearch to bookStore in elasticsearch.yml so it’s show above.

Directory Information

Config : elasticsearch.yml : advanced configuration like setup cluster, nodes, index path and memory related configurations. (default is well set if your application is high latency then need to change some advance options but be carefully change default setting after reading documentation). logging.yml : here all logging related setting like which log print on screen and logged on file like log4j.properties (when you java programmer you know this Loggers based liberery that preserve logs).

Data : when you create an index here all index data will be a store.

Lib : Here all required JAR available for Elastic Search.

Logs : In this directory all the logs related index are perserver here we also see this on own a CMD window when we start elasticsearch.bet file or perform any operation on index.

How to access ElasticSearch index

In Elastic Search everything is on REST so you can send your request using CRUL or Chrome extension Sense or Marvel plugin to the cluster. (If you want manually add or update or delete index also query fire in that index).

Sense makes it easy for us to compose and send GETs, POSTs, PUTs, and DELETEs to tell the server make it index data or retrieve results for us as per our query string.

Using Marvel Kibana app to monitor your cluster there you are tracking and visualizing cluster related data. It’s provided rich data about our cluster.

Basic Terms

Cluster

Cluster is contain number of nodes which share the same cluster name.

Document

A document is a primary record that can you store in index. It’s similar to a row in a traditional RDBMS database. Documents are structured is JSON objects and must belong to a specific type like number, String, Date etc.

Type

A type is a sets of documents with common fields. It being similar to a table in a traditional RDBMS database, but the definition is somewhat less strict because of it’s schema free.

Index

An index is a collection of related type of documents. It is somewhat similar in function to a database or schema in the traditional database world.

Shard

A shard is a single Lucene instance on multiple shard one primary shard. In our example index was really small, but many indexes can get quite large (e.g. e-commerce website) and it isn’t realistic at all to have Elasticsearch index with multiple TBs(terabytes) of data into them. Shard helps quite you scale up this data beyond one machine by breaking your index up into multiple parts and storing it on multiple nodes (generally one node per server machine but it’s also possible multiple node on single system). that why it’s allows for more storage, and one more thing shards also allow for better performance, because data in the same index can be searched by multiple nodes at the same time get quick result.

Replica

In simple word, the replica is a copy of a shard. Each primary shard has zero or more replicas. Shard useful when hardware failures time for example when all of your shards are replicated, then the fail one of the node in your cluster will not harm availability. Replicas also improve throughput and latency by making your index’s data available to more nodes in the cluster. By default, each primary shard has one replica, but you can make the number of models can be changed dynamically on an existing index as per your requirements.

In over mind one question always arise how does it work? Where is Elastic search on a picture? First of all, the user performs any action on website in our example online shopping site which user search for AC4 Human and give control user interface to web service (REST, SOAP, etc.). The elastic cluster has multiple nodes as we discuss above. Node returns JSON result to Data Access layer respected to query. When a data update or delete that automatically index is updated, so changes directly reflected on index result. The number of database roundtrip is decrease and performance automatically increase.

Reference