Elasticsearch Overview

When building an application that requires a large number of inventory items, service listings or client base, one can face issues with slow database querying & information retrieval. For users that are accustomed to their search results delivered in under 1s, slow results can lead to poor UI experience.

Traditional RDMS are not suitable for querying large volumes of data quickly. In response, NoSQL was developed and used for managing & storing large volumes of data. Elasticsearch is a form of NoSQL distributed database.

Elasticsearch (ES) is a schema-less document-oriented database. Based on a Lucene StandardAnalyser, Elasticsearch is designed to store, retrieve & manage semi-structured or document-oriented data, that is stored in JSON document form.

Unlike MongoDB which a general purpose NoSQL database, Elasticsearch is a distributed text search engine utilized for working on large datasets, in real time.

Cluster(Indexable collection of servers storing data) have a basic abstraction level provided by Elastic search, however, an HTTP REST API is primarily used as a means of abstraction.

If your mobile app requires a lot of filter & search operations, ES is your best bet. During implementation, ES is often paired with NoSQL and SQL databases.

3 Top Elasticsearch Use Cases

1) Voluminous Text Search

Being one of the core-capabilities, ES is not only suited to handle search queries but also is powerful & flexible enough to support large volumes of data via distributed computing or shards(refer to the above image) in real time.

ES has its own query DSL with features such as auto-complete, “Did you mean?”/auto-suggest, geo-localization & more.

In terms of handling large volume, sharding is used.

Shards are single Lucene instances managed by ES. Primary shards are first to be indexed followed by the indexing on the replica shards. The number of primary shards required for the volume can be specified beforehand.

By default, each index has 5 shards associated with it. Sharding overall aids in maintaining performance across large volumes of data during searches.

2) Scrapping & Logging Data

If you’ve experimented with Elastic search before, you would have realized its ability to effectively scrap data from various sources. The ES ecosystem has been designed to make it easy to implement scalable logging solutions.

Developers take advantage of this by using Elasticsearch to add logging as part of their main usecase in the application or use it purely as a tool for logging in data.

Tools like Logstash, Bats, Ingest Nodes etc are used alongside with Elasticsearch to provide a variety of data grabbing & logging options for a variety of indexable data locations. Even data from social channels can be pulled and indexed using Logstash & Elasticsearch.

3) Visualization & Analytics

Since data is stored in a structure with Elasticsearch, analytics can easily be applied to data logged in it. One can perform various analytic operations such as time series analysis via timestamps, pattern identification & tracking anomalies via machine learning, security & authentication checks etc

Tools like Kibana, lets developers visualize the data and analysis stored with rich dashboards. Due to the structured storage of data, it is easier for developers to develop reports with visualizations. Kibana provides tons of charting options, time-series data analysis using TimeLion and a tile service for geo-data.

In practice, Logstash is used for collecting, parsing and storing logs with ES. Kibana is used to search and view the logs indexed. A combination of all 3 is known as an ELK stack

ELK stacks are extremely versatile. From being used a standalone application or integrated with existing applications to maintain extremely data analytics on large datasets.

Due to the lack of a strict schema, Elasticsearch has the added benefit to take store data from multiple sources & still keep it all manageable & searchable. Using their Twitter plugin you can even define a set of hashtags from Twitter, pull in all the tweets with the hashtags & visualize them using Kibana.

Elasticsearch – Not great as a primary database

ES can handle large data sets, flexibility during single object storage and fast search queries at the cost of latency, transactions and joins

ES is not an ACID compliant database system like most SQL systems. This means it cannot block transactions, which is very important for financial actions(such as buying an item or updating a cart) where multiple tables need to be updated in sync.

ES does not come with security features such as built-in authentication or authorization.

Basically, ES is great as long as it used for its intended usage ie, distributed full-text search & aggregation. In practice, you would use Elasticsearch as a secondary database to store text-based information that needs to be quickly indexed and searched.

Summary – Benefits of Using Elasticsearch

1) Manages the huge amount of data: In comparison to the traditional SQL database management systems that take more than 10 seconds to fetch required search query data, Elasticsearch can do that within 10 ms.

2) Direct, Easy & Fast access: Documents are stored in a close proximity to the corresponding metadata in the index. This reduces the no of data reads and as a result, increases the search result response.

3) Scalability of the search Engine: As Elasticsearch has a distributed architecture it enables to scale up to thousands of servers and accommodate petabytes of data. The customers then need not manage the complexity of distributed design as it has been done automatically. Usually paired with container technology & microservices during development, to create an all-rounded scalable application

Resources & Image Credits

https://medium.com/oneclicklabs-io/streaming-spring-boot-application-logs-to-elk-stack-part-1-a68bd7cccaeb

https://dzone.com/articles/why-elasticsearch-suitable-0

https://stackoverflow.com/questions/12723239/elasticsearch-v-s-mongodb-for-filtering-application

https://medium.com/@ranjeetvimal/elasticsearch-vs-mongodb-631f410cd317