The trendy word big data comes of the 3 Vs: volume, variety, and velocity. Volume refers to the size of data, variety refers to the diverse types of data, and velocity refers to the speed of data processing. To handle persistent big data, there are NoSQL databases that write and read data faster. But with the diversity in a vast volume, a search engine is required to find information that is without significant computer power and that takes too much time. A search engine is a software system that is designed to search for information; this mechanism makes it more straightforward and clear for users get the information that they want.

This article will cover NoSQL that is both document type and search engine Elasticsearch.

Elasticsearch is a NoSQL document type and a search engine based on Lucene. It provides a distributed, multi-tenant-capable full-text search engine with an HTTP web interface and schema-free JSON documents. Elasticsearch is developed in Java and is released as open-source under the terms of the Apache License. Elasticsearch is the most popular enterprise search engine followed by Apache Solr, which is also based on Lucene. It is a near-real-time search platform. What this means is there is a slight latency (normally one second) from the time you index a document until the time it becomes searchable.

Steps in a Search Engine

In Elasticsearch, the progress of a search engine is based on the analyzer, which is a package containing three lower-level building blocks: character filters, tokenizers, and token filters. Through the Elasticstatic documentation, the definitions are:

A character filter receives the original text as a stream of characters and can transform the stream by adding, removing, or changing characters. For instance, a character filter could be used to convert Hindu-Arabic numerals into their Arabic-Latin equivalents or to strip HTML elements from the stream.

A tokenizer receives a stream of characters, breaks it up into individual tokens (usually individual words), and outputs a stream of tokens. For instance, a whitespace tokenizer breaks the text into tokens whenever it sees any whitespace. It would convert the text "Quick brown fox!" into the terms [Quick, brown, fox!].

A token filter receives the token stream and may add, remove, or change tokens. For example, a lowercase token filter converts all tokens to lowercase, a stop token filter removes common words (stop words) like the from the token stream, and a synonym token filter introduces synonyms into the token stream.

How to Install ElasticSearch in Docker

The first step to use ES is to install it in Docker. You can install both manually and through Docker. The easiest way is with Docker following the steps below:

docker run -p 9200:9200 -p 9300:9300 -e "discovery.type=single-node" docker.elastic.co/elasticsearch/elasticsearch:6.2.3

Elasticsearch and Java EE Working Together

Eclipse JNoSQL is the bridge to work between these platforms (Java EE and the search engine). An important point to remember is that Elasticsearch is also a NoSQL document type, so a developer may model the application as such. To use both the standard document behavior and the Elasticsearch API, a programmer needs to use the the Elasticsearch extension.

<dependency> <groupId>org.jnosql.artemis</groupId> <artifactId>elasticsearch-extension</artifactId> <version>0.0.5</version> </dependency>

For this demo, we'll create a contacts agenda for a developer that will have a name, address, and, of course, the language that they know. An address has fields and becomes a subdocument that is a document inside a document.

@Entity("developer") public class Developer { @Id private Long id; @Column private String name; @Column private List < String > phones; @Column private List < String > languages; @Column private Address address; } @Embeddable public class Address { @Column private String street; @Column private String city; @Column private Integer number; }

With the model defined, let's set the mapping. Mapping is the process of determining how a document and the fields it contains are stored and indexed. For this example, the fields are usually the type keyword and those are only searchable by their exact value. Also, there is the languages field that we defined as text with a custom analyzer. This custom analyzer, the whitespace_analyzer , has one tokenizer, whitespace, and three filters (standard, lowercase, and asciifolding).

{ "settings": { "analysis": { "filter": { }, "analyzer": { "whitespace_analyzer": { "type": "custom", "tokenizer": "whitespace", "filter": [ "standard", "lowercase", "asciifolding" ] } } } }, "mappings": { "developer": { "properties": { "name": { "type": "keyword" }, "languages": { "type": "text", "analyzer": "whitespace_analyzer" }, "phones": { "type": "keyword" }, "address": { "properties": { "street": { "type": "text" }, "city": { "type": "text" }, "number": { "type": "integer" } } } } } } }

With the API, the developer can do the basic operations of a document NoSQL database — at least, a CRUD — however, in ES, the behavior of search engine matters and is useful. That why it has an extension.

public class App { public static void main(String[] args) { Random random = new Random(); Long id = random.nextLong(); try (SeContainer container = SeContainerInitializer.newInstance().initialize()) { Address address = Address.builder() .withCity("Salvador") .withStreet("Rua Engenheiro Jose") .withNumber(10).build(); Developer developer = Developer.builder(). withPhones(Arrays.asList("85 85 343435684", "55 11 123448684")) .withName("Poliana Lovelace") .withId(id) .withAddress(address) .build(); DocumentTemplate documentTemplate = container.select(DocumentTemplate.class).get(); Developer saved = documentTemplate.insert(developer); System.out.println("Developer saved" + saved); DocumentQuery query = select().from("developer") .where("_id").eq(id).build(); Optional < Developer > personOptional = documentTemplate.singleResult(query); System.out.println("Entity found: " + personOptional); } } private App() {} }

From the Elasticsearch extension, the user might use the QueryBuilders, a utility class to create search queries in the database.

public class App3 { public static void main(String[] args) throws InterruptedException { try (SeContainer container = SeContainerInitializer.newInstance().initialize()) { Random random = new Random(); long id = random.nextLong(); Address address = Address.builder() .withCity("São Paulo") .withStreet("Av. nove de Julho 1854") .withNumber(10).build(); Developer developer = Developer.builder(). withPhones(Arrays.asList("85 85 343435684", "55 11 123448684")) .withName("Maria Lovelace") .withId(id) .withAddress(address) .withLanguage("Java SE") .withLanguage("Java EE") .build(); ElasticsearchTemplate template = container.select(ElasticsearchTemplate.class).get(); Developer saved = template.insert(developer); System.out.println("Developer saved" + saved); TimeUnit.SECONDS.sleep(2 L); TermQueryBuilder query = QueryBuilders.termQuery("phones", "85 85 343435684"); List < Developer > people = template.search(query); System.out.println("Entity found from phone: " + people); people = template.search(QueryBuilders.termQuery("languages", "java")); System.out.println("Entity found from languages: " + people); } } private App3() {} }

Conclusion

An application that has an intuitive way to find data in an enterprise application is prime, mainly when the software handles a massive and with several data kinds. Elasticsearch can help the Java EE world with both NoSQL documents and a search engine. This post covered how to join the best of these two worlds using Eclipse JNoSQL.