
There is reason why we compared Elasticsearch with Apache Hadoop. Here is How To How to Install, Configure Elasticsearch with Apache Hadoop, Flume, Kibana. Also We Provided Links to Official Configuration. Before running the commands, we will suggest to read the text under the next sub header.

README To Install, Configure Elasticsearch with Apache Hadoop

Previously we have published few important guides. Even you are used for few months with Apache Hadoop, it will not harm to read the above links. They have more textual information more than mere commands to install.

How to Install, Configure Elasticsearch with Apache Hadoop

Minimum needs are HDP 2.0 or HDP Sandbox for HDP 2.0 on CentOS. In other words, follow previous guide to install Hadoop which is on numbered one on previous list of guides. Now we need to work with Java and PATH:

Advertisement ---

export JAVA_HOME=/usr/java/default export PATH=$JAVA_HOME/bin:$PATH java -version 1 2 3 export JAVA_HOME = / usr / java / default export PATH = $ JAVA_HOME / bin : $ PATH java - version

Next we need to install Flume :

yum install flume-agent flume 1 yum install flume - agent flume

Next, we need to install Elasticsearch, which is on numbered four on previous list of guides. We need to open and modify the file elasticsearch.yml :

nano /etc/elasticsearch/elasticsearch.yml 1 nano / etc / elasticsearch / elasticsearch . yml

These are things you need to modify :

cluster.name: “logsearch”

node.name: “node1”

node.master: true

node.data: true

index.number_of_shards: 5

index.number_of_replicas : 1

path.data: /data1,/data2,/data3,/data4

discovery.zen.minimum_master_nodes: 1

discovery.zen.ping.timeout: 3s

discovery.zen.ping.multicast.enabled: false

discovery.zen.ping.unicast.hosts: [“host1”, “host2:port”]

Log location is at /var/log/elasticsearch . You can CD, open and adjust later. We can control Elasticsearch with usual commands :

/etc/init.d/elasticsearch start /etc/init.d/elasticsearch status 1 2 / etc / init . d / elasticsearch start / etc / init . d / elasticsearch status

Next we need to install Kibana, which is on numbered four on previous list of guides. Update the logstash index pattern to Flume supported index pattern. Under app/dashboards/logstash.json the entries [logstash-]YYYY.MM.DD need to be separated by dash – [logstash-]YYYY-MM-DD . Now, we need to work with Flume :

mkdir /usr/lib/flume/plugins.d cp $elasticsearch_home/lib/elasticsearch-0.90*jar /usr/lib/flume/plugins.d cp $elasticsearch_home/lib/lucene-core-*jar /usr/lib/flume/plugins.d 1 2 3 mkdir / usr / lib / flume / plugins . d cp $ elasticsearch_home / lib / elasticsearch - 0.90 * jar / usr / lib / flume / plugins . d cp $ elasticsearch_home / lib / lucene - core - * jar / usr / lib / flume / plugins . d

We can update Flume configuration to use a local file and index into Elasticsearch in logstash format. But in real cases, Flume Log4j Appender, Syslog TCP Source, Flume Client SDK, Spool Directory Source are used. The below is for Flume configuration to use a local file. You must understand what you are doing. You are only testing.

agent.sources = tail

agent.channels = memoryChannel

agent.channels.memoryChannel.type = memory

agent.sources.tail.channels = memoryChannel

agent.sources.tail.type = exec

agent.sources.tail.command = tail -F /tmp/es_log.log

agent.sources.tail.interceptors=i1 i2 i3

agent.sources.tail.interceptors.i1.type=regex_extractor

agent.sources.tail.interceptors.i1.regex = (\w.*):(\w.*):(\w.*)\s

agent.sources.tail.interceptors.i1.serializers = s1 s2 s3

agent.sources.tail.interceptors.i1.serializers.s1.name = source

agent.sources.tail.interceptors.i1.serializers.s2.name = type

agent.sources.tail.interceptors.i1.serializers.s3.name = src_path

agent.sources.tail.interceptors.i2.type=org.apache.flume.interceptor.TimestampInterceptor$Builder

agent.sources.tail.interceptors.i3.type=org.apache.flume.interceptor.HostInterceptor$Builder

agent.sources.tail.interceptors.i3.hostHeader = host

agent.sinks = elasticsearch

agent.sinks.elasticsearch.channel = memoryChannel

agent.sinks.elasticsearch.type=org.apache.flume.sink.elasticsearch.ElasticSearchSink

agent.sinks.elasticsearch.batchSize=100

agent.sinks.elasticsearch.hostNames = your.IP.here:9300

agent.sinks.elasticsearch.indexName = logstash

agent.sinks.elasticsearch.clusterName = logsearch

agent.sinks.elasticsearch.serializer = org.apache.flume.sink.elasticsearch.ElasticSearchLogStashEventSerializer

Now, create :

touch /tmp/es_log.log 1 touch / tmp / es_log . log

open :

nano /tmp/es_log.log 1 nano / tmp / es_log . log

populate with this kind of things :

website:weblog:login_page weblog data1 website:weblog:profile_page weblog data2 website:weblog:transaction_page weblog data3 website:weblog:docs_page weblog data4 syslog:syslog:sysloggroup syslog data1 syslog:syslog:sysloggroup syslog data2 syslog:syslog:sysloggroup syslog data3 syslog:syslog:sysloggroup syslog data4 1 2 3 4 5 6 7 8 website : weblog : login_page weblog data1 website : weblog : profile_page weblog data2 website : weblog : transaction_page weblog data3 website : weblog : docs_page weblog data4 syslog : syslog : sysloggroup syslog data1 syslog : syslog : sysloggroup syslog data2 syslog : syslog : sysloggroup syslog data3 syslog : syslog : sysloggroup syslog data4

Restart Flume :

/etc/init.d/flume-agent restart 1 / etc / init . d / flume - agent restart

Frankly, this is a typical setup. You need to read :

https://github.com/elastic/elasticsearch-hadoop https://www.elastic.co/guide/en/elasticsearch/hadoop/current/index.html https://www.elastic.co/guide/en/elasticsearch/hadoop/current/reference.html 1 2 3 https : / / github . com / elastic / elasticsearch - hadoop https : / / www . elastic . co / guide / en / elasticsearch / hadoop / current / index . html https : / / www . elastic . co / guide / en / elasticsearch / hadoop / current / reference . html

for case specific setup and usage.

Tagged With

This Article Has Been Shared 934 Times! Pinterest

About Abhishek Ghosh Abhishek Ghosh is a Businessman, Orthopaedic Surgeon, Author and Blogger. You can keep touch with him on Twitter - @AbhishekCTRL.