Introduction

Any production incident soon turn scary when you start to get a hell lot of logs in your Logging infrastructure causing it to overwhelm and you on the other side unable to generate any meaningful data out of it. Now if you are supplying your logs via Beats to your Logstash, its better to use a buffering mechanism. Kafka acting as a buffer in front of Logstash to ensure resiliency, is the best way to deploy the ELK Stack to reduce logs overload.

Apache Kafka is the most common buffer solution deployed together the ELK Stack. Kafka is deployed between the logs delivery and the indexing units, acting as an segregation unit for the data being collected:

In this story, we’ll see how to deploy all the components required to set up a resilient logs pipeline with Apache Kafka and ELK Stack:

Beats — Collects logs and forwards them to a Kafka topic.

— Collects logs and forwards them to a Kafka topic. Kafka — Buffers the data flow and queues it.

— Buffers the data flow and queues it. Logstash — Aggregates the data from the Kafka topic, processes it and sends to Elasticsearch.

— Aggregates the data from the Kafka topic, processes it and sends to Elasticsearch. Elasticsearch — Indexes and Maps the data.

— Indexes and Maps the data. Kibana — Visualizes the Mapped Data to the end user.

Prerequisites:

For setting up the environment am using Microsoft Azure VMs because I have credits unused in them. You can go ahead and do the same on AWS EC2. Am running an Ubuntu 18.04 VM. Just make sure to put it in a Public Subnet and proper Vnet in Azure and Public Subnet in VPC in AWS. Add a Security Inbound Rule for Port 22 (SSH) and 5601(TCP) for SSH and Kibana Connection.

Am using Apache Access Logs for the pipeline, you can use VPC Flow Logs, ALB Access logs etc.

We will start with installing the main component in the stack — Elasticsearch.

Login to your Ubuntu system using sudo privileges. For the remote Ubuntu server using ssh to access it. Windows users can use putty or Powershell to log in to Ubuntu system.

Elasticsearch requires Java to run on any system. Make sure your system has Java installed by running following command. This command will show you the current Java version.

sudo apt install openjdk-11-jdk-headless

Check the installation is successful or not by the below command

~$ java — version

openjdk 11.0.3 2019–04–16

OpenJDK Runtime Environment (build 11.0.3+7-Ubuntu-1ubuntu218.04.1)

OpenJDK 64-Bit Server VM (build 11.0.3+7-Ubuntu-1ubuntu218.04.1, mixed mode, sharing)

Step 1— Install Elasticsearch on Ubuntu

The Elasticsearch official team provides an apt repository to install Elasticsearch on Ubuntu Linux system. After install below package and import GPG key for Elasticsearch packages.

Download and install the public signing key:

wget -qO - https://artifacts.elastic.co/GPG-KEY-elasticsearch | sudo apt-key add -

Now you may need to install the apt-transport-https package on Debian before proceeding:

sudo apt-get install apt-transport-https

echo "deb https://artifacts.elastic.co/packages/7.x/apt stable main" | sudo tee -a /etc/apt/sources.list.d/elastic-7.x.list

Save the repository definition to /etc/apt/sources.list.d/elastic-7.x.list :

echo “deb https://artifacts.elastic.co/packages/7.x/apt stable main” | sudo tee -a /etc/apt/sources.list.d/elastic-7.x.list

You can install the Elasticsearch Debian package with:

sudo apt-get update && sudo apt-get install elasticsearch

Before we bootstrap Elasticsearch, we need to apply some basic configurations using the Elasticsearch configuration file at: /etc/elasticsearch/elasticsearch.yml:

sudo su

nano /etc/elasticsearch/elasticsearch.yml

Since we are installing Elasticsearch on Azure/AWS, we will bind Elasticsearch to localhost. Also, we need to define the private IP of our VM/EC2 instance as a master-eligible node:

network.host: "<InstancePrivateIP>"

http.port:9200

cluster.initial_master_nodes: ["<InstancePrivateIP>"]

Save the file and run Elasticsearch with:

sudo service elasticsearch start

To confirm that everything is working as expected, point curl to: http://localhost:9200, and you should see something like the following output (give Elasticsearch a minute or two before you start to worry about not seeing any response):

Step 2: Installing Logstash

Next up, the “L” in ELK — Logstash. Logstash and installing it is easy. Just type the following command.

sudo apt-get install logstash -y

Next, we will configure a Logstash pipeline that pulls our logs from a Kafka topic, process these logs and ships them on to Elasticsearch for indexing.

Let’s create a new config file:

sudo nano /etc/logstash/conf.d/apache.conf

Next, we will configure a Logstash pipeline that pulls our logs from a Kafka topic, process these logs and ships them on to Elasticsearch for indexing.

Let’s create a new config file:

As you can see — we’re using the Logstash Kafka input plugin to define the Kafka host and the topic we want Logstash to pull from. We’re applying some filtering to the logs and we’re shipping the data to our local Elasticsearch instance.

Step 3: Installing Kibana

Let’s move on to the next component in the ELK Stack — Kibana. As before, we will use a simple apt command to install Kibana:

sudo apt-get install kibana

We will then open up the Kibana configuration file at: /etc/kibana/kibana.yml, and make sure we have the correct configurations defined:

server.port: 5601

server.host: "<INSTANCE_PRIVATE_IP>"

elasticsearch.hosts: ["http://<INSTANCE_PRIVATE_IP>:9200"]

Then enable and start the Kibana service:

sudo systemctl enable kibana

sudo systemctl start kibana

We would need to install Firebeat. Use:

sudo apt install filebeat

Open up Kibana in your browser with: http://<PUBLIC_IP>:5601. You will be presented with the Kibana home page.

I will continue to edit this post or add a new part for the Pipeline addition and other materials.