Whenever you have a problem with any component in your architecture, the first thing that a system administrator does is to go and check the logs related to that application. Because logs can provide you an insight to what problem that application is facing.

If you are responsible for handling only a few handful of servers and applications running on them, then you can simply go and grep the logs to collect information. But imagine a situation, where you have hundreds of servers and you have very different types of applications that run on them. In such cases you cant go and run a tail and then grep or even use awk to format the output in your desired manner.

Although learning the art of formatting the output with awk is a good thing, but that just don’t serve the purpose of your troubleshooting and will take hours and hours to do the same on hundreds of servers.

The problem can be solved by having a central logging server. But even if you have a central logging server, you can only aggregate all logs at one place, but still sorting all of them and finding the problem during a disaster is a nightmare.

It will be better if we have a central logging server, where all application and system logs are collected, along with that we need a good interface, that will filter and sort different fields in the logs and populate all of that in a web interface.

Below mentioned are some points that we need to accomplish, to have a simple but efficient and scalable log management.

Integrated collection of logs at one place

Parsing of logs (So that you dont need to run tail, grep or awk to sort the output)

Storage and easy search.

Fortunately there is an open source tool available that can help us solve our problem. The tool is called as logstash.

Logstash is free, also its open source, and is released under Apache 2 license. Logstash is a very efficient log management solution for Linux. The main added advantage is that logstash can collect log inputs from the following places.

Windows Event Logs

Syslog

TCP/UDP

Files (which means any custom log file that does not come under syslog, where your application sends log data)

STDIN

The above shown places covers almost all log locations that your infrastructure can have.

Logstash is written in Jruby. And it requires Java on your machine to run.

There are 5 important components inside our logstash log management tool. Let's see which are those.

A remote shipper component which will be on all the agents (clients/servers which will send their logs to our central logstash server) Searching and Storage for the data Web interface (Where you can search and view your log details at your specified time range) Indexer (This will receive the log data from remote agents and will index it, so that we can search them later) Broker (This will receive log event data from different agents, means remote servers)

Logstash architecture diagram will look something like the below. The diagram includes all the components we mentioned above.

How to install and configure Logstash?

Now let's go ahead and install our central logging server with logstash. For showing you this example configuration i will be using an Ubuntu-12.04 Server. I am using this exact same configuration in one of my architecture that has a wide variety of different applications running. Some of them are proprietary, while some of them are open source.

I am running this configuration in Amazon AWS, and my logstash server is using Amazon s3 as its storage. For showing you this example tutorial, we will be using the local system for storage.

Logstash is an open source tool that is very easy to scale. You can separate each component of it, so that you can scale it easily. We will be having all of the components in one central server as far as this tutorial is concerned.

The main prerequisite for running logstash central server is to have Java installed. Because logstash runs inside a Java Virtual Machine. So let's see how to install java on our Ubuntu system. Please remember the fact that although am showing you this example configuration in an Ubuntu machine, you can use the same method to configure logstash inside a centos/redhat system as well (Of course you need to replace the Ubuntu specific commands like apt-get with yum while configuring it in redhat/centos environment.)

So let's first install Java JDK on our central logstash server.

Once you have installed JDK, you can confirm the same, with the below command.

Now let's create a directory where we will place our logstash java file. Its always better to create new directory for your application to avoid confusion, and also it will make it look neat.

We will have three directories. One will hold logstash application, and the other will hold the configuration files, and the third will hold the logs realted to logstash itself.

Please note the fact that it is not all necessary to create the logstash folder exactly in the same location. You can create logstash folder anywhere. Or if you wish you can do it without any folder at all. The only purpose is to organize different files in a structured manner.

Now let's get inside our logstash application directory and download the logstash java jar file from its official website using wget command. This can be done as shown below.

This jar file we just downloaded contains even the agent and also the server.

Now please keep one thing in mind that there is no differnce in package(jar file) as far as logstash server and agents are considered. The difference is only the way you configure it. The logstash server will be running this same jar file with server specific configuration, and the logstash agents will be running this same jar file with shipper specific configurations.

We need to now install broker. If you remember the architecture diagram, the log data sent by different servers are received by a broker on the logstash server.

So the idea of having a broker is to hold log data sent by agents before logstash indexes it. Having a broker will enhance performance of the logstash server. You can use any of the broker's mentioned below.

Redis server

AMQP (Advanced Message Queuing Protocol)

ZeroMQ

We will be using the first one as our broker, as it is easy to configure. So our broker will be a redis server.

Let's go ahead and install redis server by the below command.

If you are using centos/redhat distribution, then first install EPEL Yum repository, to install the latest version of redis.

Redis acts like a buffer for log data, till logstash indexes it and stores it. As it is in RAM its too fast

Now we need to modify the redis configuration file so that it will listen on all interfaces, and not only localhost 127.0.0.1. This can be done by commenting out the below line in the redis configuration file /etc/redis/redis.conf.

Now restart redis server by the below command.

Now we have our redis server listening on all network interfaces. And is ready to receive log data from remote servers. To test the redis configuration, there is a tool that comes along with redis package. Its called as redis-cli. Redis by default listens on port no 6379. and redis-cli commands will try to connect to that port, by default.

Testing redis server can be done by connecting to our redis server instance using redis-cli command and then issuing the command "PING". If all is well, you should get an output of "PONG"

You can alternatively issue the same command of PING, by simply telneting to the port 6379 from any remote server.

Now our redis broker is ready. But one main thing is still pending to be configured on our logstash server. Its the searching and indexing tool. The main objective of a central log server is to collect all logs at one place, plus it should provide some meaningful data for analysis. Like you should be able to search all log data for your particular application at a specified time period.

Hence there must be a searching and well indexing capability on our logstash server. To achieve this, we will install another opensource tool called as elasticsearch.

Elasticsearch uses a mechanism of making an index, and then search that index to make it faster. Its a kind of search engine for text data. Easticsearch works in a clustered model by default. Even if you have one elasticsearch node, it will still be part of a cluster you define in its configuration file (like in our case we will be having only one elasticsearch node, which will be part of a cluster ). This is done to easily achieve scalability. In future if we need more elasticsearch nodes, we can simply add another Linux host and inside its elasticsearch configuration we will specify the same cluster name, with node addresses.

Tools like elasticsearch and redis are big topic in themselves, that i cannot discuss them in complete detail in this tutorial (Because our primary objective of this tutorial is to get a central logging server with logstash ready and running). Am sorry for that. However i will surely include tutorials related to redis and elasticsearch in our upcoming posts.

Now let's go ahead and install elasticsearch on our logstash server. The main prerequisite for installing elasticsearch is Java, which we have already installed in the previous section.

Elasticsearch is available for download from the below location.

Download Elasticsearch

We will be downloading Debian package of elasticsearch, as we are setting up our logstash central server on an Ubuntu machine. If you visit the elasticsearch download page, you can get RPM, tar.gz packages as well. Let's download the debian package on our logstash server with wget command as shown below.

The next step is to simply go ahead and install elasticsearch with dpkg command as shown below.



Once you have installed elasticsearch, the service should automatically get started by default. If you dont have the elasticsearch service started, then you can always do that with the below command.

If you remember, i have previously told that elasticsearch always works in the form of a cluster. Even though if you have only a one node cluster. Installing elasticsearch and starting it, will by default make it work inside some cluster. You can find the current cluster name by accessing the elasticsearch configuration file /etc/elasticsearch/elasticsearch.yml

You will find the below two lines inside the elasticsearch yml configuration file.

Both the above lines will be commented out by default. You need to uncomment it and rename the cluster with your desired name. For showing this example, we will name our elasticsearch cluster as "logstash". Also rename the node with your desired name. Any other future node with the same cluster name will become part of this cluster.

Once done with the renaming, restart elasticsearch service by the previously shown command for the changes to take effect.

Elasticsearch by default runs on port 9200. You will be able to double check this, by browsing the webpage <your IP address>:9200. You should gets some status, and other version details on that URL.

Now we have three main components on our logstash central logging server ready. One is our Broker which is redis server, Second is logstash java jar file, and the third is the indexing and searching tool called elasticsearch. But we have not yet told our logstash server (which is basically a simple jar file we downloaded previously and kept at /opt/logstash/) about our broker, and elasticsearch.

We need to first create a configuration file for logstash at /etc/logstash directory we created, and describe about these two components (elastisearch & redis). Once we have our configuration file describing these components ready, we can start our logstash server using the jar file inside /opt/logstash/. While starting our logstash server we will provide our configuration file which will be inside /etc/logstash as an argument. We also need to provide a logging location where logstash server events will be logged (this is regular /var/log/logstash folder.)

Please dont confuse with the /var/log/logstash directory with the central logging location our logstash server will be using. Logs from all different servers will be stored inside elasticsearch storage location. We will see that in some time.

We can name our logstash configuration file anything we like. Whatever name you provide does not matter, but you need to simply pass that file as an argument while starting our logstash server. Our central logstash configuration file will look something like the below. We will call our logstash server configuration file server.conf. Create a file named server.conf inside /etc/logstash/, and copy paste the below contents inside.

Logstash has this concept of different blocks inside its configuration. It has got its input block, which tells where to get inputs from. And an output block, which tells where to give the output for storage and indexing.

Our input and output blocks are very simple to understand. It says take input from redis instance on 192.168.0.106 (which is our host itself) and whatever logstash finds in redis mark it with a key of "logstash".

Once the data from redis input is processed, give them to elasticserach cluster named "logstash" for storage and search.

You might be thinking how logstash understands these language we described while creating the server.conf file. How will it understand what is elasticsearch, how will it understand what is redis etc. Dont worry about that, the logstash jar file (The single component you need for logstash to run) knows about these different types of brokers and inputs and output types.

Now our main logstash central server configuration file is ready. Let's go ahead and start our logstash service by using the jar file in /opt/logstash/ directory. You can start logstash by the below command.

Although you can start logstash server with the above shown command. A better way to start/stop/ and restart logstash service is to make an init script for it. There is already a nice init script made for logstash at the below github location. Modify the script according to your configuration file name and it should work.



Logstash Init Script to start and stop the service

Please Let me know through comments if you are unable to get it working through the above init script.

Now we have two tasks pending. One is the kibana web interface for logstash. And the second is configuring our first agent (client server) to send logs to our central logstash server.

How to configure kibana web interface for logstash?

logstash comes pre-built with a kibana console. This default kibana console can be started by the below command. And remember the fact which i previously told. All components required for logstash to run comes pre-built inside the single jar file we downloaded. Hence the web component with kibana can also be started by using the logstash.jar file as shown below.

But a better method to configure a kibana console is to have nginx web server running along with a document root with kibana console. Hence to run kibana with nginx, we need to first install nginx. As we are in doing all this in an ubuntu 12.04 server, installing nginx is a single command away.

If you are using a Red hat or centos based distribution, you can install nginx by following the below post.

Read: How to install nginx web server in Red hat and Centos

Now as we have our nginx web server installed, we need to first download the kibana package that can be served using nginx. This can be done by using wget command with the url as shown below. Please download the below file in the nginx doc root (by default ubuntu nginx doc root is located at /usr/share/nginx/ )

Unzip the downloaded file and then change the default document root for nginx (simply modify the file /etc/nginx/sites-available/default, and replace the root directive pointing to our kibana master folder we just uncompressed. ), so that it points to our newly unzipped kibana master directory.

By default the kibana console package we downloaded for nginx will try connecting to elasticsearch on localhost. Hence there is no modification required.

Restart nginx and point your browser to your server ip and it will take you to the default kibana logstash console.

How to configure logstash agents to send logs to our logstash central server?

Now let's look at the second thing remaining. We need to configure our first agent to send logs to our newly installed logstash server. Its pretty simple. I have outlined the steps below to configure the first logstash agent.

Step 1

Create the exact directory structure, we did for logstash central server, as shown below. Because for agents also we need configuration file in /etc/logstash, log files inside /var/log/logstash, and logstash main jar file inside /opt/logstash/.

Step 2

Download logstash jar file (The same jar file we used for logstash server ) and place it inside the directory /opt/logstash (As i told in the beginning of this article, the directory structure is just to organize it in a better manner. You can place configuration file, jar file at your desired location.)

Step 3

Create the configuration file for our agent. This part is slightly different. Here the inputs will be the files (the log files that you want to send to our central log server), and the output will be our redis broker installed on the central logstash server.

The input block will contain a list of files(log files of any application or even syslog ) that needs to be sent to our central logstash server. Let's see how the agent configuration file will look like.

Create a file in /etc/logstash/ called shipper.conf or say agent.conf or anything you like. The content should look like the below.

If you see, am sending my /var/log/auth.log and /var/log/syslog to our central logstash server.

Also if you see the output block, we are sending these data to redis server on 192.168.0.106 (which is our central logstash server we installed. Replace this with your central logstash server ip address ). Remember the fact that the input block on central logstash server was redis. Hence whatever the agents send to redis, logstash central server will collect it from redis and give the filtered data back to elasticsearch(because the output block on central server was elasticsearch)

As we have our logstash agent ready with its configuration, its better to start and stop this agent with an init script similar to our logstash central server. the init script for logstash agent can be found at the same githup location below. Refer to the shipper section in the below URL for the logstash agent init script. Please modify the configuration file lication appropriately to suite your config files.

Logstash Agent Init Script

And that's it. Our first agent is now sending log data to our central log server.

You should now be able to see these messages appear on our kibana console we configured on central logstash server. Please navigate to the default logstash dashboard.

Although we now have a central logging server for our architecture that receives logs from different agents, the data we are collecting is not good and meaningful. Because we are simply forwarding the logs from agents to a central server, even though kibana shows us the data in the web interface, its not meaningful and good enough for analysis.

If you remember our main objective behind having a central logging server like logstash was not just for storing all logs at one place.The objective was to get rid of doing tail, grep and awk against these log messages.

So we need to filter the data that we are sending to our central server, so that kibana can show us some meaningful data. Let's take the example of a DNS query log. The query log looks something like the below.

We need to filter the above shown log before sending it to our logstash server. When i say filtering it means we need to tell what each field in that log message is. What is the first field, what is the second field etc. So that whenever i want to troubleshoot and find out what was the dns queries like between 10AM and 11AM today morning.

The output i should get for analysis in kibana console must be something like the time stamp, the source ip, the dns query type, the queried domain name, etc etc.

I will be writing a dedicated post for each of the topics mentioned below. Till then stay tuned.

How to filter out logs send to logstash.

How to send windows event logs to logstash.

How to manage storage with elasticsearch in logstash.

Hope this article was helpful in getting your logstash central log server and agents ready and running.