Log aggregation, visualization, analysis, and monitoring of Dockerized microservices using the ELK Stack (Elasticsearch, Logstash, and Kibana) and Logspout

Introduction

In the last series of posts, we learned how to use Jenkins CI, Maven, Docker, Docker Compose, and Docker Machine to take a set of Java-based microservices from source control on GitHub, to a fully tested set of integrated Docker containers running within an Oracle VirtualBox VM. We performed integration tests, using a scripted set of synthetic transactions, to make sure the microservices were functioning as expected, within their containers.

In this post, we will round out our Virtual-Vehicles microservices REST API project by adding log aggregation, visualization, analysis, and monitoring, using the ELK Stack (Elasticsearch, Logstash, and Kibana) and Logspout.

All code for this post is available on GitHub, release version v3.1.0 on the ‘master’ branch (after running ‘git clone …’, run a ‘git checkout tags/v3.1.0’ command).

Logging

If you’re using Docker, then you’re familiar with the command, ‘ docker logs container-name command ‘. This command streams the log output of running services within a container, commonly used to debugging and troubleshooting. It sure beats ‘ docker exec -it container-name cat /var/logs/foo/foo.log ‘ and so on, for each log we need to inspect within a container.

With Docker Compose, we gain the command, ‘ docker-compose logs ‘. This command stream the log output of running services, of all containers defined in our ‘ docker-compose.yml ‘ file. Although moderately more useful for debugging, I’ve also found it fairly buggy when used with Docker Machine and Docker Swarm.

As helpful as these type of Docker commands are, when you start scaling from one container, to ten containers, to hundreds of containers, individually inspecting container logs from the command line is time-consuming and of little value. Correlating log events between containers is impossible. That’s where solutions such as the ELK Stack and Logspout really shine for containerized environments.

ELK Stack

Although not specifically designed for the purpose, the ELK Stack (Elasticsearch, Logstash, and Kibana) is an ideal tool-chain for log aggregation, visualization, analysis, and monitoring. Individually setting up Elasticsearch, Logstash, and Kibana, and configuring them to communicate with each other is not a small task. Luckily, there are several ready-made Docker images on Docker Hub, whose authors have already done much of the hard work for us. After trying several ELK containers on Docker Hub, I chose on the willdurand/elk image. This image is easy to get started with, and is easily used to build containers using Docker Compose.

Logspout

Using the ELK Stack, we have a way to collect (Logstash), store and search (Elasticsearch), and visualize and analyze (Kibana) our container’s log events. Although Logstash is capable of collecting our log events, to integrate more easily with Docker, we will add another component, Glider Lab’s Logspout, to our tool-chain. Logspout advertises itself as “a log router for Docker containers that runs inside Docker. It attaches to all containers on a host, then routes their logs wherever you want. It also has an extensible module system.”

Since Logspout is extensible through third-party modules, we will use one last component, Loop Lab’s Logspout/Logstash Adapter. Written in the go programming language, the adapter is described as “a minimalistic adapter for Glider Lab’s Logspout to write to Logstash UDP”. This adapter will allow us to collect Docker’s log events with Logspout and send them to Logstash using User Datagram Protocol (UDP).

In order to use the Logspout/Logstash adapter, we need to build a Logspout container from the gliderlabs/logspout Docker image, which contains a customized version of Logspout’s modules.go configuration file. This is explained in the Custom Logspout Builds section of Logspout’s README.md. Below is the modified configuration module with the addition of the adapter (see last import statement).

package main import ( _ "github.com/gliderlabs/logspout/adapters/raw" _ "github.com/gliderlabs/logspout/adapters/syslog" _ "github.com/gliderlabs/logspout/httpstream" _ "github.com/gliderlabs/logspout/routesapi" _ "github.com/gliderlabs/logspout/httpstream" _ "github.com/gliderlabs/logspout/transports/udp" _ "github.com/looplab/logspout-logstash" )

One note with Logspout, according to their website, for now it Logspout only captures stdout and stderr, but a module to collect container syslog is planned. Although syslog is common centralized log collection method, the Docker logs we will collect are sent to stdout and stderr, the lack of syslog support is not a limitation for us, in this demonstration.

We will configure Logstash to accept log events from Logspout, using UDP on port 5000. Below is an abridged version of the logstash-logspout-log4j2.conf configuration file. The except from the configuration file, below, instructs Logstash to listen for Logspout’s messages over UDP on port 5000, and passes them to Elasticsearch.

input { udp { port => 5000 codec => json type => "dockerlogs" } # filtering section not shown... output { elasticsearch { protocol => "http" } stdout { codec => rubydebug } }

We could spend several posts on the configuration of Logstash. There are an infinite number of input, filter, and output combinations, to collect, transform, and push log events to various programs, including Logstash. The filtering section alone takes some time to learn exactly how to filter and transform log events, based upon the requirements for visualization and analysis.

Apache Log4j Logs

What about our Virtual-Vehicle microservice’s Log4j 2 logs? In the previous posts, you’ll recall we were sending our log events to physical log files within each container, using Log4j’s Rolling File appender.

<Appenders> <RollingFile name="RollingFile" fileName="${log-path}/virtual-authentication.log" filePattern="${log-path}/virtual-authentication-%d{yyyy-MM-dd}-%i.log" > <PatternLayout> <pattern>%d{dd/MMM/yyyy HH:mm:ss,SSS}- %c{1}: %m%n</pattern> </PatternLayout> <Policies> <SizeBasedTriggeringPolicy size="1024 KB" /> </Policies> <DefaultRolloverStrategy max="4"/> </RollingFile> </Appenders>

Given the variety of appenders available with Log4j 2, we have a few options to leverage the ELK Stack with these logs events. The least disruptive change would be to send the Log4j log events to Logspout by redirecting Log4j output from the physical log file to stdout. We could do this by running a Linux link command in each microservice’s Dockerfile, as in the following example with Authentication microservice.

RUN touch /var/log/virtual-authentication.log && \ ln -sf /dev/stdout /var/log/virtual-authentication.log

This method would not require us to change the log4j2.xml configuration files, and rebuild the services. However, the alternative we will use in this post is switching to Log4j’s Syslog appender. According to Log4j documentation, the Syslog a ppender is a Socket appender that writes its output to a remote destination specified by a host and port in a format that conforms with either the BSD Syslog format or the RFC 5424 format. The data can be sent over either TCP or UDP.

To use the Syslog appender option, we do need to change each log4j2.xml configuration file, and then rebuild each of the microservices. Instead of using UDP over port 5000, which is the port Logspout is currently using to communicate with Logstash, we will use UDP over port 5001. Below is a sample of the log4j2.xml configuration files for the Authentication microservice.

<Appenders> <Syslog name="RFC5424" format="RFC5424" host="elk" port="5001" protocol="UDP" appName="virtual-authentication" includeMDC="true" facility="SYSLOG" enterpriseNumber="18060" newLine="true" messageId="log4j2" mdcId="mdc" id="App" connectTimeoutMillis="1000" reconnectionDelayMillis="5000"> <LoggerFields> <KeyValuePair key="thread" value="%t"/> <KeyValuePair key="priority" value="%p"/> <KeyValuePair key="category" value="%c"/> <KeyValuePair key="exception" value="%ex"/> <KeyValuePair key="message" value="%m"/> </LoggerFields> </Syslog> </Appenders>

To communicate with Logstash over port 5001 with the Syslog appender, we also need to modify the logstash-logspout-log4j2.conf configuration file, again. Below is the unabridged version of the configuration file, with both the Logspout (UDP port 5000) and Log4j (UDP port 5001) configurations.

input { udp { port => 5000 codec => json type => "dockerlogs" } udp { type => "log4j2" port => 5001 } } filter { if [type] == "log4j2" { mutate { gsub => ['message', "

", " "] gsub => ['message', "\t", " "] } } if [type] == "dockerlogs" { if ([message] =~ "^\tat ") { drop {} } grok { break_on_match => false match => [ "message", " responded with %{NUMBER:status_code:int}" ] tag_on_failure => [] } grok { break_on_match => false match => [ "message", " in %{NUMBER:response_time:int}ms" ] tag_on_failure => [] } } } output { elasticsearch { protocol => "http" } stdout { codec => rubydebug } }

You will note some basic filtering in the configuration. I will touch upon this in the next section. Below is a diagram showing the complete flow of log events from both Log4j and from the Docker containers to Logspout and the ELK Stack.

Troubleshooting and Debugging

Trying to troubleshoot why log events may not be showing up in Kibana can be frustrating, without methods to debug the flow of log events along with way. Were the stdout Docker log events successfully received by Logspout? Did Logspout successfully forward the log events to Logstash? Did Log4j successfully push the microservice’s log events to Logstash? Probably the most frustrating of all issues, did you properly configure the Logstash configuration file(s) to receive, filter, transform, and push the log events to Elasticsearch. I spent countless hours debugging filtering, alone. Luckily, there are several ways to ensure log events are flowing. The below diagram shows some of the debug points along the way.

First, we can check that the log events are making to Logspout from Docker by cURLing or browsing port 8000. Executing ‘ curl -X GET --url http://api.virtual-vehicles.com:8000/logs ‘ will tail incoming log events received to Logspout. You should see log events flowing into Logspout as you call the microservices through NGINX, by running the project’s integration tests, as shown in the example, below.

Second, we can cURL or browse port 9200. This port will display information about Elasticsearch. There are several useful endpoints exposed by Elasticsearch’s REST API interface. Executing ‘ curl -X GET --url http://api.virtual-vehicles.com:9200/_status?pretty ‘ will display statistics about Elasticsearch, including the number of log events, referred to as ‘documents’ to Elasticsearch’s structured JSON document-based NoSQL datastore. Note the line, ‘ "num_docs": 469 ‘, indicating 469 log events were captured by Elasticsearch as documents.

{ "_shards": { "total": 32, "successful": 16, "failed": 0 }, "indices": { "logstash-2015.08.01": { "index": { "primary_size_in_bytes": 525997, "size_in_bytes": 525997 }, "translog": { "operations": 492 }, "docs": { "num_docs": 469, "max_doc": 469, "deleted_docs": 0 } } } }

If you find log events are not flowing into Logstash, a quick way to start debugging issues is to check Logstash’s log:

docker exec -it jenkins_elk_1 cat /var/log/logstash/stdout.log

If you find log events are flowing into Logstash, but not being captured by Elasticsearch, it’s probably your Logstash configuration file. Either the input, filter, and/or output sections are wrong. A quick way to debug these types of issues is to check Elasticsearch’s log. I’ve found this log often contains useful and specific error messages, which can help fix Logstash configuration issues.

docker exec -it jenkins_elk_1 cat /var/log/elasticsearch/logstash.log

Without log event documents in Elasticsearch, there is no sense moving on to Kibana. Kibana will have no data available to display.

Kibana

If you recall from our last post, the project already has Graphite and StatsD configured and running, as shown below. On its own, Graphite provides important monitoring and performance information about our microservices. In fact, we could choose to also send all our Docker log events, through Logstash, to Graphite. This would require some additional filtering and output configuration.

However, our main interest in this post is the ELK Stack. The way we visualize and analyze the log events we have captured is through Kibana. Kibana resembles other popular log aggregators and log search and analysis products, like Splunk, Graylog, and Sumo Logic. I suggest you familiarize yourself with Kibana before diving into the this part of the demonstration. Kibana can be confusing at first, if you are not familiar with it’s indexing, discovery, and search features.

We can access Kibana from our browser, at port 8200, ‘ http://api.virtual-vehicles.com:8200 ‘. The first interactions with Kibana will be through the Discover view, as seen in the screen grab shown below. Kibana displays the typical vertical bar chart event timeline, based on log event timestamps. The details of each log event are displayed below the timeline. You can filter and search within this view. Searches can be saved and used later.

Heck, just the ability to view and search all our log events in one place is a huge improvement over the command line. If you look a little closer at the actual log events, as shown below, you will notice two types, ‘ dockerlogs ‘ and ‘ log4j2 ‘. Looking at the Logstash configuration file again, shown previously, you see we applied the ‘ type ‘ tag to the log events as they were being processed by Logstash.

In the Logstash configuration file, shown previously, you will also note the use of a few basic filters. I created a ‘ status_code ‘ and ‘ response_time ‘ filter, specifically for the Docker log events. Each Docker log event is passed through the filters. The two fields, ‘ status_code ‘ and ‘ response_time ‘, are extracted from the main log event text and added as separate, indexable, and searchable fields. Below is an example of one such Docker log event, an HTTP DELETE call to the Valet microservice, shown as JSON. Note the two fields, showing a response time of 13ms and a http status code of 204.

{ "_index": "logstash-2015.08.01", "_type": "dockerlogs", "_id": "AU7rcyxTA4OY8JukKyIv", "_score": null, "_source": { "message": "DELETE http://api.virtual-vehicles.com/valets/55bd30c2e4b0818a113883a6 responded with 204 No Content in 13ms", "docker.name": "/jenkins_valet_1", "docker.id": "7ef368f9fdca2d338786ecd8fe612011aebbfc9ad9b677c21578332f7c46cf2b", "docker.image": "jenkins_valet", "docker.hostname": "7ef368f9fdca", "@version": "1", "@timestamp": "2015-08-01T22:47:49.649Z", "type": "dockerlogs", "host": "172.17.0.7", "status_code": 204, "response_time": 13 }, "fields": { "@timestamp": [ 1438469269649 ] }, "sort": [ 1438469269649 ] }

For comparison, here is a sample Log4j 2 log event, generated by a JsonParseException . Note the different field structure. With more time spent modifying the Log4j event format, and configuring Logstash’s filtering and transforms, we could certainly improve the usability of Log4j log events.

{ "_index": "logstash-2015.08.02", "_type": "log4j2", "_id": "AU7wJt8zA4OY8JukKyrt", "_score": null, "_source": { "message": "<43>1 2015-08-02T20:42:35.067Z bc45ce804859 virtual-authentication - log4j2 [mdc@18060 category=\"com.example.authentication.objectid.JwtController\" exception=\"\" message=\"validateJwt() failed: JsonParseException: Unexpected end-of-input: was expecting closing quote for a string value at [Source: java.io.StringReader@12a24457; line: 1, column: 27\\]\" priority=\"ERROR\" thread=\"nioEventLoopGroup-3-9\"] validateJwt() failed: JsonParseException: Unexpected end-of-input: was expecting closing quote for a string value at [Source: java.io.StringReader@12a24457; line: 1, column: 27] ", "@version": "1", "@timestamp": "2015-08-02T20:42:35.188Z", "type": "log4j2", "host": "172.17.0.9" }, "fields": { "@timestamp": [ 1438548155188 ] }, "sort": [ 1438548155188 ] }

Kibana Dashboard

To demonstrate the visualization capabilities of Kibana, we will create a Dashboard. Our Dashboard will be composed of a series of Kibana Visualizations. Visualizations are charts, graphs, tables, and metrics, based on the log events we see in the Discovery view. Below, I have created a rather basic Dashboard, containing some simple data visualization, based on our Docker and Log4j log events, collected over a 1-hour period. This one small screen-grab does not begin to do justice to the real power of Kibana.

In the dashboard above, you see a few basic metrics, such as request response times, response http status code, a chart of which containers are logging events, a graph that shows log events captured per minute, and so forth. Along with Searches, Visualizations and Dashboards can also be saved in Kibana. Note this demonstration’s Docker Compose YAML file does not configure volume mapping between the containers and host. If you destroy the containers, you destroy anything you saved in Kibana.

A key feature of Kibana’s Dashboards is their interactive capabilities. Rolling over any piece of a Visualization brings up an informative pop-up with additional details. For example, as shown below, rolling over the http status code ‘500’ pie chart slice, pops up the number of status code 500 responses. In this case, 15 log events, or 1.92% of the total 2,994 log events captured, had a ‘status_code’ field of ‘500’, within the 24-hour period the Dashboard analyzed.

Conveniently, Kibana also allows you to switch from a visual mode to a data table mode, for any Visualization on the Dashboard, as shown below, for a 24-hour period.

Conclusion

The ELK Stack is just one of a number of enterprise-class tools available to monitor and analyze the overall health of your applications running within a Dockerized environment. Having well planned logging, monitoring, and analytics strategies is key to this type of project. They should be implemented from the beginning of the project, to increase development and testing velocity, as well as provide quick troubleshooting, key business metrics, and proactive monitoring, once the application is in production.

Notes on Running the GitHub Project

If you download and run this project from GitHub, there is two key steps you should note. First, you need add an entry to your local /etc/hosts file. The IP address will be that of the Docker Machine VM, ‘test’. The hostname is ‘api.virtual-vehicles.com’. which matches the one I used throughout the demo. You should run the following bash command before building your containers from the docker-compose.yml file, but after you have built your VM using Docker Machine. The ‘test’ VM must already exist.

echo "$(docker-machine ip test) api.virtual-vehicles.com" \ | sudo tee --append /etc/hosts

If you want to override this domain name with your own, you will need to modify and re-build the microservices project, first. Then, copy those build artifacts into this project, replacing the ones you pulled from GitHub.

Second, in order to achieve HATEOAS in my REST API responses, I have included some variables in my docker-compose.yml file. Wait, docker-compose.yml doesn’t support variables? Well, it can if you use a template file ( docker-compose-template.yml ) and run a script ( compose_replace.sh ) to provide variable expansion. My gist explains the technique a little better.

You should also run this command before building your containers from the docker-compose.yml file, but after you have built your VM using Docker Machine. Again, the ‘test’ VM must already exist.

sh compose_replace.sh

Lastly, remember, we can run our integration tests to generate log events, using the following command.

sh tests_color.sh api.virtual-vehicles.com