Log monitoring and management is one of the most important functions in DevOps, and the open-source software Logstash is one of the most common platforms that are used for this purpose.

Often used as part of the ELK Stack, Logstash version 2.1.0 now has shutdown improvements and the ability to install plugins offline. Here are just a few of the reasons why Logstash is so popular:

Logstash is able to do complex parsing with a processing pipeline that consists of three stages: inputs, filters, and outputs

Each stage in the pipeline has a pluggable architecture that uses a configuration file that can specify what plugins should be used at each stage, in which order, and with what settings

Users can reference event fields in a configuration and use conditionals to process events when they meet certain, desired criteria

Since it is open source, you can change it, build it, and run it in your own environment

For more information on using Logstash, see this Logstash tutorial, this comparison of Fluentd vs. Logstash, and this blog post that goes through some of the mistakes that we have made in our own environment (and then shows how to avoid them). However, these issues are minimal — Logstash is something that we recommend and use in our environment.

In fact, many Logstash problems can be solved or even prevented with the use of plugins that are available as self-contained packages called gems and hosted on RubyGems. Here are several that you might want to try in your environment.

Logstash Input Plugins

Input plugins get events into Logstash and share common configuration options such as:

type — filters events down the pipeline

tags — adds any number of arbitrary tags to your event

codec — the name of Logstash codec used to represent the data

1. File

This plugin streams events from a file by tracking changes to the monitored files and pulling the new content as it’s appended, and it keeps track of the current position in each file by recording it. The input also detects and handles file rotation.

You can configure numerous items including plugin path, codec, read start position, and line delimiter. Usually, the more plugins you use, the more resource that Logstash may consume.

2. Lumberjack

This plugin receives events using the Lumberjack Protocol, which is secure while having low latency, low resource usage, and a reliable protocol. It uses a logstash-forwarder client as its data source, so it is very fast and much lighter than logstash. All events are encrypted because the plugin input and forwarder client use a SSL certificate that needs to be defined in the plugin.

Here is the required configuration:

lumberjack { port => ... ssl_certificate => ... ssl_key => ... }

3. Beats

Filebeat is a lightweight, resource-friendly tool that is written in Go and collects logs from files on servers and forwards them to other machines for processing.The tool uses the Beats protocol to communicate with a centralized Logstash instance. You can also use an optional SSL certificate to send events to Logstash securely.

The required configuration:

beats { port => ... }

4. TCP

This plugin reads events over a TCP socket. Each event is assumed to be one line of text. The optional SSL certificate is also available. In the codec, the default value is “line.”

The required configuration:

tcp { port => ... }

5. Filter Plugins

This is an optional stage in the pipeline during which you can use filter plugins to modify and manipulate events. Within the filter (and output) plugins, you can use:

Field references — The syntax to access a field is [fieldname]. To refer a nested field, use [top-level field][nested field]

Sprintf format — This format enables you to access fields using the value of a printed field. The syntax “%{[fieldname]}”

The power of conditional statements syntax is also available:

if EXPRESSION { ... } else if EXPRESSION { ... } else { ... }

6. Grok

This plugin is the “bread and butter” of Logstash filters and is used ubiquitously to derive structure out of unstructured data. It helps you to define a search and extract parts of your log line into structured fields. Roughly 120 integrated patterns are available.

Grok works by combining text patterns into something that matches your logs. The plugin sits on top of regular expressions, so any regular expressions are valid in grok. You can define your own custom patterns in this manner:

grok { match => { "message" => "%{SYNTAX:SEMANTIC}" } }

A mutate filter allows you to perform general mutations on fields. You can rename, remove, replace, and modify fields in your events:

mutate { }

7. GEOIP

This plugin looks up IP addresses, derives geographic location information from the addresses, and adds that location information to logs.

The configuration options:

Source — The field containing the IP address, this is a required setting

Target — By defining a target in the geoip configuration option, You can specify the field into which Logstash should store the geoip data

If you save the data to a target field other than geoip and want to use the geo\_point related functions in Elasticsearch, you need to alter the template provided with the Elasticsearch output and configure the output to use the new template:

geoip { source => "clientip" }

8. Multiline

This plugin will collapse multiline messages from a single source into one logstash event.

The configuration options:

Pattern — This required setting is a regular expression that matches a pattern that indicates that the field is part of an event consisting of multiple lines of log data

What — This can use one of two options (previous or next) to provide the context for which (multiline) event the current message belongs

For example:

multiline { ... pattern => "^\s" what => "previous" ... }

This means that any line starting with whitespace belongs to the previous line.

Important note: This filter will not work with multiple worker threads.

9. KV

This plugin helps to parse messages automatically and break them down into key-value pairs. By default, it will try to parse the message field and look for an ‘=’ delimiter. You can configure any arbitrary strings to split your data into any event field

The configuration options:

kv { ... source => "message" value_split => "=" ... }

This powerful parsing mechanism should not be used without a limit because the production of an unlimited number of fields can hurt your efforts to index your data in Elasticsearch later.

10. Date

The date plugin is used for parsing dates from fields and then using that date as the logstash @timestamp for the event. It is one of the most important filters that you can use — especially if you use Elasticsearch to store and Kibana to visualize your logs — because Elasticsearch will automatically detect and map that field with the listed type of timestamp.

This plugin ensures that your log events will carry the correct timestamp and not a timestamp based on the first time Logstash sees an event.

The configuration options:

Match — You can specify an array of a field name, followed by a date-format pattern. That can help to support fields that have multiple time formats. The date formats allowed are defined by the Java library Joda-Time.

One example:

date { ... match => [ "mydate", "MMM dd YYY HH:mm:ss", "MMM d YYY HH:mm:ss", "ISO8601" ] ... }

Logstash Codecs

Codecs can be used in both inputs and outputs. Input codecs provide a convenient way to decode your data before it enters the input. Output codecs provide a convenient way to encode your data before it leaves the output. Some common codecs:

The default “plain” codec is for plain text with no delimitation between events

The “json” codec is for encoding json events in inputs and decoding json messages in outputs — note that it will revert to plain text if the received payloads are not in a valid json format

The “json_lines” codec allows you either to receive and encode json events delimited by

or to decode jsons messages delimited by

in outputs

The “rubydebug,” which is very useful in debugging, allows you to output Logstash events as data Ruby objects

Logstash Output Plugins

An output plugin sends event data to a particular destination. Outputs are the final stage in the event pipeline.

1. Redis

The Redis plugin is used to output events to Redis using an RPUSH, Redis is a key-value data store that can serve as a buffer layer in your data pipeline. Usually, you will use Redis as a message queue for Logstash shipping instances that handle data ingestion and storage in the message queue.

The configuration options:

redis { ... port => 6379 host => ["1.1.1.1"] db => 0 (the db number) workers => 1 (The number of workers to use for this output) ... }

2. Kafka

Kafka is a distributed publish-subscribe messaging system that is designed to be fast, scalable, and durable. We at Logz.io use Kafka as a message queue for all of our incoming message inputs, including those from Logstash.

Usually, you will use Kafka as a message queue for your Logstash shipping instances that handles data ingestion and storage in the message queue. The Kafka plugin writes events to a Kafka topic and uses the Kafka Producer API to write messages.

The only required configuration is the topic name:

kafka { ... topic_id => "topic name" workers => 1 (The number of workers to use for this output) ... }

3. Stdout

This is a simple output that prints to the stdout of the shell running logstash. This output can be quite convenient when debugging plugin configurations

A Final Note

What Logstash plugins to you like to use when you monitor and manage your log data in your own environments? I invite your additions and thoughts in the comments below.

