Making patterns

In order to make patterns, I usually grab a few hundred lines of logs from a certain device and look what they have in common. You will see that the left side of most of the logs is similar.

This is a typical log mesage from a Cisco Meraki router

Oct 11 12:06:08 172.29.250.65 1476180389.257544558 MX64_RT_01 ids-alerts signature=1:26355:11 priority=1 timestamp=1476180388.532645 dhost=AA:BB:CC:00:11:22 direction=ingress protocol=tcp/ip src=192.2.0.2:80 dst=10.20.30.2:55130 message: BROWSER-PLUGINS Microsoft Windows RDP ActiveX component mstscax use after free attempt

Meraki logs seem to have this part in common:

Oct 11 12:06:08 172.29.250.65 1476180389.257544558 MX64_RT_01 <type>

All of these logs contain the transmit time, the Meraki’s IP address, a timestamp of the event and a hostname in the left part. What’s more, they indicate what the type of this log is: ids-alerts. In order to recognise these messages as Cisco Meraki logs, we can use the hostname which contains MX64 or MX84. Note that it depends on the people whether naming is consistent! There's other ways to organize this.

Using this knowledge, you can make a pattern to parse all messages’ left half first, to store the type of log received in a new field <log_type>, and to save the rest of the message in a field <contents> to do further parsing based on its type. Finally we can store the type of this device in a new field.

The Grok Debugger is a great tool to help you design patterns. You can put a log message in the input, define a pattern and see results updating immediately.

On the same website you can find patterns (go to grok-patterns) described by regex patterns and with common names that you can match to text using the ${pattern name:field name} syntax in which you can optionally specify a name of a new field. ${GREEDYDATA} matches anything so we’ll use that first and name it contents:

The trick is to try to match the contents on the left side of the ${GREEDYDATA} which matches anything. Then the new fields will show up in the output window.

On the left we have a time and an IP address. It turns out you can use the known patterns ${SYSLOGTIMESTAMP} and ${IPORHOST} to match on them.

Note that the contents field now starts with a space. Include a space in your next pattern as well. Using the grok debugger is enormously helpful, but you need to do it a few times to get the hang of it.

Normally, it is better not to try and match the entire message. You parse the parts that messages have in common, and store the rest in contents. Then you can make new matches on that field based on what you know, like the device type and the log type like ids-alerts.

Apart from the given patterns, you can use regex patterns to match strings that do not match on known Grok patterns. In this example ids-alerts (or another log type) is matched by (?<log_type>[a-zA-Z0–9\-]+)

Once you have a pattern matching everything up to and including ids-alerts, a minimal filter configuration that uses this knowledge will look like the following:

The field contents will now contain a specific message that you can parse based on the new field [log_type]. But what types are there?

If you have look at Cisco Meraki logs, you'll know that there's only a couple of options in this field:

ids-alerts (detected anomalies or misbehaving systems)

flows (network traffic: allowed or denied)

events (generic)

The latter has a lot of variation:

DHCP offers / releases

routing table updates

VPN connections

failover events

As you might guess, you will make additional patterns for each of these types and match them.

When you have patterns for these types of logs, you can do more specific matching with a few if-statements on the fields [log_type] and [contents]:

Now you understand how you can match more specific logs using some generic fields in a message, it is time for a more complete example.

In the following example configuration you can see how specific patterns are matched on [contents] and how the mutate filter plugin is used to enrich event data with things like priorities and to fix inconsistencies that we know about.

Usually, not many people are familiar with this subject. It takes some time but it is very feasible to parse your own logs when you take some time to look at the things logs have in common. I some readers are a little more confident now to make a practical Logstash configuration.