Grokking data is the usual way to structure data with pattern matching.

Last week, I wrote about some hints for the configuration. Unfortunately, the hard part comes in writing the matching pattern itself, and those hints don’t help. While it might be possible to write a perfect Grok pattern on the first draft, the above log is complicated enough that it’s far from a certainty, and chances are high to stumble upon such message when starting Logstash with an unfit Grok filter:

"tags" => [ [0] "_grokparsefailure" ]

However, there’s "an app for that" (sounds familiar?). It offers three fields:

The first field accepts one (or more) log line(s) The second the grok pattern The 3rd is the result of filtering the 1st by the 2nd

The process is now to match fields one by one, from left to right. The first data field .e.g. 2016-11-25 19:05:53.221 is obviously a timestamp. Among common grok patterns, it looks as if the TIMESTAMP_ISO8601 pattern would be the best fit.

Enter %{TIMESTAMP_ISO8601:timestamp} into the Pattern field. The result is:

{ "timestamp" : [ [ "2016-11-25 17:05:53.221" ] ] }

The next field to handle looks like the log level. Among the patterns, there’s one LOGLEVEL . The Pattern now becomes %{TIMESTAMP_ISO8601:timestamp} *%{LOGLEVEL:level} and the result:

{ "timestamp" : [ [ "2016-11-25 17:05:53.221" ] ], "level" : [ [ "INFO" ] ] }

Rinse and repeat until all fields have been structured. Given the initial log line, the final pattern should look something along those lines:

%{TIMESTAMP_ISO8601:timestamp} *%{LOGLEVEL:level} \[%{DATA:application},%{DATA:traceId},%{DATA:spanId},%{DATA:zipkin}]

%{DATA:pid} --- *\[%{DATA:thread}] %{JAVACLASS:class} *: %{GREEDYDATA:log}

Broken on 2 lines for better readability

And the associated result: