TLDR: I used *nix utilities to build a Kafka/Elasticsearch connector in 1 line of code. This shows how the *nix philosophy conquers all.

Python generating logs for Kafka to store in Elastic via Elasticsearch Connect for a happy user!

I recently ran into trouble with Kafka Connect while working on a Kafka logging system that used the Elasticsearch connector to index logs in Elasticsearch. The connector was dying for various reasons that were hard to determine because the logs were empty and the Kafka cloud provider’s support was asleep. A deadline loomed, so I was forced to create a cute, quick hack that shows the power of bash utilities like jq and the *nix philosophy.

The short, tweetable form goes like this:

bin/kafka_console_consumer.sh log_json | while read line; do curl -k -XPOST -H 'Content-Type: application/json' "$ES_URI/log/log" -d "$line"; done;

Tada! The long, jazzed up form is a little longer:

bin/kafka_console_consumer.sh log_json | while read line; do echo $line | jq ''; echo; curl -k -XPOST -H 'Content-Type: application/json' "$ES_URI/log/log" -d "$line" | jq ''; done;

Broken out and formatted, it reads like so:

bin/kafka_console_consumer.sh log_json | \

while read line

do echo $line | jq ''

echo

curl -k -XPOST -H 'Content-Type: application/json' "$ES_URI/log/log" -d "$line" | jq ''

done

The jazzed up form prints the original record as well as the Elasticsearch PUT response and sends both through jq for formatting, so they are readable.

Scripting Kafka

To be fair, the command is short because I have simplified the Kafka console consumer in this LOC. However, doing the (complex) work of setting the arguments for the console consumer would lengthen the line of code, not expand it to multiple lines. The kafka console consumer and producer are powerful tools, with lengthy command line options. Scripting them is essential to productivity when working with Kafka.

The Kafka console consumer utility looks like this:

#!/usr/bin/env bash # Parse topic argument

OPTIND=1

if [ -z "$1" ]

then

echo "Usage: bin/kafka_console_consumer.sh <topic>!"

exit

fi

topic=$1 if [ -z "$APP_ENV" ]

then

echo "Fail: no \$APP_ENV set!"

exit

fi # Pull in configuration to get Kafka broker argument

config=`cat src/app/config.json`

kafka_brokers=`echo $config|jq .environments.$APP_ENV.kafka.host|tr -d '"'` echo kafka_bin/bin/kafka-console-consumer.sh \

--bootstrap-server $kafka_brokers \

--topic $topic \

--consumer.config kafka_bin/config/consumer.properties kafka_bin/bin/kafka-console-consumer.sh \

--bootstrap-server $kafka_brokers \

--topic $topic \

--consumer.config kafka_bin/config/consumer.properties

jq and bash JSON Configuration

This script uses a json configuration file and the jq utility to fetch the configuration and use it to run the console consumer. jq is incredibly powerful, and you can easily find a recipe to accomplish any json parsing task using it. By creating small tools that fit with the *nix philosophy, we can stack and arrange tools in pipelines to process data and perform administration operations: dev ops in practice.

1 Liner in Practice

I run the command and generate some log output to Kafka. We use Python with the logstash_formatter to create json logs. The first input is a text message generated by curl, which generates a parsing exception. However, this doesn’t kill things. The command is resilient, and can run for many days without crashing.

The log output looks like so:

What is going on? The command first prints the incoming record, followed by the result of the POST query to Elasticsearch. This way you can see which records parse and are successfully indexed and which do not. This is a great way to debug issues.

In working with this tool, one quickly discovers that when log formats vary, Elasticsearch dynamic mapping becomes a problem. If one field is an object sometimes, and a string others… Elasticsearch can’t handle that. You must choose one or the other and be consistent. This toolset lets you figure out these issues efficiently, something you don’t get with Elasticsearch Connect. Of course, with it you get more reliability and scalability. This is a fun hack, not a system I would recommend for production for long periods.

bash: the Highest Level Language

The reason bash is so powerful is that it is probably the highest level language with widespread adoption in existence. If you think about it, it is code that manipulates entire programs, making it higher level than code in general purpose languages that consist of programs using libraries of code. Of course this is an imperfect definition of high level, given that the distinction between code and program is arbitrary… but the assertion still holds. Using higher level languages makes you more productive, enabling bash to (imperfectly) replace a Java tool with 9,071 LOC in 1 LOC. That is 0.011 percent of the effort of using Java.

Of course, Kafka Connect is much more robust than my bash hack, but it could be made more resilient using multiple Kafka log consumers and an orchestration script to work across multiple machines. This would take dozens of lines of code, raising the percentage effort from 0.011 percent to around .7 %. bashreduce is a good example of such a system.

Using bash to do this is inefficient, but in the short term that does not matter. Hours of even large instances are cheap. This hack bought me time to get a real solution in place. bash tools are super glue and duct tape, not mortar and block.

Conclusion

Getting this working was quick and a lot of fun! Solving problems with *nix tools turns every problem into a puzzle. I find it totally amazing what you can achieve in one line of code by stacking and automating small utilities that each do one thing well. I hope this example helps you to do the same.

Take it easy!

Shameless Sales Pitch

Need help with problems like these? Data Syndrome is available to help! We specialize in building data products and training teams to do the same. Check out our website and other blog posts.

Russell Jurney

Principal Consultant, Data Syndrome

rjurney@datasyndrome.com