I usually use kafka connect to send/get data from/to kafka.

But I recently found 2 new input plugin and output plugin for Logstash, to connect logstash and kafka. And as logstash as a lot of filter plugin it can be useful.

So it means, that for some things, that you need more modularity or more Filtering, you can use logstash instead of kafka-connect.

For example, if you have an app that write a syslog file, that you want to parse to send it on a json format.

input { stdin { } }



filter {

grok {

match => { "message" => "%{COMBINEDAPACHELOG}" }

}

date {

match => [ "timestamp" , "dd/MMM/yyyy:HH:mm:ss Z" ]

}

}



output {

kafka {

bootstrap_servers => "kafka"

codec => json{}

topic_id => "my-topic"

}

}

On this example : I use stdin (it’s more easy to test), but you can use a file or whatever the plugins of logstash allow you.

If in input I have :

127.0.0.1 - - [11/Dec/2013:00:01:45 -0800] "GET /xampp/status.php HTTP/1.1" 200 3891 "http://cadenza/xampp/navi.php" "Mozilla/5.0 (Macintosh; Intel Mac OS X 10.9; rv:25.0) Gecko/20100101 Firefox/25.0"

The output in kafka will be the object in json:

{

"message" : "127.0.0.1 - - [11/Dec/2013:00:01:45 -0800] \"GET /xampp/status.php HTTP/1.1\" 200 3891 \"http://cadenza/xampp/navi.php\" \"Mozilla/5.0 (Macintosh; Intel Mac OS X 10.9; rv:25.0) Gecko/20100101 Firefox/25.0\"",

"@timestamp" : "2013-12-11T08:01:45.000Z",

"@version" : "1",

"host" : "cadenza",

"clientip" : "127.0.0.1",

"ident" : "-",

"auth" : "-",

"timestamp" : "11/Dec/2013:00:01:45 -0800",

"verb" : "GET",

"request" : "/xampp/status.php",

"httpversion" : "1.1",

"response" : "200",

"bytes" : "3891",

"referrer" : "\"http://cadenza/xampp/navi.php\"",

"agent" : "\"Mozilla/5.0 (Macintosh; Intel Mac OS X 10.9; rv:25.0) Gecko/20100101 Firefox/25.0\""

}

Links :