UPDATE Feb 5, 2015 – Logstash Is Not Eating My Logs

For the past few days, I’ve been testing different scenarios for sending and receiving messages between computers using logstash. Setting up was straightforward. Using Sysloggen, I’ve been able to send a large number of messages through. Only one concern; I am loosing messages.

I decided to setup a test to understand this issue. I tested sending messages with both RSyslog and Logstash. The difference in messages received between these was insignificant. Both methods of sending messages had the same outcome. The only difference was that RSyslog is lots faster than Logstash as sender, which makes sense.

In these tests that follow, we can see the number of messages received is always off. The results shown are the average from multiple runs for each. All the runs in this post are performed with Sysloggen as follows:

sysloggen -d 192.168.1.43 -f sample.syslog -n 1000 -r 30

Sample syntax above sends 1000 random messages, 30 at a time from sample.syslog file.

In reality, it just picks messages from top to bottom 😉

Sending 1K Messages

Msg/sec From RSyslog From Logstash 20 1003 1005 30 1004 1004 40 998 998 50 993 998 60 1017 996 70 999 996 100 992 1000 200 1001 993 300 1011 1005 400 1002 998

Sending 10K Messages

Msg/sec From RSyslog From Logstash 500 9997 9359 600 10003 10008 700 9998 9156 800 10004 10000 900 9999 10011 1000 10002 9998

Network Adapters

I also did some testing switching network adapters. I changed Ethernet adapters on all VMs from E1000 (Gb/sec) to VMXNET3 (10 Gb/sec). I was also hoping this would allow me to send more messages at a time. This neither improved nor hamper message transfer speed or load. The number of messages lost was similar to the previous test.

Still Stuck

I was getting nowhere. Even thou most messages make their destination, I was always missing some messages at recipient most of the times. Looking at Logstash log in debug mode did not help either.

I decided to manually send an additional message at the end of each run and retest. First, I used Sysloggen to send messages in bulk, then used logger command to send one last message. This way, I would see when the last message made it to its destination (or not). After Sysloggen finished, I waited a few minutes to send the last message.

These are the results. I noted messages per second sent and messages received after Sysloggen had finished. Additionally, I saved the the timestamp for the first and last message. After sending the last message, I did a count again and noted the timestamp for the last message by Sysloggen.

I tested sending the last message both from the instance where Sysloggen ran and from a different instance. The results were the same. Regardless of origin, the last message ‘pushed’ the few remaining messages onto their destination.

10K Messages +1

Msg/sec Received time 1rst-last After +1 time 500 10000 6:01 – 6:21 10001 9:14 600 9995 11:40 – 11:57 10001 13:25 700 9994 14:35 – 14:50 10001 15:51 800 9999 16:59 – 17:11 10001 18:16 900 9993 19:22 – 19:33 10001 20:20 1000 9989 21:11 – 17:21 10001 22:26

Interesting. Is the last few messages sent getting ‘stuck’ at Logstash? The scenario tested does not have Logstash as the message sender so there are not any other moving pieces here. The log was still empty of errors. Odd, no message loss, as long as I send an ‘I’m alive’ message at the end.

So Logstash is not eating my logs but…

From this tests, it seems that last number of messages get stuck in Logstash instead of being sent to their final destination (written to file). In a scenario where messages are being produced continuously, this would not be an issue, however, this may not always be the case.

A keep alive message wouldn’t be too hard to add to any implementation and could serve many other purposes as well. I can work with that. As I get a better understanding of how Logstash works, I hope these issues become clear.

A Question Remains Why are the last messages getting stuck?