The unsung heroes of log analysis are the log collectors. They are the hard-working daemons that run on servers to pull server metrics, parse loogs, and transport them to systems like Elasticsearch or PostgreSQL. While visualization tools like Kibana or re:dash bask in the glory, log collectors’s routing making it all possible. Here, we will pit the two of the most popular data collectors in the open source world: Fluentd vs Logstash.

Logstash is most known for being part of the ELK Stack while Fluentd has become increasingly used by communities of users of software such as Docker, GCP, and Elasticsearch.

The goal here is a no-frills comparison and matchup of Elastic’s Logstash vs Fluentd, which is owned by Treasure Data. The goal is to collect all of the facts about these excellent software platforms in one place so that readers can make informed decisions for their next projects.

Logz.io supports Logstash and Fluentd. On a related note, we see a growing number of customers using Fluentd to ship logs to us. As a result, it was important for us to make this comparison. The following charts of the differences between Logstash and Fluentd, and then we go into more detail below.



Fluentd vs Logstash: Platform Comparison

One of Logstash’s original advantages was that it is written in JRuby, and hence it ran on Windows.

Fluentd, on the other hand, did not support Windows until recently due to its dependency on a *NIX platform-centric event library. Not anymore. As of this pull request, Fluentd now supports Windows. You can also input this in_windows_eventlog plugin to track Windows event logs.

Logstash: Linux and Windows

Fluentd: Linux and Windows

Event Routing Comparison

One of the key features of log collectors is event routing. Both log collectors support routing, but their approaches are different.

Logstash Event Routing

Logstash routes all data into a single stream and then uses algorithmic if-then statements to send them to the right destination. Here is an example that sends error events in production to PagerDuty:

output { if [loglevel] == "ERROR" and [deployment] == "production" { pagerduty { ... } } }

Vs Fluentd Event Routing

Fluentd relies on tags to route events. Each Fluentd event has a tag that tells Fluentd where it wants to be routed. For example, if you are sending error events in production to PagerDuty, the configuration would look something like this:

<match production.error> type pagerduty … </match>

A more complete example looks like:

<source> @type forward </source> <filter app.**> @type record_transformer <record> hostname "#{Socket.gethostname}" </record> </filter> <match app.**> @type file # ... </match>

#In order to have more than one sort of input, add another and @type with a specific tag,

<source> @type tail tag system.logs # ... <source>

then mark that tag in the area:

<match {app.**,system.logs}>

You can re-route Fluentd events in three ways: 1) by tag using the fluent-plugin-route plugin, 2) by label with the out_relabel plugin, or 3) by record content with the fluent-plugin-rewrite-tag filter.

Fluentd’s approach is more declarative whereas Logstash’s method is procedural. Therefore, programmers trained in procedural programming might see Logstash’s configuration as easier for getting started. On the other hand, Fluentd’s tag-based routing allows complex routing to be expressed clearly. For example, the following configuration applies different logic to all production and development events based on tag prefixes.

<match production.**> # production pre-processing </match> <match development.**> # development pre-processing </match>

Logstash: Uses algorithmic statements to route events and is good for procedural programmers

Fluentd: Uses tags to route events and is better at complex routing

Plugin Ecosystem Comparison

Both Logstash and Fluentd have rich plugin ecosystems covering many input systems (file and TCP/UDP), filters (mutating data and filtering by fields), and output destinations (Elasticsearch, AWS, GCP, and Treasure Data)

Logstash Plugins

One key difference is how plugins are managed. Logstash manages all its plugins under a single GitHub repo. While the user may write and use their own, there seems to be a concerted effort to collect them in one place. As of this writing, there are 199 plugins under logstash-plugins GitHub repo.

A list of examples includes:

Fluentd Plugins

Fluentd, on the other hand, adopts a more decentralized approach. There are 8 types of plugins in Fluentd—Input, Parser, Filter, Output, Formatter, Storage, Service Discovery and Buffer. Although there are 516 plugins, the official repository only hosts 10 of them. In fact, among the top 5 most popular plugins (fluent-plugin-record-transformer, fluent-plugin-forest, fluent-plugin-secure-forward, fluent-plugin-elasticsearch, and fluent-plugin-s3), only one is in the official repository! (The Logzio plugin is among the top 30 downloaded by Fluentd users with over 260,000)

A list of examples includes:

Logstash: Centralized plugin repository

Fluentd: Decentralized plugin repository

Transport Comparison

Logstash lacks a persistent internal message queue: Currently, Logstash has an on-memory queue that holds 20 events (fixed size) and relies on an external queue like Redis for persistence across restarts. This is a known issue for Logstash, and it is actively worked on this issue where they aim to persist the queue on-disk.

Fluentd, on the other hand, has a highly easy-to-configure buffering system. It can be either in-memory or on-disk with more parameters that you ever care to know.

The upside of Logstash’s approach is simplicity: the mental model for its sized queue is very simple. However, you must deploy Redis alongside Logstash for improved reliability in production. Fluentd has built-in reliability, but its configuration parameters take some getting used to.

Logstash: Needs to be deployed with Redis to ensure reliability

Fluentd: Built-in reliability, but its configuration is more complicated

Performance Comparison

This is a nebulous topic. As discussed in this talk at OpenStack Summit 2015, both perform well in most use cases and consistently grok through 10,000+ events per second.

That said, Logstash is known to consume more memory at around 120MB compared to Fluentd’s 40MB. Considering modern machines, this isn’t a big difference between the two aggregators. For leaf machines, it’s a different story: Spread across 1,000 servers, this can mean 80GB of additional memory use, which is significant. (This hypothetical number comes from the 80MB difference between Logstash and FluentD on a single machine multiplied by 1,000 machines.)

Don’t worry, Logstash has a solution. Instead of running the fully featured Logstash on leaf nodes, Elastic recommends that you run Elastic Beats, resource-efficient, purpose-built log shippers. Each Beat focuses on one data source only and does that well. On Fluentd’s end, there is Fluent Bit, an embeddable low-footprint version of Fluentd written in C, as well as Fluentd Forwarder, a stripped down version of Fluentd written in Go.

Logstash: Slightly more memory use. Use Elastic Beats for leaf machines.

Fluentd: Slightly less memory use. Use Fluent Bit and Fluentd Forwarder for leaf machines.

So Much Information! What’s Next?

While there are several differences, the similarities between Logstash and Fluentd are greater than their differences. Users of either Logstash or Fluentd are miles ahead of the curve when it comes to log management.