Hacking on Heka, part I

Recently I watched a video which records two authors of the Go programming language hacking on tip.golang.org. Partly inspired by this, I'm going to document fixing a bug in Heka, Mozilla's data collection and processing tool for infrastructure. This is the first part in a series of posts, the number of which probably depends on how useful readers find these, and how much time I have.

There are a couple of reasons I'm writing specifically about Heka:

It's a great project to contribute to if you want to expand your knowledge of networking, concurrency, Go, or Lua.

Heka is relatively mature and in use at Mozilla. But I'd like to see it improve further through usage and community contributions.

You should already have a working knowledge of Heka; if you haven't, take a look at the excellent getting started docs.

The only other prerequisites you'll need (apart from being interested in contributing to Heka) are a working Go installation, and the ability to build Heka using source build.sh (instructions). I'll be using Linux, but most of this will work on OS X too.

Getting started

Today we're going to tackle an issue raised on Github: add support for abstract UNIX domain sockets to Heka's UDP input.

Heka is a plugin-based system. Plugins for inputs, processing and outputs are configured by the user to their needs:

Input plugins acquire data from the outside world and inject it into the Heka pipeline. They can do this by reading files from a file system, actively making network connections to acquire data from remote servers, listening on a network socket for external actors to push data in, launching processes on the local system to gather arbitrary data, or any other mechanism. Source

When configured, Heka's UDP input plugin will listen for UDP datagrams at the specified address, which is stated as an "IP address:port or Unix datagram socket file path":

So what's an abstract socket? Unlike sockets that are named with a file path, an abstract UNIX domain socket doesn't create an entry on the file system:

Traditionally, UNIX domain sockets can be either unnamed, or bound to a filesystem pathname (marked as being of type socket). Linux also supports an abstract namespace which is independent of the filesystem. Source

Among other things, this means less clutter in /tmp , and the program creating it doesn't even need write access to the file system! Nifty, if non-portable.

Replicating the issue

First, let's see if we can replicate the bug. The following config file will attempt to listen on the socket @hekasock and print any received datagrams to the console:

Running Heka with this config gives the following output:

OK. We replicated the problem; that was easy!

During startup, plugins are initialized in their own goroutine. The Init function is called with the appropriate parameters parsed from the config file.

Here's a listing of UDPInput's Init (found in plugins/udp/udp_input.go ):

As we can see, if the Net config option is unixgram (indicating UDP domain socket), Heka will try to chmod the file path. If there is no file, that won't work.

Digging for abstract sockets

This leads to a further question: does Go even support abstract sockets? To answer that, we'll need the documentation for net.ListenUnixgram :

Hmm, no mention there (or under UnixAddr ). Digging through the source, I found some tests and a platform-specific test helper that looks promising:

Tracing through the relevant code, we find a system call wrapper that checks for a leading @ :

So it does look like Go will create abstract sockets on Linux. If the convention is that @ prefixed to the socket name indicates the abstract namespace, then let's modify UDPInput to check for this, and behave accordingly:

Testing a fix

We'll want to add a test for this, but before we go any further, let's see if it actually works how we expect. We'll start Heka up and send a datagram using socat.

Looks good so far. So here's a test, adapted from the existing tests in udp_input_test.go :

But wait... there's a problem. We didn't check shutdown properly, and the test fails.

Because Heka assumes the socket has an entry on the file system, it will (reasonably) try and remove the file on shutdown. This will fail if there is no file in the first place. Let's fix the shutdown mechanism (towards the end of the Run function):

Now our tests pass. Great!

Conclusion

The issue seems resolved — let's submit our changes for review. Have a look at the resulting pull request.

One thing worth noting is that I didn't comment on the issue before starting work. This is mostly OK for small bugs like this, but for anything more substantial you should probably start a conversation first. Among other benefits, you usually get useful feedback on your approach before any code is written.

Hopefully this has served as a helpful introduction to working on Heka. Next time, we'll look at something a little more involved. In the meantime, I'd appreciate your feedback on twitter or by email.