Possibly the most common statement in all Go code is the following:

if err != nil { return err }

We check whether an error has occurred. Then we pass that error on to someone else to handle.

I call this “passing the buck”. Something went wrong. We don’t know what to do, so we pass on responsibility for the error to something else.

It’s not uncommon to find functions containing 10 or more occurrences of the above statement.

Unlike some people, I am not unhappy with the verbosity of this. Every place that we see this check, we know that we are thinking about the possible error that’s happening and dealing with it in the best way that we can think of. The statement above expresses this quite reasonably, in my opinion, but there is more to consider.

The error cannot be returned forever. The buck stops somewhere. At some point, assuming we do our best to handle all errors, we’re going to do something like this:

if err != nil { log.Fatalf("it all went horribly wrong: %v", err) }

or this:

if err != nil { log.Printf("continuing after error: %v", err) }

or perhaps this:

if os.IsNotExist(err) { // If the file was not there, ignore // the fact that we cannot remove it. return nil }

The amount of information that we will see in the above two cases is entirely dependent on the chain of return statements that produced it. If each returned exactly the error that it encountered, then we will see the original error that caused the error cascade to start.

Seeing the original error is not always very helpful though. On one memorable occasion, a user of a network API was seeing the simple error “EOF” in response to an API call. It turned out that that error had propagated up through thirteen levels before being reported. The original error was almost entirely uninformative – were we reading from the network when we encountered the EOF? A disk file? Some context would have been very helpful.

A common alternative to just passing the buck is to add some context to the buck that we pass:

if err != nil { return fmt.Errorf("cannot sprang the doodah: %v", err) }

If every place did this, we would have very long (but beautifully annotated) error messages. Part of the reason these error messages would be long is that they are likely to contain redundant information.

For instance:

f, err := os.Open(path) if err != nil { return fmt.Errorf("cannot open %q: %v", path, err) }

would print:

cannot open "/fdsfds/dsfvgfdsv": open /fdsfds/dsfvgfdsv: no such file or directory

We have just annotated our error message with exactly the same information that was already in the error. When writing the above code, most people will be aware of that, and just pass the buck.

That’s where it starts to become a bad habit. We get used to just passing the buck (the error already contains the information needed to identify the problem, right?), and eventually it’s very easy to end up in a situation where we have no idea what the error signifies (“why the $#!% was it trying to read that file when I was doing that?”).

I’d like to suggest some rules that, if followed, should make it easier to diagnose problems when something goes unexpectedly wrong.

Rule 1 for good Go errors: annotate the error whenever it will add useful information

We don’t want hugely redundant error messages, but we do want errors that describe what went wrong in sufficient detail that we can have a rough idea of what the fix might be without the need to reproduce the problem under controlled circumstances.

There is one significant problem that I’ve been ignoring here. By annotating the error with fmt.Errorf, we lose any error type. That means that the caller, on receiving the returned and annotated error, can know only that an error occurred. Without actually inspecting the error string itself (not a great idea as error messages can change), it cannot make a decision based on the kind of error.

Before suggesting a possible solution, I’d like to take a moment to think about error inspection. Consider the following code:

if err := somepkg.SomeFunction(); err != nil && !os.IsNotExist(err) { return err }

We just deliberately ignored any errors that were produced by the os package when a file was not found. But we didn’t just call an package in the os package. How did we know that SomeFunction could return this kind of error?

In an ideal world, the answer would be “because the documentation for SomeFunction states that in these circumstances, an error is returned that satisfies os.IsNotExists”. Unfortunately, we do not live in an ideal world. The answer is commonly “because SomeFunction returned an os.IsNotExist error when I called it”.

There is a problem with the latter approach – it makes for more brittle and less maintainable software, because the type of every possible returned error type is implicitly now part of the contract of the function. If SomeFunction changes so that it happens not to return an IsNotExist error, it may break its clients. In fact, they could be broken by a change to any of the functions SomeFunction happens to call.

This is another reason to dislike the “pass the buck” idiom – not only does it ignore the opportunity to add useful information to the returned error, but it actively makes the software more likely to break when refactored.

This leads me to suggest a couple more rules

Rule 2 for good Go errors: document error return types

If an error return type is documented, we know that it’s part of the contract of the function and it’s OK to rely on. There should be a test for it too.

Rule 3 for good Go errors: hide undocumented types

By hiding types that are not a documented part of a function’s interface, it should be obvious to a caller when they are relying on undocumented behaviour. They will either avoid doing that, or change the called function to document the returned type, or (if it’s in an external package) exert pressure on the upstream maintainer to change it.

errgo: making it easy to observe these rules.

I’ve implemented a package, errgo to try to make it easy to build Go software that observes the above rules without imposing undue overhead.

The standard “pass the buck idiom” becomes:

if err != nil { return errgo.Mask(err) }

The “Mask” name means “mask the type of the error” – the error type is hidden in the returned error, but errgo actually automatically adds context here even though we didn’t add an explicit error string. The returned error contains the source code location of the call to errgo.Mask, although the error string is unchanged.

This means that if we have a cascading chain of “pass the buck” errors, we can actually find out the location of each place where the buck was passed.

if err != nil { log.Fatalf("it all went horribly wrong: %v; details: %s", err, errgo.Details(err)) }

(In fact we can use %#v instead of errgo.Details here if we like). This will print something like this (it’s all on one line to make grep easier):

it all went horribly wrong: cannot sprang the doodah: EOF; details: [{/usr/ubuntu/src/github.com/me/sprang.go:213: cannot sprang the doodah} {/usr/ubuntu/src/github.com/me/doodah.go:213: } {/usr/ubuntu/src/github.com/me/doodah.go:345: } {EOF}]

We can add explicitly more context to the error if desired:

if err != nil { return errgo.Notef(err, "cannot sprang the doodah") }

In accordance with Rule 3, an error created by errgo hides the type of the wrapped error. It is possible to let error types pass through the wrapping process though – by adding an error predicate to errgo.Mask. An error predicate is just a function that reports whether a given error should be passed through.

Any function with the type func(error) bool can be used as an error predicate. For instance, os.IsNotExist works as an error predicate:

// LockIt tries to aquire the lock. It returns an error // that satisfies os.IsNotExist if the lock file has not // been created. func LockIt() error { f, err := os.Open(lockPath) if err != nil { return errgo.Mask(err, os.IsNotExist) } [ etc ... ] }

To accommodate the common idiom of using specific error values rather than error types, errgo provides a trivial helper function, Is:

if err != nil { return errgo.Mask(err, errgo.Is(rsa.ErrDecryption)) }

Since Mask wraps the error to add the source code caller metadata, the resulting error will never satisfy os.IsNotExist directly. Instead, errgo.Cause can be used to recover the cause of the error. Cause is safe to call on any error, not just errors created with errgo – if the error is not created by errgo, it will just return the error itself.

if os.IsNotExist(errgo.Cause(err)) { // If the file was not there, ignore // the fact that we cannot remove it. return nil }

Sometimes we wish to create an error that preserves information about the error cascade, but also adds specific information. The errgo.Err type implements all the functionality of the errgo error stack in a form suitable for embedding into a custom error type. I’ll leave describing that to another blog post.

Converting existing code to use errgo

I converted the Juju code base (~200Kloc) to use errgo, by applying the following transformation rules:

This:

if someStatement; err != nil { return err }

becomes:

if someStatement; err != nil { return errgo.Mask(err) }

This:

fmt.Errorf("some text: %v", err)

becomes

errgo.Newf(err, "some text")

This:

errors.New("some text")

becomes:

errgo.New("some text")

This:

if someStatement; err == somethingButNotNil

becomes

if someStatement; errgo.Cause(err) == somethingButNotNil

This:

err.(T)

becomes:

errgo.Cause(err).(T)

As we use gocheck for testing, I also added some rules for conversion of gocheck test cases.

I forked gofix to add the above transformation rules.

The rules above are by no means complete (we rely heavily on the fact that we almost always name error variables “err”, for example), but were sufficient to transform the vast majority of the code base. There was still a fair amount of work to do. In particular, we need to change any Mask call to add the error types that are expected to be returned. This proved to be not too hard (in the vast majority of cases we really do want to mask the error) – an alternative approach is initially to let all errors through (using errgo.Any) and gradually move towards stricter errors.

Although the changes have not yet landed, some initial experiments have shown that the errors we get are vastly more informative. My belief is that the new scheme will leave us with a code base that’s more easily understood and refactored too.

Share this: Twitter

Facebook

Like this: Like Loading... Related