I recently got the chance to redo the error handling in two different crates I help maintain. For liquid , I decided to write the error types by hand rather than use something like error-chain . In the case of assert_cli , I decided to finally give failure a try.

From failure 's announcement:

Failure is a Rust library intended to make it easier to manage your error types. This library has been heavily influenced by learnings we gained from previous iterations in our error management story, especially the Error trait and the error-chain crate.

I think failure does a great job iterating on Rust's error management story, fixing a lot of fundamental problems in the Error trait and making it easier for new developers and prototypers to be successful. While a replacement for the Error trait is a disruptive change to the ecosystem, I feel the reasons are well justified and the crate looks like it does a great job bridging the gap.

On the other hand, I do feel that there are parts of failure that are a bit immature, particularly Context . The reason this is concerning is the implementation policies related to Context are coupled to the general Fail trait mechanism (see Separation of mechanism and policy). We cannot experiment and iterate on Context without breaking compatibility with Fail and there is a reasonable hesitance in breaking compatibility.

I'd recommend failure for anyone writing an application. I'd recommend library authors exercise caution, particularly if you have richer requirements for your errors. For the crate author, I'd suggest it is either not ready yet for "1.0" or to be more open to breaking changes than "the distant future". Below I'll go into more specific areas I think failure can be improved.

I know it is less than ideal to receive this type of feedback "late" in the process (1.0 is expected to release next week). I was strapped for time and only recently had solid use cases to push failure s limits. I at least tried to provide theoretical feedback early on based on reading the code and docs out of an eagerness for failure to succeed. The ability to create a thorough analysis from just reading the docs / code is limited and the weight of such feedback is reasonably lower.

My Recent Failures

So let's step through my recent case studies in error management

liquid

liquid is a rust implementation of Shopify's liquid template language.

As mentioned above, I recently did a major revamp of liquid s errors. I ended up skipping failure . It was easier for me to write my error types from scratch than to take the time to figure out how to map my needs to failure . I was also concerned about making an unstable crate such a fundamental part of my API.

My goals were:

Low friction for detailed error reports.

Give the user context to the errors with template back traces.

An example of a template backtrace:

Error: liquid: Expected whole number, found `fractional number` with: end=10.5 from: {% for i in (1..max) reversed %} from: {% if "blue skies" == var %} with: var=10

Here is a rough sketch of liquid s errors:

#[ derive (Fail, Debug)] pub struct Error { inner : Box<InnerError>, } #[ derive (Fail, Debug)] struct InnerError { msg : borrow::Cow< 'static , str >, user_backtrace : Vec<Trace>, #[cause] cause : Option<ErrorCause>, } #[ derive (Clone, PartialEq, Eq, Debug, Default)] pub struct Trace { trace : Option<String>, context : Vec<(borrow::Cow< 'static , str >, String)>, }

I've created various "ResultExt" traits to make it some-what ergonomic to create:

let value = value . as_scalar () . and_then (Scalar::to_integer) . ok_or_else (|| Error::with_msg(format!(" Expected whole number, found ` {} ` ", expected, value. type_name ())) . context_with (|| (arg_name. to_string (). into (), value. to_string ()))?; ... let mut range = self .range . evaluate (context) . trace_with (|| self . trace (). into ())?;

While the context API needs some work, the overall approach worked great and provides very help error messages to the user.

assert_cli

assert_cli provides assertions for program behavior to help with testing.

This week, I started on a refactoring of assert_cli in an effort to move it to "1.0" for the CLI working group. I needed some new errors and rather than continuing to extend the existing brittle system based on error-chain , I thought I'd give failure a try. I figured breaking changes in failure would have minimal impact on my users because 99% of assert_cli users are just unwrapping the error in their tests rather than passing it along.

I structured assert_cli 's error-chain errors to leverage chaining. The chaining hierarchy is something like:

Spawn Failed

Assertion Failed Status (success / failure) Failed Exit Code Failed Output Strings matched when shouldn't Strings matched when should Bytes matched when shouldn't Bytes matched when should Sub-strings matched when shouldn't Sub-strings matched when should Byte subset matched when shouldn't Byte subset matched when should Predicate failed



Most of these errors really just exist for the sake of adding context. This is greatly simplified by leveraging failure::Context and ErrorKind idiom.

This is what it looks like, in Rust pseudo-code:

#[ derive (Copy, Clone, Eq, PartialEq, Debug, Fail)] pub enum AssertionKind { #[ fail (display = " Spawn failed. ")] Spawn, #[ fail (display = " Status mismatch. ")] StatusMismatch, #[ fail (display = " Exit code mismatch. ")] ExitCodeMismatch, #[ fail (display = " Output mismatch. ")] OutputMismatch, } #[ derive (Debug)] pub struct AssertionError { inner : failure::Context<AssertionKind>, }

Most of the rest of this post goes into my initial experience in converting over to this.

Feedback on failure::Context

error-chain and friends have been around for a while and given us different takes on how to write and chain errors but there hasn't been too much experimentation with adding context to errors. In failure s case, it is using roughly the same approach from when it was first announced.

Context serves two purposes in failure .

Wrap a failure::Error , adding a single impl Display of context (see documentation).

, adding a single of context (see documentation). Quick and dirty way to create or chain an error (see documentation).

Decouple Roles

Coupling these roles is initially convenient. A user can quickly create an error of from any piece of data they have and there isn't need for another struct that looks and acts similarly to Context .

The problem is in the details.

Say my errors are wrapped like this:

Context -> Context -> AssertionError -> io::Error

(remember: Context is a Fail )

Which Fail s' Termination trait should be respected for ? in main (when that becomes a thing)?

s' trait should be respected for in (when that becomes a thing)? How does a user of my library identify my documented error type for making programmatically handling the error?

If an application only wants to show the leaf error and not the causes, how does it identify what is a leaf error?

How should adding Context in no_std work? With the current approach, the Context is passed back and the original error is dropped.

Suggestions:

Separate the roles by providing an alternative "easy error".

Separate the roles so a clearer no_std policy can exist.

policy can exist. Remove Context from the error chain by moving the Context from an error decorator to an error member, like backtrace and cause .

from the error chain by moving the from an error decorator to an error member, like and . Experiment with ways to give no_std users more control over the Context policy like moving the Context from an error decorator to an error member, like backtrace and cause .

Errors are Display ed in Inverted Order

Because Context generically wraps a failure::Error , the ordering is inverted when rendering an error for a user.

If I were to naively switch liquid to failure my errors would change from

Error: liquid: Expected whole number, found `fractional number` with: end=10.5 from: {% for i in (1..max) reversed %} from: {% if "blue skies" == var %} with: var=10

to

end=10.5 cause: Error: liquid: Expected whole number, found `fractional number` cause: {% for i in (1..max) reversed %} cause: var=10 cause: {% if "blue skies" == var %}

Suggestion:

Associate contexts with an error by, again, moving the Context from an error decorator to an error member, like backtrace and cause .

Better support type contracts

failure::Error is a fancy boxed version of failure::Fail . This is a great solution for prototyping and applications like Cobalt.

Libraries can be a different beast. In cases like aasert_cli and liquid , I want to ensure:

User has a backwards compatible, defined, finite set of errors to programmatically deal with.

The errors are useful.

Having a type-erased failure::Fail makes this impossible. Any ? could be turning an error from a dependency into a failure::Fail and passing it right back up to my user. The only way for me to know is to exercise every error path and inspect the output.

Instead I prefer to return Result<T, AssertionError> instead of Result<T, failure::Error> .

That's fine, failure::Error is just a convenience that failure provides, like with Box<Error> , except:

If I want to add Context to my errors, the only reasonable ergonomic way to do so is to return Result<T, failure::Error> .

to my errors, the only reasonable ergonomic way to do so is to return . Context only supports wrapping a failure::Error , making it so any error can escape through a Context .

The only alternative is to reimplement the Context machinery that is built-in to the failure::Fail and failure::ResultExt traits.

Suggestion:

Allow using context without failure::Error by, yet again, moving the Context from an error decorator to an error member, like backtrace and cause .

Support a ContextPair

When converting assert_cli to failure , I found failure::Context works great when you want to dump strings but has limitations for my more common cases:

// Good: failure::Context works well in this case return Err(AssertionError::new(AssertionKind::OutputMismatch)) . context (" expected to contain ") // Bad: failure::Context loses the context in this case return Err(AssertionError::new(AssertionKind::OutputMismatch)) . context (needle) . context (haystack) // Alternative: Works but is a bit verbose, especially considering this is going to be 90% of my Contexts return Err(AssertionError::new(AssertionKind::OutputMismatch)) . context_with (|| format!(" needle: {} ", needle)) . context_with (|| format!(" haystack: {} ", haystack))

So I added this:

#[ derive (Debug)] pub struct ContextPair { key : & 'static str , context : impl Display, }

which can be used like:

return Err(AssertionError::new(AssertionKind::OutputMismatch)) . context (" expected to contain ") . context (ContextPair(" needle ", needle. to_owned ())) . context (ContextPair(" haystack ", haystack. to_owned ()));

Suggestion:

Provide something like ContextPair so failure can help developers fall into the pit of success for giving helpful errors to users.

Feedback on failure::Error

failure::Error is a fancy boxed version of Fail . Their APIs generally deviate in ways that make sense for their different feature sets.

A quick summary of their behavior:

Fail Error cause child item inner causes starts with self starts with inner root_cause includes self includes self downcast acts on self acts on inner

failure::Error is not a Fail

While failure::Error acts as a boxed Failed , it isn't a Fail . This isn't possible without specialization because From<Fail> for Error would conflict with From<T> for T .

failure::Error::cause is a foot-gun

As note above, Error and Fail have similar APIs and failure::Error mostly behaves as a proxy for the inner Fail with cause being the exception. Despite the cause s having different signatures ( Fail returns an Option<&Fail> while cause returns &Fail >), I think this is going to trip up a lot of people.

Suggestion:

Error::cause should behave exactly like Fail::cause to avoid surprising developers.

Feedback on failure::Fail

causes and root_cause are foot-guns

If cause is the child error, then causes would start with that and iterate beneath it, right? Unfortunately, no. The current Fail is the first cause.

Similarly, if your Fail has no cause , then it will be the root_cause .

I can understand the need for functions that behave like these but not with names that imply they won't include your current Fail . Granted, I personally would find causes not returning the top Fail more convinient.

Suggestion:

Either find new names or change the behavior of the functions to avoid surprising developers.

Addendum

Additional error management case studies

Cobalt

Cobalt is a static site generator that I maintain. As an end-user application, it is a perfect fit for failure . I haven't converted it over due to lack of a pressing need compared to the desired features.

Day Job

This is from a 15+ year old, multi-million LoC product that is 80% user-facing library and 20% user-facing application targeted at non-programmers.

Putting a flexible product into the hands of non-programmers means you need to provide a lot of guidance to help people get out of situations you couldn't predict. We consider good errors essential for our users.

The user-form of our errors look like (translated to Rust pseudo-code):

#[ derive (Fail, Debug, Copy, Clone, ...)] #[ repr (C)] pub enum ErrorKind { ... } #[ derive (Fail, Debug)] pub struct PublicError { kind : ErrorKind, context : String, }

and the internal-form of our errors look like (translated to Rust pseudo-code):

#[ derive (Fail, Debug, Copy, Clone, ...)] #[ repr (C)] enum ContextKind { ... } struct Context { value : Vec<(ContextKind, Box<Display>)>, backtrace : failure::Backtrace, } struct InternalError { kind : ErrorKind, context : Option<Box<Context>>, }

Things of note: