This blog post is about exception and error handling. It seems that many projects don’t have quite a clear idea how to organize the code to deal with various kinds of potential problems. Without clear idea and approach in mind error handling can become muddled and complicates matters further than it should. Read this and learn.

What is an error what is not an error?

From a user perspective an error is something that is unexpected but still potentially part of normal program behaviour. Examples are problems such as “file not found”, “network connection went down” or failure to reserve some other (shared) operating system resource such as a mutex, socket, file handle or even memory. All these problems have one thing in common; they typically occur during resource acquisition and are associated with a specific error code coming from the underlying operating system. However from the application/program perspective none of these are really errors but more just “unfortunate turn of events”. In this text we refer to this category of errors as “soft errors”.

The second category of errors are errors that are not part of the normal application behaviour, but are in fact errors made by the programmer. These include errors such as memory corruption, out of bounds access, other undefined behaviour and many other types of errors including mistakes in the the application logic itself. Essentially the category of these kind of errors is wide and varied. Some tools lend themselves better to lessen some of these problems than others. However none of the tools and languages in use today are without their own pitfalls. In this text we refer to this category of errors as “hard errors” or “programmer errors.”

How to deal with soft errors?

Say your program is responding to some user input such as a button press event coming through the UI and the requested action is to open some file, perhaps read some data and display it to the user. What if the file is not found?

As the event is dispatched into our UI code we typically travel further and further down the software stack to lower levels in the system where we finally end up executing some piece of code that calls some operating system function to open up the resource identified by the file name. The operation fails and we are left with an invalid file handle and an error code that says (for example) “insufficient permission”. What is the best course of action to do? Remember this is not a hard error. The program itself has not faulted. The only sensible thing to do is to inform the user that the operation he/she requested could not be carried out because the he/she had insufficient priviledges to read the file. Now having decided that informing the user is the right way to do, how might we go about doing it?

Display error right when it happens. (print to std::cerr, MessageBox etc.) Call some error callback Return an error code Throw an exception

Let’s dissect these going from least attaractive to the most attractive. Option number 1. is arguably the least favourable of these options for many reasons. It bundles code that clearly belongs to the UI layer with application code. It is difficult to localize and difficult to reuse in other contexts. What if the calling code was a console application instead of a GUI application? What if there’s no GUI at all but just a headless service? We clearly dont have enough context to start displaying error messages right in that function. Instead displaying the error should be delegated to the UI layer that is responsible interacting with the user.

Option number 2. allows the error reporting code to be plugged into the application code. This allows for slightly better code reusability and localization. However in practice maintaining a separate error callback is cumbersome and often error prone by itself. Also it doesn’t really separate the error reporting/displaying from the occurrence. Callbacks also have additional problems such as not knowing what the callback implementation does (recursion anyone??). Anyone who has ever held a mutex while invoking a callback into unknown code know’s what im talking about. *cough* deadlock *cough. Do it enough many times and you *will* shoot yourself in the foot eventually. The only thing an error callback has going for it is easy debuggability. Put a breakpoint in the callback and you can immediately break in it and jump back in the callstack to see what was the problem. This, however this is not worth the price.

Option number 3 is almost a working solution. It has the benefit of completely allowing to separate the error reporting from the location where it occurs. However returning error codes makes error propagation difficult. When there’s need to propagate the error up the stack several layers, every layer needs to make sure that they do it correctly. There are few alternative ways to do this, such as having a global error type(s) (for example an enum, or std::error_code) that allows a each layer to return the same error object up the stack to the caller. This works but is somewhat ugly since ideally most of the code along the way should be error agnostic and not really need to know about the details of the possible errors but still be able to propagate them up to the caller without a hitch. Type agnostic error codes are then easily achieved by falling back on looser typing such as an integer (think errno) but this has the unfortunate side effect that it’s easy let mistakes slip through and silently return wrong codes/values. Essentially programming on error codes works best when the maximum propagation is just 1 level (such as from the kernel to the application). Error code based programming suffers from having to clutter normal application logic with details about error propagation. It does have the benefit of making the code flow very clear however. So in some limited circumstances when rigorus attention to detail is given it can provide a working solution.

Finally we arrive to the option number 4. Throw an exception. Throwing an exception has multiple pros. It allows the code between the code handling the exception and throwing it to be completely agnostic. I.e. the code in the middle doesn’t need to know, nor care about any exceptions flying through. Propagation is automatic and type safe. Multiple types of exceptions can easily be defined and the amount of data they can carry up the stack is not limited by any means. The biggest and pretty much only drawback is that it might make it slightly harder to reason about the code flow. And in some cases the intermediate code might need to be aware of exceptions, even catch them and correct their state appropriately before propagating the exception. However with proper design and resource management such cases should be far and few.

Next up programmer failures…

How to deal with hard errors?

Ok then the nasty stuff, the bugs that the programmers write. Out of bounds access. Violation of precondition. Mistakes in program logic. I write them and you write them too, so just read up and heed these advices. Lets inspect some potential mechanisms (and non-mechanisms) for dealing with this crap.

throw an exception, use some other soft error method ?

assert ?

ASSERT ?

So let’s assume that your program is in the middle of executing some prodecure. While it’s executing it tries to access an object that doesn’t exist in some data container. For example accessing a vector one index past the end. What could we possibly do about it? Clearly this is an error in the program and the programmer is responsible for this error. This is not part of our well behaved program and should not be part of the normal execution. Our program is simply broken. End of story. No matter how much we might be tempted to “deal with” the situation, return a nullpointer, or some “empty” object, throw an exception or return an error code we cannot actually solve the problem. The only thing we’d be doing is a disservce, by masking and hiding the real problem. We should be totally honest and immediatly upon stumbling on such an error simply abort the program. Blow up with a massive bang and leave behind a big smoke trail (a core dump) thad leads directly to the offending piece of code. To the unexperienced developer this might seem like the wrong thing to do. After all the program will die, the user will lose his data and generally everyone will be unhappy. Trust me kids, this will in fact make your program *better* (once you fix the bug ofc). Essentially trying to weasel your way out of a problem you made yourself is the wrong thing to do and won’t fix the problem. Trying to do so in the code will only add confusion to the code base (hey, look ma, im trying to write logic to deal with programmer failures) and clearly demonstrates the lack of balls to the programmers behalf. The bottom line. DO NOT WRITE CODE TO DEAL WITH PROGRAMMER ERRORS. Simply blow up.

Now that that is out of the way, how might be going about blowing up with a bang? The standard assert() macro is a nifty and simple utility. However it’s value is limited since it’s usually only built in debug builds. However we can use the standard assert to trap errors that are likely to be quickly encountered during development. Example cases are using asserts to verify pre and post conditions inside a class. (Is the file handly really open,did that expression really evalute to ‘true’, was that pointer definitely non-null. etc).

However when using the standard assert we must be sure that there’s no way for an error condition to fall through the cracks and possibly lead to incorrect behaviour at runtime.

ASSERT is a the big cousin of the standard assert. Except that it will always be built in. And it will always terminate the process with a big bang and a core dump when it is violated. This is not a standard utility but many projects already provide such a tool. It’s also not terribly difficult to craft one oneself. I recommend you do so now if you already havent. The chief idea is that it’s able to give us a stack trace and a memory dump for post mortem debugging.

And the recommended guideline is to user ASSERT always when:

a) we’re not 100% that we have covered all the loopholes in development with normal assert.

b) violation of the condition would lead to undefined behaviour. (out of bounds access, null pointer, etc.)

Finally a note about resource cleanup and RAII. Resource cleanup should always use ASSERT to check the return values (such as close(), CloseHandle() etc.). Why? Because it’s likely to happen in a destructor and if the resource cleanup fails the only viable condition to trigger this is corrupted application state. Examples are double free, or double delete, trying to close a handle which has become corrupted (incorrect handle value, object slicing etc.), double closing etc. Again, better just to dump core asap, investiage and then fix the application.

Bottom line, always make sure that your application state is 100% correct and well formed and if it isn’t catch this as soon as possible and blow up with a bang.