In a dreamy programming utopia, functions behave correctly and nothing unexpected ever happens. In technical reality however even the most mundane of actions can suffer failures and errors. It’s a misguided notion that errors can be eliminated or that some things can not fail. The only sane way to program is by assuming things will go wrong.

A polymorphic world

Operating systems have become very complex and flexible creatures. Between the lowest levels of the computer and our high level application code are layers upon layers of abstractions. Let’s consider a very common operation: opening a file.

1 var q = open ( "url_or_device_path?" )

What is a file? This can be basically anything nowadays. Some languages let us open URLs, or remote resources, directly as files. The OS lets us open named sockets, or even create anonymous sockets while opening a file. Linux exposes a whole host of kernel properties via /proc files, or communication with hardware via /dev files. Protocols like WebDAV and SFTP also ensure that even things that look like normal files may not be.

As a file can be so many different things it follows that there is a virtually unlimited number of ways that opening it can fail. Even if we do successfully open it, the first read or write, or the second or third, could easily go wrong. It’s not possible, in a general way, to know how file operations might fail on us, only that they can at any time.

There’s another discussion leading from this about the classification of errors. This type of abstraction is what causes checked exceptions, or controlled error codes, to ultimately fail. They tend to create an unsolvable mapping problem.

All libraries fail

The purpose of libraries is to abstract a lower concept, to create a friendlier way to use it. We never really know what is happening in the library; we don’t know how many options it offers or concepts it combines. If it just calls a single function that can fail, that requires the high-level call can also fail. The chances of this are quite high: consider that most functions in the standard C library can fail, as can a great portion of OS calls.

Some libraries go out of their way to front-load failures in the form of parameter checking. This helps prevent “unexpected” errors happening during operations. But the reporting of the parameter errors are still errors.

It may be tempting to distinguish between genuine failures and programmer error, but unless we become infallible at work what difference does it really make? The function fails in either case.

“But wait!” we say, “I’ve seen functions that don’t report errors.” They’re usually just lying, hiding the fact that errors happen and making it difficult to debug or diagnose the problem. Several bad APIs decide to just log the error and continue, making it impossible for us to actually know an error happened. Either a library reports errors or it’s lying.

Don’t confuse libraries that offer side-channels for error reporting as being the same as not reporting errors. OpenGL uses glGetError to store all error codes, and you’re expected to call it after all other functions.

Even the basics

We can’t assume even the fundamentals of programming are error-free. Take even simplest of operations, the addition of two integers. In the world of math this is a well behaved operation that can’t fail. On a computer however we don’t have real integers, we have fixed length integers. This means any addition can result in either an overflow or an underflow.

1 2 3 var q = a + b if ( a > 0 and b > 0 ) assert ( q > a && q > b ) // this can fail due to overflow

What’s very painful about integer underflow and overflow is that most languages fail to report it in any fashion whatsoever. It’s also the hidden fabric of so many more operations, from array indexing to stream seeking. High-level errors resulting from low-level overflow are quite common, from long-running programs with counters that get too high to users that feed massive documents into programs. It’s really not hard to reach the limits of integers.

Division also has a problem, if we accidentally divide by zero. For integers this may result in a low-level trap on some systems, but floating point may result in infinity, or NaN. Floating point also has it’s own assortment of range and precision problems. The chance of us writing error-free floating point code is unfortunately kind of low.

Given that the underpinnings of basic programming are prone to failures and errors, it’s hard to imagine anything built on top of that can ever be error-free.

It will fail

From the lowest levels through to the highest levels, everything can fail. Crashes, corruption, and glitches usually happen because somebody assumed something wouldn’t fail. Even theoretically perfect algorithms can fail due to programmer error.

Assuming that something can fail is the only realistic option. If a library, or language, pretends otherwise, then chances are it is broken.

I’ve been fortunate to work on a wide variety of projects, from engineering to media, from finance to games. Follow me on Twitter to share in my love of programming. If you’re interested in something special just let me know.