Range-Checks and Recklessness

Here's an odd technical debate from the 1980s: Should compiler-generated checks for "array index out of range" errors be left in production code?

Before C took over completely, with its loose accessing of memory as an offset from any pointer, there was a string of systems-level languages with deeper treatment of arrays, including the ALGOL family, PL/1, Pascal, Modula-2, and Ada. Because array bounds were known, every indexing operation, such as:

frequency[i] = 0

could be checked at runtime to see if it fell within the extents of the array, exiting the program with an error message otherwise.

This was such a common operation that hardware support was introduced with the 80286 processor in the form of the bound instruction. It encapsulated the two checks to verify an index was between the upper and lower bounds of an array. Wait, wasn't the lower bound always zero? Often not. In Pascal, you could have declarations like this:

type Nineties = array[1990..1999] of integer;

Now back to the original question of whether the range checks should live on in shipping software. That error checking is great during development was not controversial, but opinions after that were divided. One side believed it wasteful to keep all that byte and cycle eating around when you knew it wasn't needed. The other group claimed you could never guarantee an absence of bugs, and wouldn't it be better to get some kind of error message than to silently corrupt the state of the application?

There's also a third option, one that wasn't applicable to simpler compilers like Turbo Pascal: have the compiler determine an index is guaranteed to be valid and don't generate range checking code.

This starts out easy. Clearly the constant in Snowfall[1996] is allowed for a variable of type Nineties . Replace "1996" with a variable, and it's going to take more work. If it's the iteration variable in a for loop, and we can ensure that the bounds of the loop are between 1990 and 1999 inclusive, then the range checks in the loop body can be omitted.

Hmmm...what if the for loop bounds aren't constants? What if they're computed by a function in another module? What if there's math done on the indices? What if it's a less structured while loop? Is this another case of needing a sufficiently smart compiler? At what point do diminishing returns kick in, and the complexity of implementation makes it hard to have faith that the solution is working correctly?

I set out to write this not for the technical details and trivia, but more about how my thinking has changed. When I first ran across the range-check compiler option, I was fresh out of the school of assembly language programming, and my obsessive, instruction-counting brain was much happier with this setting turned off. These days I can't see that as anything but reckless. Not only would I happily leave it enabled, but were I writing the compiler myself I'd only remove the checks in the most obvious and trivial of cases. It's not a problem worth solving.

permalink March 22, 2014

previously