In this post, I want to share my thoughts about the notion of precondition. In “Design by Contract” philosophy, preconditions are always mentioned along postconditions and invariants, and in the context of OO design. In this post I focus only on preconditions and not necessarily related to any class. For instance, the following function specifies a precondition on its argument:

double sqrt(double x); // precondition: x >= 0

Note that the function specifies the precondition even though there is no language feature for this purpose (at least in C++). A precondition is a “concept” or an “idea” rather than a language feature. This is the kind of preconditions that this post is about.

A motivating example

Consider the following piece of code for authenticating users. Its responsibility is to have the end-user enter his user-name and check if this name is already recorded in the internal data-base.

bool checkIfUserExists(std::string userName) { std::string query = "select count(*) from USERS where NAME = \'" + userName + "\';" return DB::run_sql<int>(query) > 0; } bool autheticate() { std::string userName = UI::readUserInput(); return checkIfUserExists(userName); }

This code may look correct at first. Especially because in most of the cases it works as expected. If the end-user is nice, and enters names like “tom”, our program comes back with the right response. However, if one of the users is malicious, he may enter a “user name” like the following:

JOHN'; delete from USERS where 'a' = 'a

In this case the query in function checkIfUserExists becomes this:

select count(*) from USERS where NAME = ' JOHN'; delete from USERS where 'a' = 'a ';

This is a simple example of a serious security issue in a program, but for our purposes it is sufficient to call this situation a bug. Assuming that functions checkIfUserExists and authenticate were written by two different people, which of them is responsible for the bug? The author of authenticate can say “I expected that checkIfUserExists only queries one table and does not issue arbitrary commands to DB.” The author of checkIfUserExists can say “I expected a user name in the function argument — not arbitrary SQL commands.” Neither of them would be wrong. They worked under different expectations, and their expectations were not explicitly stated. Without clearly stated expectations (or a contract), it is impossible to tell whose fault it is: it is a failure to communicate between two programmers. (Or even between one programmer.)

We have two problems then: (1) the program has a bug, (2) it is not clear whose responsibility it is. The latter problem would have been avoided if function checkIfUserExists made its assumptions (or the lack thereof) about its arguments explicit. Suppose we have a function that can tell valid user names from the invalid ones:

bool isValidName( std::string const& text ) { const static std::regex NAME{"\\w+"}; std::smatch match; return std::regex_match(text, match, NAME); }

Now when dividing the code into smaller pieces (in our case: into functions) before we write the implementation of the functions, just after we decide on the function’s interface ( bool checkIfUserExists(std::string) ) we should also decide on the allowed range of the argument values. We can allow any value (and then apply filtering ourselves) or require only valid user names. Any choice is fine, but we have to make it explicit. In the header file that contains our function’s declaration, we decorate the declaration with our assumption.

bool checkIfUserExists(std::string userName); // precondition: isValidName(userName)

Or in case we accept any string:

bool checkIfUserExists(std::string userName); // precondition: true

Whichever of the two we decide upon, the responsibility for checking the value of the string becomes clear. We said that the precondition needs to be put along the function’s declaration because it is part of the function’s interface, like function’s name, return type etc.. The notable difference in the case of a precondition, though, is that (at least in C++) there is no language feature that helps us express it. We have to use comments. Comments? Does it sound wrong? In the environment I work in, I observed a tendency to avoid writing comments. True, there exist good reasons to avoid comments in places where we have a better alternative. But this should not make comment avoidance a common practice. One could say that such preconditions in comments may be misleading because comments are not syntax- and type-checked. Today’s IDE’s have the capability of recognizing certain patterns in comments and using them for generating documentation or tool-tip hints. Also, with a bit of inventive declarations and macros, you can force the compiler to enforce the type-safety of the asserted predicates, at the expense of slightly polluting the function declaration syntax. For instance, consider this solution (it requires C++11):

template <typename T> struct RETURN { template <typename U> struct precondition_expression_type { static_assert( std::is_constructible<bool, U>::value, "expression in precondition is not convertible to bool" ); using type = T; }; template <typename U> using PRECOND = typename precondition_expression_type<U>::type; }; #define PRECONDITION(...) ::PRECOND<decltype(__VA_ARGS__)>

Using this template and the macro, you we can declare our function as:

auto checkIfUserExists(std::string userName) -> RETURN<bool> PRECONDITION( isValidName(userName) );

Compiler will refuse the compilation if the expression is invalid or if it is not contextually convertible to bool .

What does a precondition mean?

A precondition makes certain assumptions explicit. When a function specifies a precondition it is clear that the caller is supposed to guarantee that the precondition holds. This is the contract: one should not call the function if he cannot guarantee satisfying the function’s precondition. This does not necessarily mean that the caller needs to do any checks himself; there are other means of fulfilling the guarantee. To illustrate this consider the precondition of sqrt :

double sqrt(double x); // precondition: x >= 0

The following three functions guarantee that sqrt ’s precondition will hold, even though they do not check the value of the argument:

double fun1() { const double x = 255.0; // using a literal return sqrt(x); } double fun2(double x, double y) // precondition: x >= 0 && y >= 0 { return sqrt(x) + sqrt(y); // relying on another precondition } double fun3(double x) // precondition: true { return sqrt(abs(x)); // abs(x) is never negative } // (relying in abs's postcondition)

Preconditions (along with other contract-programming concepts) bring order into the program. For instance, in our example with authentication, we may wonder if wiping out one DB table is a bug in the program or is it simply an unfortunate user input over which the program has no control. If function checkIfUserExists requires only valid user names in the precondition it helps set a certain ‘border’: it is acceptable for the user to enter any string, and this string can even enter the program, and this is fine; but the invalid string cannot cross the border; if it does, then (and only then) this is a bug. In other words, preconditions help us distinguish bugs from other unusual situations in the program.

Note also that we could have addressed the problem in a different way:

// NOT RECOMMENDED! bool checkIfUserExists(std::string userName) { if (!isValidName(userName)) SIGNAL(); std::string query = "select count(*) from USERS where NAME = \'" + userName + "\';" return DB::run_sql<int>(query) > 0; } bool autheticate() { std::string userName = UI::readUserInput(); if (!isValidName(userName)) SIGNAL(); return checkIfUserExists(userName); }

Ignore for the moment what SIGNAL() is (although it is a very difficult and important question). In this solution nobody trusts nobody and everyone just checks for the dangerous condition wherever they can. The author of checkIfUserExists cannot be sure if his caller will validate the input, so he does it himself. Similarly, the author of autheticate cannot be sure if checkIfUserExists will be prepared for any input and has to validate it himself. This solution has certain drawbacks, though. First: performance; now we repeatedly check for the same condition (although in this particular case this will be negligible compared to DB access). Second, the code now becomes messy. The programmers loose control over the code. If the author of autheticate at some point gets to see the implementation of checkIfUserExists , he may discover that since the latter already does the check, he can skip his own check for clarity/performance reasons. The author of checkIfUserExists may observe (and implement) something opposite. More, if function checkIfUserExists throws an exception when passed an invalid name, someone may try to use it for validating strings:

bool checkIfNameIsValid(std::string text) { try { checkIfUserExists(text); return true; } catch (BadUserName const&) { return false; } }

Next problem here is what SIGNAL() should do inside checkIfUserExists . Throw an exception? But is the author of autheticate prepared to handle it? Return an error code? Use errno ? Again, can we trust the author of autheticate to check it, given that we do not trust that he would validate the input? Whatever we choose, the program will grow in complexity; and complexity (especially a messy one like this) is likely to cause bugs.

What if a precondition is violated?

Violating a precondition is conceptually similar to dereferencing a null pointer. A function simply works under the assumption that certain conditions are met. If this is not the case, the function is likely to do something else than what its author and its specification expected. This is called an undefined behavior. For an example, let’s consider an implementation of function sqrt that uses Newton’s Iteration algorithm.

double sqrt(double x) // precondition: x >= 0 { double y = 1.0; double prev_y; do { prev_y = y; y = (y + x / y) * 0.5; } while (!closeEnough(y, prev_y)); return y; }

We keep finding a better and better approximation (variable y ) until we find one that lays within an acceptable tolerance. Function closeEnough checks if the current approximation is significantly different from the previous one. It is not obvious though that this loop is guaranteed to ever stop. Our expectation that the algorithm will stop is based on the observation that the difference between y and prev_y decreases in each iteration; and that the final result — call it final_y — satisfies the condition:

closeEnough(final_y, (final_y + x / final_y) * 0.5)

The two are indeed the case, and y and prev_y converge in fact quickly, but only provided that x is non-negative. The moment we pass a negative x , variables y and prev_y never converge; and our loop never stops.

In consequence, violating a precondition may cause a program to hang. If the end-user computes a simulation, and he knows the simulation will be running for a week, it is only after a week that he will learn that his simulation is not running and the program is simply hanging.

Validating the precondition manually

Given that violating the precondition can have severe consequences (just consider providing an out-of-range index when accessing an element in an array), why not just validate it as the first thing inside the function?

As we already said, calling a function whose precondition is not satisfied is an undefined behavior: our function can legally do just anything. Using an additional measure to validate the precondition does fit into “anything.” However, a couple of things should be kept in mind.

1. Checking for a precondition inside the function should not relieve us from still specifying the precondition. While the check is performed, our clients should not take it for granted. They should not be led to expect that we have committed to performing the check, and still comply with the contract. Our check is part of function’s implementation: it is subject to change. The contract, on the other hand, is part of the interface: it is supposed to be something stable.

To illustrate this, consider the two ways of accessing elements in std::vector :

void test(std::vector<int> const& vec, size_t n) { vec[n]; // (1) vec.at(n); // (2) }

The first one specifies the following contract:

precondition: n < vec.size() ;

returns: *(vec.begin() + n) .

The second specifies a different contract:

precondition: true (no precondition);

effects: if n < vec.size() then returns *(vec.begin() + n) ; otherwise throws out_of_range .

This means that the following way of breaking the loop is perfectly valid:

void forEach(std::vector<int> const& v, std::function<void(int)> f) // precondition: f != nullptr { size_t i = 0 try { for(;;) { f(v.at(i++)); } } catch (std::out_of_range const&) { // finished loop } }

Using operator[] in place of at above would in turn cause a UB (even if this UB happens to result in throwing out_of_range ).

2. While it may be possible to form an expression that tells valid arguments from invalid ones, executing the expression may be fatal to the program. For instance consider the following function:

template <typename IIT> // requires: InputIterator<IIT> void displayFirstSecondNext(IIT beg, IIT end) // precondition: std::distance(beg, end) >=2 { std::cout << "first: " << *beg++ << std::endl; std::cout << "second: " << *beg++ << std::endl; std::cout << "next: " while (beg != end) { std::cout << *beg++ << " "; } }

This function requires that the input range contains two or more elements. We can easily check it using function std::distance , but this by would require incrementing the iterator. In case we are given an InputIterator (e.g., the iterator interface for IO-streams), if the iterator is incremented while checking the precondition, we can never get back to the value it was referring to before, and we would not be able to display the first element in the range inside the function body.

Similarly, validating the precondition may change the complexity guarantees of the algorithm. Consider:

bool containsSorted(std::vector<int> const& v, int i) // precondition: std::is_sorted(begin(v), end(v)) { return std::binary_search(begin(v), end(v), i); }

This function expects a sorted vector and therefore it can offer a logarithmic complexity. However, in order to check if the vector is sorted we have to use an algorithm with a linear complexity. In the end, our function containsSorted requires a sorted vector and offers a linear complexity if it validates the precondition. This can make our program run much much slower. This may be unacceptable even for debug builds.

Also, for performance-critical applications if we check for the same precondition multiple-times it unnecessarily slows the program down. Consider:

void forEach(std::vector<int> const& v, std::function<void(int)> f) // precondition: f != nullptr { for (size_t i = 0; i < v.size(); ++i) { // (1) precondition check f(v[i]); } }

By specifying the loop’s terminating condition we already check the precondition of operator[] , if we were checking it again inside operator[] we would be unnecessarily making the program run slower.

3. Evaluating the precondition is the easy part; the tough part is how to report the broken precondition. What should function sqrt do if it detects a precondition violation? Return some special value like NaN ? But if the caller doesn’t bother to check the precondition, will he bother to check the special return value? Also see this post to see why it can cause more bugs. Putting it differently, since violating the precondition is the failure (bug) of the caller, giving the control back to him in order to fix the problem is unlikely to work. The caller’s job is to satisfy the precondition (it is entirely in his control) — not to handle the consequences of his own fault.

The same issue is applicable to other ways of signaling function failures. You could think of returning a combined value:

optional<double> sqrt(double x); // precondition: x >= 0

This is problematic also. How do you specify the effects of the function? I.e., what does it do? Let’s try: it returns an ‘uninitialized’ optional in case x < 0 . But this is counter to the concept of a precondition. If you know what the function should do for negative arguments, it does not have a precondition: it is well defined for any value of type double , and therefore it can be legally used in ways that we may not like:

optional<double> sqrt(double x); // precondition: true bool isNegative(double x); // precondition: true { return !sqrt(x); }

A compound value may make sense in cases like converting a string to a number, where not being able to convert a string is a frequent and expected situation:

optional<int> toInt(std::string s); // precondition: true int getNumber(); { std::string s; for (;;) { std::cout << "enter a number: "; std::cin >> s; if (auto ans = toInt(s)) { return *ans; } else { std::cout << "this was not a number, "; } } }

But in the case of sqrt this is different. The only case where we would be forced to return the special value to the caller is when we know there is a bug in the caller. We would probably need to specify the contract in the following way: “requires a non-negative double , never returns an uninitialized optional .” But this means that we never need to check for the uninitialized optional . So, why returning a compound value? A better choice would be to alter the type of function’s argument rather than its return type, but we leave that option for Part II.

If we decided to throw an exception on precondition violation, at least we do not have to involve the return value. However, one problem remains: we are passing the control to the caller of which we are certain that it has a bug. Exceptions are used to signal a failure inside the function (failure to do what it is expected to). We are now abusing the exception mechanism a bit to signal a bug outside of the function. Also, it will not work for functions that want provide a no-fail guarantee:

double sqrt(double x) noexcept; // precondition: x >= 0

This problem (of non-throwing functions with a precondition) has been also discussed in depth in N3248.

Another option is to stop the program at this point by calling std::terminate . This appears very harsh. On the other hand, we are killing the program that is about to enter a UB, and already has a bug; and std::terminate may give us a chance to collect the information about the program state (create a memory dump or some such) and restart the program. This is in fact the default behavior in the proposal to add Contract Programming to C++: see N1962.

I have to stop there: I do not want the post to be to long. In the next post we will explore how and when to best specify preconditions, what alternatives to preconditions we have, and how compilers and other tools can assist with enforcing the preconditions.