Programming is a deeply humbling activity.

What else could you say of an activity and profession where it's common knowledge that you'll never write bug-free code? That regardless of how hard one tries, it's not enough. And it's not because we just haven't yet found the right abstraction or designed the right programming language. Software is surprisingly complex.

Software entities are more complex for their size than perhaps any other human construct. Frederick P. Brooks, Jr.

In his essay, "No Silver Bullet", Frederick P. Brooks (writer of The Mythical Man-Month) describes how software is complex. So complex that it actually cannot be made simpler: its complexity is essential. To deal with this non-reducible complexity we create abstractions in layers.

Yet each time we build an abstraction layer atop of another, we hide details. In many scenarios these details are unimportant. Sometimes, we hide bugs. And sometimes bugs are emergent properties of complex components interacting in unforeseen ways.

What can we do to reduce software complexity when much of our software systems depend upon open source software we didn't write? Software that we don't fully understand. Software that we've bet the business on.

To figure this out, we need to find out why software is intrinsically complicated.

Why is programming complicated?

I recently read an interesting paper, "Out of the Tar Pit" which discusses why programming is complicated. When you start trying to understand a large codebase there are two main ways you can understand it:

Model as much of it as you can in your head Using tests to demonstrate the correctness of pieces that you can't fit in your mental model

So the goal of making your codebase less complicated is two-fold: make your tests good and make it fit in the average programmer's brain better. You can accomplish the latter in a number of ways: the principle of least astonishment, using standard patterns that are familiar to other programmers, naming your variables sensibly, and, as the paper argues, limiting the amount of state you need to keep track of. State increases the arity of every single class or method that can touch that state, so it's essential to keep global state to a minimum.

What this paper leaves out, in my view, is how modern software is constructed: you depend on dozens of third party libraries and abstractions to accomplish every task. How do you fit them into your mental model? Does that API throw an exception when the network is down or does it block? Can you configure that? These libraries complicate your software as they simplify some other aspects of it. We need a way to understand this complexity in order to measure and reduce it.

The best way to deal with this complexity is via code contracts. Each time you call a class or method, what does it require? What does it produce? Most importantly, what are its error conditions? How does it fail? How should you handle this failure?

Code contracts are written in three ways: the type and arity of the method or class's inputs and outputs, the documentation, and implicit, hidden details to the contract that you can only find out if you read the code.

The better you understand the contracts of the code you're using, the better you can fit it into your mental model. You will properly handle its inputs and outputs and error conditions if you understand its code contract. You will have less bugs and your software will be more robust.

Software without a well-specified code contract is complicated. Nobody wants to use complicated software where a simpler substitute would do.

Code Contracts

So, what's a code contract?

It's an interface to your code. It specifies what your code expects as input, what it will output in what circumstances, and how and why it will fail. Let's look at an example.

Elasticsearch is a distributed searchable database. You put your data in, and then you can search it in various ways. Let's see how you would use the Java client to search the database.

public SearchResponse search ( Client esClient , String query ) { final ListenableActionFuture < SearchResponse > future = esClient . prepareSearch ( this . indexName ) . setQuery ( query ) . execute (); return future . actionGet (); }

This takes an Elasticsearch client esClient, and a JSON query. It searches the index name (much like a database name in a traditional database) that was configured. The builder returns a future, but we just call get on it because we want our method to return the SearchResponse.

How does this fail? What if the esClient can't talk to the Elasticsearch cluster? What if the query takes hours? What if the JSON is malformatted? Our search method is nearly undocumented and difficult to use properly. It doesn't even specify that query is supposed to be JSON.

Let's make it better.

/** * Searches the Elasticsearch cluster pointed at by 'esClient'. The JSON * query specifed by 'query' is run and the SearchResponse is returned. If * the query takes longer the configured timeout, it is aborted. */ public SearchResponse search ( Client esClient , String query ) { final ListenableActionFuture < SearchResponse > future = esClient . prepareSearch ( this . indexName ) . setQuery ( query ) . setTimeout ( this . timeout ) // ten seconds by default . execute (); return future . actionGet ( this . timeout + 100 ); // a little extra time }

This is a little better, but we need to know what Elasticsearch will do when it gets a bad query or times out. We need to consult the Elasticsearch documentation!

Well, it looks like the docs don't mention it. That's frustrating. From a quick glance at the code, it looks like an ElasticsearchException is thrown, but that's probably thrown for every problem, and not just bad queries. I guess we'll need to go read the code.

The execute method is in a class called AbstractRequestBuilder. This doesn't mention anything about timeouts at all. But this is an abstract class so we probably should go to SearchRequestBuilder since we're making a search query. This points us to Requests.searchRequest which has basically no information.

This is an example of a poorly specified code contract. Because it is difficult to find out how it fails (other than throws some generic exception), you need to run the code to find out. Worse, since that's an implied code contract rather than an explicit one, it could change at any time.

The final solution to searching Elasticsearch properly with a good code contract is in this gist. It has good documentation, obvious failure conditions, and specifies JSON via a type instead of a string. This code sample is obviously a toy problem and could be improved, but it's better than Elasticsearch's code contract by a long shot. It specifies RuntimeExceptions in its throws declaration even though it's not required because it's more clear.

Magic

With abstractions you can reduce the "accidental" complexity of a software system. The urge to simplify everything – even essential complexity – is dangerous. Abstractions that seek to simplify but actually complicate the system are called magic.

The complexity of software is an essential property, not an accidental one. Hence, descriptions of a software entity that abstract away its complexity often abstracts away its essence. Frederick P. Brooks, Jr.

I define magic as software which does something impressive, but has a weak code contract. Maybe it's poorly specified. Or maybe the code contract is just complicated or violates some commonly held practice or standard. Maybe it tries to "figure out the right thing to do" with whatever arguments it gets. Once you dive into the details of how it works, the curtain is pulled back and the magic is revealed.

Magical abstractions are not useful because they add complexity rather than removing it. Avoid creating magical software and libraries by specifying code contracts, and avoid using them if you can.

Write Contracts

The static verses dynamic typing argument in programming languages is really about code contracts. Statically typed programming languages allow you to specify much of your contract in code. Dynamically typed programming languages require you to specify your contract in documentation. Middle-of-the-road languages like C++, Java, and C# require both: the worst of both worlds. Using a dependently typed programming language, you might be able to put your entire code contract in the code and have no documentation at all.

The only way to write complex software systems today is to produce strict code contracts and to understand other softwares' code contract. The more difficult your code contract is to understand, the more difficult your software will be to use.

The goal is simplicity. The way to achieve simplicity is by a well-specified, easy to digest contract. Prefer libraries that have little to no magic that are easy to understand, for both you and your colleagues.

Code contracts aren't a silver bullet, but they're a solid step towards one.