Solving the Dichotomy between Solution and Test

Today I would like to propose a software design pattern. This design pattern will be a functional programming design pattern aimed at solving the dichotomy between code and test. Traditionally, programmers write logic that specifies work steps executed by a computer on how to solve a problem and then separately — to assure themselves and their customers of the correctness of their solutions — they write tests. Invariably, solution and test create two tightly coupled models of the same problem. Tests ultimately embody the specification of the problem — what must hold true for the solution to be valid.

Junior engineers are often surprised to learn that the test effort involves as much work, or indeed more, than coding the solution itself. Senior engineers & decision makers are often reluctant to write comprehensive tests. This violation of best practice has its roots in economics. Imagine an industry where in order to build a thing, an effort equivalent or greater than building another variant of the same thing must be expended – continually. No other industry puts forward such a proposition. Build an airport: to make sure planes can land, build another. Build an army: to make sure its soldiers can fight, build another army.

Other industries, too, undertake testing. But there is a difference. Most industries test prototypes in the R&D phase of development. Then they enter production where things are made on an economy of scale, thus reducing the per unit R&D cost share. Individual units of manufacture are likely merely spot-tested. There is an assumption, that once a thing matures, testing is less relevant. It reveals fewer problems.

Here is how software is fundamentally different. Making additional units of the prototype as it relates to software is essentially cost-free – it is reduced to the copying of electrons. This part is essentially not costed in terms of time scales in the software life cycle. The software life cycle leaves the “R” of R&D (embodied in analysis and design) but essentially never leaves the “D.” Even maintenance is still “D” – or development. The test model that was created is now tightly coupled to the solution model. So every time one is modified, the other must be modified too. To draw on the airport analogy again, every time we modify the airport, we have to modify the test airport that we build beside it. As fewer problems are found, someone who looks at yield for effort and says “you know, skip that.” Logic is copied. Tests are left behind.

This is the onset of trouble..

What if software could be written in such a way as to unify the solution model and the test model? What if neither model could be left behind?

What is proposed is Specification Driven Programming by way of a Quantifier Enginer. It is suggested that Specification Driven Programming follows a design pattern based on a reasoning system, or a specification solver that I term the quantifier engine. The pattern in shown below.

Suggested Pre-Reading

The suggested pre-reading discusses a number of examples of the design pattern.

The Anatomy of a Puzzle

The Design Pattern in Detail

What is suggested is that non-trivial algorithmic & imperative problems can be factored into three distinct abstractions:

Declarative specifications based on existential or universal quantification (the input to the quantifier engine). Declarative domain generator functions Imperative but purely functional domain functions.

Let us examine each by way of example. We will be using Common Lisp as our prototyping language because Lisp is amenable to be re-programmed into anything of one’s choosing. Our example will draw on Project Euler Problem 2.

Project Euler Problem 2

Project Euler Problem 2 is described as:

“By considering the terms in the Fibonacci sequence whose values do not exceed four million, find the sum of the even-valued terms.”

We begin by noting this is declarative. We are to find the sum of the even valued terms of the Fibonacci sequence where terms must be less than or equal to 4 million. If we were to phrase this leaning on universal quantification, we might say something like For all A’s such that A is a Fibonacci number below 4,000,000 where A is even -> sum all A’s.

Our corresponding Lisp expression is written thusly.

The answer is the solution to Euler problem 2. Explaining the syntax…

For C++ programmers, “Reduce #’+” is equivalent to std::accumulate() or std::reduce(). It’s a functional programming concept that reduces a sequence using a binary function, here +. This is our “sum all A’s” from the English specification. The rest should read just as our universal quantification in the plain English version of the problem specification. We are solving for all A’s where A is a-fibonacci-number-below 4,000,000 and where A is even (evenp stands for “even predicate“).

Apologetics

Before we delve into the moving parts of this, let’s do some housekeeping.

1) I’m a C++ programmer. How efficient is this?

On a Macbook Air, the total runtime was 362 microseconds. That’s interpreted, not compiled, no optimizations. Running in the REPL. Zero bytes of memory management overhead were incurred (consed.)

2) I’m a Python programmer. I don’t like your Lisp parentheses.

I feel for you. Let’s help you out.

The {3 + 4} is essentially C++ / Algol syntax utilising infix notation. But since you said Python, you might wish to see our expression based on indentation and without parentheses or indeed without C++ style curly braces. We write thusly.

Both the Algol example and the “Pythonesque” example are still Lisp. The nice thing about Lisp is that it can re-compute itself into what you want it to be. Back to our specification…

The Moving Parts of our Specification

1) The “All Values” Solver

All-values is a higher order function that is the core of our specification. This function is in a Lisp reasoning library called Screamer which implements the “quantifier engine.” Enclosed within (all-values … ) is valid code. Thanks to the homoiconic nature (code = data), all-values treats everything contained within its scope as data and parses it into a decision tree. Here each node of the tree is simple, a mere assertion that A be an even number. Any complex expression might appear here. In C++, this meta programming construct is known as an expression template. Like C++, the transformation takes place at compile time and crucially not at runtime. Except here we don’t need specialised template syntax. We can write any code that is syntactically valid. Important is that this is entirely a declarative construct. There are no variables with mutable state. Nothing is said about how Fibonacci numbers are arrived at. We merely state our universal quantification that all values A must be Fibonacci numbers below 4,000,000 and they must be even.

2) The Domain Generator

The domain generator is an abstraction that generates valid members of the input domain: “a-fibonacci-number-below max.”

The code for the generator is as follows:

All the generator says that a Fibonacci number below a maximum value must be a-member-of the Fibonacci sequence below that maximum. Again, this is declarative. There is no mutable state. But why is it meaningful? Not much appears to be said, except the bare minimum. Consider the following formulation of our specification:

It looks nearly identical but it is profoundly different. Now we are saying that A must be an integer between 1 and 4,000,000. We assert that it must be even and that it must be a Fibonacci number. We have not defined the Fibonacci test predicate (fibonacci). Regardless of how that predicate might be implemented (through memoization perhaps to avoid repeated calculation of the sequence), our analytical complexity has increased dramatically. Our input domain is anything between 1 and 4,000,000. To arrive at a total solution all must be tried. Our assertion is now more complex also: there are two tests with a boolean AND in each branch of the decision tree. Compare that with the number of inputs to our first specification.

There are but 32 valid domain inputs in our first specification. Logically both specifications are identical. The search space of the second, naive solution, however, is 125,000 times larger, which is sure to exact a significant performance penalty on the solver.

The purpose of a domain generator then is to reduce the analytical complexity of the specification while delineating the input domain – or edge cases.

There is one more important consideration: The generator is non-deterministic. An integer between 0 and 1 might be zero or it might be 1. An enumeration might be 0,1 or it might be 1,0. Nothing about sequence is warranted and that is purposeful. a-member-of is a set abstraction. Sets are not presumed to have any ordering.

3) The Domain Function

The domain function for our solution is the Fibonacci sequence itself.

Or if you prefer the “Pythonesque” indentation style without parentheses…

Just pretend the defun was defn and you’re almost there. It worked for Guido.

What the function says is: let a be an initial zero, let b be an initial 1 and let c be a limit value n. While the sum of a + b is less than n, collect a sequence of c as the sum of a + b. At each iteration, a is set to (setq) b. b is set to (setq) c. That precisely is the specification of the Fibonacci sequence.

The first thing we note that this is imperative. It is also functional in that it is side-effect free. We have been naughty in our use of a loop in lieu of recursion. But we are side-effect free with respect to our environment. Being side-effect free is important in that our client, the domain generator, is non-deterministic. Output must always be the same for a given input regardless of state and regardless of how we have been called in the past and with what arguments.

Summary

What this design pattern has achieved is to fully a completely extricate the solution to the problem from the domain logic. We can reason about each separately and each can evolve independently. The actual solution to the problem is a declarative transcription of the English (mathematical ) problem statement. Test and solution are the same. Neither can be left behind.

If you consider how cumbersome model based verification of code can be, we have attained a significant capability towards enabling code correctness using our design pattern.

Post-Reading

The example given here is small. How might this design pattern scale to more substantial use-cases? In the article below we apply this design pattern to the complex cryptanalysis of the GCHQ Christmas puzzle of 2015.

Cryptanalysis with Reasoning Systems

I hope you have enjoyed this small exposé on what I term Specification Driven Programming.