Last week, the programmer Hillel posted a challenge (the link points to a partial postmortem of the provided solutions) on Twitter for someone to prove a correct implementation of three small programming problems: Leftpad, Unique, and Fulcrum. This was a good opportunity to compare the SPARK language and its expressiveness and proof power to other systems and paradigms, so I took on the challenge. The good news is that I was able to prove all three solutions and that the SPARK proofs of each complete in no more than 10 seconds. I also believe the Fulcrum solution in particular shows some aspects of SPARK that are especially nice. I will now explain my solutions to each problem, briefly for Leftpad and Unique and in detail for Fulcrum. At the end, I discuss my takeaways from this challenge. Leftpad Hillel mentioned that the inclusion of Leftpad into the challenge was kind of a joke. A retracted JavaScript package that implemented Leftpad famously broke thousands of projects back in 2016. Part of the irony was that Leftpad is so simple one shouldn’t depend on a package for this functionality. The specification of Leftpad, according to Hillel, is as follows:

Takes a padding character, a string, and a total length, returns the string padded to that length with that character. If length is less than the length of the string, does nothing. It is always helpful to start with translating the specification to a SPARK contract. To distinguish between the two cases (padding required or not), we use contract cases, and arrive at this specification (see the full code in this github repository):



function Left_Pad (S : String; Pad_Char : Character; Len : Natural) return String with Contract_Cases => (Len > S'Length => Left_Pad'Result'Length = Len and then (for all I in Left_Pad'Result'Range => Left_Pad'Result (I) = (if I <= Len - S'Length then Pad_Char else S (I - (Len - S'Length + 1) + S'First))), others => Left_Pad'Result = S);

In the case where padding is required, the spec also nicely shows how the result string is composed of both padding chars and chars from the input string. The implementation in SPARK is of course very simple; we can even use an expression function to do it:

function Left_Pad (S : String; Pad_Char : Character; Len : Natural) return String is ((1 .. Len - S'Length => Pad_Char) & S);

Unique The problem description, as defined by Hillel: Takes a sequence of integers, returns the unique elements of that list. There is no requirement on the ordering of the returned values. An explanation in this blog wouldn’t add anything to the commented code, so I suggest you check out the code for my solution directly. Fulcrum The Fulcrum problem was the heart of the challenge. Although the implementation is also just a few lines, quite a lot of specification work is required to get it all proved. The problem description, as defined by Hillel: Given a sequence of integers, returns the index i that minimizes |sum(seq[..i]) - sum(seq[i..])|. Does this in O(n) time and O(n) memory. (Side note: It took me quite a while to notice it, but the above notation seq[..i] apparently means the slice up to and excluding the value at index i. I have taken it instead to mean the slice up to and including the value at index i. Consequently, I used seq[i+1..] for the second slice. This doesn't change the nature or difficulty of the problem.)

I’m pleased with the solution I arrived at, so I will present it in detail in this blog post. It has two features that I think none of the other solutions has: It runs in O(1) space, compared to the O(n) permitted by the problem description and required by the other solutions;

It uses bounded integers and proves absence of overflow; all other solutions use unbounded integers to side-step this issue. Again, our first step is to transform the problem statement into a subprogram specification (please refer to the full solution on github for all the details: there are many comments in the code). For this program, we use arrays to represent sequences of integers, so we first need an array type and a function that can sum all integers in an array. My first try looked like this:

type Seq is array (Integer range <>) of Integer; function Sum (S : Seq) return Integer is (if S’Length = 0 then 0 else S (S’First) + Sum (S (S’First + 1 .. S’Last)));

So we are just traversing the array via recursive calls, summing up the cells as we go. Assuming an array S of type Seq and an array index I, we could now write the required difference between the left and right sums as follows (assuming S is non-empty, which we assume throughout):

abs (Sum (S (S’First .. I)) - Sum (S (I + 1 .. S’Last)))

However, there are two problems with this code, and both are specific to how SPARK works. The first problem could be seen as a limitation of SPARK: SPARK allows recursive functions such as Sum, but can’t use them to prove code and specifications that refer to them. This would result in a lot of unproved checks in our code. But my colleague Claire showed me a really nice workaround, which I now present . The idea is to not compute a single sum value, but instead an array of partial sums, where we later can select the partial sum we want. Because such an array can’t be computed in a single expression, we define a new function Sum_Acc as a regular function (not an expression function) with a postcondition:

function Sum_Acc (S : Seq) return Seq with Post => (Sum_Acc'Result'Length = S'Length and then Sum_Acc'Result'First = S'First and then Sum_Acc'Result (S'First) = S (S'First) and then (for all I in S'First + 1 .. S'Last => Sum_Acc'Result (I) = Sum_Acc'Result (I - 1) + S (I)));

The idea is that each cell of the result array contains the sum of the input array cells, up to and including the corresponding cell in the input array. For an input array (1,2,3), the function would compute the partial sums (1,3,6). The last value is always the sum of the entire array. In the postcondition, we express that the result array has the same length and bounds as the input array, that the first cell of the result array is always equal to the first cell of the input array, and that each following cell is the sum of the previous cell of the result array, plus the input cell at the same index. The implementation of this Sum_Acc function is straightforward and can be found in the code on github. We also need a Sum_Acc_Rev function which computes the partial sum starting from the end of the array. It is almost the same as Sum_Acc, but the initial (in fact last) value of the array will be zero, owing to the asymmetric definition of the two sums in the initial problem description. You can also find its specification and implementation in the github repository. For our array S and index I, the expression to compute the difference between left and right sum now becomes:

abs (Sum_Acc (S) (I) - Sum_Acc_Rev (S) (I))

The second problem is that we are using Integer both as the type of the array cells and the sum result, but Integer is a bounded integer type (32 or 64 bits, depending on your platform), so the sum function will overflow! All other Fulcrum solutions referred to in Hillel’s summary post use unbounded integers, so they avoid this issue. But in many contexts, unbounded integers are unacceptable, because they are unpredictable in space and time requirements and require dynamic memory handling. This is why I decided to increase the difficulty of the Fulcrum problem a little and include proof of absence of overflow. To solve the overflow problem, we need to bound the size of the values to be summed and how many we can sum them. We also need to take into account negative values, so that we also don’t exceed the minimum value. Luckily, range types in SPARK make this simple:

subtype Int is Integer range -1000 .. 1000; subtype Nat is Integer range 1 .. 1000; type Seq is array (Nat range <>) of Int; type Partial_Sums is array (Nat range <>) of Integer;

We will use Int as the type for the contents of arrays, and Nat for the array indices (effectively limiting the size of arrays to at most 1000). The sums will still be calculated in Integer, so we need to define a new array type to hold the partial sums (we need to change the return type of the partial sums functions to Partial_Sums to make this work). So if we were to sum the largest possible array with 1000 cells, containing only the the highest (or lowest) value 1000 (or -1000), the absolute value of the sum would not exceed 1 million, which even fits easily into 32 bits. Of course, we could also choose different, and much larger, bounds here. We now can formulate the specification of Find_Fulcrum as follows:

function Find_Fulcrum (S : Seq) return Nat with Pre => S'Length > 0, Post => (Find_Fulcrum'Result in S'Range and then (for all I in S'Range => abs (Sum_Acc (S) (I) - Sum_Acc_Rev (S) (I)) >= abs (Sum_Acc (S) (Find_Fulcrum'Result) - Sum_Acc_Rev (S) (Find_Fulcrum'Result))));

The implementation of Fulcrum The Sum_Acc and Sum_Acc_Rev functions we already defined suggest a simple solution to Fulcrum that goes as follows: call Sum_Acc and Sum_Acc_Rev and store the result; compute the index where the difference between the two arrays is the smallest. The problem with this solution is that step (1) takes O(n) space but we promised to deliver a constant space solution! So we need to do something else. In fact, we notice that every program that calls Sum_Acc and Sum_Acc_Rev will be already O(n) in space, so we should never call these functions outside of specifications. The Ghost feature of SPARK lets the compiler check this for us. Types, objects and functions can be marked ghost, and such ghost entities can only be used in specifications (like the postcondition above, or loop invariants and other intermediate assertions), but not in code. This makes sure that we don’t call these functions accidentally, slowing down our code. Marking a function ghost can be done just by adding with Ghost to its declaration. The constant space implementation idea is that if, for some index J - 1, we have the partial sums Left_Sum = Sum_Acc (J - 1) and Right_Sum = Sum_Acc_Rev (J - 1), we can compute the values for the next index J + 1 by simply adding S (J) to Left_Sum and subtracting it from Right_Sum. This simple idea gives us the core of the implementation:

for I in S'First + 1 .. S'Last loop Left_Sum := Left_Sum + S (I); Right_Sum := Right_Sum - S (I); if abs (Left_Sum - Right_Sum) < Min then Min := abs (Left_Sum - Right_Sum); Index := I; end if; end loop; return Index;

To understand this code, we also need to know that Min holds the current minimal difference between the two sums and Index gives the array index where this minimal difference occurred. In fact, the previous explanations of the code can be expressed quite nicely using this loop invariant (it also holds at the beginning of the loop):

pragma Loop_Invariant (Left_Sum = Sum_Acc (S) (I - 1) and then Right_Sum = Sum_Acc_Rev (S) (I - 1) and then Min = abs (Sum_Acc (S) (Index) - Sum_Acc_Rev (S) (Index)) and then (for all K in S'First .. I - 1 => abs (Sum_Acc (S) (K) - Sum_Acc_Rev (S) (K)) >= abs (Sum_Acc (S) (Index) - Sum_Acc_Rev (S) (Index))));

The only part that’s missing is the initial setup, so that the above conditions hold for I = S’First. For the Right_Sum variable, this requires a traversal of the entire array to compute the initial right sum. For this we have written a helper function Sum which is O(n) time and O(1) space. So we end up with this initialization code for Find_Fulcrum:

Index : Nat := S'First; Left_Sum : Integer := S (S'First); Right_Sum : Integer := Sum (S); Min : Integer := abs (Left_Sum - Right_Sum);