Using SPARK to prove absence of run time errors on a hobby project.

The Danish Technical University has a yearly RoboCup where autonomous vehicles solve a number of challenges. Each solved challenge gives points and the team with most points wins. The track is available for testing two weeks before the qualification round.

The idea behind and creation of RoadRunner for DTU RoboCup 2019.

RoadRunner is a 3D printed robot with wheel suspension, based on the BeagleBone Blue ARM-based board and the Pixy 1 camera with custom firmware enabling real-time line detection. Code is written in Ada and formally proved correct with SPARK at Silver level. SPARK Silver level proves that the code will execute without run-time errors like overflows, race-conditions and deadlocks. During development SPARK prevented numerous hard-to-debug errors to reach the robot. The time-boxed testing period of DTU RoboCup made early error detection even more valuable. In our opinion the estimated time saved on debugging more than outweighs the time we spent on SPARK. The robot may still fail, but it will not be due to run-time errors in our code...

Ada was the obvious choice. We both have years of experience with Ada. The BeagleBone Blue runs Debian Linux and we use the default GNAT FSF compiler version from Debian Testing. It is easy to cross compile from a Debian laptop and it is not too old compared to GNAT Community Edition. The BeagleBone Blue runs Debian Stable.

We decided initially not to use SPARK. According to some internet articles, SPARK had problems proving properties about floating points, and the Ada subset for tasking seemed to be too restricted. Support for floating points has improved since then, and tasking was extended to the Jorvik (extended Ravenscar) profile.

A race condition in the code changed the SPARK decision. A lot of valuable time was spent chasing it. The GNAT runtime on Debian armhf has no traceback info, so it was difficult to find what caused a Storage_Error. SPARK flow analysis detects race conditions, so that would prevent that kind of issues in the future.

It took a lot of effort before SPARK would be able to do flow analysis without errors. SPARK rejects code with exception handlers, fixed-point to floating-point conversions, the standard Ada.Containers and task entries. Hopefully, future versions of SPARK will be able to analyze the above or at least give a warning and continue analysis, even if it contains unsupported Ada features. When the code base is in SPARK, it is quite easy to add new code in SPARK.

Experience with SPARK.

I had to start from scratch, with no previous experience with SPARK, formal methods or safety critical software. I read the online documentation to find examples on how to prove different types of properties.

SPARK is a tough code reviewer. It detects bad design decisions and forces a rewrite. Bad design is impossible to test and prove and SPARK does not miss any of it. Very valuable for small projects without any code review.



SPARK code is easier to read. Defensive code is moved from exception handlers and if statements to preconditions. The resulting code has less branching and is easier to understand and test.



SPARK code is also more difficult to read. Loop invariants and assertions to prove loops can outnumber the Ada statements in the loop.



We used saturation as a shortcut to prove absence of overflows, in locations where some values could not be proved to stay in bounds. But this is not an ideal solution: if it does saturate, then the result is not correct.

In our experience, proving absence of run time errors also detects real bugs, not only division by zero and overflows.

Two examples.

SPARK could not prove a resulting value in range. A scaling factor had been divided with instead of multiplied with.



SPARK could not prove array index in range. X and Y on camera image calculation had been swapped.

Stability.

If code is in SPARK, then all changes must be analyzed by SPARK. Almost all coding error gives a run time exception. That happened several times on the track during testing. But when tested with SPARK, it always found the exact line with the bug. After learning the lesson we found that code analyzed with SPARK always worked the first time.



Having a stable build environment with SPARK and automated tests, made it possible to make successful last-minute changes to the software just before the final run in the video link.





Here is the presentation video for our robot:

