gfverif: fast and easy verification of finite-field arithmetic

Warning

The gfverif code is currently in alpha test and should not be relied upon. We prioritized rapid development over careful software engineering; we are releasing the code as a preliminary explanation of the underlying verification techniques. The code certainly has bugs, and needs a complete rewrite for several obvious reasons. The interface is also going to change.

Introduction

Cryptographic software needs to be correct: even a tiny bug can have disastrous consequences for security, as illustrated by Brumley, Barbosa, Page, and Vercauteren exploiting an ECDH carry bug in OpenSSL 0.9.8g. Standard software-testing techniques catch many bugs but did not prevent, e.g., further OpenSSL carry bugs announced in January 2015 (initial analysis: too expensive to exploit) and December 2015 (initial analysis: exploitable by well-equipped attackers).

The goal of gfverif is to integrate highly automated proofs of correctness into the ECC software-development process. These proofs can be compromised by bugs in gfverif, but eliminating those bugs is a relatively small one-time task. Correct proofs will eliminate all ECC software bugs, including the low-probability carry bugs that slip past testing. Of course, bug-free software can be compromised by hardware bugs and compiler bugs, but separate projects are underway to eliminate those bugs.

Contributors (alphabetic order)

Daniel J. Bernstein (University of Illinois at Chicago and Eindhoven University of Technology)

Peter Schwabe (Radboud University)

An initial experiment

We have proven the correctness of a reference implementation of X25519 elliptic-curve scalar multiplication. The implementation is, except for a few minor tweaks, the preexisting "ref10" implementation in C. Correctness means that, under plausible assumptions regarding the behavior of the C compiler, the implementation computes exactly the function specified by a concise high-level description of X25519.

We considered several criteria in the design of gfverif. Each criterion is stated here as (1) a one-line question; (2) a list of possible answers, from most desirable to least desirable; and (3) what this X25519 experiment says about gfverif's answer.

Volume of annotations to turn existing code into verifiable code?

zero; fully automated verifier

small

tolerable for development

painful

so painful it has never been done

gfverif produced most of the proof automatically, but it also relied on some "annotations" that the programmer added to the implementation. These annotations occupy only a small fraction of the original code volume, and we're working on ways to further reduce the volume of annotations.

For comparison, the paper "Verifying Curve25519 software" (ACM-CCS 2014) required much more annotation effort.

Difficulty of writing the annotations?

trivial

easy

medium

hard

The annotations are straightforward. They could easily have been written by a programmer without expertise in formal verification.

Prerequisites for how existing code is written?

nothing

only things that we need anyway, such as constant-time

more restrictions

give up, write verifiable code from scratch

A few small parts of "ref10" had to be tweaked for current limitations of gfverif. We don't expect these limitations to be hard to remove. This doesn't mean that gfverif will be able to handle everything: we're targeting only constant-time code, for example.

Volume of verifiable code written from scratch?

even smaller than writing today's code was

same ballpark as writing today's code

not much bigger than writing today's code

significantly more painful than writing today's code

so painful it has never been done

In the short term gfverif is targeting existing code, but in the long term we expect the ECC software developer to use gfverif from the beginning of the software-development process. The results won't be much bigger than today's code, and we can see ways that they can be even smaller than today's code.

Human effort for verification?

easy

medium

hard

Almost all of the proof was verified by gfverif. The human verifier checks the concise high-level description of X25519, and a few lines of input-output specification; gfverif verifies the rest.

CPU time for verification?

seconds

minutes

hours

days

maybe it's polynomial time in some cases

gfverif used a few minutes of CPU time to verify the full X25519 scalar multiplication. For comparison, "Verifying Curve25519 software" used many hours of CPU time for only the core operation (a "Montgomery ladder step") of X25519 scalar multiplication.

Types of errors that general-purpose verifier is trying to catch?

all, even maliciously hidden errors

anything normal, but not necessarily malice

various important categories of bugs not caught by testing

well, um, something

gfverif is targeting full functional verification, catching all bugs. However, this version isn't protected against malice: for example, it assumes (and does not enforce) memory-safety constraints.

Fraction of code verified?

complete crypto computation

large fraction of code

gfverif verified a complete X25519 scalar multiplication. For comparison, "Verifying Curve25519 software" covered only the main loop of scalar multiplication.

Volume of code in general-purpose verifier?

1000 lines

10000 lines

100000 lines

1000000 lines

The current version of gfverif is about 2000 lines of code. However, it also relies on g++ and the Sage computer-algebra system, which have much more code. One of the reasons for a complete gfverif rewrite is to remove g++ and Sage.

Risk that general-purpose verifier will fail to catch errors?

minimal; code proved

code audited but not proved

too much code to audit

Auditing the current gfverif code per se would be feasible, but auditing g++ and Sage is not feasible.

Types of computations supported?

much broader than crypto

crypto

core crypto: e.g., all of NaCl

significant chunk of core crypto

gfverif focuses on finite-field computations. This is slightly broader than ECC (for example, it can handle HECC and can probably handle NTRU) but it isn't all of core crypto. On the other hand, it's a significant chunk of core crypto.

Input languages supported?

pretty much everything people want

one popular language

have to use something new BUT IT'S A BETTER LANGUAGE YOU FOOLS

The current version of gfverif supports C. The next target is asm, which we'll initially handle by compiling asm to C. For the moment C and asm are the primary languages for the development of high-security ECC software. However, we see advantages in new domain-specific ECC languages and will be happy to extend gfverif accordingly.