Project Everest is a large, collaborative research effort that aims to verify and deploy a new, secure HTTPS stack. All of our code is verified using the F* programming language. Using KreMLin, a dedicated compiler, the verified F* code is compiled to readable C, meaning existing systems projects can readily integrate our verified code. Going to C is what allows people to use our code without having to buy into exotic, strange languages with lambdas.

Three years into the project, we have successfully integrated code into mainstream software, notably Firefox and Windows. Along the way, we learned what it takes to deliver quality C code that can be taken seriously and integrated into an actual source tree. We made many rookie mistakes and discovered the discrepancy between what we, researchers, thought was ok, and what was expected of us by actual software developers. None of these lessons will ever make it into a paper, which makes a perfect fit for a blog post. I have shared many of these anecdotes in informal conversations or talks; apologies if you’ve heard some of this already.

Our first big “client” of Project Everest turned out to be Firefox. This initial success was in large part due to the well-known strategy of infiltrating PhD students at large companies or non-profits (e.g. Mozilla), hoping that after setting camp as interns, they would be in a position to push for verified code from the inside.

Software engineers review generated code!

We sure expected that generating idiomatic C code was important, and this informed a lot of early design choices in our toolchain. We were surprised, however, by how closely Mozilla reviewed and manually inspected the generated C code. This revealed a number of issues that we never thought were actual blockers.

Many temporaries were inserted by F*, to enforce the evaluation order of arguments, something which is guaranteed by neither C of OCaml, the two main extraction backends of F*. Alas, this extra precision was not appreciated by reviewers, who very explicitly requested that variables named uu__123456 be eliminated whenever possible.

be eliminated whenever possible. Computationally-irrelevant function arguments were erased as unit arguments, a correct but suboptimal compilation scheme that gives un-natural function signatures in C headers. Also a no-go.

arguments, a correct but suboptimal compilation scheme that gives un-natural function signatures in C headers. Also a no-go. I wrote the pretty-printer for our compiler, KReMLin, looking at the reference table of operator precedence in C, resulting in a minimal amount of parentheses being inserted; I happily thought I did the optimal thing, until it turned out that no one can remember the relative precedence of + and << , or | and - – I have myself since then forgotten, and I’ve heard that it even differs across languages…

These shortcomings were fixed as follows.

The auto-generated names were removed using a custom pass in the KReMLin compiler that performs a def-use analysis via a syntactic criterion, and is aware of the C semantics.

The extra unit arguments were removed, first by KReMLin, then in F* itself, which now ensures that the OCaml extraction of F* also generates prettier code.

KReMLin became reasonable and added an option to generate extra parentheses.

A myriad of other details were fixed, such as always adding curly braces around conditional blocks, indentation, unused variable elimination, etc.

In short, even if it doesn’t matter for a C compiler, the expectation was truly that the resulting C code would be as crisp, clean and readable as what a C programmer would have written. In hindsight, that’s fair.

Don’t go functional on a C compiler

In early experiments, as programmers of functional training and descent, we relied heavily on recursion: reasoning is easier with recursive calls (it’s easy to call some lemma after the recursive call returns); it also makes implementing our functional specifications easier, something important for students who need to a progressive ramp-up with our toolchain. For all those scenarios, we relied on GCC 7’s excellent tail-call optimization, as a stop-gap measure in support of rapid prototyping.

However, once one leaves the cocoon of modern compilers and toolchains, and discovers the vast range of compiler versions and architectures under Mozilla’s CI, it becomes evident that relying on any sort of tail-calls optimizations would be unreasonable: very few compilers reliably perform tail-call optimization.

We were confronted with that reality when a (temporary) piece of code blew the stack using a non-GCC compiler. It was converting between two abstractions for arrays by performing a very idiomatic, functional, byte-by-byte recursive copy. It was later rewritten with a call to memcpy , and will soon disappear as we unify the two abstractions.

Another gripe that we had was our heavy reliance on passing structures by value. KReMLin supports, as an extension, data types passed as values, using a tagged union compilation scheme. It is awfully convenient, since structures as values have no lifetime, and are hence manipulated as pure values within F*, incurring no extra verification cost.

Passing structures by value is allowed by the C standard, but this is costly. Most ABIs mandate that such structures, if larger than a few words, be passed by pushing their contents onto the stack. This showed up in our performance profiles; our new guideline is to essentially only use those when their size does not exceed four words on a 64-bit architecture.

Furthermore, even small structures caused issues. Due to multiple layers of type abstraction, and owing to the tagged union compilation scheme of KReMLin, we ended up with series of field accesses over 10 fields long, which hit a hardcoded limit in one of our supported compilers. This was later solved by extending KReMLin with three new compilation schemes for data types, optimizing special cases (inductive has one branch; no branch has more than one argument; etc.) to reduce the depth of field accesses.

Forget about abstraction and modularity

Working with F*, we (unsurprisingly) split our code into separate modules, to leverage modularity, maintainability, and parallel verification. Within modules, larger functions were split into small, individual helpers, each adorned with their respective pre- and post-conditions, for robustness and readability.

Going to C, a priority for the KReMLin compiler was to get rid of this pesky abstraction and modularity.

First, no one wants to land in their repository an algorithm spread out across 20 different source files, no matter how beautiful and well-designed the original source code was.

Naïvely translating each F* module to an individual C file had a more pernicious effect: this generated a lot of calls across translation units (C files), meaning that a large chunk of the call graph was forced to abide by the rules of external linkage and the calling convention dictated by the compiler’s ABI. Specifically, this prevented the compiler from optimizing function calls by, say, passing more arguments in registers. This had a disastrous effect on performance.

Even splitting functions into smaller helpers prevented a lot of intra-procedural analyses from kicking in.

This was alleviated with two mechanisms:

KReMLin got equipped with a bundling mechanism driven by a micro-DSL, which allows the programmer to specify how to recombine multiple source F* modules into a single C translation unit (file), indicating which functions are the API entry points, leaving the rest of the module marked as static inline , which has a big effect on performance.

, which has a big effect on performance. Initially done in KReMLin, now in F*, the programmer has the ability to mark functions as noextract inline_for_extraction , meaning that the helper will be inlined at call-site and will not itself be extracted to a separate function.

The bundling mechanism works in conjunction with a reachability analysis that removes anything not reachable from the API entry points, hence ensuring that no unused helper remains in the generated C code, something that also made our consumers queasy.

Reproducible builds

Our toolchain relies on OCaml, custom compilers, and a variety of tools that turned out to be very hard to set up. We have now switched entirely to a Docker-based CI infrastructure, which makes distributing our code trivial. Mozilla adopted this and runs a Docker build command in their own CI, which errors out if someone tries to modify the generated code instead of fixing the F* source file.

Audit all the non-verified code

This blog post wouldn’t be complete without a few words about the bugs we found. Apart from a compiler bug, two amusing bugs occurred in hand-written code and are worth noting.

We had a somewhat puzzling stack corruption. The reason turned out to be an external header improperly defining a macro. Now for the details…

C99 provides fixed-width integer types ( uint32_t , etc.) via #include <inttypes.h> , along with corresponding macros for printing and scanning fixed-width integers via printf and scanf . We rely heavily on <inttypes.h> , but compiling for Windows, the header didn’t seem to be available by default. We did find, however a copy of this header somewhere, and used it to write a routine to convert a hex-encoded string into the corresponding byte array, using the SCNx8 macro for scanning a single hex-encoded byte into a temporary on the stack. Unbeknownst to us, the underlying C library had no runtime support for scanning a single byte, but this did not deter the author of the inttypes.h header we found from defining the macro nonetheless, scanning two bytes into the destination address, causing a buffer overrun. It looks like other people (e.g. Android) were careful not to define SCNx8 on Windows.

C99 provides fixed-width integer types ( , etc.) via , along with corresponding macros for printing and scanning fixed-width integers via and . We rely heavily on , but compiling for Windows, the header didn’t seem to be available by default. We did find, however a copy of this header somewhere, and used it to write a routine to convert a hex-encoded string into the corresponding byte array, using the macro for scanning a single hex-encoded byte into a temporary on the stack. Unbeknownst to us, the underlying C library had no runtime support for scanning a single byte, but this did not deter the author of the header we found from defining the macro nonetheless, scanning two bytes into the destination address, causing a buffer overrun. It looks like other people (e.g. Android) were careful not to define on Windows. We regularly audit our hand-written code using Clang’s sanitizers. This caught a few undefined behaviors. The most blatant one was for code reading 4 little-endian bytes, returning the corresponding uint32_t . The author of the initial implementation wrote return *((uint32_t *)src); , not realizing that in the case that src is not aligned, this is undefined.

We now have a fairly solid process for distributing code, gathering the artifacts of Project Everest in self-contained directories that any client can copy and integrate into their project. It took us a long way to get there, but we are now ready to scale up to multiple consumers of our code, an excellent news as we gear up towards more software releases, such as EverCrypt, which I shall cover in a later blog post.