image © Tkgd2007

Defining programs of a reasonable size

First, it is important to define what I mean by programs of reasonable size. If we look at the current state of affairs, yes, proving complete functional correctness of thousands of lines of code (LoC) is possible. We already have a verified OS kernel — seL4 and a verified C compiler — CompCert. However, the amount of effort that goes into verifying any non-trivial program is still enormous (in several human-years). For most applications it is impractical to devote such efforts, reserving the method of formal verification to only critical systems.

Often, the applications we do want to secure consist of hundreds of thousands of LoCs for which complete functional verification is impractical. Many have resorted to techniques based on static analysis; while static analysis can scale, it is limited to proving simpler properties like memory safety. Infer is an example of such an analyzer that can scale well, leveraging separation logic to enable local reasoning.

In general, we can prove large theorems about small pieces of code and we can prove small theorems about large pieces of code. But what about everything in between? Is it possible to prove interesting theorems about a reasonable sized code base?

Yes. One way to do this bottom up: i.e. scale existing methods that work for smaller code bases to larger programs and systems. This means: more automation in formal verification, better tooling, more expressive decision procedures, etc. Alternatively, another approach is top-down — i.e. get better at (and more diligent about) patching or updating your software and components. Here at SourceClear, we’re OCD about tracking and staying up-to-date on the latest safe versions of all 3rd party components used in our applications.

However, these methodologies are pretty nascent and progressive. Not much work has been done in building secure systems at scale from the middle out. In the rest of the article, I outline a scalable method to secure reasonable sized applications across the entire supply chain.

Building security from the middle out

In order to understand how we can build security from the middle out, we need to first lay out two points about software development:

A large application is rarely developed all at once, instead it has evolved over a period of time through agile methodologies, and small changes.

The notion of correctness for a program is often not fixed or known a priori; instead what ends up happening is that while building the application we observe some useful program behaviors. Those are then deemed to be correct. (Imagine you are given a legacy code base with no documentation, the first thing you do is to run it and see what it does, then you form a mental picture of what is “correctness” for that program.)

If you agree with the above points, then it is clear that security needs to be thought-out as a property of not just a single version of a program but an emergent behavior of software evolution. As we build new versions of software, we want them to be secure and “correct” without hindering the speed of development.

“It is clear that security needs to be thought-out as a property of not just a single version of a program but an emergent behavior of software evolution.”

Thus, building tools and techniques that apply to change-sets, patches or diffs is of utmost importance as it can help secure many useful programs. However, there is very limited work in this area that takes these two points seriously. Just simply applying existing top-down or bottom-up approaches actually makes the problem much harder. Thus, there is a need for approaches that analyze changes between versions of a program.

Relational verification is one such technique that attempts to verify one version of the program with respect to another. It applies formal methods to prove properties about two runs or two different versions of the same program. And as it turns out, it is actually harder to prove properties about two programs compared to a single one. This has to do with the difficulty of composing two versions of the program so that they can be analyzed in the same way as we analyze a single program. Even regression verification, i.e. proving equivalence of programs is very difficult to automate in practice and undecidable in general.

Analyzing software changes for ‘correctness’

Given that the notion of “correctness” is not fixed and the expected behavior of a program is a moving target, here is a scheme you can use to analyze software changes:

Assume that the old version of program is “correct” for some version of “correctness” Analyze only the new patches, diffs or changes made to ensure they maintain the “correctness” If the new changes do not preserve “correctness” then either we have a regression in software, i.e. we introduced a bug or we need to update our notion of “correctness” since the updates introduce new behaviors.

Now there are two things that are needed for this to work. First we need to be able to capture the notion of “correctness”, and second, it should be possible to check for preservation of “correctness” by just analyzing the updates or diffs. This is certainly possible for diffs/patches generated by typical version control systems with properties expressed as regular expressions.

Using linear diffs and regular expressions

A patch or a diff as produced by a typical version control system like git for text files can be expressed with just two operations: Add(Line), and Remove(Line). Changes or updates made to an existing line can also be represented as Remove(OldLine) followed by Add(NewLine). Interesting properties about the program can be expressed as a set of regular expressions (say R).

Now, the patch or diff preserves “correctness” if and only if

(1) All the lines in Add(Line) operations in the patch match with all the regular expressions in R

(2) None of the lines in Remove(Line) operations in the patch match with any of the regular expressions in R

Checking for (1) and (2) can be done easily using a tool like grep and we now have a way of distinguishing changes or updates that preserve existing behavior with those that do not. Of course regular expressions are not expressive enough to capture program correctness but the approach here is to apply a set of security related rules on commits and analyze them incrementally. We used this approach to build Commit Watcher — a project that enables you to find interesting and potentially hazardous commits in git projects.

In order to capture more properties that are closer to program correctness, we need to extend the approach in two different ways. First we need to compute diffs that have more structural information and are not just linear. Second we need to use a language that is more expressive than regular expressions.

The next evolution: using tree diffs and computation tree logic (CTL)

Static analyzers usually operate on the AST to validate structural properties. Instead of using linear diffs we can compute the diff/patch information as it applies to ASTs. In particular, from a given patch we can compute the tree edit distance (TED) in the form of insert, remove and update operations on the underlying trees representing the AST of the program. Once we can generate tree diffs the next questions is how do we express “correctness”.

Computation tree logic (CTL) is a logic that can express properties on tree-like structures. It is commonly used in model checking; however we’ve applied it to express arbitrary structural properties over ASTs. It is a shame that is not used as much for specifying the rules of static analyzers. Only recently, Facebook introduced AL, a declarative language that uses CTL to define rules for the infer static analyzer.

CTL Formula F1 holds until F2

With tree diffs for capturing patch/update information and a more expressive logic like CTL, it is not that straightforward to check if changes made to the program do indeed preserve “correctness” . Instead of trying to prove it here, I give the following conjecture for preservation of patch “correctness” which I believe to be true at least in certain cases.

Conjecture: There exists an algorithm to check for preservation of CTL properties by checking them on the insert, remove and update operations of the tree diff between the ASTs.

Assuming the conjecture is valid we have a way of checking if a patch or diff preserves correctness (as expressed in CTL). CTL is expressive enough to capture interesting safety and liveness properties that are more realistic; checking them continuously and incrementally will increase the confidence in the security of the application under development.

As we see going from linear to tree-like structures increases the expressiveness of the analysis. It remains to be seen if such an approach can also be extended to graph-like structures and data-flow properties that are even more expressive. Such an extension will allow us to check complex inter-procedural or inter-file properties of programs incrementally. At this point, I still consider it to be an open problem.

To conclude, I outlined an approach in this article that is different in character and spirit from existing top-down and bottom-up methods of securing programs. The key idea is to operate on an abstraction that is based on updates, patches, or changes and formulate properties of them. Doing so can help improve scalability and enable us to check these properties incrementally, fitting nicely into the modern software development process based on CI/CD pipelines. Continuous verification thus has the potential revolutionize application of formal methods to software development in the same way continuous integration did for testing.