[This is the first part of a two-post series. The second part is here.]

In previous posts on this blog, I have talked about how research on programming languages and tools can contribute to K-16 education. In this post, I will share with you some of my experiences while trying to use a PL tool — specifically, a semi-automated program verifier — in an undergraduate algorithms class. The experience was a success in many ways. A program verifier gives students a hands-on understanding of how algorithms are mathematical objects, and verification tools are now mature enough for productive classroom use. I think every CS program would benefit students by introducing them to these tools at some point. At the same time, the experience exposed me to some serious limitations of current-day program verification, and opened up a number of directions for new research in this area.

Teaching proofs with Dafny

The undergrad course that I teach at Rice is called “Reasoning about algorithms”. In many ways, the class is a standard-issue algorithms course: it teaches algorithm design techniques like dynamic programming, greedy optimization and randomization, and elementary complexity theory. But a distinctive feature of the class is its heavy emphasis on proof techniques. I want my students to develop an intuition for rigorous thinking, and learn different ways to formally reason about the correctness and efficiency of algorithms.

One thing that I have wanted to do, ever since I took on this course, is to teach proofs through appropriate PL tools. After all, reasoning about programs is one of the main preoccupations of PL research. The tools that best fit my class are systems for semi-automated program verification. These systems allow programmers to write propositions and proofs just like they would write programs. Proofs are checked for correctness just like programs would be checked for type safety.

Specifically, my class from the past spring used the Dafny program verifier (here’s a tutorial). At one level, Dafny is a programming language with imperative and first-order functional constructs. However, it differs from mainstream languages in allowing logical annotations such as preconditions and postconditions, ranking functions, and invariants for loops and recursive calls. In effect, through these annotations, the programmer embeds a purported proof of correctness within their code. Dafny verifies that these annotations form a real proof.

For example, a Dafny function to compute the n-th Fibonacci number looks like this:

function Fibonacci(n: int): int requires n >= 0; ensures Fibonacci(n) >= 0 decreases n; { if n == 0 then 0 else if n == 1 then 1 else Fibonacci(n - 2) + Fibonacci(n - 1) }

Here, the “requires” clause defines a precondition on the inputs of the function. The “ensures” clause is a postcondition that is asserted when the function terminates. Dafny checks automatically that in any given level of recursion, the precondition implies the postcondition. Inductively, the annotations imply that the postconditions hold on all terminating executions.

To prove that the function always terminates, Dafny uses the “decreases” clause in the program. This annotation defines a ranking function whose value is supposed to be bounded from below and also to decrease at every recursive call. If the ranking function satisfies these conditions (Dafny checks automatically that it does), then the program must terminate eventually on all inputs.

You can also write an imperative definition of Fibonacci in Dafny, along with annotations that prove that it is equivalent to the functional version above:

method Compute_Fib(n: int) returns (x: int) requires n >= 0; ensures x == Fibonacci(n); { var i := 0; x := 0; var y := 1; while (i < n) invariant n >= i && i >= 0; invariant y == Fibonacci(i + 1); invariant x == Fibonacci(i); { x, y := y, x + y; i := i + 1; } }

Here, the “ensures” clause states that the return value x of the procedure equals Fibonacci(n). The while-loop in the program is annotated with a set of loop invariants that form an inductive proof of this claim. These invariants hold at the point when the loop is entered (the base case of the induction), are preserved by every loop iteration (the inductive case), and imply the postcondition upon exit from the loop.

Why program verifiers belong to undergrad CS

I am a big fan of Dafny, and I think every CS program should consider exposing their students to Dafny or a system like it at some point. Students who are new to proofs often struggle to understand what exactly it is that makes a proof “formal”. A program verifier offers an immediate answer to this question. Every computer science student has written code since at least her first semester in college, and is familiar with the precision that coding demands. By using a program verifier, the student gets to extend this sense of precision to proofs.

Program verifiers illustrate one of the most basic ideas in theoretical CS: that an algorithm is a formal, mathematical artifact. In particular, systems like Dafny are excellent tools for illuminating the usefulness of induction in computing. Take those problems in discrete math where you use induction to prove that a series sums up to a certain expression. A system like Dafny allows a student to code up the series as a simple program and to experiment with various induction hypotheses within the same programming environment. This experience obliterates the distinction between mathematical objects (for example a series) and programs, and that between proofs in discrete math and proofs of algorithms. Now the student can go on to understand, once again in an experiential way, that the same principles of induction lie behind proofs of more complex algorithms that manipulate recursive data structures or have complex loops or recursive calls.

There are reasons why Dafny fits my needs better than other verification tools or proof assistants. First and foremost, Dafny checks proofs fully automatically. Following the laws of Hoare logic, Dafny compiles a proof task into a verification condition, or a logical formula that is valid if and only if the user-written annotations constitute a proof. It checks the validity of this formula using an SMT solver. This means that the student can avoid tedious, low-level formula manipulation while doing proofs.

Second, Dafny’s programming language closely matches languages with which all CS students are already familiar. This significantly lowers the barrier to entry.

Third, Dafny is widely available. In particular, Dafny is available through a web interface that undergrad students are likely to find very useful. Go check it out if you haven’t already!

Limitations

As much as I am a Dafny enthusiast, I only use it in a small part of my course that concerns induction and formal proofs of program correctness. I would like to do much more. I would like to use Dafny or its cousins for teaching a wide variety of proof techniques, from complexity analysis to exchange arguments in correctness proofs of greedy algorithms to analysis of randomized algorithms using Chernoff bounds. Unfortunately, the state of the art of program verification doesn’t permit this. For instance, something as simple and mechanical as big-Oh complexity analysis of divide-and-conquer algorithms is beyond the scope of all publicly available verification tools. Also, automated program verifiers do not do well with programs that manipulate unbounded, mutable data structures (for example graphs or even arrays). In fact, in my class, I only use Dafny for programs that operate on numbers and functional data structures.

Some of these limitations can be easily fixed: for example, Rustan Leino and I have talked about extending Dafny with some forms of reasoning about complexity. A more fundamental issue is the giant abstraction gap between the level at which humans think about proofs, and their formal, machine-checkable versions. For example, a correctness proof for Dijkstra’s shortest-paths algorithm that deserves full credit in an undergrad class would fit in half a page. A machine-checked proof of Dijkstra’s algorithm is a research paper.

This is not a cause for despair; if anything, these limitations are opportunities for exciting new research. In my next post, I will explore this topic in more detail, so stay tuned.