Login: Password: Remember Me Register Blogs >> RolfRolles 's Blog

Created: Friday, December 4 2009 01:41.32 CST Modified: Friday, December 4 2009 01:57.35 CST Printer Friendly ... VinE's OCaml Programming Tricks: Explicit Continuation-Passing Style Author: RolfRolles # Views: 43113

There are many books devoted to languages like C++ or Python, and technical blogs are teeming with further information on these subjects. Languages like OCaml have much smaller communities, with very few dedicated books and technical blogs. As far as I've put together, we OCaml programmers mainly learn from reading one anothers' code. Functional programmers must stick together.



VinE is the static analysis component of the BitBlaze binary analysis project, it is written in OCaml, and it contains some tricks that I haven't seen elsewhere. For the sake of contributing to their dissemination throughout the community, I would like to detail some of them here. These might be obvious to everyone already, or maybe no one cares. Feedback is welcome.



The following discussion concerns /ocaml/wp.ml, the weakest precondition calculator. A weakest precondition (hereinafter WP) is the most general condition under which execution reaches some given point with some desired post-condition (i.e. an execution trace plus some arbitrary condition that must hold at the end of the trace, e.g. "this array write at this location goes out of bounds"). WPs compare favorably to forward symbolic execution in terms of formula size, and comprise an attractive choice for enacting the primitive "cause execution to reach a particular line in a particular state", if loop unrolling is palatable. This entry concerns an OCaml programming technique found in VinE's WP calculator, and does not seek to discuss WPs themselves. For further reading, read a book on formal semantics (the section on axiomatic semantics) or Dijkstra's original work in the area from the early 1970s.



One issue with functional programming is that, if recursive functions are not written in a tail-recursive style, they have the potential to overflow the stack, thereby causing the program to abort. One learns to be judicious about writing tail-recursive functions, but alas, it is impossible to naively convert some functions into such a form; care is needed. For example, if the recursive function needs to call itself to produce a value, and then perform some operation on the result before returning it, then the recursive invocation is not in the tail position and cannot be optimized into a jump by the compiler. Therefore, the function will consume stack space for each recursive invocation, leading to an upper bound on the size of the data that can be processed before a stack overflow exception is thrown.



Consider the following snippet, which is the aforementioned weakest precondition calculator written as a straightforward recursive pattern match:



let rec wp q = function

| Skip -> q

| Assume e -> exp_implies e q

| Choice(s1, s2) -> let r1 = wp q s2 in

let r2 = wp q s1 in

exp_and r1 r2

| Seq(s1, s2) -> let r1 = wp q s2 in

wp r1 s1

| Assign(t, e) -> Let(t, e, q)

| Assert e -> exp_and e q



In the case of the Choice statement, the weakest precondition for both choices (variables r1 and r2) must be calculated before they can be conjoined, and therefore these calls can not be made tail-recursive. Seq is similar. How do we write this function so it does not crash in the presence of large data sets?



One answer is "explicit continuation-passing style", as seen in the following snippet:



let calculate_wp (simp:Gcl.exp->Gcl.exp) (post:Gcl.exp) (stmt:Gcl.t) =

let rec wp q s k =

match s with

| Skip -> k q

| Assume e -> k (simp(exp_implies e q))

| Choice(s1, s2) -> wp q s1 (fun x -> wp q s2 (fun y -> k(simp(exp_and x y ))))

| Seq(s1, s2) -> wp q s2 (fun x -> wp x s1 k)

| Assign(t, e) -> k(simp(Let(t, e, q)))

| Assert e -> k (simp(exp_and e q))

in

wp post stmt (fun x->x)



We have introduced a new argument to wp called "k", the "continuation", which means "the rest of the computation". We can see that the calls to wp in the Choice and Seq cases are now in the tail position, and hence can be optimized directly into jumps, resulting in constant stack depth and no more crashes. The cases other than Choice and Seq have been modified to apply the continuation to the result of their portion of the weakest precondition calculation.



But what exactly is going on here? The compact representation yields little insight into how the WP is actually calculated step-by-step. Thus, let's take a look at an example GCL statement and the computation of its WP:



wp q Seq(Skip,Seq(Assume expr1,Assert expr2)) (fun x -> x) (* Initial call to wp with arbitrary post-condition q *)



wp q (Seq(Assume expr1,Assert expr2)) (fun x -> wp x Skip (fun x->x)) (* Grow the continuation because of the Seq *)



wp q (Assert expr2) (fun x -> wp x (Assume expr1) (fun x -> wp x Skip (fun x->x))) (* Grow the continuation because of the Seq *)



(fun x -> wp x (Assume expr1) (fun x -> wp x Skip (fun x->x))) (exp_and expr2 q) (* Shrink the continuation to process the assert *)



wp (exp_and expr2 q) (Assume expr1) (fun x -> wp x Skip (fun x->x)) (* Process the assume *)



(fun x -> wp x Skip (fun x->x)) (exp_implies expr1 (exp_and expr2 q)) (* Shrink the continuation to because of the assume *)



wp (exp_implies expr1 (exp_and expr2 q)) Skip (fun x->x) (* Process the skip *)



(fun x -> x) (exp_implies expr1 (exp_and expr2 q)) (* Shrink the continuation because of the skip *)



(exp_implies expr1 (exp_and expr2 q)) (* Final WP *)



The first three steps show the expansion of the continuation due to the Seq statements in the input; the rest of the steps show the gradual shrinkining of the continuation by processing the statements inside of the Seq statements. Essentially, we have traded stack space for the activation records of wp for heap space for the continuation, and in the process have produced a tail-recursive function that does not consume stack space.



From here, we pass the propositional formula (a VinE expression) off to a theorem prover and receive an input that causes execution to reach the specified state with the specified condition. However, this is once again beyond the scope of this entry.



Blog Comments mayhem Posted: Saturday, December 5 2009 23:08.33 CST The topic of software verification is very interesting and complex, even if slightly hors subject for openrce :)



Some notes could have helped the most novice audience to comprehend this. For example, one sentence to explain that the first code snip is actually a matching (or "type switch") on the anonymous function waited in second parameter. Also, the undefined "simpl" function is (I assume) just a helper function performing pre-defined formula simplication during the computation of the WP, explaining it more in depth would have been even more HS however.



Beside this, great post! Thanks for sharing!



Btw, ocaml newbies can check out ocaml.inria.fr for the complete reference manual, or www.ocaml-tutorial.org for interesting examples to start up with.



-mayhem

RolfRolles Posted: Monday, December 7 2009 03:06.57 CST Hey mayhem,



Thanks for the comments; I'm glad somebody understood :-) You are correct on the second point; simp just simplifies the expression as it is being built (in Binary Analysis Platform, this defaults to id, e.g. fun x -> x). On the first point, actually it is impossible to pattern-match on a closure, which is opaque. The "= function" syntax



let rec wp q = function



is equivalent to



let rec wp q s = match s with .



It's just terser and more explicit, so I like it better. The second argument is of type Gcl.t, like the corresponding parameter for the second version. Sorry for my sloppy editing having caused confusion.

mayhem Posted: Monday, December 7 2009 04:02.17 CST Thanks for the precision. I couldnt find the meaning of the = fucntion | syntax in the manual (but finally found it in wikipedia). Actually I got confused with the "fun ->" syntax which is to make use of an anonymous function, while "function |" seems indeed only to be a syntactic sugar for a function defined as a pattern matching, with parameter names and types left implicit.



-m

AdultSwim Posted: Wednesday, March 3 2010 07:22.20 CST Good post. I totally agree with the first paragraph. There aren't many good sources out there about how to be a good functional programmer (you can't just go out and buy "Functional Design Patterns for Dummies").

mrrusof Posted: Monday, April 12 2010 03:46.17 CDT Disregarding what simp does, for the cps version to yield the same exp_and term as the non-cps, should the match for choice be:



| Choice(s1, s2) -> wp q s2 (fun x -> wp q s1 (fun y -> k(simp(exp_and x y ))))



?



-mrrusof

RolfRolles Posted: Wednesday, May 5 2010 14:26.52 CDT Sorry for the slow response; I'm on a mental vacation. If there is a mistake in the above code, it's in the non-CPS version, which is attributable to me. The CPS version comes straight from the VINE release and is more likely to be correct than my rendition.





