A Study of Clight and its Semantics

I had been interested in CompCert for quite some times, and ultimately challenged myself to study Clight and its semantics. This write-up is the result of this challenge, written as I was progressing. Installing CompCert Problem Statement Proof Walkthrough Conclusion Revisions This revisions table has been automatically generated from the git history of this website repository, and the change descriptions may not always be as useful as they should. You can consult the source of this file in its current version here. Prepare the introduction of a RSS feed c62a61d Add missing revisions tables ca7a8d6 Add a new post about Clight and its semantics 7073116 Installing CompCert CompCert has been added to opam , and as a consequence can be very easily used as a library for other Coq developments. A typical use case is for a project to produce Clight (the high-level AST of CompCert), and to benefit from CompCert proofs after that. Installing CompCert is as easy as opam install compcert Once opam terminates, the compcert namespace becomes available. In addition, several binaries are now available if you have correctly set your PATH environment variable. For instance, clightgen takes a C file as an argument, and generates a Coq file which contains the Clight generated by CompCert. Problem Statement Our goal for this first write-up is to prove that the C function int add (int x, int y) { return x + y; } returns the expected result, that is x + y . The clightgen tool generates (among other things) the following AST (note: I have modified it in order to improve its readability). CompCert is a certified C compiler which comes with a proof of semantics preservation. What this means is the following: the semantics of the C code you write is preserved by CompCert compilation passes up to the generated machine code.I had been interested in CompCert for quite some times, and ultimately challenged myself to study Clight and its semantics. This write-up is the result of this challenge, written as I was progressing.CompCert has been added to, and as a consequence can be very easily used as a library for other Coq developments. A typical use case is for a project to produce Clight (the high-level AST of CompCert), and to benefit from CompCert proofs after that.Installing CompCert is as easy asOnceterminates, thenamespace becomes available. In addition, several binaries are now available if you have correctly set yourenvironment variable. For instance,takes a C file as an argument, and generates a Coq file which contains the Clight generated by CompCert.Our goal for this first write-up is to prove that the C functionreturns the expected result, that is. Thetool generates (among other things) the following AST (note: I have modified it in order to improve its readability).



From compcert Require Import Clight Ctypes Clightdefs AST

Coqlib Cop .



Definition _x : ident := 1% positive .

Definition _y : ident := 2% positive .



Definition f_add : function :=

{| fn_return := tint

; fn_callconv := cc_default

; fn_params := tint ) ; ( _y , tint ) ]

; fn_vars :=

; fn_temps :=

; fn_body := Sreturn

( Some ( Ebinop Oadd

( Etempvar _x tint )

( Etempvar _y tint )

tint ))

|}.



:= 1%:= 2%:={|:=:=:= [ ( _x , := [] := [] :=))|}.

function type are pretty self-explanatory (as it is often the case in CompCert’s ASTs as far as I can tell for now). Identifiers in Clight are ( positive ) indices. The fn_body field is of type statement , with the particular constructor Sreturn whose argument is of type option expr , and statement and expr look like the two main types to study. The predicates step1 and step2 allow for reasoning about the execution of a function , step by step (hence the name). It appears that clightgen generates Clight terms using the function call convention encoded by step2 . To reason about a complete execution, it appears that we can use star (from the Smallstep module) which is basically a trace of step . These semantics are defined as predicates (that is, they live in Prop ). They allow for reasoning about state-transformation, where a state is either A function call, with a given list of arguments and a continuation

A function return, with a result and a continuation

A statement execution within a function We import several CompCert modules to manipulate values (in our case, bounded integers). The fields of thetype are pretty self-explanatory (as it is often the case in CompCert’s ASTs as far as I can tell for now).Identifiers in Clight are () indices. Thefield is of type, with the particular constructorwhose argument is of type, andandlook like the two main types to study. The predicatesandallow for reasoning about the execution of a, step by step (hence the name). It appears thatgenerates Clight terms using the function call convention encoded by. To reason about a complete execution, it appears that we can use(from themodule) which is basically a trace of. These semantics are defined as predicates (that is, they live in). They allow for reasoning about state-transformation, where a state is eitherWe import several CompCert modules to manipulate(in our case, bounded integers).



From compcert Require Import Values Integers .

Import Int .





Putting everything together, the lemma we want to prove about f_add is the following.



Lemma f_add_spec ( env : genv )

( t : Events.trace )

( m m' : Memory.Mem.mem )

( v : val ) ( y z : int )

( trace : Smallstep.star step2 env

( Callstate ( Ctypes.Internal f_add )

[ Vint x ; Vint y ]

Kstop

m )

t

( Returnstate ( Vint z ) Kstop

: z = add x y .



) ( x m' ))

Proof Walkthrough We introduce a custom inversion tactic which does some clean-up in addition to just perform the inversion. We introduce a customtactic which does some clean-up in addition to just perform the inversion.



Ltac smart_inv H :=

inversion H ; subst ; cbn in *; clear H .





We can now try to prove our lemma.



Proof .





We first destruct trace , and we rename the generated hypothesis in order to improve the readability of these notes.



smart_inv trace .

rename H into Hstep .

rename H0 into Hstar .





Hstep : step1 env (Callstate (Ctypes.Internal f_add) [Vint x; Vint y] Kstop m) t1 s2 Hstar : Smallstep.star step2 env s2 t2 (Returnstate (Vint z) Kstop m') In other words, to “go” from a Callstate of f_add to a Returnstate , there is a first step from a Callstate to a state s2 , then a succession of steps to go from s2 to a Returnstate . We consider the single step , in order to determine the actual value of s2 (among other things). To do that, we use smart_inv on Hstep , and again perform some renaming. This generates two hypotheses.In other words, to “go” from aofto a, there is a first step from ato a state, then a succession of steps to go fromto aWe consider the single, in order to determine the actual value of(among other things). To do that, we useon, and again perform some renaming.



smart_inv Hstep .

rename le into tmp_env .

rename e into c_env .

rename H5 into f_entry .





f_entry : function_entry1 env f_add [Vint x; Vint y] m c_env tmp_env m1 Then, the Hstar hypothesis has been updated, because we now have a more precise value of s2 . More precisely, s2 has become State f_add (Sreturn (Some (Ebinop Oadd (Etempvar _x tint) (Etempvar _y tint) tint))) Kstop c_env tmp_env m1 Using the same approach as before, we can uncover the next step. This produces two effects. First, a new hypothesis is added to the context.Then, thehypothesis has been updated, because we now have a more precise value of. More precisely,has becomeUsing the same approach as before, we can uncover the next step.



smart_inv Hstar .

rename H into Hstep .

rename H0 into Hstar .





Hstep : step2 env (State f_add (Sreturn (Some (Ebinop Oadd (Etempvar _x tint) (Etempvar _y tint) tint))) Kstop c_env tmp_env m1) t1 s2 Hstar : Smallstep.star step2 env s2 t0 (Returnstate (Vint z) Kstop m') An inversion of Hstep can be used to learn more about its resulting state… So let’s do just that. The resulting hypotheses areAn inversion ofcan be used to learn more about its resulting state… So let’s do just that.



smart_inv Hstep .

rename H7 into ev .

rename v0 into res .

rename H8 into res_equ .

rename H9 into mem_equ .





res : val ev : eval_expr env c_env tmp_env m1 (Ebinop Oadd (Etempvar _x tint) (Etempvar _y tint) tint) res res_equ : sem_cast res tint tint m1 = Some v' mem_equ : Memory.Mem.free_list m1 (blocks_of_env env c_env) = Some m'0 Our understanding of these hypotheses is the following The expression _x + _y is evaluated using the c_env environment (and we know thanks to binding the value of _x and _y ), and its result is stored in res

is evaluated using the environment (and we know thanks to the value of and ), and its result is stored in res is cast into a tint value, and acts as the result of f_add The Hstar hypothesis is now interesting Hstar : Smallstep.star step2 env (Returnstate v' Kstop m'0) t0 (Returnstate (Vint z) Kstop m') It is clear that we are at the end of the execution of f_add (even if Coq generates two subgoals, the second one is not relevant and easy to discard). The generated hypotheses have becomeOur understanding of these hypotheses is the followingThehypothesis is now interestingIt is clear that we are at the end of the execution of(even if Coq generates two subgoals, the second one is not relevant and easy to discard).



smart_inv Hstar ; [| smart_inv H ].





We are making good progress here, and we can focus our attention on the ev hypothesis, which concerns the evaluation of the _x + _y expression. We can simplify it a bit further.



smart_inv ev ; [| smart_inv H ].

rename H4 into fetch_x .

rename H5 into fetch_y .

rename H6 into add_op .





fetch_x and fetch_y are the most important. fetch_x : eval_expr env c_env tmp_env m1 (Etempvar _x tint) v1 fetch_y : eval_expr env c_env tmp_env m1 (Etempvar _y tint) v2 The current challenge we face is to prove that we know their value. At this point, we can have a look at f_entry . This is starting to look familiar: smart_inv , then renaming, etc. In a short-term, the hypothesesandare the most important.The current challenge we face is to prove that we know their value. At this point, we can have a look at. This is starting to look familiar:, then renaming, etc.



smart_inv f_entry .

clear H .

clear H0 .

clear H1 .

smart_inv H3 ; subst .

rename H2 into allocs .





We are almost done. Let’s simplify as much as possible fetch_x and fetch_y . Each time, the smart_inv tactic generates two suboals, but only the first one is relevant. The second one is not, and can be discarded.



smart_inv fetch_x ; [| inversion H ].

smart_inv H2 .

smart_inv fetch_y ; [| inversion H ].

smart_inv H2 .





add . The two relevant hypotheses that we need to consider next are add_op and res_equ . They are easy to read. add_op : sem_binarith (fun (_ : signedness) (n1 n2 : Integers.int) => Some (Vint (add n1 n2))) (fun (_ : signedness) (n1 n2 : int64) => Some (Vlong (Int64.add n1 n2))) (fun n1 n2 : Floats.float => Some (Vfloat (Floats.Float.add n1 n2))) (fun n1 n2 : Floats.float32 => Some (Vsingle (Floats.Float32.add n1 n2))) v1 tint v2 tint m1 = Some res add_op is the addition of Vint x and Vint y , and its result is res . res_equ : sem_cast res tint tint m1 = Some (Vint z) res_equ is the equation which says that the result of f_add is res , after it has been cast as a tint value Surprisingly, neither add_op nor res_equ can be reduced easily… After some digging, I have found that this is because both rely on Archi.ptr64 , which is basically opaque as far as reduction strategies are concerned. Now, there are probably more elegant (idiomatic) way to deal with this issue, but a working approach is to unfold until Archi.ptr64 appears, then use the Archi.splitlong_ptr32 lemma to replace it with its value ( false ). We now know the values of the operands of. The two relevant hypotheses that we need to consider next areand. They are easy to read.Surprisingly, neithernorcan be reduced easily… After some digging, I have found that this is because both rely on, which is basically opaque as far as reduction strategies are concerned.Now, there are probably more elegant (idiomatic) way to deal with this issue, but a working approach is to unfold untilappears, then use thelemma to replace it with its value ().



unfold sem_binarith in *.

cbn in *.

unfold sem_cast in *.

cbn in *.

rewrite Archi.splitlong_ptr32 in *; auto .





We can now simplify add_op and res_equ , and this allows us to conclude.



smart_inv add_op .

smart_inv res_equ .

reflexivity .

Qed .



