I wanted to prove and deploy smart contracts that never break. So I implemented the Ethereum Virtual Machine (EVM) for interactive theorem provers and tested it against the standard VM Tests (my previous post). Now I can move forward building bytecode pieces and composing them in an interactive theorem prover (my roadmap).

Somebody asked me about the coverage of the VM Tests. Yes, I know how to measure code coverage in OCaml, so I measured it. Executable specifications are useful for finding the unsearched corner cases.

Many cases are not covered because I left unused functions; sometimes the tests in the VMTests directory do not cover what they could. There is some code “for-proofs-only for now” (though I should separate it).

Some of these corner cases are covered by other test suites in the same repository, for instance State Tests, which also validate EVM implementations. Maybe I should extend the Lem model to the whole EVM (GitHub issue) so that it can run more tests. Currently, it just models one invocation of one contract.

I list my findings. Below, when I say “the VM tests”, it means “the 40,669 VM tests that do not run multiple contracts.” This excludes 24 tests, whose coverage I could measure only up to the first call.

The VM tests did not try some instructions for insufficient numbers of stack elements (GitHub issues 132, 133, 135, 138).

No VM test tries CALL with insufficient funds in the calling contract (GitHub issue).

EXTCODE, BALANCE, SLOAD are never tried for block numbers bigger than the Spurious Dragon changes. So only the older gas calculation is tested (GitHub issue).

No VM test uses DELEGATECALL (GitHub issue), although there are many State Tests that use DELEGATECALL. I guess the State Test suite is the favorite nowadays.

The coverage suggests that no VM test uses CALLCODE in a successful manner, but the test case `callcodeToReturn1` seems to do it (GitHub issue).

There are some unused, leftover functions in the Lem model (GitHub issue).

The opcodes for STOP, RETURN, DELEGATECALL, and SUICIDE instructions were never looked up. No VM tests performed CODECOPY on these instructions (GitHub issue).

The coverage measurement should be working on the master branch. I will do something similar before we move to any new virtual machine on the main net.

I thank Sidney Amani for asking me about the coverage of the VM test suite.