Functional Programming Unit Testing - Part 4 Sunday, December 21, 2008

In our previous installment, we talked about bringing together the traditional xUnit tests and QuickCheck property-based tests together in a single cohesive step. For this installment, let's talk about test coverage.

But, before we continue, let's get caught up to where we are today:

Code Coverage

Code coverage is an important metric used as part of our design process to describe to what degree our source code has been tested. The code coverage tools inspect the code directly as a form of white box testing of your code. I believe having a high code coverage percentage is important, although such hard-line stances of 100% path code coverage required is most often unnecessary and is evil. However, for some applications, such as safety-critical, some form of 100% coverage should be considered.

What do we consider as part of the criteria when we're calculating code coverage?

Function coverage

All functions in the program called?



All functions in the program called? Statement coverage

All lines in the program called?



All lines in the program called? Branch coverage

All control structures such as if/then/else evaluated to true and false?



All control structures such as if/then/else evaluated to true and false? Condition coverage

All boolean sub-expressions evaluated to true and false?



All boolean sub-expressions evaluated to true and false? Path coverage

All possible routes through the program called?



All possible routes through the program called? Entry/exit coverage

All possible call and return of the function executed?



Of course some of these are connected in some way such as the following:

Decision coverage includes statement coverage since exercising every branch must lead to exercising every statement.

Path coverage includes branch coverage.

Where should we focus? Using such things as statement coverage, decision coverage, and/or condition/branch coverage, around 80-90% of code coverage would suffice. Getting to 100% test code coverage is unrealistic and doesn't always ensure quality, and the amount of energy required for this is wasteful. The number we're looking for is somewhere greater than 80%.

We can use above metrics to determine how well we're writing our tests for our applications. For many algorithms, it's important to ensure that we have our edge cases covered, especially those in safety-critical systems. Let's walk through an example in Haskell for code coverage.

Code Coverage with Haskell Program Coverage (HPC)

The Haskell Program Coverage (HPC) tool is a built-in extension to the Haskell compiler used to record and display the parts of the code that were executed during a run of your program. With the criteria given above, we are able to record which functions, branches, expressions among other things were evaluated.

The HPC tool is designed to give you the following metrics:

Expressions used (Function coverage)

Boolean coverage Guard coverage if confitions Qualifiers

Alternatives used

Local declarations used

Top-level declarations used

Let's walk through an example of how to use this tool to your advantage. In the previous post, I've shown some QuickCheck code that doesn't give 100% code coverage so that I can show you how to better achieve it. Let's look at the example again.

First, let's look at the implementation of the ROT13 algorithm again:

--file Encryption.hs

module Encryption(rot13) where



import Data.Char



rot13 :: String -> String

rot13 =

map mapRot

where mapRot :: Char -> Char

mapRot c | c >= 'A' && c <= 'Z' = rot 'A' c

| c >= 'a' && c <= 'z' = rot 'a' c

| otherwise = c

rot :: Char -> Char -> Char

rot b c = chr $ (ord c - ord b + 13) `mod` 26 + ord b

Now, let's look at our QuickCheck property-based tests to perform to ensure the correctness of our algorithm.

-- file EncryptionTests.hs

import Data.Char

import Data.List

import Encryption

import Test.Framework

import Test.Framework.Providers.QuickCheck

import Test.QuickCheck



instance Arbitrary Char where

arbitrary = elements ([ 'A' .. 'Z' ] ++ [ 'a' .. 'z' ])



-- Equal

prop_rot13_equals s =

rot13 s == rot13 s



-- Single is inequal to original

prop_rot13_single_notEquals s =

rot13 s /= s



-- Double is equal to original

prop_rot13_double_equals s =

(rot13 . rot13) s == s



-- Distribution shapes should be equal

prop_rot13_group_equals s =

getDistro s == getDistro (rot13 s)

where getDistro = sort . map length . group . sort



tests = [

testGroup "ROT13 Tests" [

testProperty "prop_rot13_equals" prop_rot13_equals,

testProperty "prop_rot13_single_notEquals" prop_rot13_single_notEquals,

testProperty "prop_rot13_double_equals" prop_rot13_double_equals,

testProperty "prop_rot13_group_equals" prop_rot13_group_equals]

]



main = defaultMain tests

In order for us to capture the test coverage data from HPC, we need to add the -fhpc flag to the command-line for compiling our tests such as this:

>ghc -fhpc EncryptionTests.hs -- make

After instrumenting the code, we then run our code in order to capture the results. You may have noticed that it created a .hpc folder with a .mix file. When we run our code, we get the following results as usual.

>EncryptionTests

ROT13 Tests:

prop_rot13_equals: [OK, passed 100 tests]

prop_rot13_single_notEquals: [OK, passed 100 tests]

prop_rot13_double_equals: [OK, passed 100 tests]

prop_rot13_group_equals: [OK, passed 100 tests]



Properties Total

Passed 4 4

Failed 0 0

Total 4 4



You will also note that it created a .tix file which captures the actual code coverage metrics. Let's now analyze the results of our run:

>hpc report encryptiontests

97% expressions used (95/97)

33% boolean coverage (1/3)

33% guards (1/3), 1 always True, 1 unevaluated

100% 'if' conditions (0/0)

100% qualifiers (0/0)

66% alternatives used (2/3)

100% local declarations used (3/3)

100% top-level declarations used (8/8)

Analyzing the results, we realize we've made a mistake. If you look back at our Arbitrary Char instance, we're only using alphabetic characters. The problem arises is that we're not testing a portion of our rot13 function which takes a character that isn't alphabetic. But, when we change this, we have to be mindful that our tests will have to change as well. Why? Because the inequality check will not be successful if there are not letters involved. Let's make some changes and then check the results again.

instance Arbitrary Char where

arbitrary = elements ([ 'A' .. 'Z' ] ++ [ 'a' .. 'z' ] ++ "!@#$%^&*()" )



-- Single is inequal to original

prop_rot13_single_notEquals s =

any isAlpha s ==> rot13 s /= s

Now we can recompile our code once again as we did above and do the run once more.

>hpc report encryptiontests

100% expressions used (99/99)

66% boolean coverage (2/3)

66% guards (2/3), 1 always True

100% 'if' conditions (0/0)

100% qualifiers (0/0)

100% alternatives used (3/3)

100% local declarations used (3/3)

100% top-level declarations used (8/8)

Much better! Now we have 100% coverage on our ROT13 implementation. We also have the ability to dig deeper into the analysis through the use of the markup command. This will generate web pages which contain drill-down information about our code metrics. Below is a sample screen shot of my final results of my last run.

This tool is quite powerful for the code analysis we need to ensure that we're writing the right kind of tests for our specifications and implementations. Now, let's turn our attention to the F# world. What options do we have?

Code Coverage with TestDriven.NET and NCover

Once again, the TestDriven.NET addition to F# saves us once again when it comes to code coverage. With the integration of NCover, we have the ability to perform rather rich analytics on our code much like above using HPC. Let's take the code from the previous post and look at the relevant parts.

#light



namespace CodeBetter . Samples



module EncryptionTests =

open System

open FsCheck

open FsCheck . Generator

open Xunit



open Encryption

open ListExtensions

open FsCheckExtensions



type CharGenerator =

static member Chars =

elements ( [ 'A' . . 'Z' ] @

[ 'a' . . 'z' ] )



overwriteGenerators ( typeof<CharGenerator> )



let prop_rot13_equals s =

propl ( rot13 s = rot13 s )



[< Fact >]

let test_prop_rot13_equals ( ) =

check config prop_rot13_equals



let prop_rot13_double_equals s =

propl ( ( rot13 >> rot13 ) s = s )



[< Fact >]

let test_prop_rot13_double_equals ( ) =

check config prop_rot13_double_equals



let prop_rot13_single_notEquals s =

propl ( rot13 s <> s )



[< Fact >]

let test_prop_rot13_single_notEquals ( ) =

check config prop_rot13_single_notEquals



let prop_rot13_group_equals s =

let getDistro = ListExtensions . defaultSort >>

ListExtensions . group >>

List . map List . length >>

ListExtensions . defaultSort

propl ( getDistro s = getDistro ( rot13 s ) )



[< Fact >]

let test_prop_rot13_group_equals ( ) =

check config prop_rot13_group_equals

In order to get the code metrics we need, simply right-click on the project and click Test With => Coverage. This will bring up NCover explorer. We can then browse our results to once again see our mistake.

Now that we realize our mistake of not including normal characters, let's make two changes. First, let's remove the char generator because the default should suffice. Unlike the Haskell version, FsCheck comes with an arbitrary char instance already created. Also, let's ensure the success of the prop_rot13_single_notEquals function by ensuring that it contains at least one letter such as the following:

let prop_rot13_single_notEquals s =

List . exists Char . IsLetter s ==>

propl ( rot13 s <> s )

This ensures that if we have at least one letter, we can ensure that the ROT13 transformation will make sure the two strings are not equal. We can now prove our success by once again running the Test With => Coverage option and see the results as below.

Conclusion

Tools such as NCover and the Haskell Program Coverage tool, it can ensure our honesty when it comes to tests, and we get a glaring reminder when we don't. These tools, when combined with our traditional xUnit and property-based tests with saturation test generation can be a satisfying experience. We've now covered the creation and combination of traditional xUnit tests with property-based tests and how to leverage code coverage as a tool for refining. There is still more to be covered in this series which includes refactoring.