Modularity from Lazy Evaluation - Performance Testing

This is the second example of how higher-order functions and lazy evaluation can reduce complexity and lead to more modular software.

To performance test code a number of iterations have to be performed so a more stable average can be compared. The number of iterations is usually an input and chosen arbitrarily.

This post covers how statistical tests can be used to remove the need for this input. This simplifies performance testing and makes the results more robust and useful.

Lazy evaluation can be used to produce an online sample statistics sequence.

The standard error of a sample mean is given by

\[SE_{\bar{x}} = \frac{s}{\sqrt{n}}\]

The statistics sequence can be iterated until a given mean standard error level of accuracy is achieved.

Welch's t-test for comparing the mean of two samples is given by

\[t = \frac{\bar{x}_1-\bar{x}_2}{\sqrt{f_1+f_2}}\]

\[df = \frac{\left(f_1+f_2\right)^2}{\frac{f_1^2}{n_1-1}+\frac{f_2^2}{n_2-1}}\]

\[f_i = \frac{s_i^2}{n_i}\]

where \(n\) is the sample size, \(\bar{x}\) is the sample mean, \(s^2\) is the sample variance, \(t\) is Welch's t statistic, and \(df\) is the degrees of freedom of the statistic.

Welch's t statistic can be compared to the inverse Student's t-distribution for a given confidence level to test if the sample means are different.

Two sample statistics sequences can be iterated together and compared to decide if one mean is larger than the other.

type SampleStatistics = { N : int ; Mean : float ; Variance : float } member s . StandardDeviation = s . Variance member s . MeanStandardError = ( s . Variance / s . N ) type WelchStatistic = { T : float ; DF : int } module Statistics = /// Online statistics sequence for a given sample sequence. let inline s = let ( n , m , s ) x = let m' = m + ( x - m ) / ( n + 1 ) n + 1 , m' , s + ( x - m ) * ( x - m' ) Seq . map s |> Seq . scan ( 0 , 0.0 , 0.0 ) |> Seq . skip 3 |> Seq . map ( fun ( n , m , s ) -> { N = n ; Mean = m ; Variance = s / ( n - 1 ) } ) /// Scale the statistics for a given underlying random variable change of scale. let f s = { s with Mean = s . Mean * f ; Variance = s . Variance * f } /// Single iteration statistics for a given iteration count and total statistics. let ic s = { N = s . N * ic ; Mean = s . Mean / ic ; Variance = s . Variance / ic } /// Student's t-distribution inverse for 0.1% confidence level by /// degrees of freedom. let private tInv = [| 636.6 ; 31.60 ; 12.92 ; 8.610 ; 6.869 ; 5.959 ; 5.408 ; 5.041 ; 4.781 ; 4.587 ; 4.437 ; 4.318 ; 4.221 ; 4.140 ; 4.073 ; 4.015 ; 3.965 ; 3.922 ; 3.883 ; 3.850 ; 3.819 ; 3.792 ; 3.768 ; 3.745 ; 3.725 ; 3.707 ; 3.690 ; 3.674 ; 3.659 ; 3.646 ; 3.633 ; 3.622 ; 3.611 ; 3.601 ; 3.591 ; 3.582 ; 3.574 ; 3.566 ; 3.558 ; 3.551 ; 3.544 ; 3.538 ; 3.532 ; 3.526 ; 3.520 ; 3.515 ; 3.510 ; 3.505 ; 3.500 ; 3.496 ; 3.492 ; 3.488 ; 3.484 ; 3.480 ; 3.476 ; 3.473 ; 3.470 ; 3.466 ; 3.463 ; 3.460 ; 3.457 ; 3.454 ; 3.452 ; 3.449 ; 3.447 ; 3.444 ; 3.442 ; 3.439 ; 3.437 ; 3.435 ; 3.433 ; 3.431 ; 3.429 ; 3.427 ; 3.425 ; 3.423 ; 3.421 ; 3.420 ; 3.418 ; 3.416 ; 3.415 ; 3.413 ; 3.412 ; 3.410 ; 3.409 ; 3.407 ; 3.406 ; 3.405 ; 3.403 ; 3.402 ; 3.401 ; 3.399 ; 3.398 ; 3.397 ; 3.396 ; 3.395 ; 3.394 ; 3.393 ; 3.392 ; 3.390 |] /// Welch's t-test statistic for two given sample statistics. let s1 s2 = let f1 = s1 . Variance / s1 . N let f2 = s2 . Variance / s2 . N { T = ( s1 . Mean - s2 . Mean ) / ( f1 + f2 ) DF = if f1 = 0.0 && f2 = 0.0 then 1 else ( f1 + f2 ) / ( f1 / ( s1 . N - 1 ) + f2 / ( s2 . N - 1 ) ) |> } /// Welch's t-test for a given Welch statistic to a confidence level of 0.1%. let w = if w . T < Array . get tInv ( w . DF ( Array . length tInv ) - 1 ) then 0 else w . T

Three performance metrics will be created: time, memory allocated, and garbage collections.

Each of these will be repeated until at least the metric target is reached to ensure it is reliably measuring the metric and not any framework overhead.

Statistics functions will be created for each of the metrics to measure a function accurate to a mean standard error of 1%.

Compare functions will be created for each of the metrics to compare two functions to a confidence level of 0.1%. The functions will be considered equal after 10,000 degrees of freedom have past without a positive test result.

module Performance = /// Find the iteration count to get to at least the metric target. let inline private metricTarget f = let rec n = let item = n f if ( item <<< 3 ) > = metricTarget then n * metricTarget / item + 1 else ( n * 10 ) 1 /// Create and iterate a statistics sequence for a metric until the given accuracy. let inline private ( , metricTarget ) relativeError f = let ic = metricTarget f Seq . initInfinite ( fun _ -> ic f ) |> |> Seq . map ( ic ) |> Seq . find ( fun s -> s . MeanStandardError <= relativeError * s . Mean ) /// Create and iterate two statistics sequences until the metric means can be /// compared. let inline private ( , metricTarget ) = if <> then "function results are not the same" let ic = metricTarget let = Seq . initInfinite ( fun _ -> ic ) |> |> Seq . map ( ic ) Seq . map2 ( ) ( ) |> Seq . pick ( fun w -> let maxDF = 10000 if w . DF > maxDF then Some 0 else match w with | 0 -> None | c -> Some c ) /// Measure the given function for the iteration count using the start and end /// metric. let inline private ic = GC . Collect ( ) GC . WaitForPendingFinalizers ( ) GC . Collect ( ) let mutable total = LanguagePrimitives . GenericZero let = fun args -> let s = ( ) let ret = args let m = s total <- total + m ret let rec i = if i > 0 then |> ; ( i - 1 ) ic total /// Measure the time metric for the given function and iteration count. let private ic = Stopwatch . GetTimestamp ( fun t -> Stopwatch . GetTimestamp ( ) - t ) ic /// Measure the memory metric for the given function and iteration count. let private ic = let inline ( ) = if GC . TryStartNoGCRegion ( 1L <<< 22 ) |> then "TryStartNoGCRegion" GC . GetTotalMemory false let inline s = let t = GC . GetTotalMemory false - s GC . EndNoGCRegion ( ) t ic /// Measure the garbage collection metric for the given function and iteration /// count. let private ic = let ( ) = GC . CollectionCount 0 + GC . CollectionCount 1 + GC . CollectionCount 2 ( fun s -> ( ) - s ) ic /// Measure definitions which are a metric together with a metric target. let private oneMillisecond = Stopwatch . Frequency / 1000L let private timeMeasure = , oneMillisecond let private memoryMeasure = , 1024L //1KB let private garbageMeasure = , 10 /// Time statistics for a given function accurate to a mean standard error of 1%. let = timeMeasure 0.01 |> ( 1.0 / Stopwatch . Frequency ) /// Memory statistics for a given function accurate to a mean standard error of 1%. let = memoryMeasure 0.01 /// GC count statistics for a given function accurate to a mean standard error of /// 1%. let = garbageMeasure 0.01 /// Time comparison for two given functions to a confidence level of 0.1%. let = timeMeasure /// Memory comparison for two given functions to a confidence level of 0.1%. let = memoryMeasure /// GC count comparison for two given functions to a confidence level of 0.1%. let = garbageMeasure

The performance testing functions have a very simple signature. The statistics functions just take the function to be measured. The compare functions just take the two functions to be compared.

The statistics functions give an overview of a function's performance. These can easily be combined to produce a useful performance report.

The compare functions can be used in unit tests since they are a relative test and hence should be independent of the machine. They are also fast since they stop as soon as the given confidence level is achieved. The compare functions could also be extended to test if a function is a given percentage better than another.

Modularity from higher-order functions and lazy evaluation together with a little maths have produced a simple yet powerful performance testing library.

UPDATED (2016-10-21): The functions have been extended to be able to measure a subfunction of the passed in functions. This enables set up to be done without effecting the measurement. The subfunction can be called multiple times and the metric will be aggregated. An example of its use can be found in the Outliers and MAD post.

UPDATED (2017-01-03): Performance testing functionality has been added to the F# testing library Expecto using the code in this post.

namespace System

namespace System.Diagnostics

Multiple items

type AutoOpenAttribute =

inherit Attribute

new : unit -> AutoOpenAttribute

new : path:string -> AutoOpenAttribute

member Path : string



--------------------

new : unit -> AutoOpenAttribute

new : path:string -> AutoOpenAttribute

val sqr : x:'a -> 'b (requires member ( * ))

val x : 'a (requires member ( * ))

SampleStatistics.N: int

Multiple items

val int : value:'T -> int (requires member op_Explicit)



--------------------

type int = int32



--------------------

type int<'Measure> = int

SampleStatistics.Mean: float

Multiple items

val float : value:'T -> float (requires member op_Explicit)



--------------------

type float = Double



--------------------

type float<'Measure> = float

SampleStatistics.Variance: float

val s : SampleStatistics

val sqrt : value:'T -> 'U (requires member Sqrt)

type WelchStatistic =

{T: float;

DF: int;}

WelchStatistic.T: float

WelchStatistic.DF: int

val sampleStatistics : s:seq<'a> -> seq<SampleStatistics> (requires member op_Explicit)





Online statistics sequence for a given sample sequence.

val s : seq<'a> (requires member op_Explicit)

val calc : (int * float * float -> float -> int * float * float)

val n : int

val m : float

val s : float

val x : float

val m' : float

module Seq



from Microsoft.FSharp.Collections

val map : mapping:('T -> 'U) -> source:seq<'T> -> seq<'U>

val scan : folder:('State -> 'T -> 'State) -> state:'State -> source:seq<'T> -> seq<'State>

val skip : count:int -> source:seq<'T> -> seq<'T>

val scale : f:float -> s:SampleStatistics -> SampleStatistics





Scale the statistics for a given underlying random variable change of scale.

val f : float

val singleIteration : ic:int -> s:SampleStatistics -> SampleStatistics





Single iteration statistics for a given iteration count and total statistics.

val ic : int

val private tInv : float []





Student's t-distribution inverse for 0.1% confidence level by

degrees of freedom.

val welchStatistic : s1:SampleStatistics -> s2:SampleStatistics -> WelchStatistic





Welch's t-test statistic for two given sample statistics.

val s1 : SampleStatistics

val s2 : SampleStatistics

val f1 : float

val f2 : float

val welchTest : w:WelchStatistic -> int





Welch's t-test for a given Welch statistic to a confidence level of 0.1%.

val w : WelchStatistic

val abs : value:'T -> 'T (requires member Abs)

type Array =

member Clone : unit -> obj

member CopyTo : array:Array * index:int -> unit + 1 overload

member GetEnumerator : unit -> IEnumerator

member GetLength : dimension:int -> int

member GetLongLength : dimension:int -> int64

member GetLowerBound : dimension:int -> int

member GetUpperBound : dimension:int -> int

member GetValue : [<ParamArray>] indices:int[] -> obj + 7 overloads

member Initialize : unit -> unit

member IsFixedSize : bool

...

val get : array:'T [] -> index:int -> 'T

val min : e1:'T -> e2:'T -> 'T (requires comparison)

val length : array:'T [] -> int

val sign : value:'T -> int (requires member get_Sign)

module Statistics



from Main

module Performance



from Main

val private targetIterationCount : metric:(int -> 'a -> 'b) -> metricTarget:'b -> f:'a -> int (requires member ( <<< ) and comparison and member op_Explicit)





Find the iteration count to get to at least the metric target.

val metric : (int -> 'a -> 'b) (requires member ( <<< ) and comparison and member op_Explicit)

val metricTarget : 'b (requires member ( <<< ) and comparison and member op_Explicit)

val f : 'a

val find : (int -> int)

val item : 'b (requires member ( <<< ) and comparison and member op_Explicit)

val private measureStatistics : metric:(int -> 'a -> 'b) * metricTarget:'b -> relativeError:float -> f:'a -> SampleStatistics (requires member ( <<< ) and comparison and member op_Explicit and member op_Explicit)





Create and iterate a statistics sequence for a metric until the given accuracy.

val metric : (int -> 'a -> 'b) (requires member ( <<< ) and comparison and member op_Explicit and member op_Explicit)

val metricTarget : 'b (requires member ( <<< ) and comparison and member op_Explicit and member op_Explicit)

val relativeError : float

val initInfinite : initializer:(int -> 'T) -> seq<'T>

val find : predicate:('T -> bool) -> source:seq<'T> -> 'T

property SampleStatistics.MeanStandardError: float

val private measureCompare : metric:(int -> (('a -> 'a) -> 'b) -> 'c) * metricTarget:'c -> f1:(('a -> 'a) -> 'b) -> f2:(('a -> 'a) -> 'b) -> int (requires equality and member ( <<< ) and comparison and member op_Explicit and member op_Explicit)





Create and iterate two statistics sequences until the metric means can be

compared.

val metric : (int -> (('a -> 'a) -> 'b) -> 'c) (requires equality and member ( <<< ) and comparison and member op_Explicit and member op_Explicit)

val metricTarget : 'c (requires member ( <<< ) and comparison and member op_Explicit and member op_Explicit)

val f1 : (('a -> 'a) -> 'b) (requires equality)

val f2 : (('a -> 'a) -> 'b) (requires equality)

val id : x:'T -> 'T

val failwith : message:string -> 'T

val stats : ((('a -> 'a) -> 'b) -> seq<SampleStatistics>) (requires equality)

val f : (('a -> 'a) -> 'b) (requires equality)

val map2 : mapping:('T1 -> 'T2 -> 'U) -> source1:seq<'T1> -> source2:seq<'T2> -> seq<'U>

val pick : chooser:('T -> 'U option) -> source:seq<'T> -> 'U

val maxDF : int

union case Option.Some: Value: 'T -> Option<'T>

union case Option.None: Option<'T>

val c : int

val private measureMetric : startMetric:(unit -> 'a) -> endMetric:('a -> 'b) -> ic:int -> f:((('d -> 'e) -> 'd -> 'e) -> 'f) -> 'c (requires member ( + ) and member get_Zero)





Measure the given function for the iteration count using the start and end

metric.

val startMetric : (unit -> 'a)

val endMetric : ('a -> 'b) (requires member ( + ) and member get_Zero)

val f : ((('d -> 'e) -> 'd -> 'e) -> 'f)

type GC =

static member AddMemoryPressure : bytesAllocated:int64 -> unit

static member CancelFullGCNotification : unit -> unit

static member Collect : unit -> unit + 4 overloads

static member CollectionCount : generation:int -> int

static member EndNoGCRegion : unit -> unit

static member GetGeneration : obj:obj -> int + 1 overload

static member GetTotalMemory : forceFullCollection:bool -> int64

static member KeepAlive : obj:obj -> unit

static member MaxGeneration : int

static member ReRegisterForFinalize : obj:obj -> unit

...

GC.Collect() : unit

GC.Collect(generation: int) : unit

GC.Collect(generation: int, mode: GCCollectionMode) : unit

GC.Collect(generation: int, mode: GCCollectionMode, blocking: bool) : unit

GC.Collect(generation: int, mode: GCCollectionMode, blocking: bool, compacting: bool) : unit

GC.WaitForPendingFinalizers() : unit

val mutable total : 'c (requires member get_Zero and member ( + ))

module LanguagePrimitives



from Microsoft.FSharp.Core

val GenericZero<'T (requires member get_Zero)> : 'T (requires member get_Zero)

val measurer : (('g -> 'h) -> 'g -> 'h)

val toMeasure : ('g -> 'h)

val args : 'g

val s : 'a

val ret : 'h

val m : 'b (requires member ( + ) and member get_Zero)

val loop : (int -> unit)

val i : int

val ignore : value:'T -> unit

val private timeMetric : ic:int -> f:((('a -> 'b) -> 'a -> 'b) -> 'c) -> int64





Measure the time metric for the given function and iteration count.

val f : ((('a -> 'b) -> 'a -> 'b) -> 'c)

Multiple items

type Stopwatch =

new : unit -> Stopwatch

member Elapsed : TimeSpan

member ElapsedMilliseconds : int64

member ElapsedTicks : int64

member IsRunning : bool

member Reset : unit -> unit

member Restart : unit -> unit

member Start : unit -> unit

member Stop : unit -> unit

static val Frequency : int64

...



--------------------

Stopwatch() : Stopwatch

Stopwatch.GetTimestamp() : int64

val t : int64

val private memoryMetric : ic:int -> f:((('a -> 'b) -> 'a -> 'b) -> 'c) -> int64





Measure the memory metric for the given function and iteration count.

val startMetric : (unit -> int64)

GC.TryStartNoGCRegion(totalSize: int64) : bool

GC.TryStartNoGCRegion(totalSize: int64, disallowFullBlockingGC: bool) : bool

GC.TryStartNoGCRegion(totalSize: int64, lohSize: int64) : bool

GC.TryStartNoGCRegion(totalSize: int64, lohSize: int64, disallowFullBlockingGC: bool) : bool

val not : value:bool -> bool

GC.GetTotalMemory(forceFullCollection: bool) : int64

val endMetric : (int64 -> int64)

val s : int64

GC.EndNoGCRegion() : unit

val private garbageMetric : ic:int -> f:((('a -> 'b) -> 'a -> 'b) -> 'c) -> int





Measure the garbage collection metric for the given function and iteration

count.

val count : (unit -> int)

GC.CollectionCount(generation: int) : int

val s : int

val private oneMillisecond : int64





Measure definitions which are a metric together with a metric target.

field Stopwatch.Frequency: int64

val private timeMeasure : (int -> ((('a -> 'b) -> 'a -> 'b) -> 'c) -> int64) * int64

val private memoryMeasure : (int -> ((('a -> 'b) -> 'a -> 'b) -> 'c) -> int64) * int64

val private garbageMeasure : (int -> ((('a -> 'b) -> 'a -> 'b) -> 'c) -> int) * int

val timeStatistics : f:((('a -> 'b) -> 'a -> 'b) -> 'c) -> SampleStatistics





Time statistics for a given function accurate to a mean standard error of 1%.

val memoryStatistics : f:((('a -> 'b) -> 'a -> 'b) -> 'c) -> SampleStatistics





Memory statistics for a given function accurate to a mean standard error of 1%.

val gcCountStatistics : f:((('a -> 'b) -> 'a -> 'b) -> 'c) -> SampleStatistics





GC count statistics for a given function accurate to a mean standard error of

1%.

val timeCompare : f1:((('a -> 'b) -> 'a -> 'b) -> 'c) -> f2:((('a -> 'b) -> 'a -> 'b) -> 'c) -> int (requires equality)





Time comparison for two given functions to a confidence level of 0.1%.

val f1 : ((('a -> 'b) -> 'a -> 'b) -> 'c) (requires equality)

val f2 : ((('a -> 'b) -> 'a -> 'b) -> 'c) (requires equality)

val memoryCompare : f1:((('a -> 'b) -> 'a -> 'b) -> 'c) -> f2:((('a -> 'b) -> 'a -> 'b) -> 'c) -> int (requires equality)





Memory comparison for two given functions to a confidence level of 0.1%.

val gcCountCompare : f1:((('a -> 'b) -> 'a -> 'b) -> 'c) -> f2:((('a -> 'b) -> 'a -> 'b) -> 'c) -> int (requires equality)





GC count comparison for two given functions to a confidence level of 0.1%.