Tue, 24 Jan 2012

Benchmarking and QuickChecking readInt.

I'm currently working on converting my http-proxy library from using the Data.Enumerator package to Data.Conduit (explanation of why in my last blog post).

During this conversion, I have been studying the sources of the Warp web server because my http-proxy was originally derived from the Enumerator version of Warp. While digging through the Warp code I found the following code (and comment) which is used to parse the number provided in the Content-Length field of a HTTP header:

-- Note: This function produces garbage on invalid input. But serving an -- invalid content-length is a bad idea, mkay? readInt :: S.ByteString -> Integer readInt = S.foldl' (\x w -> x * 10 + fromIntegral w - 48) 0

The comment clearly states that that this function can produce garbage, specifically if the string contains anything other than ASCII digits. The comment is also correct that an invalid Content-Length is a bad idea. However, on seeing the above code, and remembering something I had seen recently in the standard library, I naively sent the Yesod project a patch replacing the above code with a version that uses the readDec function from the Numeric module:

import Data.ByteString (ByteString) import qualified Data.ByteString.Char8 as B import qualified Numeric as N readInt :: ByteString -> Integer readInt s = case N.readDec (B.unpack s) of [] -> 0 (x, _):_ -> x

About 3-4 hours after I submitted the patch I got an email from Michael Snoyman saying that parsing the Content-Length field is a hot spot for the performance of Warp and that I should benchmark it against the code I'm replacing to make sure there is no unacceptable performance penalty.

That's when I decided it was time to check out Bryan O'Sullivan's Criterion bench-marking library. A quick read of the docs and bit of messing around and I was able to prove to myself that using readDec was indeed much slower than the code I wanted to replace.

The initial disappointment of finding that a more correct implementation was significantly slower than the less correct version quickly turned to joy as I experimented with a couple of other implementations and eventually settled on this:

import Data.ByteString (ByteString) import qualified Data.ByteString.Char8 as B import qualified Data.Char as C readIntTC :: Integral a => ByteString -> a readIntTC bs = fromIntegral $ B.foldl' (\i c -> i * 10 + C.digitToInt c) 0 $ B.takeWhile C.isDigit bs

By using the Integral type class, this function converts the given ByteString to any integer type (ie any type belonging to the Integral type class). When used, this function will be specialized by the Haskell compiler at the call site to to produce code to read string values into Int s, Int64 s or anything else that is a member of the Integral type class.

For a final sanity check I decided to use QuickCheck to make sure that the various versions of the generic function were correct for values of the type they returned. To do that I wrote a very simple QuickCheck property as follows:

prop_read_show_idempotent :: Integral a => (ByteString -> a) -> a -> Bool prop_read_show_idempotent freader x = let posx = abs x in posx == freader (B.pack $ show posx)

This QuickCheck property takes the function under test freader and QuickCheck will then provide it values of the correct type. Since the function under test is designed to read Content-Length values which are always positive, we only test using the absolute value of the value randomly generated by QuickCheck.

The complete test program can be found on Github in this Gist and can be compiled and run as:

ghc -Wall -O3 --make readInt.hs -o readInt && ./readInt

When run, the output of the program looks like this:

Quickcheck tests. +++ OK, passed 100 tests. +++ OK, passed 100 tests. +++ OK, passed 100 tests. Criterion tests. warming up estimating clock resolution... mean is 3.109095 us (320001 iterations) found 27331 outliers among 319999 samples (8.5%) 4477 (1.4%) low severe 22854 (7.1%) high severe estimating cost of a clock call... mean is 719.4627 ns (22 iterations) benchmarking readIntOrig mean: 4.653041 us, lb 4.645949 us, ub 4.663823 us, ci 0.950 std dev: 43.94805 ns, lb 31.52653 ns, ub 73.82125 ns, ci 0.950 benchmarking readDec mean: 13.12692 us, lb 13.10881 us, ub 13.14411 us, ci 0.950 std dev: 90.63362 ns, lb 77.52619 ns, ub 112.4304 ns, ci 0.950 benchmarking readRaw mean: 591.8697 ns, lb 590.9466 ns, ub 594.1634 ns, ci 0.950 std dev: 6.995869 ns, lb 3.557109 ns, ub 14.54708 ns, ci 0.950 benchmarking readInt mean: 388.3835 ns, lb 387.9500 ns, ub 388.8342 ns, ci 0.950 std dev: 2.261711 ns, lb 2.003214 ns, ub 2.585137 ns, ci 0.950 benchmarking readInt64 mean: 389.4380 ns, lb 388.9864 ns, ub 389.9312 ns, ci 0.950 std dev: 2.399116 ns, lb 2.090363 ns, ub 2.865227 ns, ci 0.950 benchmarking readInteger mean: 389.3450 ns, lb 388.8463 ns, ub 389.8626 ns, ci 0.950 std dev: 2.599062 ns, lb 2.302428 ns, ub 2.963600 ns, ci 0.950

At the top of the output is proof that all three specializations of the generic function readIntTC satisfy the QuickCheck property. From the Criterion output its pretty obvious that the Numeric.readDec version is about 3 times slower that the original function. More importantly, all three version of this generic function are an order of magnitude faster than the original.

That's a win! I will be submitting my new function for inclusion in Warp.

Update : 14:13

At around the same time I submitted my latest version for readInt Vincent Hanquez posted a comment on the Github issue suggesting I look at the GHC MagicHash extension and pointed me to an example.

Sure enough, using the MagicHash technique resulted in something significantly faster again.

Update #2 : 2012-01-29 19:46

In version 0.3.0 and later of the bytestring-lexing package there is a function readDecimal that is even faster than the MagiHash version.

Posted at: 11:52 | Category: CodeHacking/Haskell | Permalink