We are developing a program which receives and forwards "messages", while keeping a temporary history of those messages, so that it can tell you the message history if requested. Messages are identified numerically, are typically around 1 kilobyte in size, and we need to keep hundreds of thousands of these messages.

We wish to optimize this program for latency: the time between sending and receiving a message must be below 10 milliseconds.

The program is written in Haskell and compiled with GHC. However, we have found that garbage collection pauses are far too long for our latency requirements: over 100 milliseconds in our real-world program.

The following program is a simplified version of our application. It uses a Data.Map.Strict to store messages. Messages are ByteString s identified by an Int . 1,000,000 messages are inserted in increasing numeric order, and the oldest messages are continually removed to keep the history at a maximum of 200,000 messages.

module Main (main) where import qualified Control.Exception as Exception import qualified Control.Monad as Monad import qualified Data.ByteString as ByteString import qualified Data.Map.Strict as Map data Msg = Msg !Int !ByteString.ByteString type Chan = Map.Map Int ByteString.ByteString message :: Int -> Msg message n = Msg n (ByteString.replicate 1024 (fromIntegral n)) pushMsg :: Chan -> Msg -> IO Chan pushMsg chan (Msg msgId msgContent) = Exception.evaluate $ let inserted = Map.insert msgId msgContent chan in if 200000 < Map.size inserted then Map.deleteMin inserted else inserted main :: IO () main = Monad.foldM_ pushMsg Map.empty (map message [1..1000000])

We compiled and ran this program using:

$ ghc --version The Glorious Glasgow Haskell Compilation System, version 7.10.3 $ ghc -O2 -optc-O3 Main.hs $ ./Main +RTS -s 3,116,460,096 bytes allocated in the heap 385,101,600 bytes copied during GC 235,234,800 bytes maximum residency (14 sample(s)) 124,137,808 bytes maximum slop 600 MB total memory in use (0 MB lost due to fragmentation) Tot time (elapsed) Avg pause Max pause Gen 0 6558 colls, 0 par 0.238s 0.280s 0.0000s 0.0012s Gen 1 14 colls, 0 par 0.179s 0.250s 0.0179s 0.0515s INIT time 0.000s ( 0.000s elapsed) MUT time 0.652s ( 0.745s elapsed) GC time 0.417s ( 0.530s elapsed) EXIT time 0.010s ( 0.052s elapsed) Total time 1.079s ( 1.326s elapsed) %GC time 38.6% (40.0% elapsed) Alloc rate 4,780,213,353 bytes per MUT second Productivity 61.4% of total user, 49.9% of total elapsed

The important metric here is the "max pause" of 0.0515s, or 51 milliseconds. We wish to reduce this by at least an order of magnitude.

Experimentation shows that the length of a GC pause is determined by the number of messages in the history. The relationship is roughly linear, or perhaps super-linear. The following table shows this relationship. (You can see our benchmarking tests here, and some charts here.)

msgs history length max GC pause (ms) =================== ================= 12500 3 25000 6 50000 13 100000 30 200000 56 400000 104 800000 199 1600000 487 3200000 1957 6400000 5378

We have experimented with several other variables to find whether they can reduce this latency, none of which make a big difference. Among these unimportant variables are: optimization ( -O , -O2 ); RTS GC options ( -G , -H , -A , -c ), number of cores ( -N ), different data structures ( Data.Sequence ), the size of messages, and the amount of generated short-lived garbage. The overwhelming determining factor is the number of messages in the history.

Our working theory is that the pauses are linear in the number of messages because each GC cycle has to walk over all the working accessible memory and copy it, which are clearly linear operations.

Questions: