myset = Set.delete dead $ Set.fromList [alive, dead]

myset

alive

dead

dead

dead

alive

alive

myset

alive

dead

myset

Thunk (Set.delete dead) (Value (Set.fromList [alive,dead]))

Value (Set.fromList [alive])

dead

evaluate myset

evaluate

myset

myset

evaluate

dead

seq

-rtsopts -prof -auto-all -caf-all

+RTS -h

hp2ps -c

.hp

END_SAMPLE

hp2ps

$ shake benchmark breadth=1000 depth=1000 --deterministic -j1 --clean --no-report

--flags=testprog

+RTS -h

+RTS -s

-xt -hy

forkIO

The stacks might be becoming too large. I added -K1K to limit all threads to 1Kb of stack. The small test case ran to completion, without any threads exceeding their stack, so the stacks were not growing too large.

to limit all threads to 1Kb of stack. The small test case ran to completion, without any threads exceeding their stack, so the stacks were not growing too large. There might be too many threads active in the program. I added logging of every call to forkIO, and every time a thread finished. I found I was churning through 1000's of threads, but at most 60 were alive at any one time.

The threads might be keeping their stacks alive after they had finished for too long. Having eliminated the other possibilities, this seemed likely.

ThreadId

ThreadId

ThreadId

ThreadId

ThreadId

ThreadId

data S = S {threads :: Set.HashSet ThreadId, ...}

data S = S {threads :: !(Set.HashSet ThreadId), ...}

One of the downsides of a lazy language is the possibility of space leaks. A space leak is when your program uses more memory than it should, typically because lazy evaluation is holding on to things that if evaluated would disappear. As an example, let's build up a set:Consider the value. We first create a set withand, then delete. Assuming the valuesandare used nowhere else in the program, which values does the garbage collector consider to be alive?In a strict language onlywould be considered alive. But in a lazy language, as we have not yet evaluated, bothandare kept alive. You can think ofbeing represented in memory as, instead of just. You are storing the list of operations that will be applied to the set, rather than the result of those operations. These operations reference, so we cannot garbage collect it. Once we have identified the problem, we can apply strictness operations to force the value, for example:Thefunction forces evaluation of the first part of. If our set data type is strict, that will force evaluation of the entirevalue. Afterthere will be no references to, and it will be garbage collected. In other situations bang patterns, strict fields orcan be used for eliminating space leaks.The space leak in Shake was found over a period of a few weeks, ultimately with me waking up at 4am on New Years Day with knowledge of where an exclamation mark should be placed. However, I will present the account somewhat more methodically...The first step to fixing a space leak is the realisation that a space leak exists. My suspicion is that many (most?) large Haskell programs have space leaks, but they often go unnoticed. My first report of "memory issues" with Shake was a few months before I tracked it down. Shake is a library that makes use of user supplied rules, and in this example, the user supplied rules were far from trivial. Complicating the problem is the fact that Shake stores a list of all rules it has run in memory, which naturally grows over time. What finally convinced me that there was a problem was when several clean builds failed by exhausting memory, taking greater than 2Gb on a 32bit system.After suspecting a space leak, the first thing to do is measure memory usage, which I did using Process Explorer. The graph to the left shows a clear increase in memory. The memory doesn't grow all at once, but in steps - typical of a garbage collected language. The time between memory growth is not evenly spaced, which is reasonable for a program that runs different user rules all of which take different amounts of time. I reran the program several times, and while memory always increased, the shape of the graph varied quite considerably - as expected when running a non-deterministic program such as Shake. While I still didn't have any real information on what caused the memory leak, I could at least observe the memory leak myself.Before thinking too hard, it is worth applying whatever tools are to hand. In the case of space leaks, the GHC manual describes how to use the heap profiling tools. My standard pattern is to compile with, run withand view with. Since the test case was huge (over an hour), I always terminated it early with Ctrl-C, which produces an invalidfile. Fortunately, you can delete everything after the lastto recover a profile thatcan understand.The results on the right show steadily increasing memory allocated to PINNED values, of type ARR_WORDS, which I suspect are bytestrings. However, the total memory usage as reported by GHC and Process Explorer far exceeded that shown in the profile. It seemed like there was something missing from the profile. I upgraded to the latest GHC version but memory still seemed to be missing. The tools were not giving me any insight - perhaps due to the high degree of parallelism or other complexities in the build system.The next step was to produce a simpler example - one I could run to completion in a feasible amount of time. My standard approach is to take the full test case and remove things, checking the problem has not gone away, until nothing more can be removed - leaving a minimal test case. There are two reasons why that was infeasible in this instance: 1) With a program which naturally consumes memory over time, it is not clear if the space leak has disappeared or merely become smaller as things are removed. 2) The first reduction steps would have each taken over an hour.After deciding reduction was infeasible, I decided to try and produce a test case from scratch (which is usually a terrible idea). The original example used many different types of build rule, in many different ways - it was entirely possible that only one particular variant led to the memory leak. I decided to start with a small example, then (if necessary) try adding features until the space leak returned. Since I had been meaning to produce a benchmark for Shake at some point, I figured I could write a benchmark test which would hopefully show the bug, and even if not, be useful elsewhere.For benchmarking, generating a random build tree is not so useful, so I attempted to define a regular shaped but interesting dependency pattern. The pattern I settled on was parameterised by breadth and depth, where every node a depthdepended on every node at depth, to a limit of the given depth, with a given number of nodes at each level given by breadth. To accurately benchmark I used the file rules, as these are by far the most common.I found that with breadth=1000 depth=1000 I was able to exhaust the memory. In order to try and simplify the test case I tried passing flags to the test case to turn off certain features and try and make the problem easier to investigate. I was able to turn off multithreading (-j1), profile reports (--no-report), always build from scratch (--clean) and to build rules in a deterministic order (--deterministic). The final command was:This command works using a released version of Shake if you install the test program (install with).With the knowledge that 1000x1000 exceeded available memory I start reducing the numbers so the benchmark would complete in a reasonable timeframe (< 15s), but use an unreasonable amount of memory (> 500Mb). I found that 100x100 gave a reasonable balance.With a simple example, I once again turned to the heap profiling tools available in GHC. UsingI still saw memory unaccounted for, as before. The two obvious guesses are that GHC knows about the memory, but has decided to not show it in the heap profile, or that GHC does not know about the memory (for example, allocated on a different heap). UsingI saw that GHC was aware of the additional memory, suggesting GHC had at least some knowledge of the memory. With nothing else to try, I ran through the list of heap profiling flags trying each in turn.The magic flags turned out to be, producing the graph on the left. The profile shows STACK takes up the majority of the space, and starts to give memory usage in about the right ballpark. I concluded that the space leak must include stack values.A stack in GHC is created by a thread, and a thread is typically created by. There are three reasons stacks could take too much memory - too large, too many, too long:Looking for places threads were referenced, and thus potentially kept alive, I found a set I was building up containings. The set is used so that if one Shake thread throws an exception all other threads are killed immediately. When a thread started it was added to the set. When a thread finished it was removed. Under most circumstances the set was never evaluated, but simply thrown away when the build finished. This situation correponds to the example at the start of this post, and was causing a space leak ofI guessed that if theis kept alive then the stack is also. As a result, leaking a small amount of memory for a thread identifier was amplified by also leaking the stack for that thread. GHC HQ could certainly modify the runtime system so that thedid not keep the stack alive, but holding on to completeds is rarely a good idea, so I imagine it is not worth it. For more details, including a profiling showing that leakings does leak stacks, see this blog post on the GHC scheduler Having identified the likely culprit, I simply needed to force evaluation of the set at each step. The set was held within a larger data structure:To always force the set, I added an exclamation mark, turning it into a strict record:Since I had worked for days to find somewhere to insert a single character, I also added a comment, just in case I ever thought about removing the exclamation mark.To check the problem had gone away I reran my reduced test case, producing the graph on the right, and showing far less memory usage in Process Explorer. I then reran the full build system and saw a typical reduction in memory of around 1Gb - a most welcome improvement!Shake-0.4 and later contain that important exclamation mark, so an upgrade is strongly advised.Does your Haskell program need an additional exclamation mark? Here are some you can use: !!!!!!!!!!!!!!