Here's what I see (after inserting a print before the last performGC , to help tag when things happen.

524288 524296 32381000 0.00 0.00 1.15 1.95 0 0 (Gen: 0) 524288 524296 31856824 0.00 0.00 1.16 1.96 0 0 (Gen: 0) 368248 808 1032992 0.00 0.02 1.16 1.99 0 0 (Gen: 1) 0 808 1032992 0.00 0.00 1.16 1.99 0 0 (Gen: 1) "performed!" 39464 2200 1058952 0.00 0.00 1.16 1.99 0 0 (Gen: 1) 22264 1560 1075992 0.00 0.00 1.16 2.00 0 0 (Gen: 0) 0 0.00 0.00

So after GCs there is still 1M on the heap (without -G1). With -G1 I see:

34340656 20520040 20524800 0.10 0.12 0.76 0.85 0 0 (Gen: 0) 41697072 24917800 24922560 0.12 0.14 0.91 1.01 0 0 (Gen: 0) 70790776 800 2081568 0.00 0.02 1.04 1.20 0 0 (Gen: 0) 0 800 2081568 0.00 0.00 1.04 1.20 0 0 (Gen: 0) "performed!" 39464 2184 1058952 0.00 0.00 1.05 1.21 0 0 (Gen: 0) 22264 2856 43784 0.00 0.00 1.05 1.21 0 0 (Gen: 0) 0 0.00 0.00

So about 2M. This is on x86_64/Linux.

Let's think about the STG machine storage model to see if there's something else on the heap.

Things that could be in that 1M of space:

CAFs for things like [] , string constants, and the small Int and Char pool, plus things in libraries, the stdin MVar?

, string constants, and the small and pool, plus things in libraries, the MVar? Thread State Objects (TSOs) for the main thread.

thread. Any allocated signal handlers.

The IO manager Haskell code.

Sparks in the spark pool

From experience, this figure of slightly less than 1M seems to be the default "footprint" of a GHC binary. That's about what I've seen in other programs as well (e.g. shootout program smallest footprints are never less than 900K).

Perhaps the profiler can say something. Here's the -hT profile (no profiling libs needed), after I insert a minimal busy loop at the end to string out the tail:

$ ./A +RTS -K10M -S -hT -i0.001

Results in this graph:

Victory! Look at that ~1M thread stack object sitting there!

I don't know of a way to make TSOs smaller.

The code that produced the above graph: