In the process of doing some simple benchmarking, I came across something that surprised me. Take this snippet from Network.Socket.Splice:

hSplice :: Int -> Handle -> Handle -> IO () hSplice len s t = do a <- mallocBytes len :: IO (Ptr Word8) finally (forever $! do bytes <- hGetBufSome s a len if bytes > 0 then hPutBuf t a bytes else throwRecv0) (free a)

One would expect that hGetBufSome and hPutBuf here would not need to allocate memory, as they write into and read from a pre-allocated buffer. The docs seem to back this intuition up... But alas:

individual inherited COST CENTRE %time %alloc %time %alloc bytes hSplice 0.5 0.0 38.1 61.1 3792 hPutBuf 0.4 1.0 19.8 29.9 12800000 hPutBuf' 0.4 0.4 19.4 28.9 4800000 wantWritableHandle 0.1 0.1 19.0 28.5 1600000 wantWritableHandle' 0.0 0.0 18.9 28.4 0 withHandle_' 0.0 0.1 18.9 28.4 1600000 withHandle' 1.0 3.8 18.8 28.3 48800000 do_operation 1.1 3.4 17.8 24.5 44000000 withHandle_'.\ 0.3 1.1 16.7 21.0 14400000 checkWritableHandle 0.1 0.2 16.4 19.9 3200000 hPutBuf'.\ 1.1 3.3 16.3 19.7 42400000 flushWriteBuffer 0.7 1.4 12.1 6.2 17600000 flushByteWriteBuffer 11.3 4.8 11.3 4.8 61600000 bufWrite 1.7 6.9 3.0 9.9 88000000 copyToRawBuffer 0.1 0.2 1.2 2.8 3200000 withRawBuffer 0.3 0.8 1.2 2.6 10400000 copyToRawBuffer.\ 0.9 1.7 0.9 1.7 22400000 debugIO 0.1 0.2 0.1 0.2 3200000 debugIO 0.1 0.2 0.1 0.2 3200016 hGetBufSome 0.0 0.0 17.7 31.2 80 wantReadableHandle_ 0.0 0.0 17.7 31.2 32 wantReadableHandle' 0.0 0.0 17.7 31.2 0 withHandle_' 0.0 0.0 17.7 31.2 32 withHandle' 1.6 2.4 17.7 31.2 30400976 do_operation 0.4 2.4 16.1 28.8 30400880 withHandle_'.\ 0.5 1.1 15.8 26.4 14400288 checkReadableHandle 0.1 0.4 15.3 25.3 4800096 hGetBufSome.\ 8.7 14.8 15.2 24.9 190153648 bufReadNBNonEmpty 2.6 4.4 6.1 8.0 56800000 bufReadNBNonEmpty.buf' 0.0 0.4 0.0 0.4 5600000 bufReadNBNonEmpty.so_far' 0.2 0.1 0.2 0.1 1600000 bufReadNBNonEmpty.remaining 0.2 0.1 0.2 0.1 1600000 copyFromRawBuffer 0.1 0.2 2.9 2.8 3200000 withRawBuffer 1.0 0.8 2.8 2.6 10400000 copyFromRawBuffer.\ 1.8 1.7 1.8 1.7 22400000 bufReadNBNonEmpty.avail 0.2 0.1 0.2 0.1 1600000 flushCharReadBuffer 0.3 2.1 0.3 2.1 26400528

I have to assume this is on purpose... but I have no idea what that purpose might be. Even worse: I'm just barely clever enough to get this profile, but not quite clever enough to figure out exactly what's being allocated.

Any help along those lines would be appreciated.

UPDATE: I've done some more profiling with two drastically simplified testcases. The first testcase directly uses the read/write ops from System.Posix.Internals:

echo :: Ptr Word8 -> IO () echo buf = forever $ do threadWaitRead $ Fd 0 len <- c_read 0 buf 1 c_write 1 buf (fromIntegral len) yield

As you'd hope, this allocates no memory on the heap each time through the loop. The second testcase uses the read/write ops from GHC.IO.FD:

echo :: Ptr Word8 -> IO () echo buf = forever $ do len <- readRawBufferPtr "read" stdin buf 0 1 writeRawBufferPtr "write" stdout buf 0 (fromIntegral len)

UPDATE #2: I was advised to file this as a bug in GHC Trac... I'm still not sure it actually is a bug (as opposed to intentional behavior, a known limitation, or whatever) but here it is: https://ghc.haskell.org/trac/ghc/ticket/9696