6 ways to manage allocated memory in Haskell

Let’s say we have a foreign function that takes a C data structure. Our task is to allocate the structure in memory, fill in the fields, call the function, and deallocate the memory.

The data structure is not flat but contains pointers to other data structures that also need to be allocated. An example of such a data structure is a linked list:

typedef struct list { list { int car; car; struct list *cdr; list *cdr; } list;

And an example of a C function is one that computes the sum of all elements in the list:

int sum(list *l) { sum(list *l) { int s = 0 ; s = while (l) { (l) { s += l->car; l = l->cdr; } return s; s; }

In this article, I will explore different ways to track all the allocated pointers and free them reliably.

The complete code can be downloaded as a git repo:

git clone https://ro-che.info/files/2017-08-06-manage-allocated-memory-haskell.git

The modules below use the hsc2hs preprocessor; it replaces things like #{peek ...} , #{poke ...} , and #{size ...} with Haskell code.

Way 1: traverse the structure

Since all pointers are stored somewhere in the data structure, we could traverse it to recover and free those pointers.

mkList :: [ Int ] -> IO ( Ptr List ) = mkList l case l of -> return nullPtr []nullPtr x : xs -> do xs <- mallocBytes # {size list} structmallocBytes{size list} # {poke list, car} struct x {poke list, car} struct x <- mkList xs xs_cmkList xs # {poke list, cdr} struct xs_c {poke list, cdr} struct xs_c return struct struct freeList :: Ptr List -> IO () () freeList l | l == nullPtr = return () nullPtr() | otherwise = do <- # {peek list, cdr} l cdr{peek list, cdr} l free l freeList cdr way1 :: Int -> IO Int = bracket (mkList [ 1 .. n]) freeList csum way1 nbracket (mkList [n]) freeList csum

This is how we’d probably do it in C, but in Haskell it has several disadvantages compared to the other options we have:

The code to traverse the structure has to be written manually and is prone to errors. Having to do the extra work of traversing the structure makes it slower than some of the alternatives. It is not exception-safe; if something happens inside mkList , the already allocated pointers will be lost. Note that this code is async-exception-safe ( bracket masks async exceptions for mkList ), so the only exceptions we need to worry about must come from mkList or freeList (e.g. from mallocBytes ).

Way 2: allocaBytes and ContT

The Foreign.Marshal.Alloc module provides the bracket -style allocaBytes function, which allocates the memory and then automatically releases it.

-- |@'allocaBytes' n f@ executes the computation @f@, passing as argument -- a pointer to a temporarily allocated block of memory of @n@ bytes. -- The block of memory is sufficiently aligned for any of the basic -- foreign types that fits into a memory block of the allocated size. -- -- The memory is freed when @f@ terminates (either normally or via an -- exception), so the pointer passed to @f@ must /not/ be used after this. allocaBytes :: Int -> ( Ptr a -> IO b) -> IO b b)

It works great if we need to allocate a small fixed number of structures:

$ \ptr1 -> allocaBytes size1\ptr1 $ \ptr2 -> do allocaBytes size2\ptr2 -- do something with ptr1 and ptr2

But what if the number of allocations is large or even unknown, as in our case?

For that, we have the continuation monad!

mallocBytesC :: Int -> ContT r IO ( Ptr a) a) = ContT $ \k -> allocaBytes n k mallocBytesC n\kallocaBytes n k

mallocBytesC is a monadic function that returns a pointer to the allocated memory, so it is as convenient and flexible as the simple mallocBytes used in Way 1. But, unlike mallocBytes , all allocated memory will be safely and automatically released at the point where we run the ContT layer.

mkList :: [ Int ] -> ContT r IO ( Ptr List ) = mkList l case l of -> return nullPtr []nullPtr x : xs -> do xs <- mallocBytesC # {size list} structmallocBytesC{size list} $ # {poke list, car} struct x liftIO{poke list, car} struct x <- mkList xs xs_cmkList xs $ # {poke list, cdr} struct xs_c liftIO{poke list, cdr} struct xs_c return struct struct way2 :: Int -> IO Int = runContT (liftIO . csum =<< mkList [ 1 .. n]) return way2 nrunContT (liftIOcsummkList [n])

This is my favorite way: it is simple, safe, fast, and elegant.

Way 3: ResourceT

If we replace ContT r with ResourceT , we get a similar function with essentially the same semantics: the memory released when the ResourceT layer is run.

mallocBytesR :: Int -> ResourceT IO ( Ptr a) a) = snd <$> allocate (mallocBytes n) free mallocBytesR nallocate (mallocBytes n) free

The rest of the code barely changes:

mkList :: [ Int ] -> ResourceT IO ( Ptr List ) = mkList l case l of -> return nullPtr []nullPtr x : xs -> do xs <- mallocBytesR # {size list} structmallocBytesR{size list} $ # {poke list, car} struct x liftIO{poke list, car} struct x <- mkList xs xs_cmkList xs $ # {poke list, cdr} struct xs_c liftIO{poke list, cdr} struct xs_c return struct struct way3 :: Int -> IO Int = runResourceT (liftIO . csum =<< mkList [ 1 .. n]) way3 nrunResourceT (liftIOcsummkList [n])

However, as we shall see, this version is an order of magnitude slower than the ContT version.

Way 4: single-block allocation via mallocBytes

Even though our structure contains lots of pointers, we don’t have to allocate each one of them separately. Instead, we could allocate a single chunk of memory and calculate the pointers ourselves.

This method may be more involved for complex data structures, but it is the fastest one, because we only need to keep track of and deallocate a single pointer.

writeList :: Ptr List -> [ Int ] -> IO () () = writeList ptr l case l of -> error "writeList: empty list" [] -> do [x] # {poke list, car} ptr x {poke list, car} ptr x # {poke list, cdr} ptr nullPtr {poke list, cdr} ptr nullPtr x : xs -> do xs # {poke list, car} ptr x {poke list, car} ptr x let ptr' = plusPtr ptr # {size list} ptr'plusPtr ptr{size list} # {poke list, cdr} ptr ptr' {poke list, cdr} ptr ptr' writeList ptr' xs mkList :: [ Int ] -> IO ( Ptr List ) mkList l | null l = return nullPtr nullPtr | otherwise = do <- mallocBytes ( length l * # {size list}) ptrmallocBytes ({size list}) writeList ptr l return ptr ptr way4 :: Int -> IO Int = bracket (mkList [ 1 .. n]) free csum way4 nbracket (mkList [n]) free csum

Way 5: single-block allocation via allocaBytes + ContT

This is the same as Way 4, except using allocaBytes instead of mallocBytes . The two functions allocate the memory differently, so I thought I’d add this version to the benchmark.

mkList :: [ Int ] -> ContT r IO ( Ptr List ) mkList l | null l = return nullPtr nullPtr | otherwise = do <- mallocBytesC ( length l * # {size list}) ptrmallocBytesC ({size list}) $ writeList ptr l liftIOwriteList ptr l return ptr ptr way5 :: Int -> IO Int = runContT (liftIO . csum =<< mkList [ 1 .. n]) return way5 nrunContT (liftIOcsummkList [n])

Way 6: managed

This is the same as Way 2, but using the managed library by Gabriel Gonzalez. (Thanks to reddit user /u/andriusst for bringing it up.)

mallocBytesC :: Int -> Managed ( Ptr a) a) = managed $ \k -> allocaBytes n k mallocBytesC nmanaged\kallocaBytes n k mkList :: [ Int ] -> Managed ( Ptr List ) = mkList l case l of -> return nullPtr []nullPtr x : xs -> do xs <- mallocBytesC # {size list} structmallocBytesC{size list} $ # {poke list, car} struct x liftIO{poke list, car} struct x <- mkList xs xs_cmkList xs $ # {poke list, cdr} struct xs_c liftIO{poke list, cdr} struct xs_c return struct struct way6 :: Int -> IO Int = with (liftIO . csum =<< mkList [ 1 .. n]) return way6 nwith (liftIOcsummkList [n])

Managed is just a newtype wrapper around the polymorphic continuation monad (a.k.a. the Codensity monad) applied to IO . Because the underlying implementation is the same, the run time performance is exactly that of Way 2, but you get to work with simpler types: Managed instead of ContT r IO .

Is sum pure?

We expose the sum function from C as an IO function csum :

import ccall "sum" csum :: Ptr List -> IO Int foreignccall "sum" csum ::->

The alternative is to expose it as a pure function:

import ccall "sum" csum :: Ptr List -> Int foreignccall "sum" csum ::->

The Haskell FFI explicitly allows both declarations, and it may seem that a function summing up numbers deserves to be pure.

However, declaring csum as pure would break every single example above (after making the trivial changes to make it type check). Can you see why?

The answer is laziness. If we made csum pure and naively changed our code to compile, we might arrive at something like

1 .. n]) freeList ( return . csum) bracket (mkList [n]) freeList (csum)

This tells the compiler to:

Call mkList [1..n] ; Pass the result to the pure csum function, and promote the value to the IO type; Call freeList ; Return the value from step (2).

You may think that the value returned from step (2), and therefore from the whole expression, is a number — the length of the list. But because of laziness, it’s not a number yet. It is a thunk, a delayed application of csum to the pointer value. This thunk will be forced at the next step — when we pass it to the print function, for example, or compare it with the expected value. Only then will csum try to do its work. But it will be too late, because by that time freeList will have run, and csum will discover that its pointer argument is no longer a valid pointer.

To work around this problem, we could replace return with evaluate , but this just shows us that our decision to declare csum was a mistake. It matters in what order mkList , csum , and freeList are evaluated, and the principled way to enforce this order in Haskell is the IO monad.

Benchmark results

The benchmark consists of allocating, summing, and deallocating a list of numbers from 1 to 100.