View source on Github

This blog post discusses the conduit package. If you are not familiar with it, you can read up on it in the Yesod book chapter on conduits.

Earlier today, Oleg sent an email to the Haskell cafe about regions. Yves Parès sent a response that linked to resource-simple, a package I had not heard of until then. Reading the description on the page reminded me of one of the earlier decisions I made in designing conduits. I'd like to explain that decision, and then explain how we can work around it.

Originally, I had intended that all conduits would live in the IO monad. This is a fair assumption: the majority of the time, we want to use conduits to perform some kind of I/O (otherwise, why not just use lazy lists?). So for my first stab at the problem, I designed a ResourceT transformer that always assumed an IO monad for its base. Then, all three data types in conduit ( Source , Sink , and Conduit ) assumed that their actions lived in the ResourceT transformer so that they could safely acquire resources.

However, this IO assumption can be limiting. Thre are plenty of sources, sinks, and conduits which perform no resource allocation at all, and we would like to be able to access from pure code. For example, xml-conduit provides a greater parser and renderer for XML documents; it would be a shame to only be able to access it from IO . We could of course use unsafePerformIO , but we don't mention that in polite company.

I created an elaborate typeclass system around ResourceT, which would allow us to build monad stacks around both IO and ST . Then we could call ST from pure code, and no need to touch that unsafe stuff!

Unfortunately, there are a few downsides to this approach:

ResourceT doesn't really make sense for ST . You can't safely allocate scarce resources in the ST monad, so we're just pretending for the sake of uniformity. Just have a look at how resourceBracket is implemented for ST .

The type complexity really gets in the way. Look at the presence of both with and withIO for an example.

We are still limited in our monad choices, since we need monads that provide mutable references for ResourceT to work. This turns into a performance penalty, as we'll see later.

It turns out there's a simple solution here: don't bake ResourceT into Source , Sink , and Conduit . Instead, only use it for functions that actually allocate scarce resources, such as sourceFile . There is a downside to this approach: type signatures get a little longer:

-- conduit 0.2 sourceFile :: ResourceIO m => FilePath -> Source m ByteString -- conduit 0.3 sourceFile :: ResourceIO m => FilePath -> Source (ResourceT m) ByteString

However, you can make the argument that this is in fact a Good Thing: we're now explicit in our types as to whether we're performing allocation of scarce resources.

I've put together a separate branch on Github for this approach, and have generated some Haddocks. I'm not yet ready to release this code to Hackage, but I wanted to get people's feedback.

Beyond the theoretical issues above, I'm sure there are two big questions people want to ask.

How bad is the breakage?

Not bad at all. The Resource typeclass is completely gone now. You can replace it with Monad . In other words:

-- old nums :: Resource m => Source m Int nums = fromList [1..10] -- new nums :: Monad m => Source m Int nums = fromList [1..10]

Additionally, the lesser-used ResourceThrow and ResourceUnsafeIO classes have been renamed to MonadThrow and MonadUnsafeIO . These classes are not in any way ResourceT -specific, thus the name change. ResourceIO remains as-is.

You might have to add a few explicit lift calls now, and in some cases will have to change your type signature to include ResourceT . But overall, this is a minor change.

How does this affect performance?

For code that will still live in the ResourceT transformer, this will have no performance affect. (I made a separate change to optimize the monadic bind implementation of ResourceT , which does improve performance significantly.) However, if you don't need scarce resource allocations, you can now skip out on the ResourceT overhead entirely. In fact, you can skip out on the overhead of IO and ST as well if you just need to perform pure actions.

I implemented a simple Criterion benchmark comparing six different ways of summing up the numbers 1 to 1000:

main :: IO () main = defaultMain [ bench "bigsum-resourcet-io" (whnfIO $ C.runResourceT $ CL.sourceList [1..1000 :: Int] C.$$ CL.fold (+) 0) , bench "bigsum-io" (whnfIO $ CL.sourceList [1..1000 :: Int] C.$$ CL.fold (+) 0) , bench "bigsum-st" $ whnf (\i -> (runST $ CL.sourceList [1..1000 :: Int] C.$$ CL.fold (+) i)) 0 , bench "bigsum-identity" $ whnf (\i -> (runIdentity $ CL.sourceList [1..1000 :: Int] C.$$ CL.fold (+) i)) 0 , bench "bigsum-foldM" $ whnf (\i -> (runIdentity $ foldM (\a b -> return $! a + b) i [1..1000 :: Int])) 0 , bench "bigsum-pure" $ whnf (\i -> foldl' (+) i [1..1000 :: Int]) 0 ]

The results are very promising: moving from ResourceT to the Identity monad brings runtime from 1541us to 409us. Unsurprisingly, a straight foldM is still faster (no conduit overhead at all), and a pure foldl' faster yet, but we're definitely closing the gap.