Last week, the .Net BCL team published (a preview of) a bunch of immutable collections. These collections support efficient operations for creating modified copies of themselves, while leaving the original unchanged (they persist).

Immutable collections are useful, particularly for functionality involving persistent snapshots (like undo/redo), but I also find them fundamentally fascinating. Did you know that the asymptotic performance of immutable data structures is always within a logarithmic factor of their mutable equivalents? Basically, you can take any mutable data structure and encode its data into an immutable tree instead of memory. Each memory access becomes an operation on the tree, taking logarithmic time. Of course, in practice, you can often avoid this logarithmic slowdown and achieve equivalent asymptotic complexity (or even comparable real-world performance).

A few years ago, Eric Lippert published a series of blog posts about immutability. Among other things, he explains how to make an immutable stack and an immutable queue. The stack is downright trivial (basically just a linked list), and the queue is also pretty simple. For reference, I made a short visualization of the queue described by Eric:

(I apologize if the chosen colors look too similar if you’re colorblind.)

The simple queue described by Eric has an immutable stack for incoming items, and flips the incoming stack into an immutable outgoing stack for dequeuing (when necessary). Flipping is very expensive (linear time) but happens infrequently, so that the overall amortized cost are constant time per operation.

When I read Eric’s post, it left me with a question: can that intermittent expensive operation be avoided? Is there an immutable queue where every single enqueue and dequeue is guaranteed to be constant time? Well… yes. How? Read on.

Iterative Rebuilding

The problem with the immutable queue shown in the above video is that it procrastinates. Instead of continuously doing small amounts of work, it delays doing anything until the last possible moment. When the outgoing item stack is empty, and someone wants to dequeue an item, the entire state must be reversed. A constant time queue will need to be constantly doing that sort of work as items are enqueued and dequeued, instead of all at once.

The first necessary change is obvious: we must incrementally reverse the incoming stack into the outgoing stack whenever an item is enqueued or dequeued. Because we can only add items to the top of the outgoing stack (since it’s a stack), not the bottom, we’ll have to have a ‘ready to dequeue’ outgoing stack and a ‘partially rebuilt’ outgoing stack. Whenever an item is enqueued or dequeued, we’ll reverse a few more items onto the partially rebuilt outgoing stack. Once the partially rebuilt outgoing stack is completely rebuilt, we’ll replace the true outgoing stack.

The second necessary change is a consequence of the first. The incoming stack is no longer being reduced (its items are needed in the same order for the next outgoing stack rebuilding pass). Garbage will accumulate at the bottom of the incoming stack. We can ignore it by keeping a count and artificially decrementing it but, unless we want memory usage to go up without bound, those nodes eventually need to be cleaned up. To make that possible, the incoming queue will also have to be continuously iteratively rebuilt.

Implementation

The source code, about 200 lines of C#, is too long to embed comfortably into this post. It’s divided into three data structures. First is the ‘stack with a count that can be decremented to ignore items’, which I call DropStack (because items ‘drop’ off the bottom). Second is the DropCollectStack, the augmentation of the drop stack that rebuilds its internal stack to ensure dropped items can be eventually dereferenced (for garbage collection). Finally, there’s the constant time immutable Queue, which keeps rebuilding an outgoing stack based on an incoming drop collect stack.

I assume the above high level overview is not sufficient to really ‘get’ what’s going on, so I also made a visualization of the described constant time immutable queue:

The shaded area along the top the the incoming stack shows the progress of the incremental copy into the ‘partially rebuilt outgoing’ stack. The shaded area along the bottom of the incoming stack shows the progress of the current incremental copy into the ‘partially reversed incoming’ stack. Black rectangles represent items that are ignored, but still in the underlying stack.

The visualization shows fifteen items being enqueued, with the outgoing stack being rebuilt the whole time, and then shows the items being cycled out and back into the queue, with both the outgoing and incoming stacks being rebuilt. Note how the incoming rebuild has to play catchup with the actual current state before it can replace the incoming queue, whereas the outgoing rebuild can get away with omitting some of the freshest items (because they won’t be dequeued for awhile).

Side note: the visualization has more rebuilding-per-operation than the source code, to decrease the maximum number of black squares and keep the outgoing stack more obviously up to date.

Analysis

I chose the rate of iterative rebuilding, per enqueue or dequeue, to be the simplest values that ensure items are always available to be dequeued and to ensure that the amount of garbage is linear in the size of the queue. These rates can be tweaked, to get better performance trade-offs, but ultimately rebuilding needs to be done often enough to ensure that items are available to be dequeued and that nodes are garbage collected.

For rebuilding the outgoing stack, I set every enqueue to be worth 1 step and every dequeue to be worth 2 steps. Each step pushes at least one item onto the partially rebuilt outgoing stack (and replaces the outgoing stack if possible). This number of steps ensures the number of items available to be dequeued is always at least half of all the items in the queue.

For rebuilding the incoming stack, I set every dequeue to be worth 3 steps (and enqueues are worth 0 steps). Each step either pushes an item onto the partially reversed stack or onto the partially rebuilt incoming stack (and replaces the incoming stack if possible). This bounds the amount of garbage nodes to be at most six times the number of nodes actually in the queue (more steps decrease this rather large bound, but of course take more time).

The performance of this queue is consistent. Consistently bad (but very consistent!). Every operation takes approximately the same amount of time, but that amount of time is at least an order of magnitude slower, overall, than the simple immutable queue. This could be improved by having an immutable facade, instead of true end-to-end immutability, or tweaking constants but, ultimately, we’re not going to be able to come close to beating the minimal overhead achieved by the naive immutable queue without some sort of major revision.

In general, I recommend favoring the simple immutable queue over my constant time immutable queue. A constant time queue might be better if you had hard real time constraints, but I doubt you’d be using a language with a garbage collector in that case.

Summary

Immutable data structures are useful and interesting. It’s possible to make an immutable queue with guaranteed constant time operations, instead of just aggregate constant time, but there are severe performance trade-offs. If you need an immutable queue, use the one in the immutable collections library released by the BCL team last week.

Note that the content of this post barely scratches the surface of information available in the research literature, and elsewhere. For example, way back in 1999, Kaplan and Tarjan showed how to implement an immutable constant-time deque with catenation (pdf of paper).

—

—



Twisted Oak Studios offers consulting and development on high-tech interactive projects. Check out our portfolio, or Give us a shout if you have anything you think some really rad engineers should help you with.



Archive