What is there to reference counting that is not obvious? In any language that supports deterministic destruction and the overloading of the copy constructor and the assignment operator it should be trivial. Or so I though until I decided to implement a simple ref-counted thread handle in D. Two problems popped up:

How does reference counting interact with garbage collection? How to avoid data races in a multithreaded environment?

In purely garbage-collected languages, like Java, you don’t implement reference counting, period. Which is pretty bad, if you ask me. GC is great at managing memory, but not so good at managing other resources. When a program runs out of memory, it forces a collection and reclaims unused memory. When it runs out of, say, system thread handles, it doesn’t reclaim unused handles–it just dies. You can’t use GC to manage system handles. So, as far as system resources go, Java forces the programmer to use the moral equivalent of C’s malloc and free . The programmer must free the resources explicitly.

In C++ you have std::shared_ptr for all your reference-counting needs, but you don’t have garbage collection for memory–at least not yet. (There is also the Microsoft’s C++/CLI which mixes the two systems.)

D offers the best of both worlds: GC and deterministic destruction. So let’s use GC to manage memory and reference counting (or other policies, like uniqueness) to manage other limited resources.

First attempt

The key to reference counting is to have a “value” type, like the shared_ptr in C++, that can be cloned and passed between functions at will. Internally this value type must have access to a shared chunk of memory that contains the reference count. In shared_ptr this chunk is a separately allocated integral type–the counter. The important thing is that all clones of the same resource share the same counter. The counter’s value reflects how many clones there are. Copy constructors and assignment operators take care of keeping the count exact. When the count goes to zero, that is the last copy of the resource goes out of scope, the resource is automatically freed (for instance, by calling CloseHandle ).

In my first attempt, I decided that the memory allocated for the counter should be garbage-collected. After all, the important thing is to release the handle–the memory will take care of itself.

struct RcHandle { shared Counter _counter; // GC'd shared Counter object HANDLE _h; ~this() { // destructor if (_counter.dec() == 0) // access the counter CloseHandle(_h); } // methods that keep the counter up to date }

RcHandle is a struct, which is a value type in D. Counter is a class, which is a reference type; so _counter really hides a pointer to shared memory.

A few tests later I got a nasty surprise. My program faulted while trying to access an already deallocated counter. How did that happen? How could garbage collector deallocate my counter if I still had a reference to it?

Here’s what I did in my test: I embedded the ref-counted handle inside another garbage-collected object:

class Embedder { // GC'd class object RcHandle _rc; }

When the time came to collect that object (which happened after the program ran to completion), its finalizer was called. Whenever an object contains fields that have non-trivial destructors, the compiler generates a finalizer that calls the destructors of those embedded objects– _rc in this case. The destructor of the ref-counted handle checks the reference count stored in the counter. Unfortunately the counter didn’t exist anymore. Hours of debugging later I had the answer.

What happened is that the garbage collector had two objects on its list: the embedder and the counter. It just so happened that the collector decided to collect those two objects in reverse order: the counter first, then the embedding object. So, by the time it got to the finalizer of the embedding object, the counter was gone!

What I discovered (with the help of other members of the D team who were involved in the discussion) was that there are some limitations on mixing garbage collection with deterministic destruction. There is a general rule:

An object’s destructor must not access any garbage-collected objects embedded in it.

Since the destructor of the ref-counted handle must have access to the counter, the counter must not be garbage-collectible. That means only one thing: it has to be allocated using malloc and explicitly deallocated using free . Which brings us to the second problem.

Concurrency

What can be simpler than an atomic reference count? On most processors you can atomically increment and decrement a memory location. You can even decrement and test the value in one uninterruptible operation. Problem solved! Or is it?

There is one tiny complication–the location that you are atomically modifying might disappear. I know, this is totally counter-intuitive. After all the management of the counter follows the simple rule: the last to leave the room turns off the light. If the destructor of RcHandle sees the reference count going from one to zero, it knows that no other RcHandle has access it, and it can safely free the counter. Who can argue with cold logic?

Here’s the troubling scenario: RcHandle is embedded in an object that is visible from two threads:

class Embedder { RcHandle _rcH; } shared Embedder emb;

Thread 1 tries to overwrite the handle:

RcHandle myHandle1; emb._rcH = myHandle1;

while Thread 2 tries to copy the same handle to a local variable:

RcHandle myHandle2 = emb._rcH;

Consider the following interleaving:

T2: Load the address of the _counter embedded in _rcH . T1: Swap emb._rcH with myHandle T1: Decrement the counter that was embedded in _rcH . If it’s zero (and it is, in this example), free the counter. T2: Increment the _counter . Oops! This memory location has just been freed.

The snag is that there is a window between T2 reading the pointer in (1), and incrementing the location it’s pointing to in (4). Within that window, the reference count does not match the actual number of clients having access to the pointer. If T1 happens to do its ugly deed of freeing the counter within that window, the race may turn out deadly. (This problem has been known for some time and there were various proposals to fix it, for instance using DCAS, as in this paper on Lock-Free Reference Counting.)

Should we worry? After all the C++ shared_ptr also exposes this race and nobody is crying havoc. It turns out that it all boils down to the responsibilities of the shared object (and I’m grateful to Andrei for pointing it out).

A shared object should not willy-nilly expose its implementation to clients

If the clients of Embedder want access to the handle, they should call a synchronized method. Here’s the correct, race-free implementation of the Embedder in the scenario I just described.

class Embedder { private: RcHandle _rcH; public: synchronized RcHandle GetHandle() const { return _rcH; } synchronized void SetHandle(RcHandle h) { _rcH = h; } ... }

The method GetHandle copies _rcH and increments its count under the Embedder ‘s lock. Another thread calling SetHandle has no chance of interleaving with that action, because it is forced to use the same lock. D2 actually enforces this kind of protection for shared objects, so my original example wouldn’t even compile.

You might be thinking right now that all this is baloney because I’m trying to fit a square peg into a round hole. I’m imposing value semantics on a non-atomic object, and you cannot atomically overwrite a non-atomic object. However, using this logic, you could convince yourself that a double cannot have value semantics (on most common architectures doubles are too large to be atomic). And yet you can safely pass doubles between threads and assign them to each other (which is the same as overwriting). It’s only when you embed a double inside a shared object, you have to protect it from concurrent access. And it’s not the double protecting itself–it’s the shared embedder that is responsible for synchronization. It’s exactly the same with RcHandle .

Conclusions

Just when you thought you knew everything about reference counting you discover something you haven’t though about. The take-home message is that mixing garbage collection with deterministic destruction is not a trivial matter, and that a race-free reference-counted object is vulnerable to races when embedded it in another shared object. Something to keep in mind when programming in D or in C++.