In my CppCon 2016 Trip Report I mentioned Herb Sutter's Plenary, and his deferred reclamation library. The problem to be solved is that some data structures cannot be represented as a DAG, where each node in the structure has a clear owner; these data structures are general graphs, where any node may hold a pointer to any other node, and thus you cannot say that a node is no longer needed just because one of the pointers to it has been removed. Likewise, it is possible that there may be a whole segment of the data structure which is no longer accessible from outside, and the only remaining pointers are those internal to the data structure. This whole section should therefore be destroyed, otherwise we have a leak.

In the data structure shown below, the left-hand set of nodes have 3 external pointers keeping them alive, but the right-hand set are not accessible from outside, and so should be destroyed.

Deferred Reclamation

Herb's solution is to use a special pointer type deferred_ptr<T> , and a special heap and allocator to create the nodes. You can then explicitly check for inaccessible nodes by calling collect() on the heap object, or rely on all the nodes being destroyed when the heap itself is destroyed. For the details, see Herb's description.

One thing that stood out for me was the idea that destruction was deferred — nodes are not reclaimed immediately, but at some later time. Also, the use of a custom allocator seemed unusual. I wondered if there was an alternative way of handling things.

Internal pointers vs Root pointers

I've been contemplating the issue for the last couple of weeks, and come up with an alternative scheme that destroys unreachable objects as soon as they become unreachable, even if those unreachable objects hold pointers to each other. Rather than using a custom heap and allocator, it uses a custom base class, and distinct pointer types for internal pointers and root pointers.

My library is also still experimental, but the source code is freely available under the BSD license.

The library provides two smart pointer class templates: root_ptr<T> and internal_ptr<T> . root_ptr<T> is directly equivalent to std::shared_ptr<T> : it is a reference-counted smart pointer. For many uses, you could use root_ptr<T> as a direct replacement for std::shared_ptr<T> and your code will have identical behaviour. root_ptr<T> is intended to represent an external owner for your data structure. For a tree it could hold the pointer to the root node. For a general graph it could be used to hold each of the external nodes of the graph.

The difference comes with internal_ptr<T> . This holds a pointer to another object within the data structure. It is an internal pointer to another part of the same larger data structure. It is also reference counted, so if there are no root_ptr<T> or internal_ptr<T> objects pointing to a given object then it is immediately destroyed, but even one internal_ptr<T> can be enough to keep an object alive as part of a larger data structure.

The "magic" is that if an object is only pointed to by internal_ptr<T> pointers, then it is only kept alive as long as the whole data structure has an accessible root in the form of an root_ptr<T> or an object with an internal_ptr<T> that is not pointed to by either an root_ptr<T> or an internal_ptr<T> .

This is made possible by the internal nodes deriving from internal_base , much like std::enable_shared_from_this<T> enables additional functionality when using std::shared_ptr<T> . This base class is then passed to the internal_ptr<T> constructor, to identify which object the internal_ptr<T> belongs to.

For example, a singly-linked list could be implemented like so:

class List{ struct Node: jss::internal_base{ jss::internal_ptr<Node> next; data_type data; Node(data_type data_):next(this),data(data_){} }; jss::root_ptr<Node> head; public: void push_front(data_type new_data){ auto new_node=jss::make_owner<Node>(new_data); new_node->next=head; head=new_node; } data_type pop_front(){ auto old_head=head; if(!old_head) throw std::runtime_error("Empty list"); head=old_head->next; return old_head->data; } void clear(){ head.reset(); } };

This actually has an advantage over using std::shared_ptr<Node> for the links in the list, due to another feature of the library. When a group of interlinked nodes becomes unreachable, then firstly each node is marked as unreachable, thus making any internal_ptr<T> s that point to them become equal to nullptr . Then all the unreachable nodes are destroyed in turn. All this is done with iteration rather than recursion, and thus avoids the deep recursive destructor chaining that can occur when using std::shared_ptr<T> . This is similar to the behaviour of Herb's deferred_ptr<T> during a collect() call on the deferred_heap .

local_ptr<T> completes the set: you can use a local_ptr<T> when traversing a data structure that uses internal_ptr<T> . local_ptr<T> does not hold a reference, and is not in any way involved in the lifetime tracking of the nodes. It is intended to be used when you need to keep a local pointer to a node, but you're not updating the data structure, and don't need that pointer to keep the node alive. e.g.

class List{ // as above public: void for_each(std::function<void(data_type&)> f){ jss::local_ptr<Node> node=head; while(node){ f(node->data); node=node->next; } } }

Warning: root_ptr<T> and internal_ptr<T> are not safe for use if multiple threads may be accessing any of the nodes in the data structure while any thread is modifying any part of it. The data structure as a whole must be protected with external synchronization in a multi-threaded context.

How it works

The key to this system is twofold. Firstly the nodes in the data structure derive from internal_base , which allows the library to store a back-pointer to the smart pointer control block in the node itself, as long as the head of the list of internal_ptr<T> s that belong to that node. Secondly, the control blocks each hold a list of back-pointers to the control blocks of the objects that point to them via internal_ptr<T> . When a reference to a node is dropped (either from an root_ptr<T> or an internal_ptr<T> ), if that node has no remaining root_ptr<T> s that point to it, the back-pointers are checked. The chain of back-pointers is followed until either a node is found that has an root_ptr<T> that points to it, or a node is found that does not have a control block (e.g. because it is allocated on the stack, or owned by std::shared_ptr<T> ). If either is found, then the data structure is reachable, and thus kept alive. If neither is found once all the back-pointers have been followed, then the set of nodes that were checked is unreachable, and thus can be destroyed. Each of the unreachable nodes is then marked as such, which causes internal_ptr<T> s that refer to them to become nullptr , and thus prevents resurrection of the nodes. Finally, the unreachable nodes are all destroyed in an unspecified order. The scan and destroy is done with iteration rather than recursion to avoid the potential for deep recursive nesting on large interconnected graphs of nodes.

The downside is that the time taken to drop a reference to a node is dependent on the number of nodes in the data structure, in particular the number of nodes that have to be examined in order to find an owned node.

Note: only dropping a reference to a node (destroying a pointer, or reassigning a pointer) incurs this cost. Constructing the data structure is still relatively low overhead.

Feedback

Please let me know if you have any comments on my internal pointer library, especially if you have either used it successfully, or have tried to use it and found it doesn't work for you.

Posted by Anthony Williams

[/ cplusplus /] permanent link

Tags: cplusplus, reference counting, garbage collection

Stumble It! | Submit to Reddit | Submit to DZone

Comment on this post

If you liked this post, why not subscribe to the RSS feed or Follow me on Twitter? You can also subscribe to this blog by email using the form on the left.