I recently ran across a couple of blog articles describing a data structure that provides a set of properties that I often am seeking:

Contiguous data storage, good for linear iteration and maximizing cache utilization Constant-time insertion, deletion, and relocation of elements Stable external references to elements regardless of internal relocation Constant-time look-up from external references without hashing

(The two articles were from the makers of the Molecule Engine and the BitSquid engine, respectively.)

Fascinated by data structures and algorithms in general, and lured by the appeal of these properties, I set out to study the concept and implement my own version. It’s quite simple to implement in a basic form to achieve the above four properties, and can easily be extended to acquire some additional behaviors at the cost of additional memory or extra operations. In this first article, I’ll walk through the basic concept, as well as a simple implementation in JavaScript. Later articles will explore some of the possible extensions.The central principle behind this data structure is double indirection. A single level of indirection, such as a raw pointer or an index into an array, is fast and simple, but any attempt to move the data elsewhere will suddenly and silently invalidate all direct references that might be stored and used elsewhere in code. But if all references merely refer to a single direct reference which serves as an intermediary, then moving the data merely requires also updating this intermediary. Look-ups from external references under this model then need only do two simple look-ups instead of one, first to get the direct reference from the indirect reference, and then to get the data itself from the direct reference.

We’ll start with a simple dynamic array and implement constant-time insertion and deletion functions, which will give us the first two properties listed above. Then we can take advantage of double indirection to gain the third and fourth properties without losing the first two. It will require some additional memory and some simple maintenance computations, but nothing severe, and certainly nothing that extends beyond a constant size cost per element or time cost per operation.

Let’s first look at a simple dynamic array, written in JavaScript. The language of course already has its own dynamic array, but I’ll start by building a wrapper around this array, as we’ll obviously be extending it shortly.

A Primitive Dynamic Array function ObjectPool() { this.objects = []; } //Add an object to the end of contiguous storage, and return an id that can later reference the object. //implementation is simply the array index of the newly added object. ObjectPool.prototype.add = function ObjectPool_add(object) { var id = this.objects.length; this.objects.push(object); return id; }; //Move the object at the end of contiguous storage into the location currently occupied //according to the given object id, and then remove the last item from storage. ObjectPool.prototype.remove = function ObjectPool_remove(id) { this.objects[id] = this.objects.pop(); //Warning: This invalidates the id for whatever object was previously at the end of //internal array, and which now can only be accessed by the id of the object that was //just removed! }; ObjectPool.prototype.get = function ObjectPool_get(id) { return this.objects[id]; }; ObjectPool.prototype.set = function ObjectPool_set(id, object) { this.objects[id] = object; }; 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 function ObjectPool ( ) { this . objects = [ ] ; } //Add an object to the end of contiguous storage, and return an id that can later reference the object. //implementation is simply the array index of the newly added object. ObjectPool . prototype . add = function ObjectPool_add ( object ) { var id = this . objects . length ; this . objects . push ( object ) ; return id ; } ; //Move the object at the end of contiguous storage into the location currently occupied //according to the given object id, and then remove the last item from storage. ObjectPool . prototype . remove = function ObjectPool_remove ( id ) { this . objects [ id ] = this . objects . pop ( ) ; //Warning: This invalidates the id for whatever object was previously at the end of //internal array, and which now can only be accessed by the id of the object that was //just removed! } ; ObjectPool . prototype . get = function ObjectPool_get ( id ) { return this . objects [ id ] ; } ; ObjectPool . prototype . set = function ObjectPool_set ( id , object ) { this . objects [ id ] = object ; } ;

There are two key features in this first version I want to point out explicitly. First is that the handle returned to external code is abstracted as an opaque id, but is technically just an integer used for direct access of the internal array. Second is how the remove() function provides a constant-time implementation while keeping everything packed tightly in the array without any holes. Removing anything but the last item is naturally going to create a hole, so we simply fill that hole with the element that was previously at the end, and then we simply pop the last item off. An obvious drawback with this implementation is that the order of elements is not stable whenever remove() is called, a behavior that will remain as we extend the data structure below. This can be disastrous in some contexts, so keep that in mind when considering the use of this data structure.

In fact, this behavior is already disastrous, since the ids returned by add() are no longer reliable after one or more calls to remove()! Let’s begin to add that double indirection to solve this problem. We’ll create a second internal array which stores indexes into the object array, and which never rearranges its elements. It may develop holes, but any indexes into this second array will be guaranteed to remain valid as long as the underlying object itself isn’t removed.

Introducing Double Indirection function ObjectPool() { this.objects = []; this.indexes = []; } ObjectPool.prototype.add = function ObjectPool_add(object) { var index = this.objects.length; var id = this.indexes.length; this.indexes.push(index); this.objects.push(object); return id; //Warning: The indexes array never reuses any old slots, and will thus grow indefinitely, even //if the number of elements in the objects array remains relatively constant, but is added to //and removed from frequently! }; ObjectPool.prototype.remove = function ObjectPool_remove(id) { var index = this.indexes[id]; this.objects[index] = this.objects.pop(); this.indexes[id] = -1; //Ugh! We'll need to do something better than this. //Warning: The entry in the indexes array which refers to the element that just got moved //to fill in the hole created is now wrong, and will break if accessed! }; ObjectPool.prototype.get = function ObjectPool_get(id) { return this.objects[this.indexes[id]]; }; ObjectPool.prototype.set = function ObjectPool_set(id, object) { this.objects[this.indexes[id]] = object; }; 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 function ObjectPool ( ) { this . objects = [ ] ; this . indexes = [ ] ; } ObjectPool . prototype . add = function ObjectPool_add ( object ) { var index = this . objects . length ; var id = this . indexes . length ; this . indexes . push ( index ) ; this . objects . push ( object ) ; return id ; //Warning: The indexes array never reuses any old slots, and will thus grow indefinitely, even //if the number of elements in the objects array remains relatively constant, but is added to //and removed from frequently! } ; ObjectPool . prototype . remove = function ObjectPool_remove ( id ) { var index = this . indexes [ id ] ; this . objects [ index ] = this . objects . pop ( ) ; this . indexes [ id ] = - 1 ; //Ugh! We'll need to do something better than this. //Warning: The entry in the indexes array which refers to the element that just got moved //to fill in the hole created is now wrong, and will break if accessed! } ; ObjectPool . prototype . get = function ObjectPool_get ( id ) { return this . objects [ this . indexes [ id ] ] ; } ; ObjectPool . prototype . set = function ObjectPool_set ( id , object ) { this . objects [ this . indexes [ id ] ] = object ; } ;

This is moving us in the right direction, but as the warning comments indicate, we still have some serious problems. Let’s tackle the issue with remove() first, since it will literally produce invalid behavior as written above. The issue is that the element at the end of the array is moving to a new location in the array, and thus the value in the indexes array that refers to it needs to be updated to refer to the new index (that is, the index of the item that just got removed). But how do we find that index in order to update it? We could scan the entire indexes array for the element that refers to the last index of the objects array, but that would make the remove() function have linear-time, not constant-time complexity.

Instead, we’ll just add a third array, this one that maps index values of the objects array to indexes of the indexes array. And since this terminology is getting crazy, I’ll clean it up by noting that we essentially have two sets of indexes: Those that are used to access the objects array directly are dense indexes, as they are indexing into a tightly packed array with no holes. The doubly indirect indexes are used on the other hand to access the indexes array, which might contain holes, so we’ll call those sparse indexes. The two arrays can then be renamed to sparseToDense and denseToSparse, since they essentially form a two-way map between sparse and dense indexes.

Index Maps for Both Directions function ObjectPool() { this.objects = []; this.sparseToDense = []; this.denseToSparse = []; } ObjectPool.prototype.add = function ObjectPool_add(object) { var denseIndex = this.objects.length; var sparseIndex = this.sparseToDense.length; this.sparseToDense.push(denseIndex); this.denseToSparse.push(sparseIndex); this.objects.push(object); return sparseIndex; //Warning: Still growing indefinitely! }; ObjectPool.prototype.remove = function ObjectPool_remove(sparseIndex) { //Look up where in the dense array the object is stored, given its sparse index. var denseIndex = this.sparseToDense[sparseIndex]; //Look up where in the sparse array the last element is referenced. //The last element of both objects and denseToSparse are popped, as these two arrays //are 1-to-1, and the last element is being moved to fill in the hole of the removed object. var lastElementSparseIndex = this.denseToSparse.pop(); this.objects[denseIndex] = this.objects.pop(); //The maps for the last element that just got relocated need to be updatd with //the new index values. this.sparseToDense[lastElementSparseIndex] = denseIndex; this.denseToSparse[denseIndex] = lastElementSparseIndex; //Still aren't doing anything intelligent with the old sparse index slot. this.sparseToDense[sparseIndex] = -1; //Ugh! We'll need to do something better than this. }; ObjectPool.prototype.get = function ObjectPool_get(sparseIndex) { return this.objects[this.sparseToDense[sparseIndex]]; }; ObjectPool.prototype.set = function ObjectPool_set(sparseIndex, object) { this.objects[this.sparseToDense[sparseIndex]] = object; }; 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 function ObjectPool ( ) { this . objects = [ ] ; this . sparseToDense = [ ] ; this . denseToSparse = [ ] ; } ObjectPool . prototype . add = function ObjectPool_add ( object ) { var denseIndex = this . objects . length ; var sparseIndex = this . sparseToDense . length ; this . sparseToDense . push ( denseIndex ) ; this . denseToSparse . push ( sparseIndex ) ; this . objects . push ( object ) ; return sparseIndex ; //Warning: Still growing indefinitely! } ; ObjectPool . prototype . remove = function ObjectPool_remove ( sparseIndex ) { //Look up where in the dense array the object is stored, given its sparse index. var denseIndex = this . sparseToDense [ sparseIndex ] ; //Look up where in the sparse array the last element is referenced. //The last element of both objects and denseToSparse are popped, as these two arrays //are 1-to-1, and the last element is being moved to fill in the hole of the removed object. var lastElementSparseIndex = this . denseToSparse . pop ( ) ; this . objects [ denseIndex ] = this . objects . pop ( ) ; //The maps for the last element that just got relocated need to be updatd with //the new index values. this . sparseToDense [ lastElementSparseIndex ] = denseIndex ; this . denseToSparse [ denseIndex ] = lastElementSparseIndex ; //Still aren't doing anything intelligent with the old sparse index slot. this . sparseToDense [ sparseIndex ] = - 1 ; //Ugh! We'll need to do something better than this. } ; ObjectPool . prototype . get = function ObjectPool_get ( sparseIndex ) { return this . objects [ this . sparseToDense [ sparseIndex ] ] ; } ; ObjectPool . prototype . set = function ObjectPool_set ( sparseIndex , object ) { this . objects [ this . sparseToDense [ sparseIndex ] ] = object ; } ;

Getting closer. Now let’s finish off the last problem, that the sparseToDense array grows every time a new object is added, but never shrinks, no matter how many objects are removed. We can utilize what is known as a free list, often used for memory pools. Since what we are interested in is specifically the collection of slots in the sparseToDense array that aren’t in use, we are in fact free to use those slots themselves to store that data, in the form of a singly linked list. The slots are already generally filling the role of storing indexes that reference an array. Why not just put them to use referencing the next free element in sparseToDense array whenever they’re not in use to reference the objects array? All we would need then is a single head reference for the linked list, which we can store in the ObjectPool instance.

Index Free List function ObjectPool() { this.objects = []; this.sparseToDense = []; this.denseToSparse = []; this.freeHead = null; //No free slots in the sparse array initially. } ObjectPool.prototype.add = function ObjectPool_add(object) { var denseIndex = this.objects.length; var sparseIndex = this.freeHead; //Grab the index of the first free slot, if there is one. if (sparseIndex !== null) { //Since we're about to use the first free slot, the next free slot needs to become the free //head. Follow the linked list to the next element and store that as the free head. this.freeHead = this.sparseToDense[sparseIndex]; this.sparseToDense[sparseIndex] = denseIndex; } else { //All slots are currently used, so we'll just need to grow the array. sparseIndex = this.sparseToDense.length; this.sparseToDense.push(denseIndex); } this.denseToSparse.push(sparseIndex); this.objects.push(object); return sparseIndex; }; ObjectPool.prototype.remove = function ObjectPool_remove(sparseIndex) { var denseIndex = this.sparseToDense[sparseIndex]; var lastElementSparseIndex = this.denseToSparse.pop(); this.objects[denseIndex] = this.objects.pop(); this.sparseToDense[lastElementSparseIndex] = denseIndex; this.denseToSparse[denseIndex] = lastElementSparseIndex; //Point this newly freed slot at the free head. If the free head is //currently null, then this slot will effectively be the last element //in the linked list, as it will point to null as the next index. this.sparseToDense[sparseIndex] = this.freeHead; //Make the newly freed slot the head of the linked list of free slots. this.freeHead = sparseIndex; }; ObjectPool.prototype.get = function ObjectPool_get(sparseIndex) { return this.objects[this.sparseToDense[sparseIndex]]; }; ObjectPool.prototype.set = function ObjectPool_set(sparseIndex, object) { this.objects[this.sparseToDense[sparseIndex]] = object; }; 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 function ObjectPool ( ) { this . objects = [ ] ; this . sparseToDense = [ ] ; this . denseToSparse = [ ] ; this . freeHead = null ; //No free slots in the sparse array initially. } ObjectPool . prototype . add = function ObjectPool_add ( object ) { var denseIndex = this . objects . length ; var sparseIndex = this . freeHead ; //Grab the index of the first free slot, if there is one. if ( sparseIndex !== null ) { //Since we're about to use the first free slot, the next free slot needs to become the free //head. Follow the linked list to the next element and store that as the free head. this . freeHead = this . sparseToDense [ sparseIndex ] ; this . sparseToDense [ sparseIndex ] = denseIndex ; } else { //All slots are currently used, so we'll just need to grow the array. sparseIndex = this . sparseToDense . length ; this . sparseToDense . push ( denseIndex ) ; } this . denseToSparse . push ( sparseIndex ) ; this . objects . push ( object ) ; return sparseIndex ; } ; ObjectPool . prototype . remove = function ObjectPool_remove ( sparseIndex ) { var denseIndex = this . sparseToDense [ sparseIndex ] ; var lastElementSparseIndex = this . denseToSparse . pop ( ) ; this . objects [ denseIndex ] = this . objects . pop ( ) ; this . sparseToDense [ lastElementSparseIndex ] = denseIndex ; this . denseToSparse [ denseIndex ] = lastElementSparseIndex ; //Point this newly freed slot at the free head. If the free head is //currently null, then this slot will effectively be the last element //in the linked list, as it will point to null as the next index. this . sparseToDense [ sparseIndex ] = this . freeHead ; //Make the newly freed slot the head of the linked list of free slots. this . freeHead = sparseIndex ; } ; ObjectPool . prototype . get = function ObjectPool_get ( sparseIndex ) { return this . objects [ this . sparseToDense [ sparseIndex ] ] ; } ; ObjectPool . prototype . set = function ObjectPool_set ( sparseIndex , object ) { this . objects [ this . sparseToDense [ sparseIndex ] ] = object ; } ;

That should do it! We now have a data structure that can store items contiguously, supports constant-time insertion (at the end) and deletion (from anywhere), and provides external references that are efficient and remain stable for the life of the element they reference. Granted, the contiguous storage is likely to be less relevant in a language like JavaScript, as it will likely be stored in memory as a contiguous array of pointers to objects. But the data structure is sound, and should carry over very well to statically typed languages like C++ that could benefit even more significantly from the data locality.

Next time I’ll add some extra behaviors to this data structure that require no changes to the above core, but will allow for pairs of elements to be swapped. And this operation in turn enables the entire data structure to be sorted according to any desired criteria, still without invalidating any external references. Later articles might include subjects such as implementing this in C++ or improving the external references so that they remain valid (but essentially null) even after the object has been removed, more or less allowing them to act as safe weak references.

Dense Dynamic Arrays with Stable Handles, Part 1

Dense Dynamic Arrays with Stable Handles, Part 2

to be continued…?