What Does It Mean To Change An Object? If you want to be able to augment a data structure in a uniformly efficient way, it seems to be necessary to use a node-based structure rather than an array-based one.



Some of the reader comments about last week's note suggest a question of definition. For example, one of the comments eventually leads to an example that includes the following code:

template< class Item > inline auto concat( NcArray<Item> a, Item const v ) -> NcArray<Item> { a.append( movable( v ) ); return movable( a ); }

This code appears as part of the definition of a class named NcArray , which is intended to represent an array is never copied, but still supports operations, such as the function concat defined here, that appears to create a new object that contains a copy of the original object's elements.

I say appears in the previous sentence because what really happens is the result of concat contains the contents of its argument augmented by concatenating a new element, and has the side effect that the argument is no longer usable. In other words, if I have an NcArray object named foo , and an element named x , and I write

auto bar = concat(foo, x);

then when I'm done, the value of bar is an NcArray whose elements are all of the elements in foo followed by a copy of x , and as a side effect, I must promise that I will never again use the value of foo .

The question, then, is whether this call of concat modifies foo . We might argue that the definition of concat requires its users to write their code in ways that work properly regardless of whether concat modifies foo . If a program never uses a variable again after passing it as the first argument of concat , it meets this requirement.

With this notion in mind, let's look again at the collatz_vec function I showed two weeks ago. This function executes the statement

result.push_back(n);

repeatedly in a loop, where result is a vector<unsigned> . Using the technique that I described in that article, we could define a tail-recursive auxiliary function something like

vector<unsigned> collatz_vec_aux(unsigned n, unsigned count, vector<unsigned> result)

and then replace the recursive calls in collatz_aux by something like

return collatz_vec_aux(n / 2, count + 1, concat(result, n));

Here, concat would be a function written along the lines of the first example in this article, which would give the appearance of creating a new vector containing the elements of result along with a new element at the end, but which would really recycle the vector 's previous contents instead of copying them.

This technique would work for this particular example. However, it does not meet what I think is the definition of a data structure that never changes after being created. For such a data structure, I think it is reasonable to insist that after executing

auto bar = concat(foo, x);

the value of foo must be whatever it was before, and there should be no prohibition against looking at that value.

I had thought until recently that such a data structure must be impossible if it were based on contiguous data structures such as arrays or vectors — and, in a sense, I still think it is impossible. However, one can almost get there through cheating. Here's how.

Consider the following simple data structure, each instance of which represents an initial segment of a vector:

template<class T> struct Thing { typename vector<T>::size_type size; shared_ptr<vector<T>> p; };

We can imagine implementing a nondestructive concat function along these lines:

template<class T> Thing<T> concat(const Thing<T>& t, T n) { Thing<T> result = t; ++result.size; result.p->push_back(n); }

This code would affect the contents of the vector associated with a Thing , but because its size element would remain unchanged, the change to the vector would be invisible. Accordingly, after

auto bar = concat(foo, x);

it would still be possible to access the value of foo .

The reason I said one could almost get there is that the following example fails:

auto bar = concat(foo, x); auto baz = concat(foo, y);

Here, we would like baz to hold the (original) elements of foo followed by y , but bar has left the underlying vector in a state that renders this strategy impossible. The solution is to make concat check whether the Thing 's size matches the vector 's size; if it doesn't, it is necessary to copy the vector . This strategy allows well-behaved programs such as collatz_vec to run efficiently, while still delivering semantics — at a price — that preserve the value of concat 's argument. We can see this price: Defining bar above takes O(1) time, but defining baz takes O(n) time, where n is the number of elements in foo .

I think that these examples show a fundamental limitation of array-based data structures: There is only one place where "the next element" in such a data structure can go. As a result, if you want to be able to augment a data structure in a uniformly efficient way, it seems to be necessary to use a node-based structure rather than an array-based one.