Copying Container Elements From The C++ Library: It's Trickier Than It Looks We must generally assume that adding elements to a vector will invalidate every reference, pointer, or iterator that refers to a vector element.



Last week, I pointed out that putting a new element in a vector might reallocate the vector. As a result of this behavior, we must generally assume that adding elements to a vector will invalidate every reference, pointer, or iterator that refers to a vector element.

With this assumption in mind, let's assume that t is a Thing and that v is a vector<Thing> , and consider the function call

v.push_back(t);

Obviously, this call appends a copy of t to v . What is the type of push_back 's parameter?

One's first thought is apt to be that the parameter should be a Thing . However, in that case, calling v.push_back(t) would always end up copying t twice: first from t to push_back 's parameter, and then from there into v . To avoid this double copying, push_back is actually defined to take a reference parameter. In other words, by definition a call to push_back does not copy the argument directly; any copying must be done explicitly within the body of push_back .

Now consider this call:

v.push_back(v[0]);

We assume that v already has at least one element; otherwise, the mere act of evaluating v[0] is undefined. In this call, push_back 's parameter is a reference to v[0] . Like any other reference to an element of v , this reference will become invalid if v is reallocated. This fact leads to a requirement on the implementation of push_back :

push_back must copy its parameter before it invalidates any references to elements of v .

However, in order to copy its parameter, push_back must have some place to copy it. Unless the parameter is to be copied twice, that place must be the element of v in which the copy will eventually live. But that element does not exist until v is reallocated! So we seemingly have a requirement that push_back copy its parameter after reallocating v , even though we have already seen that it must copy it before reallocating v .

Fortunately, there is a way around the seeming contradiction between these requirements: Unlike user code, library routines such as push_back can know about how vectors are implemented. For example, reallocating a vector probably has four phases:

Allocate enough memory to hold the new elements of v . Move the old elements to the new memory. Destroy the old elements. Free the old memory.

Unlike user code, push_back can copy its parameter to the appropriate place in the new memory between phases 1 and 2. At that point in the code, all of v 's elements are still in their original places, which means that any references to those elements are still valid. So long as push_back 's parameter has been copied to its destination before phase 2 begins, there is no risk of invalidating that parameter before copying its contents.

You should be able to see from even this simple example that it is harder for the C++ library to manipulate vectors than might appear on the surface. This difficulty is yet one more reason to use the library instead of trying to reimplement it.