Deleting from a collection is hard in STL. Or rather, it’s subtle. Which is roughly equivalent to hard.

Jim Beveridge talks about STL erase in his latest blog post and that inspired me to spend some time messing around with STL containers last night.

You can remove all even numbers from a set with Jim’s code:



set<int> coll; /* ... */ for (set<int>::iterator itr=coll.begin(); itr!=coll.end(); /*EMPTY*/ ) if (*itr%2 == 0) coll.erase(itr++); else ++itr;

But you can’t take this code, replace “set” with “vector”, and use it to remove all even numbers from a vector. For one thing, inserting or deleting an element in the middle of a vector invalidates all iterators that point to elements following the insertion or deletion point, so the trick of post-incrementing itr doesn’t help. You do indeed get a pointer to the next element which is valid until erase is called, which then invalidates the iterator. Fortunately erase doesn’t reallocate the vector’s storage, so the iterator is just off by one. Still, if you populate your vector with the values 1,2,2,4,5 and run it through the above loop, you’ll wind up with 1,2,5.

Here’s how to remove all the even numbers from a vector:



vector<int> coll; /* ... */ for (vector<int>::iterator itr=coll.begin(); itr!=coll.end(); /*EMPTY*/ ) if (*itr%2 == 0) itr = coll.erase(itr); else ++itr;

Note that vector::erase returns an iterator which points to the next value (and, incidentally, the same memory location). That’s because it’s an STL sequence, while set is an associative container. (There are five different STL containers: the three sequences vector, list and deque and two associative containers set and multiset.)

So, given that there are two different interfaces for erase corresponding to two different container types, how can I write a template function that deletes even elements from a templated container? I can’t just write two versions of delete_even with different implementations; they need to have different signatures. I can’t use iterator tags because container category isn’t a property of the iterator: set and list both have bidirectional iterators, but are in different container categories.

Here’s what I wound up with:



struct assoc_container_tag {}; struct sequence_tag {}; template <class T> sequence_tag container_category( const vector<T>& v ) { return sequence_tag(); } template <class T> sequence_tag container_category( const list<T>& v ) { return sequence_tag(); } template <class T> sequence_tag container_category( const deque<T>& v ) { return sequence_tag(); } template <class T> assoc_container_tag container_category( const set<T>& v ) { return assoc_container_tag(); } template <class T> assoc_container_tag container_category( const multiset<T>& v ) { return assoc_container_tag(); } template <typename T, typename P> void delete_if( T& t, P pred, assoc_container_tag ) { for( typename T::iterator i = t.begin(); i != t.end(); ) if( pred(*i) ) t.erase( i++ ); else ++i; } template <typename T, typename P> void delete_if( T& t, P pred, sequence_tag ) { for( typename T::iterator i = t.begin(); i != t.end(); ) if( pred(*i) ) i = t.erase( i ); else ++i; } template <typename T, typename P> void delete_if( T& t, P pred ) { delete_if( t, pred, container_category(t) ); }

I defined two empty types to differentiate between the two container categories. I wrote a templated function container_category and partially specialized it for the five basic container types, returning the appropriate container category tag. Then come two container-category-specific versions of the deletion loop, using either Jim’s postincrementing iterator trick t.erase( i++ ); or the returned iterator i = t.erase(i) . The entry point is the two-argument version of delete_if, which uses container_category to select (at compile-time) the appropriate three-argument version of delete_if to call.

Note that to make it look more like STL I’ve factored out the test for evenness, and now you can pass in an arbitrary predicate. And I’ve changed the routine name from delete_even to delete_if.

Why does that sound vaguely familiar?

[…]

Oh yes. Because there is an STL function remove_if which does exactly the same thing, only better, because you can give it an iterator range instead of an entire container.

So: was all this work wasted? No! Object categories are still a useful technique, and it’s good to play around with them so that they’re familiar when needed. It’s important to think about the properties of containers and their iterators, and to know which operations can invalidate iterators. I had an insight about vector::erase and iterator invalidation, but I’m not sure how to put it into words yet.

And anyway, remove_if has problems of its own, which I’ll get to in another post.