Introduction

When some subscribers in my email list asked me to write an article on std::vector, my first thought was: “what can I possibly write that hasn’t already been written?” So I did a quick search to find out that most of what is out there is either a very basic tutorial or a reference guide. Therefore, I decided to write an article not only for the beginner but for the intermediate developer. Someone who needs to learn not only the basic interface but some of the intricacies that may cause bugs if not understood well. This article is not meant to be an exhaustive reference. I’ll only cover parts of the interface that, in my experience, are most commonly used. I’ll also go a bit more in-depth on portions where I’ve frequently noticed developer misuse and bugs. Additionally, I’ll cover the std::vector’s advantages and disadvantages, as well as when it’s appropriate to use it.

Beginners will also benefit from this article as I’ll cover the basic concepts and interface. As for required background, understanding the basics of arrays, pointers, and references will make understanding the material easier, but I have not found it essential. For the most part, std::vector is very easy to use and it is very efficient for many applications. It is one of the most basic containers, and every C++ developer must know it well. In fact, mastering it is essential. Let’s get right to it.

What is a vector?

The std::vector is a container, and it is part of the C++ Standard Library. Its basic purpose is to store a number of elements of the same type. Specifically, the std::vector is an abstraction of a data structure known as dynamic size array, or growing array. Just as a static array, the memory containing its elements is contiguous, and it provides constant time access to any element. Unlike a static array, a vector automatically grows and shrinks as necessary to make room for new elements inserted or erased.

You can visualize a vector like this:



Each square represents enough space in memory to store a value of a particular type. A vector stores its size and a pointer to its contiguous memory where its elements are stored.

Tutorial

To use std::vector, you need to include the vector header file. Like this:

#include <vector>

Once you have included the header file, you can create vectors to store values of any type. All you need to do is to pass the “value_type” as the template parameter. For example, this is how you create a vector of integers:

auto intvector = std::vector< int >{};

And a vector of strings:

auto strvector = std::vector< std::string >{};

What!? Auto? Yes! Using auto for these declarations seems, at first, a little strange, but you’ll soon get used to it. Herb Sutter has written extensively about the advantages of using auto for variable declarations, so I won’t include that here. Needless to say, my priority is to always write code for clarity and correctness.

Initialization

You can initialize a vector with values when you create it:

auto v = vector<int>(5, 0); // v has 5 zeros auto w = vector<int>{0, 1, 2, 3, 4}; // w has 5 values auto sv = vector<string>{"hi"s, "bye"s, "world"s}; // sv has 3 // strings auto uv = vector<int>(5); // uv has 5 default constructed // ints (i.e. 5 zeroes) auto vcopy = v; // vcopy is a copy of v; it has 5 zeros

Size

You can get the size of a vector:

auto elements = v.size(); // v is a vector

And you can check if it has any elements:

if (!v.empty()) {…} // v is a vector

Both size() and empty() refer to how many elements are stored in the vector and not to how much memory the vector has allocated for its elements. A vector may be empty, but it may have enough memory allocated to store a 100 elements. In that case, the vector has capacity of 100 even though its size is 0. More on this later…

Access

You can access the first element of a vector like this:

auto& first = v.front(); // returns a reference to first element auto& also_first = v[0]; // first element is at index 0

And you can access the last element like this:

auto& last = v.back(); auto& also_last = v[v.size()-1]; // last element is at size-1

And you can access any element in between like this:

auto& elem = v[v.size()/2]; auto& e = v[2]; // returns a reference to the third element

As illustrated above, operator[]() lets you access any element of a vector. For example, you can print all elements like this:

for (auto i = size_t{0}; i != v.size(); ++i) { // v is a vector cout << v[i] << ' '; } cout << '

';

Using indices to traverse all elements is not the best choice, but you can do it. The best choice is using a range-based for-loop. I’ll provide an example later in this article.

Bug alert! The access operators above don’t do any bounds checking, so you must make sure you are not trying to access an element which is out of bounds, like calling front() on an empty vector for example. If you do, the result is undefined, and most likely, it will be bad. A way to prevent out-of-bounds errors is to use range-based for-loops or iterators with std algorithms. I’ll cover iterators later in this article. If you are writing your own loops, stick to using C++ loop idioms. In their book: “The Practice of Programming,” Kernighan and Pike have a great explanation of why you should use idioms:

“Consider the C, C++, or Java code for stepping through the n elements of an array, for example to initialize them. Someone may write the loop like this: ? i = 0; ? while (i <= n-1) ? array[i++] = 1.0; or perhaps like this: ? for (i = 0; i < n; ) ? array[i++] = 1.0; or even: ? for (i = n; --i >= 0; ) ? v[i] = 1.0; All of these are correct, but the idiomatic form is like this: for (i = 0; i < n; ++i) v[i] = 1.0; This is not an arbitrary choice. It visits each member of an n-element array indexed from 0 to n-1. It places all the loop control in the for itself, runs in increasing order, and uses the very idiomatic ++ operator to update the loop variable. It leaves the index variable at a known value just beyond the last array element. Native speakers recognize it without study and write it correctly without a moment’s thought.”

If you want to have bounds checking, you can use at(), which will throw an exception of type std::out_of_range if you try to access an element that is not within the range of the vector. For example:

try { ... auto& elem = v.at(5); // returns a reference to sixth element ... } catch(std::out_of_range) { // if there is no sixth element, you’ll end up here... // handle out_of_range }

Iterators

As I’ve already mentioned, the best way to traverse all elements of a vector is using a range-based for-loop:

// print only positive elements for (const auto& e : v) { if (e > 0) std::cout << e << ' '; } std::cout << '

';

However, if you want more control or if you want to run an algorithm on some or all of a vector’s elements, you can use iterators. You can think of an iterator as a pointer to an element (or to one-past-last element in the case of end). Since a vector stores its elements contiguously, they can be accessed not only through iterators but also through offsets to regular pointers to elements. However, use iterators instead of pointers since they are the abstraction meant to traverse not only vectors, but standard containers in general. Different containers have different memory arrangements, and their iterators will make life a lot easier rather than trying to use pointers.

You can get iterators to the beginning or to the end of a vector:

auto v = std::vector<int>{0, 1, 2, 3, 4}; auto begin = v.begin(); // begin points to first element // or auto b = std::begin(v); // begin is defined in <iterator> auto end = v.end(); // points to one past the last element // or auto e = std::end(v); // end is defined in <iterator>

And you can get const_iterators, which you may think of as pointer to const elements:

auto cbegin = v.cbegin(); // v is a vector // or auto cb = std::cbegin(v); // cbegin is defined in <iterator> auto cend = v.cend(); //points to one past the last element // or auto ce = std::cend(v); // cend is defined in <iterator>

You can dereference iterators just as if they were pointers:

auto v = std::vector<int>{1,2,3}; auto it = v.cbegin(); // it points to first element auto first_elem = *it; // first_elem’s value is 1

Similarly:

auto sv = std::vector<std::string>{"first", "second"}; auto it = sv.cbegin(); // it points to first element auto first_str_sz = it->size();

The std::vector iterators are Random Access Iterators, which practically means that they behave exactly like pointers when it comes to iterator arithmetic and comparison. For instance:

auto v = std::vector<int>{1, 2, 3, 4}; auto it = v.cbegin(); // it points to first element ++it; // it now points to second element auto jt = it + 2; // jt points to fourth element auto is_end = (jt == v.cend()); // false; // end is one past last element

You can use iterators to run algorithms on a vector. The code below determines if a vector contains an element of a certain value:

#include <iostream> #include <vector> #include <algorithm> #include <iterator> int main() { auto v = std::vector<int>{0, 1, 2, 3, 4}; auto n = 3; // value we are looking for auto cend = v.cend(); // used more than once // find if v contains an element of value n // find returns a const_iterator auto result = std::find(v.cbegin(), cend, n); if (result != cend) { // iterators can be compared std::cout << "v contains: " << n << '

'; } else { std::cout << "v doesn’t contain: " << n << '

'; } }

The std::find() algorithm does a linear search, but since the vector is sorted, you could also have used std::equal_range(), which uses a binary search algorithm. It doesn’t make much difference with a vector so small though.

You can also use iterators to write your own loops:

auto v = std::vector<int>{-1, 3, -5, 8, 1}; auto cend = v.cend(); // print only positive elements for (auto i = v.cbegin(); i != cend; ++i) { if (*i > 0) { std::cout << *i << ' '; } } std::cout << '

';

However, you should prefer standard algorithms to writing your own loops since it often yields code that is more efficient and maintainable, as well as correct. Here is how you can print only the positive elements with std::for_each():

auto v = std::vector<int>{-1, 3, -5, 8, 1}; auto print_if_positive = [](auto elem) { if (elem > 0) std::cout << elem << ' '; }; for_each(v.cbegin(), v.cend(), print_if_positive); std::cout << '

';

Modifying elements

You can modify the value of an existing vector element with operator[]():

auto v = std::vector<int>{1, 1, 2}; // first element is 1 v[0] = 0; // first element is now 0

Inserting and erasing elements – Bug alert!

All operations that insert or erase an element from a vector invalidate at least one and possibly more memory addresses to other elements (or end) of the vector. Beware of stored iterators, pointers, or references because they may become invalid as soon as an element is inserted or erased. I’ve known developers spending hours trying to find out what happened to their data simply because they forgot this property.

There are two vector features to consider when trying to determine which iterators or references become invalid. First, a vector stores its elements contiguously. As a consequence, previously obtained iterators, pointers, and references to elements AT and AFTER the point of insertion or deletion become invalid. This is because elements need to be shifted either to open space for a new element or to close a gap left by an erased element. The following figure illustrates how this occurs when inserting a new element:



After inserting 5 at position 3, old iterators, “it” and “end”, are invalid since 2 and end were shifted to open space for the 5.

Similarly, when erasing an element:



This time 5 was erased, and old iterators, “it” and “end”, became invalid. They are no longer pointing to the correct element or to the correct end of the array.

Before I move to the second feature, let me define capacity. The capacity of a vector is the maximum number of elements that can be stored in the memory it already has allocated. The following figure illustrates the concept of capacity:



Capacity is independent of size.

With capacity defined, we can now move on to the second feature: a vector will reallocate all of its elements to a larger chunk of memory if the new size is greater than the old capacity. Of course, if the vector reallocates all elements to new memory, all of their addresses change, and all iterators, pointers, and references to the old addresses become invalid. So unless you’re keeping track of capacity, assume that all iterators, pointers, and references are invalidated after appending or inserting an element.

Appending a new element to the end

You can append new elements to the end of a vector with push_back():

auto v = std::vector<int>{0, 1, 2}; // v.size() is 3 v.push_back(3); // v.size() is now 4: {0,1,2,3}

Erasing the last element

And you can remove elements from the end of a vector with pop_back():

v.pop_back(); // v.size() is back to 3: {0,1,2}

Because they can be used to implement a stack, push_back(), back(), and pop_back() are sometimes referred to as stack operations.

Bug alert! Underflowing a vector will leave it in an undefined state, so don’t do this:

auto v = std::vector<int>{}; v.pop_back(); // ups! undefined effect; // the state of v becomes undefined v.push_back(); // undefined effect since v’s state is undefined

Inserting new elements

You may also insert new elements before a specified location:

auto v = std::vector<int>{5}; // v has 1 element, 5 auto it = v.cbegin(); v.insert(it, 1); // v now has 2 elements: {1,5} it = v.cbegin(); // "it" is no longer valid; get a new one v.insert(it, 2, 0); // v now has 4 elements: {0,0,1,5} auto w = std::vector<int>{3, 4, 5}; it = v.cbegin(); // "it" is no longer valid; get a new one v.insert(it+1, w.cbegin(), w.cend()); // v now has: // {0,3,4,5,0,1,5} v.insert(v.cbegin(), {-2, -1}); // v now has: // {-2,-1,0,3,4,5,0,1,5}

Erasing elements

Specified elements may be erased from the vector with erase():

auto v = std::vector<int>{0, 1, 8, 2, 3, 7, 9}; // erase third element auto cit = v.cbegin(); v.erase(cit+2); // v now has {0, 1, 2, 3, 7, 9} // erase elements in the range [cfirst, cend) auto cfirst = v.cend() - 2; v.erase(cfirst, v.cend()); // v now has {0, 1, 2, 3}

Clearing all elements from a vector

You can remove all elements from a vector with clear():

auto v = std::vector<int>{1, 2, 3}; v.clear(); // v is now empty; it has size 0

Capacity and memory management

When a vector is created, it allocates just enough memory from the heap to store the number of elements that it was initialized with. If it is asked to store a new element, and it doesn’t have space for it, the vector allocates a new chunk of memory of twice the size of its current memory. It copies all of its current elements into the new memory, and it inserts the new element in the right place. It then destroys the old elements and deallocates the old memory.

[Edit] The standard doesn’t specify how large the new chunk of memory should be. g++ and clang++ double it, but MSVC uses 1.5x the size of the current chunk. Other compilers may use different ratios.

You can use capacity() to see the number of elements that a vector can currently store based on how much memory it has allocated. For example:

auto v = vector<int>{}; // size: 0, capacity: 0 v.push_back(1); // size: 1, capacity: 1 v.push_back(2); // size: 2, capacity: 2 v.push_back(3); // size: 3, capacity: 4 v.push_back(4); // size: 4, capacity: 4 v.push_back(5); // size: 5, capacity: 8 auto w = vector<int>{0, 1, 2, 3, 4}; // size: 5, capacity: 5 w.push_back(5); // size: 6, capacity: 10 w.pop_back(); // size:5, capacity 10

[Edit] As you can see in the example above, erasing elements from std::vector shrinks the size but not the capacity. In order to return memory to the system, you must explicitly do it with shrink_to_fit(). See below.

You may also request to return extra memory to the system by calling shrink_to_fit():

auto w = vector<int>{0, 1, 2, 3, 4}; // size: 5, capacity: 5 w.push_back(5); // size: 6, capacity: 10 w.shrink_to_fit(); // size: 6, capacity 6

The std::vector automatic reallocation is convenient, however. Reallocations are usually expensive, so if you know in advance how many elements a vector will store, allocate all of the required memory at once with reserve():

auto v = vector<int>{}; // size: 0, capacity: 0 auto max_elements = 100u; v.reserve(max_elements); // size: 0, capacity: 100

Moving vectors

Passing and returning vectors to and from functions

Since the introduction of move semantics, standard vectors are now considered cheap to move. This has changed the default Modern C++ advice on how to return vectors from functions. By default advice, I mean: “unless you have a reason to do otherwise, follow this advice and don’t overthink it.” Anyway, the modern default advice is: return a local vector by value. For example:

auto generate_n_random_ints(unsigned n) { auto v = std::vector<int>{}; v.reserve(n); // to allocate only once for (auto i = 0u; i != n; ++i) v.push_back(generate_random_int()); return v; // NOT "return std::move(v)" }

Notice that generate_n_random_ints() returns “v” and not “move(v).” Returning “move(v)” forces your compiler to do a move instead of Return Value Optimization (RVO), which is more efficient. If RVO is not possible or if the compiler doesn’t want to perform it, the standard requires it to do a move. As a consequence, there is nothing to win by returning “move(v),” but you will lose the opportunity for RVO if you do. Therefore, never apply std::move() to local objects if they are eligible for RVO.

The default Modern C++ advice for the rest of the passing and the returning of vectors from functions hasn’t changed.

This is the default Modern C++ advice for passing and returning std::vector from a function, f():

Out: return by value, vector<T> f();

return by value, In and Out: pass an lvalue reference, f(vector<T>&);

pass an lvalue reference, In: pass a const lvalue reference, f(const vector<T>&);

pass a const lvalue reference, In and retain a copy: pass a const lvalue reference, f(const vector<T>&);

Destruction

When a vector is destroyed, it’s usually by going out of scope. It destroys all elements in it before deallocating its memory. There is not much else to say about destruction.

Performance

Performance of common operations on std::vector:

Random access – constant, O(1)

Removal of elements at the end – constant, O(1)

Insertion of elements at the end – amortized constant, O(1)

Insertion or removal of elements – linear in the distance to the end of the vector, O(n)

A note on performance

Adding and removing elements from the “back” of a vector is fast since it doesn’t affect any of the existing vector elements. However, inserting or erasing elements anywhere else is less efficient since it involves shifting every element from the insertion position onward. But, with modern computers, this is not as bad as it sounds. Modern computers are able to use cache very efficiently with vectors due to their compactness and predictable memory usage patterns. In addition to cache locality, vectors have low overhead compared to other data structures, such as lists. In practice, this gives vectors an advantage with many algorithms even when a lot of random insertions or deletions are required. Therefore, always consider std::vector for your applications.

Advantages of vectors:

Easy to use

Constant element access

Memory management, automatic allocation and deallocation for its elements

Works well with sort and binary search algorithms

Removal of elements at the end – it’s constant, O(1)

Insertion of elements at the end – it’s amortized constant, O(1)

Cheap to move

Provides bounds checking (with at())

Disadvantages of vectors:

Inserting or removing elements can be relatively costly, O(n), as it requires shifting (moving) all elements from insertion point to the end. However, cache locality and compactness considerably alleviates this.

To keep elements in contiguous memory, existing elements are copied to a new memory location every time the vector grows. Due to the memory reallocation pattern of std::vector, this doesn’t occur often. However, you must keep it in mind since it may be critical for some applications.

When should you use a vector?

Vectors are a simple way to group data. They provide an efficient way to access their elements. They have little space overhead, and they provide a general and no less efficient alternative to manually managing memory. They are excellent for fixed-size data sets and for variable-size but small data sets. They also work well with sort and binary search algorithms. Additionally, they are not bad for many tasks requiring variable-size large data sets, especially if the type you are storing is small. All in all, you should always consider using a vector.

When should you not use a vector?

Don’t use std::vector if you need a specific behavior that it doesn’t provide or if your data set is large and your application can’t afford the time to copy all elements on the rare occasions when the vector reallocates.

Of course, decisions aren’t always black and white, there are gray areas when it’s not clear if std::vector is your best option. If you are storing large objects, maintaining a large and changing set of data with a vector can be costly. However, defining “large,” both for objects and data sets, isn’t simple, so in these cases, I would default to using std::vector. Then, if I can’t meet my performance requirements, I’ll proceed to profiling my application, and I’ll move forward from there. Don’t worry. The more experience you get, the more the gray area shrinks.

What’s next?

All you need now to master std::vector is practice, and as I said at the beginning: as a C++ developer, you must master std::vector. It is a basic, easy to use data structure with multiple applications, so spend some time writing some programs where you can use vectors. As you are coding, always remember that any operation that changes the size of a vector invalidates some or all existing iterators and references, so be careful if you have any stored. This is where I’ve seen the most bugs. Also, make sure to watch your boundaries since going out of bounds is another common source of errors. As always, feel free to ask me questions or give me suggestions in the comments, and I’ll get to them as soon as I can. Happy coding!

SIGN UP TO MY EMAIL LIST!

If you liked this article, SIGNUP TO MY EMAIL LIST, and I’ll email you when I post anything new. My aim is to create material that:

Is practical so that you can start using it NOW.

Will guide you on what to focus and what not to so that you can learn Modern C++ FAST.

Will refresh important knowledge in order for you to remain SUCCESSFUL.



