std::optional: How, when, and why

Casey

September 4th, 2018

This post is part of a regular series of posts where the C++ product team here at Microsoft and other guests answer questions we have received from customers. The questions can be about anything C++ related: MSVC toolset, the standard language and library, the C++ standards committee, isocpp.org, CppCon, etc. Today’s post is by Casey Carter.

C++17 adds several new “vocabulary types” – types intended to be used in the interfaces between components from different sources – to the standard library. MSVC has been shipping implementations of std::optional , std::any , and std::variant since the Visual Studio 2017 release, but we haven’t provided any guidelines on how and when these vocabulary types should be used. This article on std::optional is the first of a series that will examine each of the vocabulary types in turn.

The need for “sometimes-a-thing”

How do you write a function that optionally accepts or returns an object? The traditional solution is to choose one of the potential values as a sentinel to indicate the absence of a value:

void maybe_take_an_int(int value = -1); // an argument of -1 means "no value" int maybe_return_an_int(); // a return value of -1 means "no value"

This works reasonably well when one of the representable values of the type never occurs in practice. It’s less great when there’s no obvious choice of sentinel and you want to be able to pass all representable values. If that’s the case, the typical approach is to use a separate boolean to indicate whether the optional parameter holds a valid value:

void maybe_take_an_int(int value = -1, bool is_valid = false); void or_even_better(pair<int,bool> param = std::make_pair(-1, false)); pair<int, bool> maybe_return_an_int();

This is also feasible, but awkward. The “two distinct parameters” technique of maybe_take_an_int requires the caller to pass two things instead of one to represent a single notion, and fails silently when the caller forgets the bool and simply calls maybe_take_an_int(42) . The use of pair in the other two functions avoids those problems, but it’s possible for the user of the pair to forget to check the bool and potentially use a garbage value in the int . Passing std::make_pair(42, true) or std::make_pair(whatever, false) is also hugely different than passing 42 or nothing – we’ve made the interface hard to use.

The need for “not-yet-a-thing”

How do you write a class with a member object whose initialization is delayed, i.e., optionally contains an object? For whatever reason, you do not want to initialize this member in a constructor. The initialization may happen in a later mandatory call, or it may happen only on request. When the object is destroyed the member must be destroyed only if it has been initialized. It’s possible to achieve this by allocating raw storage for the member object, using a bool to track its initialization status, and doing horrible placement new tricks:

using T = /* some object type */; struct S { bool is_initialized = false; alignas(T) unsigned char maybe_T[sizeof(T)]; void construct_the_T(int arg) { assert(!is_initialized); new (&maybe_T) T(arg); is_initialized = true; } T& get_the_T() { assert(is_initialized); return reinterpret_cast<T&>(maybe_T); } ~S() { if (is_initialized) { get_the_T().~T(); // destroy the T } } // ... lots of code ... };

The "lots of code" comment in the body of S is where you write copy/move constructors/assignment operators that do the right thing depending on whether the source and target objects contain an initialized T . If this all seems horribly messy and fragile to you, then give yourself a pat on the back – your instincts are right. We’re walking right along the cliff’s edge where small mistakes will send us tumbling into undefined behavior.

Another possible solution to many of the above problems is to dynamically allocate the “optional” value and pass it via pointer – ideally std::unique_ptr . Given that we C++ programmers are accustomed to using pointers, this solution has good usability: a null pointer indicates the no-value condition, * is used to access the value, std::make_unique<int>(42) is only slightly awkward compared to return 42 and unique_ptr handles the deallocation for us automatically. Of course usability is not the only concern; readers accustomed to C++’s zero-overhead abstractions will immediately pounce upon this solution and complain that dynamic allocation is orders of magnitude more expensive than simply returning an integer. We’d like to solve this class of problem without requiring dynamic allocation.

optional is mandatory

C++17’s solution to the above problems is std::optional . optional<T> directly addresses the issues that arise when passing or storing what may-or-may-not-currently-be an object. optional<T> provides interfaces to determine if it contains a T and to query the stored value. You can initialize an optional with an actual T value, or default-initialize it (or initialize with std::nullopt ) to put it in the “empty” state. optional<T> even extends T ‘s ordering operations < , > , <= , >= – where an empty optional compares as less than any optional that contains a T – so you can use it in some contexts exactly as if it were a T . optional<T> stores the T object internally, so dynamic allocation is not necessary and in fact explicitly forbidden by the C++ Standard.

Our functions that need to optionally pass a T would be declared as:

void maybe_take_an_int(optional<int> potential_value = nullopt); // or equivalently, "potential_value = {}" optional<int> maybe_return_an_int();

Since optional<T> can be initialized from a T value, callers of maybe_take_an_int need not change unless they were explicitly passing -1 to indicate “not-a-value.” Similarly, the implementation of maybe_return_an_int need only change places that are returning -1 for “not-a-value” to instead return nullopt (or equivalently {} ).

Callers of maybe_return_an_int and the implementation of maybe_take_an_int require more substantial changes. You can ask explicitly if an instance of optional holds a value using either the has_value member or by contextual conversion to bool :

optional<int> o = maybe_return_an_int(); if (o.has_value()) { /* ... */ } if (o) { /* ... */ } // "if" converts its condition to bool

Once you know that the optional contains a value, you can extract it with the * operator:

if (o) { cout << "The value is: " << *o << '

'; }

or you can use the value member function to get the stored value or a bad_optional_access exception if there is none, and not bother with checking:

cout << "The value is: " << o.value() << '

';

or the value_or member function if you’d rather get a fallback value than an exception from an empty optional :

cout << "The value might be: " << o.value_or(42) << '

';

All of which together means we cannot inadvertently use a garbage value as was the case for the “traditional” solutions. Attempting to access the contained value of an empty optional results in an exception if accessed with the value() member, or undefined behavior if accessed via the * operator that can be caught by debug libraries and static analysis tools. Updating the “old” code is probably as simple as replacing validity tests like value == not_a_value_sentinel and if (is_valid) with opt_value.has_value() and if (opt_value) and replacing uses with *opt_value .

Returning to the concrete example, your function that looks up a string given an integer can simply return optional<string> . This avoids the problems of the suggested solutions; we can

easily discern the no-value case from the value-found case, unlike for the “return a default value” solution,

report the no-value case without using exception handling machinery, which is likely too expensive if such cases are frequent rather than exceptional,

avoid leaking implementation details to the caller as would be necessary to expose an “end” iterator with which they could compare a returned iterator.

Solving the delayed initialization problem is straightforward: we simply add an optional<T> member to our class. The standard library implementer is responsible for getting the placement new handling correct, and std::optional already handles all of the special cases for the copy/move constructors/assignment operators:

using T = /* some object type */; struct S { optional<T> maybe_T; void construct_the_T(int arg) { // We need not guard against repeat initialization; // optional's emplace member will destroy any // contained object and make a fresh one. maybe_T.emplace(arg); } T& get_the_T() { assert(maybe_T); return *maybe_T; // Or, if we prefer an exception when maybe_T is not initialized: // return maybe_T.value(); } // ... No error-prone handwritten special member functions! ... };

optional is particularly well-suited to the delayed initialization problem because it is itself an instance of delayed initialization. The contained T may be initialized at construction, or sometime later, or never. Any contained T must be destroyed when the optional is destroyed. The designers of optional have already answered most of the questions that arise in this context.

Conclusions

Any time you need a tool to express “value-or-not-value”, or “possibly an answer”, or “object with delayed initialization”, you should reach into your toolbox for std::optional . Using a vocabulary type for these cases raises the level of abstraction, making it easier for others to understand what your code is doing. The declarations optional<T> f(); and void g(optional<T>); express intent more clearly and concisely than do pair<T, bool> f(); or void g(T t, bool is_valid); . Just as is the case with words, adding to our vocabulary of types increases our capacity to describe complex problems simply – it makes us more efficient.

If you have any questions, please feel free to post in the comments below. You can also send any comments and suggestions directly to the author via e-mail at cacarter@microsoft.com, or Twitter @CoderCasey. Thank you!