C++17/20 Features and Fixes in Visual Studio 2019

April 19th, 2019

Visual Studio 2019 version 16.0 is now available and is binary compatible with VS 2015/2017. In this first release of VS 2019, we’ve implemented more compiler and library features from the C++20 Working Paper, implemented more <charconv> overloads (C++17’s “final boss”), and fixed many correctness, performance, and throughput issues. Here’s a list of the C++17/20 compiler/library feature work and the library fixes. (As usual, many compiler bugs were also fixed, but they aren’t listed here; compiler fixes tend to be specific to certain arcane code patterns. We recently blogged about compiler optimization and build throughput improvements in VS 2019, and we maintain a documentation page about compiler conformance improvements in VS 2019.)

New Features:

Implemented P1164 from C++20 unconditionally. This changes std::create_directory to check whether the target was already a directory on failure. Previously, all ERROR_ALREADY_EXISTS type errors were turned into success-but-directory-not-created codes.

to check whether the target was already a directory on failure. Previously, all type errors were turned into success-but-directory-not-created codes. The iterator debugging feature has been taught to properly unwrap std::move_iterator . For example, std::copy(std::move_iterator<std::vector<int>::iterator>, std::move_iterator<std::vector<int>::iterator>, int*) can now engage our memcpy fast path.

. For example, can now engage our fast path. The standard library’s macroized keyword enforcement <xkeycheck.h> was fixed to emit the actual problem keyword detected rather than a generic message, look for C++20 keywords, and avoid tricking IntelliSense into saying random keywords were macros.

was fixed to emit the actual problem keyword detected rather than a generic message, look for C++20 keywords, and avoid tricking IntelliSense into saying random keywords were macros. We added LWG 2221‘s operator<<(std::ostream, nullptr_t) for writing nullptr s to streams.

for writing s to streams. Parallel versions of is_sorted , is_sorted_until , is_partitioned , set_difference , set_intersection , is_heap , and is_heap_until were implemented by Miya Natsuhara.

, , , , , , and were implemented by Miya Natsuhara. P0883 “Fixing atomic initialization”, which changes std::atomic to value-initialize the contained T rather than default-initializing it, was implemented (also by Miya) when using Clang/LLVM with our standard library. This is currently disabled for C1XX as a workaround for a bug in constexpr processing.

to value-initialize the contained rather than default-initializing it, was implemented (also by Miya) when using Clang/LLVM with our standard library. This is currently disabled for C1XX as a workaround for a bug in constexpr processing. Implemented the “spaceship” three-way comparison operator from P0515 “Consistent comparison”, with partial support for the C++20 <compare> header as specified in P0768 (specifically, the comparison category types and common_comparison_category type trait, but not the comparison algorithms which are undergoing some redesign in WG21). (Implemented by Cameron DaCamara in the compiler.)

header as specified in P0768 (specifically, the comparison category types and type trait, but not the comparison algorithms which are undergoing some redesign in WG21). (Implemented by Cameron DaCamara in the compiler.) Implemented the new C++20 P1008 rules for aggregates: a type with a user-declared constructor – even when defaulted or deleted so as not to be user-provided – is not an aggregate. (Implemented by Andrew Marino in the compiler.)

Implemented the remove_cvref and remove_cvref_t type traits from P0550, which are handy for stripping reference-ness and cv-qualification but without decaying functions and arrays to pointers (which std::decay and std::decay_t do).

and type traits from P0550, which are handy for stripping reference-ness and cv-qualification but without decaying functions and arrays to pointers (which and do). C++17 <charconv> floating-point to_chars() has been improved: shortest chars_format::fixed is 60-80% faster (thanks to Ulf Adams at Google for suggesting long division), and shortest/precision chars_format::hex is complete. Further performance improvements for shortest fixed notation have been implemented and will ship in a future VS 2019 update, along with the decimal precision overloads that will complete the <charconv> implementation.

floating-point has been improved: shortest is 60-80% faster (thanks to Ulf Adams at Google for suggesting long division), and shortest/precision is complete. Further performance improvements for shortest fixed notation have been implemented and will ship in a future VS 2019 update, along with the decimal precision overloads that will complete the implementation. C++20 P0941 feature-test macros are now completely supported in the compiler and STL, including __has_cpp_attribute implemented by Phil Christensen. As a reminder, the feature-test macros are always active (i.e. defined or not defined, depending on the availability of the feature in question) regardless of the Standard mode option selected, because making them conditional on /std:c++latest would largely defeat their purpose.

Correctness Fixes:

std::allocator<void> , std::allocator::size_type , and std::allocator::difference_type have been un-deprecated.

, , and have been un-deprecated. A spurious static_cast not called for by the standard that accidentally suppressed C4244 narrowing warnings was removed from std::string . Attempting to call std::string::string(const wchar_t*, const wchar_t*) will now properly emit C4244 “narrowing a wchar_t into a char .”

not called for by the standard that accidentally suppressed C4244 narrowing warnings was removed from . Attempting to call will now properly emit C4244 “narrowing a into a .” Fixed std::filesystem::last_write_time failing when attempting to change a directory’s last write time.

failing when attempting to change a directory’s last write time. std::filesystem::directory_entry ‘s constructor was changed to store a failed result, rather than throwing an exception, when supplied a nonexistent target path.

‘s constructor was changed to store a failed result, rather than throwing an exception, when supplied a nonexistent target path. std::filesystem::create_directory ‘s 2-parameter version was changed to call the 1-parameter version, as the underlying CreateDirectoryExW function would perform copy_symlink when the existing_p was a symlink.

‘s 2-parameter version was changed to call the 1-parameter version, as the underlying function would perform when the was a symlink. std::filesystem::directory_iterator no longer fails when encountering a broken symlink.

no longer fails when encountering a broken symlink. std::filesystem::space now accepts relative paths.

relative paths. std::filesystem::path::lexically_relative is no longer confused by trailing slashes, reported as LWG 3096.

is no longer confused by trailing slashes, reported as LWG 3096. Worked around CreateSymbolicLinkW rejecting paths with forward slashes in std::filesystem::create_symlink .

rejecting paths with forward slashes in . Worked around the POSIX deletion mode delete function existing on Windows 10 LTSB 1609 but not actually being capable of deleting files.

std::boyer_moore_searcher and std::boyer_moore_horspool_searcher ‘s copy constructors and copy assignment operators now actually copy things.

and ‘s copy constructors and copy assignment operators now actually copy things. The parallel algorithms library now properly uses the real WaitOnAddress family on Windows 8 and later, rather than always using the Windows 7 and earlier fake versions.

family on Windows 8 and later, rather than always using the Windows 7 and earlier fake versions. std::system_category::message() now trims trailing whitespace from the returned message.

now trims trailing whitespace from the returned message. Some conditions that would cause std::linear_congruential_engine to trigger divide by 0 have been fixed.

to trigger divide by 0 have been fixed. The iterator unwrapping machinery we first exposed for programmer-user integration in VS 2017 15.8 (as described in https://devblogs.microsoft.com/cppblog/stl-features-and-fixes-in-vs-2017-15-8/ ) no longer unwraps iterators derived from standard library iterators. For example, a user that derives from std::vector<int>::iterator and tries to customize behavior now gets their customized behavior when calling standard library algorithms, rather than the behavior of a pointer.

and tries to customize behavior now gets their customized behavior when calling standard library algorithms, rather than the behavior of a pointer. The unordered container reserve function now actually reserves for N elements, as described in LWG 2156.

Many STL internal container functions have been made private for an improved IntelliSense experience. Additional fixes to mark members as private are expected in subsequent releases of MSVC.

Times passed to the concurrency library that would overflow (e.g. condition_variable::wait_for(seconds::max()) ) are now properly dealt with instead of causing overflows that changed behavior on a seemingly random 29-day cycle (when uint32_t milliseconds accepted by underlying Win32 APIs overflowed).

) are now properly dealt with instead of causing overflows that changed behavior on a seemingly random 29-day cycle (when uint32_t milliseconds accepted by underlying Win32 APIs overflowed). Exception safety correctness problems wherein the node-based containers like list , map , and unordered_map would become corrupted were fixed. During a propagate_on_container_copy_assignment or propagate_on_container_move_assignment reassignment operation, we would free the container’s sentinel node with the old allocator, do the POCCA/POCMA assignment over the old allocator, and then try to acquire the sentinel node from the new allocator. If this allocation failed, the container is corrupted and can’t even be destroyed, as owning a sentinel node is a hard data structure invariant. This was fixed to allocate the new sentinel node from the source container’s allocator before destroying the existing sentinel node.

, , and would become corrupted were fixed. During a or reassignment operation, we would free the container’s sentinel node with the old allocator, do the POCCA/POCMA assignment over the old allocator, and then try to acquire the sentinel node from the new allocator. If this allocation failed, the container is corrupted and can’t even be destroyed, as owning a sentinel node is a hard data structure invariant. This was fixed to allocate the new sentinel node from the source container’s allocator before destroying the existing sentinel node. The containers were fixed to always copy/move/swap allocators according to propagate_on_container_copy_assignment , propagate_on_container_move_assignment , and propagate_on_container_swap , even for allocators declared is_always_equal .

, , and , even for allocators declared . std::basic_istream::read was fixed to not write into parts of the supplied buffer temporarily as part of \r

=>

processing. This gives up some of the performance advantage we gained in VS 2017 15.8 for reads larger than 4k in size, but efficiency improvements from avoiding 3 virtual calls per character are still present.

was fixed to not write into parts of the supplied buffer temporarily as part of \r

=>

processing. This gives up some of the performance advantage we gained in VS 2017 15.8 for reads larger than 4k in size, but efficiency improvements from avoiding 3 virtual calls per character are still present. std::bitset ‘s constructor no longer reads the ones and zeroes in reverse order for large bitsets.

‘s constructor no longer reads the ones and zeroes in reverse order for large bitsets. When implementing P0083 “Splicing Maps And Sets”, we managed to overlook the fact that the merge and extract members of the associative containers should have overloads that accept rvalue containers in addition to the overloads that accept lvalue containers. We’ve rectified this oversight by implementing the rvalue overloads.

and members of the associative containers should have overloads that accept rvalue containers in addition to the overloads that accept lvalue containers. We’ve rectified this oversight by implementing the rvalue overloads. With the advent of the Just My Code stepping feature, we no longer need to provide bespoke machinery for std::function and std::visit to achieve the same effect. Removing that machinery largely has no user-visible effects, except that the compiler will no longer produce diagnostics that indicate issues on line 15732480 or 16707566 of <type_traits> or <variant> .

and to achieve the same effect. Removing that machinery largely has no user-visible effects, except that the compiler will no longer produce diagnostics that indicate issues on line 15732480 or 16707566 of or . The <ctime> header now correctly declares timespec and timespec_get in namespace std in addition to declaring them in the global namespace.

header now correctly declares and in namespace in addition to declaring them in the global namespace. We’ve fixed a regression in pair ‘s assignment operator introduced when implementing LWG 2729 “Missing SFINAE on std::pair::operator= “; it now correctly accepts types convertible to pair again.

‘s assignment operator introduced when implementing LWG 2729 “Missing SFINAE on “; it now correctly accepts types convertible to pair again. Fixed a minor type traits bug, where add_const_t etc. is supposed to be a non-deduced context (i.e. it needs to be an alias for typename add_const<T>::type , not const T ).

Header Inclusion Restructuring:

The standard library’s physical design was substantially overhauled to avoid including headers when they are not necessary. A large number of customers want to use standard library containers but don’t want to use the iostreams and locales. However, the C++ standard library has a circular dependency among components:

Systems that depend on <locale> facets want to use std::string as part of their underlying implementations.

facets want to use as part of their underlying implementations. std::string wants a stream insertion operator, which depends on std::ostream , which depends on <locale> .

Historically our standard library worked around this problem by introducing a lower level header <xstring> , which defined std::string , but none of the other contents of <string> . <xstring> would be included by both <locale> components, and by <string> , restoring the directed dependency graph. However, this approach had a number of problems:

, which defined , but none of the other contents of . would be included by both components, and by , restoring the directed dependency graph. However, this approach had a number of problems: #include <string> needed to drag in all of the iostreams machinery to provide the stream insertion operator, even though most translation units wouldn’t use the stream insertion operator.

needed to drag in all of the iostreams machinery to provide the stream insertion operator, even though most translation units wouldn’t use the stream insertion operator. If someone included only <ostream> they got std::basic_string and std::ostream , but they did not get std::basic_string ‘s stream insertion operator, the std::string typedef, or the string literals. Customers found this extremely confusing. For example, if one tried to stream insert a std::basic_string after including only <ostream> , the compiler would print an incredibly long diagnostic saying operator<< couldn’t be found, listing 26 unrelated overloads. Also, attempts to use std::string_literals , std::to_string , or other <string> components, would fail, which is confusing when std::basic_string was otherwise available.

In VS 2019, we resolve the circular reference completely differently. The stream insertion operator now finds the necessary ostream components using argument-dependent lookup, allowing us to provide it in the same place as string. This restores appropriate layering (of std::string below <locale> components), and makes it possible to use <string> without dragging in the entire mass of iostreams machinery.

If you have lots of .cpp files that include string and do something simple, for example:

#include <stdio.h> #include <string> void f(const std::string& s) { puts(s.c_str()); }

In VS 2017 15.9 this program takes 244 milliseconds to compile on a 7980XE test machine (average of 5 runs), while in VS 2019 16.0 it takes only 178 milliseconds (or about 73% of the time).

Moreover, seemingly unrelated headers like <vector> were pulled into this mess. For example, vector wants to throw std::out_of_range , which derives from std::runtime_error , which has a constructor that takes a std::string . We already had out-of-line functions for all throw sites, so the spurious include of <stdexcept> in <vector> was unnecessary and has been removed. The following program used to take 177 milliseconds to compile in VS 2017 15.9, but now only needs 151 milliseconds (85% of the time):

#include <vector> void f(std::vector<int>& v) { v.push_back(42); }

The one downside of this change is that several programs that were getting away with not including the correct headers may need to add #includes . If you were saying std::out_of_range before, you may need to #include <stdexcept> . If you were using a stream insertion operator, you may now need to #include <ostream> . This way, only translation units actually using <stdexcept> or <ostream> components pay the throughput cost to compile them.

Performance and Throughput Improvements:

if constexpr was applied in more places in the standard library for improved throughput and reduced code size in the copy family, permutations like reverse and rotate , and in the parallel algorithms library.

was applied in more places in the standard library for improved throughput and reduced code size in the family, permutations like and , and in the parallel algorithms library. The STL now internally uses if constexpr to reduce compile times even in C++14 mode.

to reduce compile times even in C++14 mode. The runtime dynamic linking detection for the parallel algorithms library no longer uses an entire page to store the function pointer array, as marking this memory read-only was deemed no longer relevant for security purposes.

std::thread ‘s constructor no longer waits for the thread to start, and no longer inserts so many layers of function calls between the underlying C library _beginthreadex and the supplied callable object. Previously std::thread put 6 functions between _beginthreadex and the supplied callable object, which has been reduced to only 3 (2 of which are just std::invoke ). This also resolves an obscure timing bug where std::thread ‘s constructor would hang if the system clock changed at the exact moment a std::thread was being created.

‘s constructor no longer waits for the thread to start, and no longer inserts so many layers of function calls between the underlying C library and the supplied callable object. Previously put 6 functions between and the supplied callable object, which has been reduced to only 3 (2 of which are just ). This also resolves an obscure timing bug where ‘s constructor would hang if the system clock changed at the exact moment a was being created. Fixed a performance regression in std::hash that we introduced when implementing std::hash<std::filesystem::path> .

that we introduced when implementing . Several places the standard library used to achieve correctness with catch blocks now use destructors instead. This results in better debugger interaction — exceptions you throw through the standard library in the affected locations will now show up as being thrown from their original throw site, rather than our rethrow. Not all standard library catch blocks have been eliminated; we expect the number of catch blocks to be reduced in subsequent releases of MSVC.

Suboptimal codegen in std::bitset caused by a conditional throw inside a noexcept function was fixed by factoring out the throwing path.

caused by a conditional throw inside a function was fixed by factoring out the throwing path. The std::list and std::unordered_meow family use non-debugging iterators internally in more places.

and family use non-debugging iterators internally in more places. Several std::list members were changed to reuse list nodes where possible rather than deallocating and reallocating them. For example, given a list<int> that already has a size of 3, a call to assign(4, 1729) will now overwrite the ints in the first 3 list nodes, and allocate one new list node with the value 1729, rather than deallocating all 3 list nodes and then allocating 4 new list nodes with the value 1729.

members were changed to reuse list nodes where possible rather than deallocating and reallocating them. For example, given a that already has a size of 3, a call to will now overwrite the ints in the first 3 list nodes, and allocate one new list node with the value 1729, rather than deallocating all 3 list nodes and then allocating 4 new list nodes with the value 1729. All locations the standard library was calling erase(begin(), end()) were changed to call clear() instead.

were changed to call instead. std::vector now initializes and erases elements more efficiently in certain cases.

now initializes and erases elements more efficiently in certain cases. <variant> has been refactored to make it more optimizer-friendly, resulting in smaller and faster generated code. Most notably, std::visit and the inliner have now become good friends.

has been refactored to make it more optimizer-friendly, resulting in smaller and faster generated code. Most notably, and the inliner have now become good friends. We’ve applied clang-format to the STL’s headers for improved readability. (There were additional manual changes, e.g. adding braces to all control flow.)

Reporting Bugs:

Please let us know what you think about VS 2019. You can report bugs via the IDE’s Report A Problem and also via the web, at the Developer Community’s C++ tab.

Billy O’Neal, Casey Carter, and Stephan T. Lavavej