Good news: strong types are (mostly) free in C++

Strong types are a simple and efficient tool for improving code expressiveness, by letting you express your intentions better to both the compiler and to your fellow human companions.

This post is part of the series about strong types, that keeps growing because it is such a rich topic:

A question that comes to mind fairly quickly when reading about strong types is how much will it cost in terms of performance? Should I stay away from strong types in the areas of the codeline that are really sensitive to performance, therefore forgoing their benefits in terms of code clarity?

The suspicion

The proposed implementation of strong types that we saw was using a generic wrapper:

template <typename T, typename Parameter> class NamedType { public: explicit NamedType(T const& value) : value_(value) {} T& get() { return value_; } T const& get() const {return value_; } private: T value_; }; 1 2 3 4 5 6 7 8 9 10 template < typename T , typename Parameter > class NamedType { public : explicit NamedType ( T const & value ) : value_ ( value ) { } T & get ( ) { return value_ ; } T const & get ( ) const { return value_ ; } private : T value_ ; } ;

…that could be delcared for a specific type the following way:

using Width = NamedType<double, struct WidthTag>; using Height = NamedType<double, struct HeightTag>; 1 2 using Width = NamedType < double , struct WidthTag > ; using Height = NamedType < double , struct HeightTag > ;

and that could be used in an interface this way:

class Rectangle { public: Rectangle(Width, Height); .... }; 1 2 3 4 5 6 class Rectangle { public : Rectangle ( Width , Height ) ; . . . . } ;

and at call site:

Rectangle r(Width(10), Height(12)); 1 Rectangle r ( Width ( 10 ) , Height ( 12 ) ) ;

We even saw how you could easily fit units in there in this post about strong types, but our purpose for performance here can be served with just the above example.

The suspected costs related to the usage of strong types are simple:

allocating stack space for the Width object,

object, constructing it from the passed int ,

, calling .get() to retrieve the underlying value, incurring a copy of a reference,

to retrieve the underlying value, incurring a copy of a reference, destructing the Width object,

object, potentially having several Width object around during parameter passing,

object around during parameter passing, and the same costs for the Height object.

The question is: how much will this cost? What is the price to pay for expressiveness?

Essentially, it’s free

One easy way to measure the performance impact of the usage of strong types is comparing the generated assembly to what is obtained by using the primitive types.

So we’ll compile the following class:

class StrongRectangle { public: StrongRectangle (Width width, Height height) : width_(width.get()), height_(height.get()) {} double getWidth() const {return width_;} double getHeight() const {return height_;} private: double width_; double height_; }; 1 2 3 4 5 6 7 8 9 10 11 class StrongRectangle { public : StrongRectangle ( Width width , Height height ) : width_ ( width . get ( ) ) , height_ ( height . get ( ) ) { } double getWidth ( ) const { return width_ ; } double getHeight ( ) const { return height_ ; } private : double width_ ; double height_ ; } ;

versus the native version:

class Rectangle { public: Rectangle (double width, double height) : width_(width), height_(height) {} double getWidth() const {return width_;} double getHeight() const {return height_;} private: double width_; double height_; }; 1 2 3 4 5 6 7 8 9 10 11 class Rectangle { public : Rectangle ( double width , double height ) : width_ ( width ) , height_ ( height ) { } double getWidth ( ) const { return width_ ; } double getHeight ( ) const { return height_ ; } private : double width_ ; double height_ ; } ;

with the following calling code:

int main() { double width; std::cin >> width; double height; std::cin >> height; //Rectangle r(width, height); //StrongRectangle r((Width(width)), (Height((height)))); std::cout << r.getWidth() << r.getHeight(); } 1 2 3 4 5 6 7 8 9 10 11 12 int main ( ) { double width ; std :: cin >> width ; double height ; std :: cin >> height ; //Rectangle r(width, height); //StrongRectangle r((Width(width)), (Height((height)))); std :: cout << r . getWidth ( ) << r . getHeight ( ) ; }

by putting in either of the two calls to the classes constructors. Note the extra parentheses to disambiguate the call to the StrongRectangle constructor from a function declaration, which are really annoying and are just another manifestation of the most vexing parse in C++. Note that the only case this happens is by passing named variables to a constructor with strong types. Passing literals like numbers, or calling a function that is not a constructor doesn’t need such extra parentheses.

Here is the assembly generated by clang 3.9.1 in -O2 on the very popular online compiler godbolt.org, for the version using primitive types:

main: # @main sub rsp, 24 lea rsi, [rsp + 16] mov edi, std::cin call std::basic_istream<char, std::char_traits<char> >& std::basic_istream<char, std::char_traits<char> >::_M_extract<double>(double&) lea rsi, [rsp + 8] mov edi, std::cin call std::basic_istream<char, std::char_traits<char> >& std::basic_istream<char, std::char_traits<char> >::_M_extract<double>(double&) movsd xmm0, qword ptr [rsp + 16] # xmm0 = mem[0],zero movsd xmm1, qword ptr [rsp + 8] # xmm1 = mem[0],zero movsd qword ptr [rsp], xmm1 # 8-byte Spill mov edi, std::cout call std::basic_ostream<char, std::char_traits<char> >& std::basic_ostream<char, std::char_traits<char> >::_M_insert<double>(double) mov rdi, rax movsd xmm0, qword ptr [rsp] # 8-byte Reload call std::basic_ostream<char, std::char_traits<char> >& std::basic_ostream<char, std::char_traits<char> >::_M_insert<double>(double) xor eax, eax add rsp, 24 ret _GLOBAL__sub_I_example.cpp: # @_GLOBAL__sub_I_example.cpp push rax mov edi, std::__ioinit call std::ios_base::Init::Init() mov edi, std::ios_base::Init::~Init() mov esi, std::__ioinit mov edx, __dso_handle pop rax jmp __cxa_atexit # TAILCALL 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 main : # @ main sub rsp , 24 lea rsi , [ rsp + 16 ] mov edi , std :: cin call std :: basic _ istream < char , std :: char _ traits < char > > & std :: basic _ istream < char , std :: char _ traits < char > > :: _ M _ extract < double > ( double & ) lea rsi , [ rsp + 8 ] mov edi , std :: cin call std :: basic _ istream < char , std :: char _ traits < char > > & std :: basic _ istream < char , std :: char _ traits < char > > :: _ M _ extract < double > ( double & ) movsd xmm 0 , qword ptr [ rsp + 16 ] # xmm 0 = mem [ 0 ] , zero movsd xmm 1 , qword ptr [ rsp + 8 ] # xmm 1 = mem [ 0 ] , zero movsd qword ptr [ rsp ] , xmm 1 # 8 - byte Spill mov edi , std :: cout call std :: basic _ ostream < char , std :: char _ traits < char > > & std :: basic _ ostream < char , std :: char _ traits < char > > :: _ M _ insert < double > ( double ) mov rdi , rax movsd xmm 0 , qword ptr [ rsp ] # 8 - byte Reload call std :: basic _ ostream < char , std :: char _ traits < char > > & std :: basic _ ostream < char , std :: char _ traits < char > > :: _ M _ insert < double > ( double ) xor eax , eax add rsp , 24 ret _GLOBAL__sub_I_example . cpp : # @ _ GLOBAL _ _ sub _ I _ example . cpp push rax mov edi , std :: _ _ ioinit call std :: ios _ base :: Init :: Init ( ) mov edi , std :: ios _ base :: Init :: ~ Init ( ) mov esi , std :: _ _ ioinit mov edx , _ _ dso _ handle pop rax jmp _ _ cxa _ atexit # TAILCALL

You don’t even need to look at the code in details, what we want to know is whether or not the strong type example generates more code than the primitive one.

And re-compiling by commenting out the primitive type and putting in the strong type gives… exactly the same generated assembly.

So no cost for the strong type. The holy zero-cost abstraction. The graal of modern C++. All the code related to the wrapping of strong types was simple enough for the compiler to understand there was nothing to do with is in production code, and that it could be completely optimized away.

Except this was compiled in -O2.

Compiling in -O1 doesn’t give the same result with clang. Showing the exact generated assembly code has little interest for the purpose of this post (you can have a look on godbolt if you’re interested), but it was quite bigger.

Note however, by compiling with gcc, the strong type machinery was optimized away both with -O2 and -O1.

What to think of this?

We can draw several conclusions from this experiment.

First, this implementation of strong types is compatible with compiler optimizations. If your compiling options are high enough then the code related to strong never makes it to a production binary. This leaves you with all the advantages related to expressiveness of strong types, for free.

Second, “high enough” depends on the compiler. In this experiment, we saw that gcc did away with the code in -O1, while clang did it only in -O2.

Lastly, even if the code is not optimized away because your binary is not compiled aggressively enough, then all hope is not lost. The rule of the 80-20 (some even say 90-10) means that in general, the vast majority of a codeline will matter little for performance. So when there is a very small likelihood of strong types being detrimental for performance, but a 100% one it will benefit the expressiveness and robustness of your code, the decision is quickly made. And it can still be reverted after profiling anyway.

Related articles:

Share this post! Don't want to miss out ?