As stated in a previous post, final keyword enables the sealing of classes and methods. This is important because it allows interesting compile-time checks, but also enables quite a powerful optimization: the devirtualization.

Devirtualization happens when the compiler can statically decide, at compile time, which function should be called, so it can produce a direct call to that function, or even inline it. This can happen in C++98/03, but often requires link-time and/or whole program optimizations for the compiler to statically deduce whether an overriding will happen or not, and in general is very hard.

Consider the following piece of pre-C++11 code:

class A { public: virtual int value() { return 1; } }; class B : public A { public: int value() { return 2; } }; int test(B* b) { return b->value() + 11; }

When compiling the call to method value() in function test(B* b) , in general the compiler can’t determine whether b ‘s real type is B or an (yet) unknown subclass of B , so the generated code will contain an indirect call (virtual call to the method value() ). In C++11 you can optionally add the override keyword, but this doesn’t influence the code generation.

When, on the other hand, you add a final specifier on either the method (1)

class B : public A { public: int value() final { return 2; } };

or the whole class (2)

class B final : public A { public: int value() override { return 2; } };

the compiler will know that no subclass can override the method (1), or that no subclass can exist at all (2), and will end up devirtualizing the call, potentially inlining/optimizing the whole function, finally transforming it in the equivalent of a simple return 13; .

Without devirtualization, Visual Studio 2015 (Update 3) produces the following assembly code:

sub rsp, 40 mov rax, qword ptr [rcx] call qword ptr [rax] add eax, 11 add rsp, 40 ret

GCC 6.2 without devirtualization produces this

mov rax, qword ptr [rdi] mov rdx, qword ptr [rax] cmp rdx, offset flat:B::value() jne .L12 mov eax, 13 (*) ret .L12: sub rsp, 8 call rdx add rsp, 8 add eax, 11 ret

and Clang 3.9.0 produces this

push rax mov rax, qword ptr [rdi] call qword ptr [rax] add eax, 11 pop rcx ret

All the compilers produce exactly the same code when devirtualization happens:

mov eax, 13 ret

One note on GCC: as you can see the code is quite complex, but in practice it produced a partial devirtualization: the compiled code has a special case when calling B::value() , for which it produces the result 13 immediately (*). This is probably based on the fact that the virtual method implementation is trivial, and the assumption, done by the compiler, that the calls will often be directed to B::value() (probably deduced by the lack of any other subclass in the compilation unit).

You can see the generated code for several other versions of GCC and Clang by using Godbolt’s Compiler Explorer. You will discover that this optimization happens in all C++11 enabled versions of GCC (4.7+) and Clang (3.0+), even on non x86/x64 platforms.

Curiously, I couldn’t make the Intel compiler optimize the code as all the other compilers did, maybe I’m missing some important switch for icc. Please let me know in the comments if you know how to enable devirtualization on Intel Compiler too.

Update 1: Devirtualization also happen in Visual Studio 2013 (Update 5), but not in Visual Studio 2012 ( final keyword is accepted, and enforces sealing, but doesn’t activate the optimization).

Update 2: From comments in Reddit I realized that I probably wasn’t clear enough: the use of final just enables the devirtualization, C++ compilers aren’t forced to actually apply devirtualization in such cases.

Like this: Like Loading...