A while ago a friend asked me whether pure virtual functions have higher overhead than regular virtual functions. At the time I answered that this cannot be – the pure/non-pure distinction is meaningful only at compile time, and non-existent at runtime. Only a (long) while later did I connect the dots, and understand what he meant than (sorry Mordy..)

Regular Overhead

Virtual calls are known to be more costly than calls that are resolved at compile time. Elan Ruskin measured ~50% difference – I measured a bit less, but the difference is certainly there. For functions that do real work the actual call overhead can be mostly neglected, but for functions that get called a lot in a scenario you’re struggling to optimize – you can get tangible results by eliminating virtual calls. It’s widely considered a good practice to use the added flexibility of virtual functions when you have a concrete reason and not just for the fun of it.

There are two reasons for the added call cost – main one being the CPU instruction-prefetch mechanism. A regular call is resolved at compile time into something like –

call 0xabcd1234

– which the instruction caching apparatus (a.k.a trace cache) easily resolves ahead of time, and can tell where to continue the fetching from. However, a virtual call is compiled into something like –

call eax

Faced with the impossible task of predicting the contents of eax dozens of instructions in advance, the trace cache just stalls.

The second potential extra cost is an extra dereferencing. The first DWORD_PTR of a c++ object (neglecting virtual inheritance) is a pointer to it’s virtual table – a table that is common to all instances of the same class. A call to a virtual function is resolved by first dereferencing that vtable pointer, and only then calling the function at a fixed offset from the vtable start. Maciej Sinilo tried to isolate this cost by comparing calls via explicit function pointers to calls via virtual functions, and turns out the difference is practically non-measurable. (BTW, I didn’t check but I suspect part of the reason is the compiler’s ability to resolve that dereference in compile time, in many situations).

Pure-Virtual Extra Overhead??

Well, not really. At least not directly. But the function ‘containers’ – classes that are used as interfaces – do come with some extra weight.

To put it short, an interface class by default has its own constructor and destructor – just like any other class. These are called just before/after a child class constructor/destructor – as you would expect in any hierarchy.

But wait… what? Why? If you didn’t provide an implementation for such a constructor, what can it possibly do?

What every compiler-generated constructor does: set up the class vtable pointer. In this case, it is a very short lived pointer – one that is immediately overwritten by the child ctor.

Take the following code:

class A { public: __declspec(noinline) A() { _tprintf_s(_T(" A::A "));} virtual ~A() { _tprintf_s(_T(" A::~A "));} virtual void f() = 0; virtual void g() = 0; }; class B : public A { public: __declspec(noinline) B() { _tprintf_s(_T(" B::B "));} virtual ~B() { _tprintf_s(_T(" B::~B "));} virtual void f() { _tprintf_s(_T(" B::f "));} virtual void g() { _tprintf_s(_T(" B::g "));} }; int _tmain(int argc, _TCHAR* argv[]) { B b; return 0; }

The order of events in B’s construction is as follows:

(1) The child ctor, B::B() is called and immediatly calls the parent ctor A::A().

(2) The parent ctor A::A(), setd the object vfptr to point to the common A vtable –

~A xxx xxx

(keep those xxx placeholders in mind – more on them soon). Watching the object state in VS at this point you see:

– emphasized are the instruction that populates the object vfptr, and the referenced vtable itself.

(3) B::B() continues, and modifies the object vfptr to point to the common B vtable:

~B B::f() B::g()

and again a VS view:

If you’d call f() from B::B() A::A() [thanks Roman!], which is the sort of self-foot-shots c++ allows, you’d be using A’s vtable (which is the only one the object knows at that time), end up calling the nonexistent xxx and gloriously crash in runtime. Of course it’s never that direct, and it’s pretty much a consensus you should never call a class virtuals from its own ctor/dtor.

Why all that hassle??

Frankly, I don’t know. It seems C++ goes out of its way to build and tear apart abstract-parent vtables, than exposes them briefly only during child construction/destruction, and then the community expert unanimously recommend never to use them. Herb Sutter does give 3 scenarios where you might consider using them – I find none of them convincing, and generally consider this to be one of the C++ semantic warts.

So, can this extra weight be mitigated?

Yes – at least in MS-specific ways. The direct way is by adding a __declspec(novtable) modifier to the abstract interface declaration. If you can guarantee that the interface class would never need any constructors/destructors (which can be tricky at times), it would be more readable to use __interface instead.

Beyond the direct saving of the extra ctor/dtor work, a happy side effect of novtable is that it eliminates all references to the modified interface vtable. The linker is then able to remove it from the binary altogether – thereby reducing the binary size and providing some extra boost. (When applied to a lot of interfaces, this can get tangible results!)

Bonus – so what is xxx, really?

Google xxx and see for yourself. I’ll wait until you return. It’d probably be a while.

Ok – this probably deserved it’s own post, as there still appears to be some online confusion regarding it. Apparently the ‘= 0’ pure virtual syntax is leading some to believe that the xxx entries are truly zeros. In fact MSDN columnist Paul DiLascia wrote sometime in 2000 that –

…the compiler still generates a vtable all of whose entries are NULL and still generates code to initialize the vtable in the constructor or destructor for A.

That may actually have been true than (I’m not even sure of that), but certainly isn’t now.

xxx is the address of the CRT function _purecall, which is essentially a debugging hook. You can control xxx’s value directly by overloading purecall yourself, or alternatively use _set_purecall_handler to route into your own handler from within _purecall. You might consider doing so, e.g., to collect stack traces or minidumps in production code.