In this article, I'll discuss a safe mechanism to dynamically cast void* to typed pointers at runtime, with examples of how to use void* for heterogeneous storage at runtime rather than polymorphism or alternatives like boost::any [1].

Introduction

The following problem arose during the development of KeyValue [2]. One of this library's responsibilities consists of mapping string names to addresses (i.e. pointers) for objects of heterogeneous types. KeyValue answers clients' requests for an object provided its name and expected type. If the name is, indeed, mapped to an object of the given type, then the corresponding pointer is cast to this type and returned back to the client. Inheritance must be observed: When the client expects the object to be of a certain type, then KeyValue must fulfill the request if the object type derives from the given type.

We need some type erasure mechanism that allows the compiler to treat pointers to objects of different types in a uniform way. Additionally, some type annotation must take place to allow for inheritance-conscious type checks against clients' expectations and, subsequently, safe type casts.

Polymorphism is the obvious design to tackle this problem. All types derive from a common polymorphic base class. Type erasure here means that derived types are overlooked by the compiler, which sees only pointers-to-base. Type information is saved inside objects allowing for type checks and type casts through dynamic_cast .

Unfortunately, for KeyValue, this wasn't an option because the types in question are defined by clients. KeyValue cannot assume and doesn't want to impose that all these types have a common polymorphic ancestor.

Polymorphism aside, boost::any appears as an alternative. It's based on the external polymorphism pattern [3], which is a clever approach to type erasure for classes unrelated by inheritance. For type annotation, boost::any relies on operator typeid() . This is a major flaw since typeid() performs poorly with respect to inheritance. The next section covers this topic.

KeyValue implements a replacement for boost::any , here called any_ptr . When a pointer is assigned to an any_ptr , a type-erased copy of it is held by a data member. Another member stores type information that allows for runtime casts.

For type erasure, any_ptr uses void* . This is an extreme form of type erasure because the compiler does not take any action to save type information for runtime usage. One can do very little with a void* and this form of type erasure would be useless if the erased pointer could not be cast back to its original type. It can be done through a static_cast :

// This is fine, provided that p points to an object of type some_type. // Otherwise, dereferencing q has undefined behaviour. q = static_cast<some_type*>(p);

This radical type erasure must be compensated by a clever type annotation. We are looking for a type annotation through an RTTI mechanism such that:

It's not intrusive and doesn't impose any constraint on clients' classes;

Inheritance is properly observed.

Run-Time Type Information (RTTI)

Let's analyze the three C++ RTTI mechanisms: polymorphism (using dynamic_cast ), typeid() and exception handling (using try-catch ).

The general picture follows. An object of type A is built. Type information is lost when the object's address is stored by a pointer p of type B* . Then, we want to check whether

p points to an object of type C and, if so, safely cast it to C* .

In the first two RTTI mechanisms, type checks and type casts are very common and take the form:

// Test 1: Using dynamic_cast C* q = dynamic_cast<C*>(p); // Test 2: Using typeid C* q = typeid(*p) == typeid(C) ? static_cast<C*>(p) : 0;

In both cases, when p points to an object of type C , we expect q to be a cast copy of p . Otherwise q must be null. The exception handling system can be used to get the same behavior as follows:

// Test 3: Using exception handling C* q = 0; try { throw p; } catch(C* t) { q = t;} catch(...) {}

For the sake of concreteness, A , B , and C will be picked up among the three classes in this hierarchy:

struct top { ~top(){} }; // Eventually, the destructor will be virtual. struct middle : public top {}; struct bottom : public middle {};

None of the tests ever gives a false positive; that is, the code snippets above don't set q to non null when they are not expected to. Unfortunately, sometimes they wrongly set q to null. We focus on these cases.

Inheritance is one of the OO's pillars and says that an object of type bottom is of type middle because the former publicly inherits from the latter. Not observing this aspect is what makes some of the aforementioned tests fail. In other words, a derived class sometimes is not seen as a base class. To catch the failures it's enough to consider the cases where A is the most derived type (i.e. bottom ).

When trying to test all combinations of B and C , the first issue is polymorphism's intrusiveness. For down-casts; that is, target type derives from source type (e.g. from top to bottom ), the compiler requires the source to be polymorphic and issues an error if it's not. The conclusions follow:

dynamic_cast succeeds for up- and no-casts (i.e. from a type to itself). For down-casts it fails to compile.

succeeds for up- and no-casts (i.e. from a type to itself). For down-casts it fails to compile. try-catch succeeds for all up- and no-casts but fails for down-casts.

succeeds for all up- and no-casts but fails for down-casts. typeid() succeeds only for no-casts.

As we can see, typeid() is weak in detecting inheritance, and there is even a semantic reason for that. To check if p points to an object of type bottom we test the condition

typeid(*p) == typeid(bottom)

This uses the comparison operator ==() , which, as expected, is symmetric. Indeed, it would be very weird to have

typeid(bottom) == typeid(top) && typeid(top) != typeid(bottom)

being true, meaning that a bottom is a top and a top isn't a bottom . What we need is the is-a semantics of inheritance, which is exactly what dynamic_cast has.

Now, satisfying dynamic_cast 's requirement, we declare top's destructor to be virtual making the three classes polymorphic. After that, the conclusions are:

dynamic_cast succeeds every time.

succeeds every time. try-catch succeeds for all up- and no-casts but fails for down-casts.

succeeds for all up- and no-casts but fails for down-casts. typeid() succeeds if, and only if, the target type is bottom .

We see dynamic_cast at its full power and nothing has changed regarding try-catch . For typeid() , things got weirder because the obvious no casts for middle and top now fail. What's happening is that, for objects of polymorphic type, typeid() returns the most derived type of the object ( bottom in this case).

This behavior is potentially dangerous because classes that are not polymorphic today may become so in the future. Certainly, we know that turning a non-virtual method into a virtual one can have consequences for its callers. Here, the situation is worse because the consequences are not restricted to callers and the set of potentially impacted code is wider.

In summary, dynamic_cast is intrusive, but it's the best method with respect to inheritance consciousness. The second best is try-catch , which is not intrusive but only detects types higher up in the hierarchy. Finally, typeid() is inheritance blinded so we'll discard it entirely from further discussion.

Another important point is performance and again, dynamic_cast wins against try-catch . This isn't a surprise because the usage of the latter for this purpose is a twist that the compiler cannot perfectly understand and, therefore, optimizations are not as good as for the former. For instance, for up casts the compiler doesn't enforce the source type to be polymorphic because in this case a dynamic_cast is nothing else than a static_cast , and thus is performed at compile time.

Table 1 provides performance comparisons between dynamic_cast and try-catch for optimized code generated by MSVC 2008 and GCC 4.4.5. It shows the time, in milliseconds, of performing 1,000,000 type casts. Each line corresponds to a cast where the source and target types are levels away from one another in the inheritance hierarchy. Level 0 means a no-cast and level 1 means a cast from a class to a direct base (for try-catch ) or derived class (for dynamic_cast ). These timings are only indicative and mean to prove that dynamic_cast s are much faster than try-catch s.

[Click image to view at full size]