This post describes a technique for getting rid of the dynamic memory allocation in the C++ pimpl idiom. But before going into the implementation details, let’s consider the “motivating” example that actually got me thinking about this issue:

#include <cache.hxx> class factory { public: factory (); factory (cache&); ... private: factory (const factory&); factory& operator= (const factory&); };

In the project that I am working on right now I have a factory for a certain kind of objects. The factory uses a cache to determine if an object requested has already been created. The cache can be provided by a client during construction of the factory. If the cache is not provided, then the default cache implementation is automatically created by the factory. Here is the straightforward implementation of this logic:

class factory { public: factory (): cache_p_ (new cache), cache_ (*cache_p_) {} factory (cache& c): cache_ (c) {} private: auto_ptr<cache> cache_p_; cache& cache_; ... };

The problem with this implementation is the need to perform the dynamic memory allocation for the cache object in the first version of the constructor. And this was something I really wanted to avoid because in my application the factories were going to be created often and on the stack.

Where else can we then get the memory for the cache object? The most natural approach is to reserve that memory as a member variable in the factory, something along these lines:

class factory { public: factory (); factory (cache&); ~factory (); private: char cache_mem_[sizeof (cache)]; cache& cache_; ... };

The problem with our first attempt is alignment. When we allocate the memory from the heap with operator new the returned buffer is guaranteed to have alignment suitable to store any object. This is not the case for member variables, however. So we somehow need to make sure that the memory we reserved in the factory object is aligned to store the cache object. The Boost type_traits library as well as the C++ TR1 make this a fairly simple task:

#include <boost/type_traits.hpp> class factory { ... boost::aligned_storage< sizeof (cache), boost::alignment_of<cache>::value >::type cache_mem_; };

The implementation of the factory constructors and destructor are presented below. Here we use placement operator new to construct the cache object in the reserved memory block. We also need to make an explicit destructor call in order to destroy the cache object:

factory:: factory () : cache_ (*reinterpret_cast<cache*> (&cache_mem_)) { new (&cache_mem_) cache (); } factory:: factory (cache& c) : cache_ (c) { } factory:: ~factory () { cache* c (reinterpret_cast<cache*> (&cache_mem_)); if (c == &cache_) c->~cache (); }

What if you do not want or cannot use Boost or TR1. Is there an easy way to get an aligned buffer using only C++x98? Unfortunately, this is quite hard to implement without making any assumptions about the target platform and the class we are trying to construct in the reserver memory. In practice, however, it is possible to come up with a solution that will work on all “reasonable” platforms and without making any unreasonable assumptions about the class.

The alignment of a class is determined by the member variable with the strictest alignment requirement (if the first member variable is of a class type itself, then this process goes recursively). For a more detailed coverage of this subject see the C++ data alignment and portability post.

On all major platforms in use today the fundamental types with the strictest alignment requirements are (from more strict to less strict): long double (4, 8 or 16), long long (4 or 8), and pointer (4 or 8). So if we don’t want to make any assumptions about the class and don’t mind wasting a few bytes on alignment, then making the memory region aligned to the long double requirement will take care of things. However, it is often reasonable to expect that the class we are planning to instantiate does not and never will have members of type long double or even long long . For example, in our case, it is reasonable to assume that the cache class will only contain pointers, size_t (same alignment as a pinter), and lesser-aligned types such as bool , etc. The same goes for all reasonable implementations of the STL containers. So in this case we can align our buffer to the pointer requirement. Here is how we can do this:

class factory { ... union { void* align; char buf[sizeof (cache)]; } cache_mem_; };

In addition, if your application is only compiled using a specific C++ compiler, then you may want to check the available extensions. Many compilers have mechanisms for querying alignment of a type and specifying desired alignment of a variable. For example, GNU g++ allows you to query the alignment using the __alignof__ operator and request a specific alignment using the aligned attribute.

The above approach cannot be translated to the pimpl idiom directly, however. In the canonical pimpl form the implementation class is left undefined in the header file and as a result we cannot know its alignment and size, which are needed to allocate the buffer:

class object { ... private: class impl; impl& impl_; union { void* align; char buf[sizeof (impl)]; // error } impl_mem_; };

Providing the definition of the implementation class in the header file is not an option since hiding the implementation details from the clients of our class is the reason why we choose to use the pimpl idiom in the first place.

The best we can do in this situation is to make an assumption about the alignment requirements and the size of the implementation class in the header file and then verify that they are correct using compile-time assertions in the source file, once the implementation class has been defined. The following example shows how we can do this using Boost:

// object.hxx // class object { public: object (); ~object (); ... private: class impl; union impl_mem { void* align; char buf[16]; }; impl& impl_; impl_mem impl_mem_; };

// object.cxx // #include <boost/type_traits.hpp> #include <boost/static_assert.hpp> using boost::alignment_of; class object::impl { ... } object:: object () : impl_ (*reinterpret_cast<impl*> (&impl_mem_)) { BOOST_STATIC_ASSERT (sizeof (impl) <= sizeof (impl_mem_)); BOOST_STATIC_ASSERT (alignment_of<impl>::value == alignment_of<impl_mem>::value); new (&impl_mem_) impl (); }

What are the drawbacks of this approach? The obvious one is the need to manually maintain our alignment and size “guesses”, though the automatic detection by the C++ compiler of the situation when they are out of sync helps a lot.

The fact that the implementation class can have different sizes on different platforms and different compiler implementations is a more serious problem. For example, 32 and 64-bit platforms have different sizes of some fundamental types and containers such as std::map can have different sizes in different STL implementations. As a result, in order to make sure that the size that we have hard-coded is sufficient, we need to compile our application on all the platforms and with all the compilers that we claim to support. One way to alleviate this problem at the expense of extra maintenance is to recreate the “data image” using the same or similar types as the implement class. For example, suppose our implementation class had the following member variables:

class entry { ... }; class object::impl { ... private: size_t count_; bool initialized_; std::map<entry> map_; };

Then the header file for this pimpl class could look like this:

class object { ... private: class impl; class impl_img { size_t count_; bool initialized_; std::map<int> map_; // Same size as map<entry>. }; union impl_mem { void* align; char buf[sizeof (impl_img)]; }; impl& impl_; impl_mem impl_mem_; };

This approach won’t scale to the more complex cases where, for example, the implementation class contains many member variables of other implementation-specific classes, like entry above. However, the more complex the implementation class, the lesser the benefit of this optimization. In the example above, for instance, the dynamic allocations by the map will presumably far outweigh the single allocation required to instantiate the implementation object. The saving of the dynamic memory allocation will be most significant for simpler implementation classes in which case it could be possible to use the above approach without too much maintenance overhead.