TL;DR: Small C++ code change with templates has large impact (2% libxul codesize reduction).

nsTArray has an inheritance structure that looks like this:

template<class E> class nsTArray : public nsTArray_Impl<E, nsTArrayInfallibleAllocator> { ... }; template<class E, class Alloc> class nsTArray_Impl : public nsTArray_base<Alloc, nsTArray_CopyElements<E> >, public nsTArray_TypedBase<E, nsTArray_Impl<E, Alloc> > { ... }; // Most classes are copied with memmove and friends. // nsTArray_CopyElements can be specialized, but we will ignore those cases here. template<class E> struct nsTArray_CopyElements : public nsTArray_CopyWithMemutils {};

…and so forth. The separation into classes and helper classes are to commonify code, when possible, and to let the magic of template specialization select appropriate definitions of things at compile time.

The problem is that this worked a little too well. nsTArray_CopyElements<uint32_t> is a different class from nsTArray_CopyElements<int32_t> , even though both of them share the same base class and neither one adds extra functionality. This means that nsTArray_base must be instantiated separately for each element type, even though the behavior of nsTArray_base is completely independent of the element type. And so methods were being unnecessarily duplicated, which impacted compile times, download size, startup and runtime performance, and so on.

[A sufficiently smart toolchain could make this problem go away: the linker can recognize duplicated methods and functions at the assembly level, discard all but one instance, and then redirect all the calls to the lone instance. (Bonus question: why does it have to be done by the linker, and not the compiler? It’s certainly more effective, but there is a correctness issue as well.) MSVC calls this “identical COMDAT folding” and the gold linker on Linux implements a similar optimization called “identical code folding”. Indeed, we enable this optimization in the linker when it’s available on our release builds, precisely because it delivers a significant code size improvement. But in a cross-platform project, you can’t necessarily rely on the linker to fix up these sorts of issues.]

In our case, however, fixing the problem is straightforward. Instead of creating new classes to describe copying behavior, we’ll use template specialization to pick the appropriate class at compile time (the class that would have been the subclass of nsTArray_CopyElements in the above scheme) and use that class directly. Then we’ll have either nsTArray_base<Alloc, nsTArray_CopyWithMemutils> (the overwhelmingly common case), or some other specialization when array elements need special treatment:

template<class E> struct nsTArray_CopyChooser { typedef nsTArray_CopyWithMemutils Type; }; // Other specializations of nsTArray_CopyChooser possible... template<class E, class Alloc> class nsTArray_Impl : public nsTArray_base<Alloc, typename nsTArray_CopyChooser<E>::Type >, public nsTArray_TypedBase<E, nsTArray_Impl<E, Alloc> > { ... };

Implementing this in bug 929494 reduced libxul’s executable code size by over 2% on Android, which is a hefty size win for such a simple change.