Alexander Stepanov's Standard Template Library provided a huge push towards making template programming an important part of C++, and helped to insure that it was included as part of the first standard in 1998. But adoption of templates by most programmers has been more of an incremental process, as opposed to a revolutionary one  many of us have literally decades of "Object-Oriented == The One True Path" to unlearn. And to top things off, templates needed some refinement over the years before they could do many of the things we expected of them.

In this article, I show you how generic programming with templates makes a noticeable improvement in a pedestrian but important task I deal with constantly in my work on data compression: bit-level I/O. Then, I'll show the annoying but understandable problems with this approach circa 1998, and finally, how to resolve them with features added in TR1 and C++1.

Bit-Oriented I/O

Most data-compression algorithms read their input data using well-supported byte-oriented I/O. Text files are read a byte at a time, and most image files are blocked into data structures that align to byte boundaries.

But the output of compressors tends to come in odd sizes. For example, venerable LZW compression output usually starts with nine-bit-wide codes, which grow as time goes on, through ten, eleven, and twelve bits, up to some arbitrary size.

The standard C++ libraries don't have I/O APIs that support these odd sizes, so any compressor implementation has to build these I/O routines into their program. And of course, the inverse task  decompression  needs to be able to read data in non-standard sizes in order to write uncompressed data. This is effectively the same problem, just turned on its head.

To illustrate how this works, I'll use the example of arithmetic compression in this article. Canonical CACM89 arithmetic compression outputs one bit at a time. So whether you are writing to files, sockets, or shared memory, you are going to need some kind of shim in your library that converts those single bits into bytes suitable for use with standard libraries.

The Solution Before Templates

If you are writing a dedicated compressor that is purpose-built for some application, you would probably just embed some dedicated code in your input and output routines. This article is less important for a one-off solution like that.

Things get more complicated if you are a library writer. If you want to provide an arithmetic compression capability to end users, you don't have any idea in advance where their I/O is going. So the classical OOP solution to this is to define a base class with virtual methods that the compressor needs, then require library users to implement derived classes that provide all missing functions and obey certain rules.

In this case, my arithmetic compressor will have dedicated routines that do the bit-fiddling, and then rely on a shim class to provide byte-oriented input and output. This means the library can read or write from a disparate variety of sources. My pre-template solution to this is illustrated with a partial code segment from bitio_00.cpp , which can be downloaded. This uses classic OOP to create input and output classes, which are then called from inside the compressor.

class compressor { public : compressor(input_bytes &input, output_bytes &output ) : m_input(input), m_output(output), m_NextByte(0), m_Mask(0x80) { } ~compressor() { if ( m_Mask != 0x80 ) m_output.putByte(m_NextByte); } int operator()() { //compression takes place here // e.g. putBit( 0 ); putBit( 1 ); } protected: void putBit( bool val ) { if ( val ) m_NextByte |= m_Mask; m_Mask >>= 1; if ( !m_Mask) { m_output.putByte(m_NextByte); m_Mask = 0x80; m_NextByte = 0; } } private : output_bytes &m_output; input_bytes &m_input; char m_NextByte; int m_Mask; };

If I were going to perform compression on a pair of iostream objects, I would need simple derived classes that implemented the necessary shim code for the putByte() and getByte() functions. A simple version of the input class used in bitio_00.cpp is shown here:

class input_bytes { public : virtual int getByte() = 0; }; class input_bytes_istream : public input_bytes { public : input_bytes_istream(std::istream &stream) : m_stream(stream) { } int getByte() { return m_stream.get(); } private : std::istream &m_stream; };

This works, and at least for the library writer, it is fairly satisfactory.

As a library user, it is not really optimal. I'd like to be able to use a library without having to create shim classes. Of course, the library writer is free to implement common shim classes, which can then be used as input for more customized approaches.

Withholding any other judgment, this is an object-oriented solution that is fairly simple to understand and implement, and there are plenty of libraries out there that use this sort of implementation. You probably find it somewhat familiar and can imagine how you might deal with it as a user.

So Where Are The Problems?

Although the OOP solution to this problem works, there are a couple of problems.

First, as library writers start identifying additional things they need in their I/O classes, the abstract base classes sometimes get larded with additional methods, making it more and more complicated to implement specialized versions. While not a problem in this specific example, it definitely is a problem in the real world.

For example, the arithmetic compressor writer might decide that block I/O offers some performance improvements, leading him or her to add methods for putBytes() and getBytes() . This kind of creeping elegance leads to complex base classes that annoy users.

However, this is more a stylistic problem, and not necessarily tied directly to OOP, although it does seem to be a natural outgrowth of that type of programming.

A more important problem, and one that is really a big deal for data compression, is the cost of using virtual base classes to implement I/O. Functions like putBit() are called repeatedly in tight loops. The fact that they have to invoke virtual functions is annoying  these are hard for the C++ compiler to optimize at compile time, and so each call is an expensive (usually one level of pointer indirection) subroutine call.

What would be much better, and what the template-based approach allows for, is to define the input and output routines as parameterized types using templates. The compiler can then generate the code that calls those functions inline with the compression code  avoiding those expensive subroutine calls and giving the compiler leeway to optimize at will. In some circumstances, this can result in a substantial improvement in runtimes.