In my current position, I work to optimize and parallelize codes that deal with genomic data, e.g., DNA, RNA, proteins, etc. To be universally available, many of the input files holding DNA samples (called reads) are text files full of the characters 'A', 'C', 'G', and 'T'. (It's not necessary to know much about DNA to understand this post or the code I'm going to describe, but if you want a primer you can try "What is DNA?") Depending upon the application and the species being studied, data files can contain 10's of thousands to millions of reads with lengths between 50 and 250 characters (bases) each. Rather than devote a full 8-bit character space in memory for each base, programmers of genomic applications can compress the four DNA molecules into 2 bits, which allows for four bases to be stored within a byte.

One of the applications I've dealt with recently used the following code (with error detection) to compute the 2-bit conversion of a given base (a char parameter to the function containing this code).

switch (base) { case 'A': case '0': return 0; case 'C': case '1': return 1; case 'G': case '2': return 2; case 'T': case '3': return 3; } cerr << "error: unexpected character: '" << base << "'n"; assert(false); abort();

(The number characters '0' to '3' are an alternate input notation.) The return values of this function are shifted into position within a byte to format each read (or substring needed) to use 25% of the space that the original input string needed.

Upon compiling the above code, I found that the assembly code contains a series of indirect jumps. Because there is no dominant base molecule from "random" strings of DNA, there is no savings to be gotten from branch prediction. (When one of our compiler experts saw the compiled code for this routine he was horrified.)

So, I looked to find an alternate way to express this computation that would execute in fewer cycles. The solution that I came up with was to substitute the switch() statement with a table lookup as implemented with the following code.

uint8_t r = b2C[base]; if (r != 0xFF) return r; cerr << "error: unexpected character: '" << base << "'n"; assert(false); abort();

The table, b2C, is declared to have 256 elements, with 248 of those elements holding the value 0xFF. The other eight elements have the integer values zero to three corresponding to the ASCII location of the characters '0' to '3', 'A', 'C', 'G', and 'T'.

Here is the most relevant part of the table declaration:

uint8_t b2C[256] = { 0xFF, 0xFF, 0xFF, 0xFF, 0xFF, 0xFF, 0xFF, 0xFF, //0 0xFF, 0xFF, 0xFF, 0xFF, 0xFF, 0xFF, 0xFF, 0xFF, 0xFF, 0xFF, 0xFF, 0xFF, 0xFF, 0xFF, 0xFF, 0xFF, //1 0xFF, 0xFF, 0xFF, 0xFF, 0xFF, 0xFF, 0xFF, 0xFF, 0xFF, 0xFF, 0xFF, 0xFF, 0xFF, 0xFF, 0xFF, 0xFF, //2 0xFF, 0xFF, 0xFF, 0xFF, 0xFF, 0xFF, 0xFF, 0xFF, 0x00, 0x01, 0x02, 0x03, 0xFF, 0xFF, 0xFF, 0xFF, //3 ‘0’ ‘1’ ‘2’ ‘3’ 0xFF, 0xFF, 0xFF, 0xFF, 0xFF, 0xFF, 0xFF, 0xFF, 0xFF, 0x00, 0xFF, 0x01, 0xFF, 0xFF, 0xFF, 0x02, //4 ‘A’ ‘C’ ‘G’ 0xFF, 0xFF, 0xFF, 0xFF, 0xFF, 0xFF, 0xFF, 0xFF, 0xFF, 0xFF, 0xFF, 0xFF, 0x03, 0xFF, 0xFF, 0xFF, //5 ‘T’ 0xFF, 0xFF, 0xFF, 0xFF, 0xFF, 0xFF, 0xFF, 0xFF, . . . }

It's really a small change and is less intuitive than the original switch() statement. However, since the function now takes the same amount of time to process an 'A' as it does to process a 'T' (rather than an average of 4 indirect jumps per base), this modification led to a 1.20X speedup without any other code changes. The authors of the code were amazed that even little changes like this can have significant impact on the execution time of their application. In retrospect the speedup seems almost obvious. With a large portion of the execution dedicated to input and conversion of very large data sets, even small changes to reduce the number of instructions executed should positively impact the run time.

I'm not advocating you go and change all your switch() constructs to table lookups. This was a very special case that fit the mold and yielded good results. Besides, there is always the maintenance aspect to what programming constructs are used. If something like a switch() or a multi-level if-then-elseif structure makes sense and gets the job done, then use it. Even so, converting to something less obvious can lead to execution improvement. Be open to different coding practices.