4. The Proof Backfired

The next step was to test the performance of this critical portion of the software. The original assembly code was written for the TI 320C15 DSP chip, but no Ada compiler was targeted to this chip. However, Tartan, Inc. had an Ada compiler targeted to the closely related TI 320C30 DSP chip, and this chip was chosen for the Ada application. Tartan agreed to test the newly developed Ada software at the Tartan facility in Pittsburgh.

Tartan had benchmarked its compiler at 1.3:1, compiled Ada size vs. assembly size, with comparable performance. This benchmark was established using an assembly program of 10,000 lines of code written by a very experienced assembly programmer, as compared with a relatively inexperienced Ada programmer.

Dave Syiek of Tartan performed the tests for QRS in July 1991. He began with the Ada which had been developed by QRS and an assembly version of the same CSC targeted to the C30 chip. This assembly was a translation of that which QRS had originally developed for the C15 chip. [4]

Syiek tested each version for speed by using the common benchmarking technique of placing the code to be timed inside a loop, timing a very large number of iterations through the loop, and then figuring the average time per loop iteration. He experimented with several combinations of compiler options with each version, and then he documented the fastest result for each. These results of the tests on the original working versions indicated little difference in running times between the assembly and the Ada, but the Ada was significantly smaller (see Table 1). [4]

Because of the size discrepancy between the original assembly and Ada versions, Syiek examined the code to determine the reason for it. He discovered that the assembly code had "unrolled" a loop which was executed six times. This means the code was copied six times instead of using a loop. It is a technique used to avoid the overhead of a looping structure and thus speed up code execution. To make something comparable in the Ada code, Syiek created a generic unit for the algorithm inside the loop, and he replaced the looping structure with six instantiations of the generic. At the same time, he also noticed three places where the addition of a local variable would avoid an unnecessary recalculation, and he added these variables. The fastest run of this new version of the Ada was comparable in size to the assembly code. However, the compiled Ada code ran approximately twice as fast as the assembly (see Table 1)! [4]

Table 1

+-----------+------------+ | Size in | Speed in | Code Version | Words | Microsecs | +-----------------+-----------+------------+ | Assembly | 410 | 48.4497 | | Original Ada | 134 | 49.3164 | | New Ada | 414 | 25.1892 | +-----------------+-----------+------------+