Programming in standardized high level languages has the benefit of being readily portable across architectures. One would think that portability exists as long as programmers constrain themselves to features specified by the language standard and employ compilers that are dutiful in implementing to the standard. Unfortunately, this is not always the case. Migration of software between architectures becomes problematic when byte order dependent code [8] exists in the source code base only to be discovered when runtime problems surface. In large, legacy code bases consisting of millions of lines of code, it is very difficult to find and address all of the byte order dependencies using known techniques [1] into endian-neutral code.

The Bi-Endian Compiler (BEC) enables applications to execute with the byte order semantics as they were designed. For example, the BEC implementation discussed in this article enables applications to execute with big-endian semantics on a little-endian processor. Employing BEC requires the programmer to designate the byte order of all data. During compilation, BEC inserts code sequences, where necessary, to load data into processor registers such that the data is in native byte order before operations are performed. Subsequently, code sequences are inserted that transform the results in native byte order into the resulting data's declared byte order before storing to memory.

This article first provides background on the subject by reviewing byte order dependencies and current techniques to mitigate issues involving them. The BEC is then introduced [2][6], discussing the language features necessary to express byte order and the underlying compiler implementation. The porting process is discussed showing how to effectively apply the compiler and its features to port an application. Performance optimization and evaluation is then detailed, showing techniques used to improve the performance of the implementation. The conclusion summarizes and offers thoughts on future directions.

Background

Endianness, or byte order, is the format of how multibyte data is stored in memory [3][10]. It specifies the location of the most significant and least significant bytes that comprise a multibyte type such as a 32-bit integer. The two types of endian architectures are termed Big-Endian and Little-Endian. Discussions on the advantages and disadvantages of each has been characterized as being akin to a religious war [4][7]. Regardless, both big-endian and little-endian architectures exist and this can cause problems when migrating between architectures due to byte order dependent code. Example 1 shows a code snippet that returns differing output samples depending upon the byte order of the processor architecture used to execute the code. On a big-endian processor, where the most significant byte is stored in the lowest memory address, the pointer ap points to 0x12. On a little-endian processor, where the least significant byte is stored in the lowest memory byte address, the pointer ap points to 0x78. Legacy code bases built up over several years by many different programmers can be littered with such snippets of code motivated in many cases by optimization; assuming the location of a smaller subset of bytes in a multibyte element can save in terms of memory transactions.

#include <stdio.h> int a = 0x12345678; char *ap = (char *)&a; printf("%2x %x

", *ap, a); Output on a big-endian processor: 12 12345678 Output on a little-endian processor: 78 12345678

Example 1: Byte order dependent code example and output (Source: Intel Corporation).

Techniques of transforming byte order dependent code into endian-neutral code are well understood [1][8]. In the aforementioned example, macros could be defined whose implementation would be platform dependent, but would agree upon which byte of a larger component would be considered first, second, and so on. The techniques require the programmer to first identify byte order dependent code and make manual code changes to enforce endian neutrality. In comparison, BEC does not require the programmer to find the specific byte order dependent code, but to only identify the byte order of the data. The compiler enforces that the correct byte order semantics are being executed. In the example from Example 1, if the code was written to assume big-endian order, the programmer only specifies that the variable a is big-endian and the compiler ensures that the expectation is met. In the conservative case, the programmer could communicate that the entire program should execute with big-endian semantics and the compiler would enforce.

A second approach to migrating byte order dependent code is encapsulated by binary translation techniques [9]. These techniques encompass more in that they enable execution of one processor's instruction set architecture (ISA) on a processor with a different ISA by intelligently and efficiently translating between the two. This approach is attractive due to its relative ease of use for the customer; Apple employed its Rosetta technology to help migrate from the PowerPC architecture to Intel architecture. compared with BEC, this approach typically incurs greater overhead as the application is translated during runtime without the benefit of aggressive static compiler optimization techniques.

Bi-Endian Evolution and Implementation

The BEC has extensions to C and C++ in which byte order is a type attribute and can be bound to a built-in type, typedef , or to a type as part of a variable declaration. The byte order attribute can be bound to pointer types, floating point types, and be part of a type chain consisting of multiple pointer indirections, integral, and floating point types. The following sections describe the evolution of the BEC, its language extensions, the dataflow analyses implemented, and big-endian data initialization.

Evolution

As the early proposals for a BEC prototype were discussed, it was necessary to dispel a common misconception. Often, engineers and managers reviewing the proposal voiced concerns that the compiler would be unable to determine the intentions of the programmer. They viewed byte order dependencies as they are encountered in a debugger or binary translator. It was often necessary for the prototyping team to explain that if the byte order of types was part of the program specification, the intentions of the programmer are clearly stated. It only became necessary to provide mechanisms for explicitly declaring byte orders at varying granularities of scope.

The earliest proof-of-concept BEC was demonstrated using the C benchmarks in SPEC2000. These benchmarks were compiled such that all types were declared as big-endian. The resulting executables were run on a processor that supports only little-endian. The next large scale demonstration involved compiling the Linux operating system. The byte order of the fields within the Internet Protocol (IP) header in the network stack is maintained as big-endian by using ntoh() and hton() macros, converting between network order, which is big-endian and the host byte order, which is little-endian. As a demonstration of the BEC capabilities, the header fields were explicitly declared as big-endian and the macros were removed. The functionality of the compiled operating system was unchanged. The BEC technology is an alternative to the use of ntoh() and hton() macros to maintain network byte order.

Debugging of mixed endian code, which is code containing uses of both big-endian and little-endian types, was supported by modified versions of a proprietary debugger and GNU gdb. The DWARF v3.0 specification provides a means for specifying byte order. Various other DWARF mechanisms allow specifying the location of pieces of data, as would be necessary when the actual byte order of data is other than specified by the programmer. An example where byte order of data can differ from its specification is when a value declared as big-endian is operated upon, such as an addition. During execution time, the data element is represented in a register in little-endian format (on a little-endian processor) for correct operation. Display of the data in a debugger would need to understand this difference. In a large development project, the person who is debugging such code may not be aware of the actual byte order of the data.



Language Features

The primary function of the language extensions is to enable the programmer to communicate the byte order of compilation units, code sections, and individual declarations to the compiler. Example 2 is a code sample employing each of the above. The source file, file.c, is compiled using the dependencieslittle-endian option, which specifies all data declarations in the compilation unit are little- endian. Similarly, the compiler supports a -big-endian option, which specifies all data declarations in the compilation unit are big-endian. Declarations that are impacted by these options are said to have been declared implicitly in either a big-endian or little-endian context. The variable, a , in the example would be stored in little-endian byte order. For convenience, we refer to such a variable as a little endian variable. In the file, #pragma byte_order (push, bigendian) , specifies that declarations following the pragma are big-endian. The variable, b , would be stored in big-endian byte order. In addition, the optional parameter push specifies that a stack of byte orders is maintained that enables byte order declarations spanning nested include files. This declaration method is also implicit and overrides the byte order specified at the command line. A section of code that has an implicit declaration bound to it is termed a big- or little-endian section (depending on the specified byte order). At the finest granularity an explicit declaration occurs via a byte order attribute. The variable, c , would be stored in big-endian byte order. The byte order attribute overrides both implicit methods.

icc -little-endian file.c

icc -little-endian file.c /*file.c*/ int a = 0x12345678; /*little-endian*/ #pragma byte_order (push, bigendian) int b = 0x12345678; /*big-endian*/ #pragma byte_order(pop) int __attribute__((bigendian) c=0x12345678; /*big-endian*/

Example 2: byte order-dependent code example (Source: Intel Corporation).

By default, the byte order of system include files are the same as the target architecture. The BEC allows specification of the implicit byte order of individual compilation units, for instance include files, without modification of the source code. The BEC provides command line options to specify directory sub-trees and to specify the implicit byte order of the compilation units contained in the sub-tree. The sub-tree identification mechanism is based upon regular expression matching of the directory path name. The implicit byte order specification is by way of source code prolog and epilog files that contain #pragma byte order specifications. The prolog and epilog files become part of the post-preprocessor file that is actually translated by subsequent phases of the compilation.

Compilation Phases

Similar to many modern compilers, the BEC consists of multiple phases, transforming one representation of the code to another, beginning with the source code and concluding with the executable. Figure 1 illustrates the compilation phases.





Figure 1: BEC Compilation Phases (Source: Intel Corporation, 2011).

The front-end phase parses the source code and transforms it into an abstract syntax tree (AST). During this phase, byte order attributes are associated with the program types represented in the AST and are dependent upon the byte order context at the point of declaration and as discussed in the previous section.

The BEC employs a proprietary intermediate language (IL) to represent the program under compilation. The IL translation phase converts the AST representation into this IL representation. Translated variables may have byte order conversion operations (BOCOs) placed both before and after the variable in cases when its type has a byte order opposite that of the underlying target.

The optimization phase operates on the IL representation to make execution of the code on the target platform more efficient. Since the BOCOs are represented in the IL just as any other instructions, standard compiler optimization can be applied. These optimizations and a description of each include:

Common subexpression elimination: removes redundant BOCOs for unused data.

Code motion: moves BOCOs up to the function entry, which reduces the number of BOCOs.

Constant propagation: determines if a constant that requires a BOCO has already been loaded (and converted), which eliminates unnecessary BOCOs.

In addition, an optimization solely designed to remove redundant BOCOs is invoked through an optimization termed the "bswap elimination optimization," which will be discussed in greater detail later in this article.

The code generator phase converts the IL representation into the binary code specific to the target platform. BOCOs are implemented using either hardware shift instructions or BSWAP instructions. Hardware BSWAP instructions provide an efficient method of converting between byte orders.

Data Initialization

In the BEC, pointer data types can also be attributed with a byte order. As a result, there may be situations where a pointer has the opposite byte order as the target platform. Pointers of the opposite byte order must be byte-swapped upon initialization. This presents a challenge when pointers are initialized by link time constants because these constants are unknown at compile time and are resolved later during the linking stage.

For the initial proof-of-concept BEC, the linker was modified to support other-endian, linker or loader resolved constants. This created an unnecessary dependency within the tool chain with respect to the new technology. Subsequent versions of the BEC employed a flexible set of alternatives for resolving such constants.

In order to perform the necessary byte swap operations for pointers, the compiler generates and places special initialization data in a section of the object file, the . initdata section. This information is used in a three-step data initialization process detailed as follows:

At the static data initialization step, a post link tool is employed, which initializes data that can be initialized statically, such as data that does not have relocations associated with it.

At the dynamic loader initialization step, a dynamic loader has an opportunity to complete initialization based on the information from the . initdata section. This step is optional and requires a modified operating system loader.

The dynamic runtime initialization step, employs a runtime routine to initialize data stored in the opposite byte order from the underlying platform. This routine is automatically invoked prior to passing control to the main routine.

Porting Process

Porting of the application from a big-endian architecture to a little-endian one consists of three main steps:

Compile the application with the BEC in big-endian compilation mode. Resolve compiler-diagnosed issues such as warnings.

Employ a symbol consistency checking mechanism to resolve possible incompatibilities between different compilation units.