This article collects the general knowledge and Best-Known-Methods (BKMs) for aligning of data within structures in order to achieve optimal performance.

Every data type has an alignment associated with it which is mandated by the processor architecture rather than the language itself. Aligning data elements allows the processor to fetch data from memory in an efficient manner and thereby improves performance. The compiler tries to maintain these alignments for data elements to provide optimal performance. The typical alignment requirements for data types on 32-bit and 64-bit Linux* systems as used by the Intel® C++ Compiler are shown below:

Data Type 32-bit (bytes) 64-bit (bytes) char 1 1 short 2 2 int 4 4 long 8 8 float 4 4 double 8 8 long long 8 8 long double 4 16 Any pointer 4 8

Table1: typical alignment requirements for data types on 32-bit and 64-bit Linux* systems as used by the Intel® C++ Compiler

In general, the compiler will try to fulfill these alignment requirements for data elements whenever possible. In the case of the Intel® C++ and Fortran compilers, you can enforce or disable natural alignment using the –align (C/C++, Fortran) compiler switch. For structures that generally contain data elements of different types, the compiler tries to maintain proper alignment of data elements by inserting unused memory between elements. This technique is known as 'Padding'. Also, the compiler aligns the entire structure to its most strictly aligned member. The compiler may also increase the size of structure, if necessary, to make it a multiple of the alignment by adding padding at the end of the structure. This is known as 'Tail Padding'. Thus padding improves performance at expense of memory. In case of the Intel® Xeon Phi™ Coprocessor, where the amount of memory available to the application is limited in itself, this poses a serious problem.

BKM: Minimizing memory wastage: One can try to minimize this memory wastage by ordering the structure elements such that the widest (largest) element comes first, followed by the second widest, and so on. The following example helps illustrate the effect of ordering of data elements on the size of the structure. struct s1 { char a; short a1; char b1; float b; int c; char e; double f; }; Size of Struct s1 = 32 The structure s1 has 11 bytes of padding as illustrated by the diagram below: Consider the following structure s2 struct s2 { double f; float b; int c; short a1; char a,b1,e; }; Size of Struct s2 = 24 This structure contains only 3 bytes of tail padding as indicated by the following figure and saves memory. Hence, simply re-arranging the elements during the structure definition may help in avoiding memory wastage. BKM: Touching only some elements at a time: An exception to this ordering of elements in structures is if your structures are bigger than a cache line, which is 64 bytes in case of Intel Xeon Phi coprocessor, and some loops or kernels touch only a part of the structure. In this case, it may be beneficial to keep the parts of the structure that are touched together in memory, which may lead to improved cache locality.

BKM: Splitting larger structures: If your structures are larger than a cache line with some loops or kernels touching only a part of the structure then you may consider reorganizing the data by splitting the large structure into multiple smaller structures which are stored as separate arrays. This potentially improves the density of touched data and thereby improves cache locality.

BKM: Forcing specific alignment using align(n): You can also use the __declpsec(align) attribute to direct the compiler to align data more strictly than it otherwise would. The syntax for this extended-attribute is as follows:

C/C++:

Fortran:

Where n is the requested alignment and is an integral power of 2, up to 4096 for the Intel C++ Compiler and up to 16384 for the Intel Fortran Compiler.

You can use this attribute to request alignments for individual variables, or structures of static or automatic storage duration. However, with that said, although you can increase the alignment of a struct, this attribute does not adjust the alignment of elements within the struct. By placing the __declspec(align) before the keyword struct, you request the appropriate alignment for only the objects of that type. Let me illustrate my point with the following example: