The following example implicitly enables vectorization and interleaving by specifying a vector width and interleaving count:

The #pragma clang loop directive allows loop vectorization hints to be specified for the subsequent for, while, do-while, or c++11 range-based for loop. The directive allows vectorization and interleaving to be enabled or disabled. Vector width as well as interleave count can also be manually specified. The following example explicitly enables vectorization and interleaving:

The loop vectorizer uses a cost model to decide on the optimal vectorization factor and unroll factor. However, users of the vectorizer can force the vectorizer to use specific values. Both ‘clang’ and ‘opt’ support the flags below.

The Loop Vectorizer is enabled by default, but it can be disabled through clang using the command line flag:

To ensure line and column numbers are produced include the command line options -gline-tables-only and -gcolumn-info . See the Clang user manual for details

And the command line -Rpass-analysis=loop-vectorize indicates that the switch statement cannot be vectorized.

-Rpass-analysis=loop-vectorize identifies the statements that caused vectorization to fail. If in addition -fsave-optimization-record is provided, multiple causes of vectorization failure may be listed (this behavior might change in the future).

-Rpass-missed=loop-vectorize identifies loops that failed vectorization and indicates if vectorization was specified.

Many loops cannot be vectorized including loops with complicated control flow, unvectorizable types, and unvectorizable calls. The loop vectorizer generates optimization remarks which can be queried using command line options to identify and diagnose loops that are skipped by the loop-vectorizer.

The LLVM Loop Vectorizer has a number of features that allow it to vectorize complex loops.

Loops with unknown trip count¶ The Loop Vectorizer supports loops with an unknown trip count. In the loop below, the iteration start and finish points are unknown, and the Loop Vectorizer has a mechanism to vectorize loops that do not start at zero. In this example, ‘n’ may not be a multiple of the vector width, and the vectorizer has to execute the last few iterations as scalar code. Keeping a scalar copy of the loop increases the code size. void bar ( float * A , float * B , float K , int start , int end ) { for ( int i = start ; i < end ; ++ i ) A [ i ] *= B [ i ] + K ; }

Runtime Checks of Pointers¶ In the example below, if the pointers A and B point to consecutive addresses, then it is illegal to vectorize the code because some elements of A will be written before they are read from array B. Some programmers use the ‘restrict’ keyword to notify the compiler that the pointers are disjointed, but in our example, the Loop Vectorizer has no way of knowing that the pointers A and B are unique. The Loop Vectorizer handles this loop by placing code that checks, at runtime, if the arrays A and B point to disjointed memory locations. If arrays A and B overlap, then the scalar version of the loop is executed. void bar ( float * A , float * B , float K , int n ) { for ( int i = 0 ; i < n ; ++ i ) A [ i ] *= B [ i ] + K ; }

Reductions¶ In this example the sum variable is used by consecutive iterations of the loop. Normally, this would prevent vectorization, but the vectorizer can detect that ‘sum’ is a reduction variable. The variable ‘sum’ becomes a vector of integers, and at the end of the loop the elements of the array are added together to create the correct result. We support a number of different reduction operations, such as addition, multiplication, XOR, AND and OR. int foo ( int * A , int n ) { unsigned sum = 0 ; for ( int i = 0 ; i < n ; ++ i ) sum += A [ i ] + 5 ; return sum ; } We support floating point reduction operations when -ffast-math is used.

Inductions¶ In this example the value of the induction variable i is saved into an array. The Loop Vectorizer knows to vectorize induction variables. void bar ( float * A , int n ) { for ( int i = 0 ; i < n ; ++ i ) A [ i ] = i ; }

If Conversion¶ The Loop Vectorizer is able to “flatten” the IF statement in the code and generate a single stream of instructions. The Loop Vectorizer supports any control flow in the innermost loop. The innermost loop may contain complex nesting of IFs, ELSEs and even GOTOs. int foo ( int * A , int * B , int n ) { unsigned sum = 0 ; for ( int i = 0 ; i < n ; ++ i ) if ( A [ i ] > B [ i ]) sum += A [ i ] + 5 ; return sum ; }

Pointer Induction Variables¶ This example uses the “accumulate” function of the standard c++ library. This loop uses C++ iterators, which are pointers, and not integer indices. The Loop Vectorizer detects pointer induction variables and can vectorize this loop. This feature is important because many C++ programs use iterators. int baz ( int * A , int n ) { return std :: accumulate ( A , A + n , 0 ); }

Reverse Iterators¶ The Loop Vectorizer can vectorize loops that count backwards. int foo ( int * A , int n ) { for ( int i = n ; i > 0 ; -- i ) A [ i ] += 1 ; }

Scatter / Gather¶ The Loop Vectorizer can vectorize code that becomes a sequence of scalar instructions that scatter/gathers memory. int foo ( int * A , int * B , int n ) { for ( intptr_t i = 0 ; i < n ; ++ i ) A [ i ] += B [ i * 4 ]; } In many situations the cost model will inform LLVM that this is not beneficial and LLVM will only vectorize such code if forced with “-mllvm -force-vector-width=#”.

Vectorization of Mixed Types¶ The Loop Vectorizer can vectorize programs with mixed types. The Vectorizer cost model can estimate the cost of the type conversion and decide if vectorization is profitable. int foo ( int * A , char * B , int n ) { for ( int i = 0 ; i < n ; ++ i ) A [ i ] += 4 * B [ i ]; }

Global Structures Alias Analysis¶ Access to global structures can also be vectorized, with alias analysis being used to make sure accesses don’t alias. Run-time checks can also be added on pointer access to structure members. Many variations are supported, but some that rely on undefined behaviour being ignored (as other compilers do) are still being left un-vectorized. struct { int A [ 100 ], K , B [ 100 ]; } Foo ; int foo () { for ( int i = 0 ; i < 100 ; ++ i ) Foo . A [ i ] = Foo . B [ i ] + 100 ; }

Vectorization of function calls¶ The Loop Vectorizer can vectorize intrinsic math functions. See the table below for a list of these functions. pow exp exp2 sin cos sqrt log log2 log10 fabs floor ceil fma trunc nearbyint fmuladd Note that the optimizer may not be able to vectorize math library functions that correspond to these intrinsics if the library calls access external state such as “errno”. To allow better optimization of C/C++ math library functions, use “-fno-math-errno”. The loop vectorizer knows about special instructions on the target and will vectorize a loop containing a function call that maps to the instructions. For example, the loop below will be vectorized on Intel x86 if the SSE4.1 roundps instruction is available. void foo ( float * f ) { for ( int i = 0 ; i != 1024 ; ++ i ) f [ i ] = floorf ( f [ i ]); }