Writing efficient overflow checks in C is very challenging (but good brain exercise). Even libraries written by security experts, such as Microsoft’s SafeInt and CERT’s IntegerLib, could contain broken overflow checks.

For another example, I have seen a lot of bit tricks for detecting int + int overflow, many of which are actually questionable because they rely on undefined behavior. At C-language level one cannot compute the signed addition first before overflow checking since signed integer overflow is undefined. Guess what C compilers would do to the following code.

int sum ( int a , int b ) { int c ; assert ( a >= 0 ); assert ( b >= 0 ); c = a + b ; /* both a and b are non-negative */ if ( c < 0 ) /* overflow? */ abort (); return c ; }

GCC will optimize away the check c < 0 , while Clang keeps it. You may try another (broken) form c < a : this time GCC keeps the check, while Clang removes it.

I am proposing an overflow detection library, libo. Unlike previous ones, it consists of assembly code, and due to the fact that I am lazy, the assembly code is generated automatically from a short program.

Let’s start with a simple task: writing an array allocator malloc_array(n, size) , where n is the number of elements and size is the element size (i.e., the non-zeroing version of calloc ).

Here’s a popular way.

void * malloc_array ( size_t n , size_t size ) { if ( size && n > SIZE_MAX / size ) return NULL ; return malloc ( n * size ); }

Unfortunately neither GCC nor Clang is able to optimize away the division (dunno why though; looks like a straightforward transformation).

malloc_array: # @ malloc_array .cfi_startproc # BB#0: # % entry testq % rsi , % rsi je .LBB0_3 # BB#1: # % land.rhs movq $ - 1 , % rax xorl % edx , % edx divq % rsi cmpq % rdi , % rax jae .LBB0_3 # BB#2: # % return xorl % eax , % eax ret .LBB0_3: # % if.end imulq % rdi , % rsi movq % rsi , % rdi jmp malloc # TAILCALL .Ltmp0: .size malloc_array , .Ltmp0 - malloc_array .cfi_endproc

If you are crazy about performance, try this trick from the Boehm GC guys.

#define SQRT_SIZE_MAX ((1U << (sizeof(size_t) * 8 / 2)) - 1) if (( size | n ) > SQRT_SIZE_MAX /* fast test */ && size && n > SIZE_MAX / size ) return NULL ;

Another way is to promote the integer operation to a wider type, such as 128-bit (assuming size_t is 64-bit); just don’t forget to do the type cast before multiplication. I am not sure how portable this is. The grsecurity patch uses something similar to modify kmalloc .

void * malloc_array ( size_t n , size_t size ) { __uint128_t bytes = ( __uint128_t ) n * ( __uint128_t ) bytes ; if ( bytes > SIZE_MAX ) return NULL ; return malloc (( size_t ) bytes ); }

With the new libo, malloc_array could be implemented as follows.

void * malloc_array ( size_t n , size_t size ) { size_t bytes ; if ( umuloz ( n , size , & bytes )) return NULL ; return malloc ( bytes ); }

With link-time optimization, Clang produces very nice code: compared to the baseline code (without overflow checking), it has only one more jno .

malloc_array: # @ malloc_array .cfi_startproc # BB#0: # % entry movq % rdi , % rax mulq % rsi jno .LBB0_2 # BB#1: # % return xorl % eax , % eax ret .LBB0_2: # % if.end movq % rax , % rdi jmp malloc # TAILCALL .Ltmp0: .size malloc_array , .Ltmp0 - malloc_array .cfi_endproc

Here is the trick. LLVM internally supports arithmetic with overflow operations, based on which the libo generator builds up functions like this:

define i32 @umulo64 ( i64 , i64 , i64 *) alwaysinline { entry: %3 = call { i64 , i1 } @llvm.umul.with.overflow.i64 ( i64 %0 , i64 %1 ) %4 = extractvalue { i64 , i1 } %3 , 0 store i64 %4 , i64 * %2 %5 = extractvalue { i64 , i1 } %3 , 1 %6 = sext i1 %5 to i32 ret i32 %6 }

To use libo with Clang (and link-time optimization), one can just use the IR. To use libo with GCC, one can invoke LLVM’s backends to generate assembly. Assuming LLVM is implemented correctly, we get a correct and efficient libo implementation for all architectures LLVM supports.