One well-documented way in which a program may break in presence of objects of more than PTRDIFF_MAX bytes is by doing a pointer subtraction that results in an overflow:

#include <stdio.h> #include <stdlib.h> int main(void) { char *p = malloc(0x81000000); if (!p) abort(); char *q = p + 0x81000000; printf("q - p: %td

", q - p); }

On the Ubuntu Linux version that I have conveniently at hand, this program, compiled to 32-bit code, produces:

$ gcc -m32 t.c && ./a.out q - p: -2130706432

Strange results are acceptable because the overflow in pointer subtractions is undefined behavior:

When two pointers are subtracted, both shall point to elements of the same array object, or one past the last element of the array object; the result is the difference of the subscripts of the two array elements. The size of the result is implementation-defined, and its type (a signed integer type) is ptrdiff_t defined in the header. If the result is not representable in an object of that type, the behavior is undefined.

Do not stop reading if you already know all this! Things are going to get interesting in a minute!

Incidentally, when I tried this program on OS X 10.9 with clang -m32 t.c && ./a.out , I obtained a warning that malloc had failed, and my program aborted without doing the dangerous q - p pointer subtraction. In effect, a malloc function that rejects allocations of more than half the address space goes a long way towards preventing surprising behaviors. This is interesting because at least as late as OS X 10.5, I routinely allocated 2.5GiB from 32-bit processes. On the OS X platform, the decision was courageously made to break some 32-bit applications that used to work in order to prevent some vaguely worrying problems. But then again, preserving application compatibility was never the foremost preoccupation of the Mac platform.

Even if you already knew all this, keep reading.

Subtraction of pointers to wider types

On a 32-bit platform

GCC and Clang’s behavior on the following example is not excused by clause 6.5.6p9:

#include <stdlib.h> int main(void) { short *p, *q; p = malloc(0x70000000 * sizeof *p); // note: product does not overflow q = p + 0x70000000; // note: stays in-bounds return q - p; // note: does not overflow ptrdiff_t }

This program prints -268435456. Wait, what?

The example can be made more striking by making the expected result of the pointer subtraction smaller:

#include <stdlib.h> typedef char (*arrayptr)[0x70000000]; int main(void) { arrayptr p, q; p = malloc(2 * sizeof *p); q = p + 2; return q - p; }

Both Clang and GCC make this program return 0 when it’s very obvious that the correct result is 2. As a psychological experiment, showing these related examples to C programmers, the average subject might reluctantly let themselves be convinced that -268435456 is 0x70000000, but they are quite certain that 2 should not be 0, and express more surprise for the second example.

The answer to the riddle—but not really an excuse in itself for Clang and GCC’s behavior—is contained in the following decomposition of pointer subtraction into elementary steps. The “sarl” instruction implements a signed operation that does not produce the correct result when the value in %eax should be interpreted as a number above 231. When the element size is not a power of two, the compiler uses a division instruction but the problem remains the same: the sequence generated by the compiler implement a signed division that doesn’t work above 231.

Partially excusing GCC’s behavior, it is somewhat documented (in replies to bug reports) that 32-bit GCC does not work with malloc functions that succeed for requests of more than 2GiB of memory. As long as the malloc calls fail in the examples above, GCC is not doing anything wrong, because the program invokes undefined behavior as soon as p + … is computed. (It is illegal to do pointer arithmetic from a null pointer.)

And indeed, if we re-introduce the malloc success check that I surreptitiously removed from the initial example, GCC does the right thing. “The right thing” being making a call to malloc , which, as semi-documented, should fail for the requested size in order to be compatible with GCC.

It’s a shame that 32-bit Glibc’s malloc does not systematically fail for such requests, but that is not GCC’s problem. It is the problem of the programmer who links together code generated from GCC and code from Glibc (note that Glibc is not intended to be compiled with a compiler other than GCC, which is unfortunate considering).

Clang, the most honey badger-like of compilers, don’t give a shit.

64-bit platforms

64-bit solves all these problems, because a malloc call requesting half the address space always fails, right?

It’s a shame that there was a difficult trade-off on 32-bit platforms, where the problem was fully realized only after some applications had gotten used to requesting 2.5GiB and getting away with it, but as even the smallest computers are moving to 64-bit, we can consider this problem solved by Moore’s law? 263-1 bytes ought to be enough for anybody?

Not really. The malloc call in the program below succeeds (the program would return 0 if the malloc invocation failed), despite Clang having the same half-the-address-space limitation for 64-bit as for 32-bit.

#include <stdlib.h> typedef char (*arrayptr)[0x700000000000000]; int main(void) { arrayptr p, q; p = malloc(32 * sizeof *p); if (!p) return 0; q = p + 32; return q - p; }

The program not returning 0 indicates that malloc succeeded. The program returning a nonzero value other than 32 indicates that Clang is confused.

Comparisons of pointers

One might consider it an acceptable trade-off to let malloc allocate 2.5GiB on 32-bit platforms, if the developer knows that they never subsequently subtract pointers to the allocated zone. Not as char* and not as pointers to another type, since compilers are broken for these subtractions too, even when C11’s 6.5.6:9 clause perfectly defines the result.

Unfortunately, an optimizing compiler’s code appears to be littered with optimizations that assume object sizes are less than PTRDIFF_MAX . As an example, we all know that X86 Clang and GCC implement pointer comparison as an assembly unsigned comparison (“cmpl” is an agnostic comparison instruction, but here it is followed by “cmoval”, where the “a”, for “above”, indicates conditional move based on an unsigned comparison). This means that it is allowed to compare pointers inside a putative 2.5GiB array on a 32-bit platform, right?

Not all pointer comparisons.

Because of this optimization.

Note the “g” in “cmovgl”. This “g” stands for “greater”: a mnemotechnic(?) indicator that a signed comparison decides which string is returned. Determining what happens with offs1=0 and offs2=0x81000000 , two values representable in a 32-bit size_t that, with a lenient malloc, can also be valid offsets into a same array, is left as an exercise for the reader.

Conclusion

It is impossible to tell when the compiler will apply this or any other optimization that assumes that a pointer difference is always representable as a ptrdiff_t . Therefore, the only safe path on 32-bit platforms is to refrain from allocating more than 2GiB in a single malloc call; it might be a good idea if Glibc’s malloc was changed to implement this limitation. Compilers could do much, much better than they currently do. Nothing would prevent malloc(0x81000000) to cause a warning (I suggest “malloc call will always return NULL” or something equally direct). Beyond the friendliness of warnings, Clang’s elimination of malloc calls that, because of Clang limits, cannot succeed, as calls that cannot fail indicate a disturbing misunderstanding of classical logic.

Acknowledgements

Alexander Cherepanov and John Regehr provided most of the examples in this post, with additional remarks coming from Daniel Micay and Rich Felker. Matt Godbolt made the absolute best platform on which to investigate these examples and show them to the world.