By pascal on Wednesday, July 31 2013, 09:47 - Permalink

Quiz time

What does the program below do?

#include <stdio.h> int main(){ struct { int t[4]; int u; } v; v.u = 3; v.t[4] = 4; printf("v.u=%d", v.u); return 0; }

Two answers are “it prints v.u=4 ” and “it prints v.u=3 ”:

$ gcc t.c && ./a.out v.u=4 $ gcc -O2 t.c && ./a.out v.u=3

The correct answer is of course that the program invokes undefined behavior. It is not that we are using at any time an lvalue of the wrong type to access memory, breaking the so-called “strict aliasing rules”. It is not that v.t+4 is outside of object v . The problem is that v.t+4 is outside object v.t . So GCC does what it pleases, and when compiling with -O2 , optimizes brutally:

$ gcc -S -O2 t.c && cat t.s .LC0: .string "v.u=%d

" … movl $3, %edx movl $.LC0, %esi movl $1, %edi xorl %eax, %eax call __printf_chk

Frama-C's value analysis warns for the above program:

$ frama-c -val t.c t.c:9:[kernel] warning: accessing out of bounds index {4}. assert 4 < 4;

In general, accessing t[i] when t is an array of size 4 is only valid when i < 4 , but here the index is hard-coded as 4 , so line 9 is only valid when 4 < 4 . That is, never: all executions that reach line 9 encounter undefined behavior there.

Second quiz, same as the first quiz

What does the program below do?

#include "stdlib.h" typedef struct{ int tab[1]; } ts; int main() { ts *q = malloc(5*sizeof(int)); q->tab[2]= 5; return 1; }

If you guessed “invoke undefined behavior”, well done!

The program above was shown to me by facetious colleague Bernard Botella, who is hard at work analyzing Python 2.7.4's runtime in the context of a project named SafePython. The snippet above is his reduced version of a larger piece of C code he found there. The issue Bernard was having started with the type definition below, and I will let you guess the rest:

typedef struct { PyObject_VAR_HEAD PyObject *ob_item[1]; /* ob_item contains space for 'ob_size' elements. * Items must normally not be NULL, except during construction when * the tuple is not yet visible outside the function that builds it. */ } PyTupleObject;

In C90, the “array of size 1 as last member of a struct” was a common idiom for implementing things like Pascal strings. And of course it is just as valid for variable-length tuples. The problem is that this is not 1990 any more: compilers now use undefined behavior as an excuse to optimize aggressively, and the idiom is no longer valid at all for either tuples or Pascal strings. On the plus side, in the C99 standard we got “incomplete types”, a safe way to implement tuples and Pascal strings:

typedef struct { PyObject_VAR_HEAD PyObject *ob_item[]; …

Conclusion

I have encouraged my colleague Bernard to report the above as a bug in Python. This kind of bug report is usually ignored, because it denounces idioms that programmers have used for a long time and that they think they understand. Just remember: if you think you can predict what the program in the second quiz does, you should be able to predict what the program in the first quiz does (or explain what is different about it).