November 15, 2019

nullprogram.com/blog/2019/11/15/

This article was discussed on Hacker News.

In software development there are many concepts that at first glance seem useful and sound, but, after considering the consequences of their implementation and use, are actually horrifying. Examples include thread cancellation, variable length arrays, and memory aliasing. GCC’s closure extension to C is another, and this little feature compromises the entire GNU toolchain.

GNU C nested functions

GCC has its own dialect of C called GNU C. One feature unique to GNU C is nested functions, which allow C programs to define functions inside other functions:

void intsort1 ( int * base , size_t nmemb ) { int cmp ( const void * a , const void * b ) { return * ( int * ) a - * ( int * ) b ; } qsort ( base , nmemb , sizeof ( * base ), cmp ); }

The nested function above is straightforward and harmless. It’s nothing groundbreaking, and it is trivial for the compiler to implement. The cmp function is really just a static function whose scope is limited to the containing function, no different than a local static variable.

With one slight variation the nested function turns into a closure. This is where things get interesting:

void intsort2 ( int * base , size_t nmemb , _Bool invert ) { int cmp ( const void * a , const void * b ) { int r = * ( int * ) a - * ( int * ) b ; return invert ? - r : r ; } qsort ( base , nmemb , sizeof ( * base ), cmp ); }

The invert variable from the outer scope is accessed from the inner scope. This has clean, proper closure semantics and works correctly just as you’d expect. It fits quite well with traditional C semantics. The closure itself is re-entrant and thread-safe. It’s automatically (read: stack) allocated, and so it’s automatically freed when the function returns, including when the stack is unwound via longjmp() . It’s a natural progression to support closures like this via nested functions. The eventual caller, qsort , doesn’t even know it’s calling a closure!

While this seems so useful and easy, its implementation has serious consequences that, in general, outweigh its benefits. In fact, in order to make this work, the whole GNU toolchain has been specially rigged!

How does it work? The function pointer, cmp , passed to qsort must somehow be associated with its lexical environment, specifically the invert variable. A static address won’t do. When I implemented closures as a toy library, I talked about the function address for each closure instance somehow needing to be unique.

GCC accomplishes this by constructing a trampoline on the stack. That trampoline has access to the local variables stored adjacent to it, also on the stack. GCC also generates a normal cmp function, like the simple nested function before, that accepts invert as an additional argument. The trampoline calls this function, passing the local variable as this additional argument.

Trampoline illustration

To illustrate this, I’ve manually implemented intsort2() below for x86-64 (System V ABI) without using GCC’s nested function extension:

int cmp ( const void * a , const void * b , _Bool invert ) { int r = * ( int * ) a - * ( int * ) b ; return invert ? - r : r ; } void intsort3 ( int * base , size_t nmemb , _Bool invert ) { unsigned long fp = ( unsigned long ) cmp ; volatile unsigned char buf [] = { // mov edx, invert 0xba , invert , 0x00 , 0x00 , 0x00 , // mov rax, cmp 0x48 , 0xb8 , fp >> 0 , fp >> 8 , fp >> 16 , fp >> 24 , fp >> 32 , fp >> 40 , fp >> 48 , fp >> 56 , // jmp rax 0xff , 0xe0 }; int ( * trampoline )( const void * , const void * ) = ( void * ) buf ; qsort ( base , nmemb , sizeof ( * base ), trampoline ); }

Here’s a complete example you can try yourself on nearly any x86-64 unix-like system: trampoline.c. It even works with Clang. The two notable systems where stack trampolines won’t work are OpenBSD and WSL.

(Note: The volatile is necessary because C compilers rightfully do not see the contents of buf as being consumed. Execution of the contents isn’t considered.)

In case you hadn’t already caught it, there’s a catch. The linker needs to link a binary that asks the loader for an executable stack ( -z execstack ):

$ cc -std=c99 -Os -Wl,-z,execstack trampoline.c

That’s because buf contains x86 code implementing the trampoline:

mov edx , invert ; assign third argument mov rax , cmp ; store cmp address in RAX register jmp rax ; jump to cmp

(Note: The absolute jump through a 64-bit register is necessary because the trampoline on the stack and the jump target will be very far apart. Further, these days the program will likely be compiled as a Position Independent Executable (PIE), so cmp might itself have an high address rather than load into the lowest 32 bits of the address space.)

However, executable stacks were phased out ~15 years ago because it makes buffer overflows so much more dangerous! Attackers can inject and execute whatever code they like, typically shellcode. That’s why we need this unusual linker option.

You can see that the stack will be executable using our old friend, readelf :

$ readelf -l a.out ... GNU_STACK 0x00000000 0x00000000 0x00000000 0x00000000 0x00000000 RWE 0x10 ...

Note the “RWE” at the bottom right, meaning read-write-execute. This is a really bad sign in a real binary. Do any binaries installed on your system right now have an executable stack? I found one on mine. (Update: A major one was found in the comments by Walter Misar.)

When compiling the original version using a nested function there’s no need for that special linker option. That’s because GCC saw that it would need an executable stack and used this option automatically.

Or, more specifically, GCC stopped requesting a non-executable stack in the object file it produced. For the GNU Binutils linker, the default is an executable stack.

Fail open design

Since this is the default, the only way to get a non-executable stack is if every object file input to the linker explicitly declares that it does not need an executable stack. To request a non-executable stack, an object file must contain the (empty) section .note.GNU-stack . If even a single object file fails to do this, then the final program gets an executable stack.

Not only does one contaminated object file infect the binary, everything dynamically linked with it also gets an executable stack. Entire processes are infected! This occurs even via dlopen() , where the stack is dynamically made executable to accomodate the new shared object.

I’ve been bit myself. In Baking Data with Serialization I did it completely by accident, and I didn’t notice my mistake until three years later. The GNU linker outputs object files without the special note by default even though the object file only contains data.

$ echo hello world >hello.txt $ ld -r -b binary -o hello.o hello.txt $ readelf -S hello.o | grep GNU-stack $

This is fixed with -z noexecstack :

$ ld -r -b binary -z noexecstack -o hello.o hello.txt $ readelf -S hello.o | grep GNU-stack [ 2] .note.GNU-stack PROGBITS 00000000 0000004c $

This may happen any time you link object files not produced by GCC, such as output from the NASM assembler or hand-crafted object files.

Nested C closures are super slick, but they’re just not worth the risk of an executable stack, and they’re certainly not worth an entire toolchain being fail open about it.

Update: A rebuttal. My short response is that the issue discussed in my article isn’t really about C the language but rather about an egregious issue with one particular toolchain. The problem doesn’t even arise if you use only C, but instead when linking in object files specifically not derived from C code.