September 06, 2011 at 20:13 Tags Articles , Assembly , Linux

A few months ago I've written an article named Where the top of the stack is on x86, which aimed to clear some misunderstandings regarding stack usage on the x86 architecture. The article concluded with a useful diagram presenting the stack frame layout of a typical function call.

In this article I will examine the stack frame layout of the newer 64-bit version of the x86 architecture, x64 . The focus will be on Linux and other OSes following the official System V AMD64 ABI. Windows uses a somewhat different ABI, and I will mention it briefly in the end.

I have no intention of detailing the complete x64 calling convention here. For that, you will literally have to read the whole AMD64 ABI.

Registers galore x86 has just 8 general-purpose registers available (eax, ebx, ecx, edx, ebp, esp, esi, edi) . x64 extended them to 64 bits (prefix "r" instead of "e") and added another 8 (r8, r9, r10, r11, r12, r13, r14, r15) . Since some of x86's registers have special implicit meanings and aren't really used as general-purpose (most notably ebp and esp ), the effective increase is even larger than it seems. There's a reason I'm mentioning this in an article focused on stack frames. The relatively large amount of available registers influenced some important design decisions for the ABI, such as passing many arguments in registers, thus rendering the stack less useful than before .

Argument passing I'm going to simplify the discussion here on purpose and focus on integer/pointer arguments . According to the ABI, the first 6 integer or pointer arguments to a function are passed in registers. The first is placed in rdi , the second in rsi , the third in rdx , and then rcx , r8 and r9 . Only the 7th argument and onwards are passed on the stack.

The stack frame With the above in mind, let's see how the stack frame for this C function looks: long myfunc ( long a, long b, long c, long d, long e, long f, long g, long h) { long xx = a * b * c * d * e * f * g * h; long yy = a + b + c + d + e + f + g + h; long zz = utilfunc(xx, yy, xx % yy); return zz + 20 ; } This is the stack frame: So the first 6 arguments are passed via registers. But other than that, this doesn't look very different from what happens on x86 , except this strange "red zone". What is that all about?

The red zone First I'll quote the formal definition from the AMD64 ABI: The 128-byte area beyond the location pointed to by %rsp is considered to be reserved and shall not be modified by signal or interrupt handlers. Therefore, functions may use this area for temporary data that is not needed across function calls. In particular, leaf functions may use this area for their entire stack frame, rather than adjusting the stack pointer in the prologue and epilogue. This area is known as the red zone. Put simply, the red zone is an optimization. Code can assume that the 128 bytes below rsp will not be asynchronously clobbered by signals or interrupt handlers, and thus can use it for scratch data, without explicitly moving the stack pointer. The last sentence is where the optimization lays - decrementing rsp and restoring it are two instructions that can be saved when using the red zone for data. However, keep in mind that the red zone will be clobbered by function calls, so it's usually most useful in leaf functions (functions that call no other functions). Recall how myfunc in the code sample above calls another function named utilfunc . This was done on purpose, to make myfunc non-leaf and thus prevent the compiler from applying the red zone optimization. Looking at the code of utilfunc : long utilfunc ( long a, long b, long c) { long xx = a + 2 ; long yy = b + 3 ; long zz = c + 4 ; long sum = xx + yy + zz; return xx * yy * zz + sum; } This is indeed a leaf function. Let's see how its stack frame looks when compiled with gcc : Since utilfunc only has 3 arguments, calling it requires no stack usage since all the arguments fit into registers. In addition, since it's a leaf function, gcc chooses to use the red zone for all its local variables. Thus, rsp needs not be decremented (and later restored) to allocate space for this data.

Preserving the base pointer The base pointer rbp (and its predecessor ebp on x86), being a stable "anchor" to the beginning of the stack frame throughout the execution of a function, is very convenient for manual assembly coding and for debugging . However, some time ago it was noticed that compiler-generated code doesn't really need it (the compiler can easily keep track of offsets from rsp ), and the DWARF debugging format provides means (CFI) to access stack frames without the base pointer. This is why some compilers started omitting the base pointer for aggressive optimizations, thus shortening the function prologue and epilogue, and providing an additional register for general-purpose use (which, recall, is quite useful on x86 with its limited set of GPRs). gcc keeps the base pointer by default on x86, but allows the optimization with the -fomit-frame-pointer compilation flag. How recommended it is to use this flag is a debated issue - you may do some googling if this interests you. Anyhow, one other "novelty" the AMD64 ABI introduced is making the base pointer explicitly optional, stating: The conventional use of %rbp as a frame pointer for the stack frame may be avoided by using %rsp (the stack pointer) to index into the stack frame. This technique saves two instructions in the prologue and epilogue and makes one additional general-purpose register (%rbp) available. gcc adheres to this recommendation and by default omits the frame pointer on x64, when compiling with optimizations. It gives an option to preserve it by providing the -fno-omit-frame-pointer flag. For clarity's sake, the stack frames showed above were produced without omitting the frame pointer.