The Fedora Engineering Steering Committee maintains a conservative list of packages that must be built using security features of GCC. Packages not on this list have these security features enabled at the packagers' descretion. There is not currently a consensus in the community as to when security hardened binaries are necessary. As a result the use of security hardened binaries can be a controversial topic. Most arguments can be reduced to whether the security benefit outweighs the performance overhead involved in using the feature.

Position Independent Executables (PIE) are an output of the hardened package build process. A PIE binary and all of its dependencies are loaded into random locations within virtual memory each time the application is executed. This makes Return Oriented Programming (ROP) attacks much more difficult to execute reliably. These blog posts are designed to showcase the results of a study I did recently which looked at the effect of building applications using PIE. In the study I investigated the overhead incurred in the loader during program startup with the aim to help distributions make better security decisions based on a technical analysis. The focus on program startup was chiefly to examine the place where PIE has the largest performance impact. The performance post process execution is largely comparable to standard Dynamic Shared Objects (DSOs) on x86_64 machines depending on how well the program and shared libraries have been designed. As this is a security blog I am biased towards functionality that increases security. However, in the tests that I performed, the start time of a PIE application and a regular application were comparable.

One of the more interesting things for me personally whilst doing this work was looking at how compiling with PIE enabled affects the resultant binary. Consider the following "Hello World" program:

#include "not/stdio.h" char message[] = "Hello World"; int main(int argc, char *argv[], char *envp[]) { puts(message); return 0; }

To reduce other influences, I used my own implementation of the standard library functions during compilation:

$ cc -nostdlib -nodefaultlibs -I. -o static-example os/syscall.x86_64.s os/start.x86_64.s not/strlen.c not/puts.c main.c $ size --format=sysv static-example static-example : section size addr .text 420 4194536 .rodata 2 4194956 .eh_frame 280 4194960 .data 12 6292392 .comment 44 0 Total 758

The ELF binary that is produced by this build has no dependencies on libc or the loader in order to run. This means that it can be loaded into memory and run without depending on the linker to find and bind dynamically with dependencies. This makes sharing and reusing routines difficult, however. The common solution to this problem is to create a shared library:

$ cc -fpic -shared -I. -nostdlib -nodefaultlibs -o libnotc.so os/syscall.x86_64.s os/syscall.c not/strlen.c not/puts.c

The next step is to recompile the main binary indicating that some symbol definitions exist within an external shared library:

$ cc -nostdlib -nodefaultlibs -I. -o dynamic-example os/start.x86_64.s main.c -L. -lnotc

The size of the resultant binary has a smaller .text section as that code is contained within the shared library libnotc.so. There are some other significant differences:

$ size --format=sysv dynamic-example dynamic-example : section size addr .interp 28 4194816 .note.gnu.build-id 36 4194844 .gnu.hash 48 4194880 .dynsym 144 4194928 .dynstr 46 4195072 .rela.plt 48 4195120 .plt 48 4195168 .text 56 4195216 .eh_frame_hdr 28 4195272 .eh_frame 96 4195304 .dynamic 272 6292552 .got.plt 40 6292824 .data 12 6292864 .comment 44 0 Total 946

In order for the program to execute correctly the ELF binary needs to be constructed in such a way that it allows the loader to resolve symbols at runtime. As the address of the symbol in memory is not a part of the main binary the loader adds a level of indirection in the procedure linkage table (the .plt section). Instead of calling puts() directly, the .plt section contains a special entry that points to the loader. The loader then has to resolve the actual address of the function. Once it has done that it updates an entry in the Global Offset Table (GOT). Subsequent calls to the same routine are made by jumps from the GOT entry.

A standard ELF binary is typically loaded into the the same base address in virtual memory each time it is executed. The linker takes advantage of this in non-relocatable code by jumping to absolute addresses of symbols. This turns out to have a slight performance benefit as it is quicker to jump to an absolute address than using relative addressing. This is especially true for i386 applications as another register is required for this process.

To see the difference between the dynamic and PIE applications we need to recompile the example program as a PIE. This simply requires the addition of the -fpic -pie flags to what we had previously:

$ cc -fpic -pie -nostdlib -nodefaultlibs -I. -o pie-example os/start.x86_64.s main.c -L. -lnotc $ size --format=sysv pie-example pie-example : section size addr .interp 28 512 .note.gnu.build-id 36 540 .gnu.hash 52 576 .dynsym 192 632 .dynstr 54 824 .rela.dyn 24 880 .rela.plt 48 904 .plt 48 960 .text 61 1008 .eh_frame_hdr 28 1072 .eh_frame 96 1104 .dynamic 320 2098352 .got 8 2098672 .got.plt 40 2098680 .data 12 2098720 .comment 44 0 Total 1091

Note that the address listed by the size command for each of the ELF sections is a relative address, whilst the address listed for the dynamic-example uses an absolute location. This is necessary because the program and all of its dependencies will be loaded into random locations in virtual memory upon execution. This is inclusive of prelinked libraries, and as such serves as an effective exploit mitigation technology for attacks that rely on returning to known addresses of standard system libraries. The overhead that is incurred by this defense mechanism and ways in which the number of relative relocations can be reduced will be covered in the next post of this series.