It is obviously very bad if everybody used the same password. It is also bad when one person reuses the same password across sites. Luckily, we have password managers that can generate unique, random passwords for each site. It is no less of a problem when everybody runs the same software for the following reason: When a software flaw is found, we want every installation of the program to be different enough that the same malicious input cannot compromise multiple users (but not so different that users notice). You don’t use the same password as anyone else; why should you run an identical copy of the software that everybody else is running?

The Multicompiler

Compilers typically optimize for speed and size. We built a compiler that instead optimizes for security. We call it the multicompiler because it builds many functionally equivalent but internally different software variants, so that an exploit only works against one specific variant. The multicompiler is based on LLVM and has been tested on lots of non-trivial programs such as Firefox, Apache, Python, and even LLVM itself.

The multicompiler is available for download from GitHub right now. If you’d like to see how to use it on real software, skip ahead to Hands On With The Multicompiler.

The work that we’re describing today is part of DARPAs Cyber Fault-Tolerant Attack Recovery (CFAR) program. Immunant is working alongside Galois, the University of California, Irvine, and Trail of Bits as a team on CFAR. We’ll explain how our work fits into the broader CFAR effort in this section. Since we a part of a larger team, we also encourage you to also read the Galois and Trail of Bits companion posts here and here.

Security through Diversity

Before computers were networked, software was distributed on physical media such as floppies and optical disks. CD-ROMs, for instance, are cloned from a golden master copy. Even when software is distributed online rather than on physical media, every user still gets an identical program copy, which is efficient but also raises a few concerns. Chief among those is the fact that vulnerable programs that are mass distributed are susceptible to exploitation on an equally massive scale. Adversaries can develop an exploit payload once and use it to exploit every identical copy.

Nature’s solution to the problem is bio-diversity. A single plant might succumb to a pathogen but the entire species survives thanks to genetic variance. We don’t accept monocultures in our crops (and when we do, disaster ensues). Why then should we continue to accept monocultures in software?

Introducing program variance via techniques like ASLR was a start and shows the approach works. Now that we predominantly install new software over the internet, it becomes entirely feasible to give each user his or her own distinct copy of a program. The multicompiler allows us to create a diverse population of functionally identical programs that are resilient to mass exploitation.

Artificial Software Diversity

Typically, a compiler optimizes the output code for speed or size, and outputs different binaries by coincidence. For example, a compiler can output different binary code from the same source code simply by adjusting the optimization level from aggressive ( -O2 ) to none ( -O0 ). However, this method isn’t practical because it degrades performance and only allows us to create a modest number of different binaries.

The multicompiler optimizes for security by introducing randomness into the compilation process. It can generate an unlimited number of program variants by randomizing decisions made during compilation. Example compilation decisions that can be easily randomized include the order in which functions are laid out in memory, and the allocation of program variables into registers. To ensure the decisions are random but repeatable, the random number generator (RNG) is seeded via a special compiler flag. Random program binaries make it harder to for an adversary to take control of the program execution. In academic circles, this is known as artificial software diversity. It is, of course, still possible to examine a single randomized binary to construct an exploit against it. However, the larger population will remain unaffected. With the right kind of randomization, an input that compromises variant 42 of a binary no longer compromises any other variant.

Randomize all the things!

From a deployment perspective, software randomization has been a spectacular success insofar that virtually any recent operating system and compiler supports Address Space Layout Randomization (ASLR). ASLR works by randomizing the base address of the stack, heap, and executable ( .text ) memory segments. While ASLR raises the difficulty of exploiting a program, it does not stop a skilled and determined adversary.

The multicompiler is a testbed and showcase for randomization techniques that go above and beyond the basic randomization provided by ASLR. As such, the multicompiler prioritizes security over deployability; ASLR does the exact opposite. In other words, the approaches complement each other and should be deployed together for maximum security benefit.

The multicompiler, which has been in development since 2010, is a set of patches and additions to the excellent clang/LLVM compiler framework. The multicompiler is well tested and works on real-world software. We have used the multicompiler to build a large number of open source packages including Firefox, Chromium, and all the packages in Linux From Scratch. In fact, we will shortly show how to diversify CPython, the official Python interpreter, with the multicompiler.

Currently, the multicompiler supports the following transformations:

Code randomization to disrupt exploits that rely on specific code addresses. Code-reuse attacks execute legitimate instructions rather than injecting new ones and so, requires knowledge of the code layout to work. Aspects that the multicompiler can randomize include the function order, the allocation of variables to CPU registers, and the instruction schedule. The multicompiler can also insert extra no-op instructions and substitute one type of instruction for another instruction having the same effect.

to disrupt exploits that rely on specific code addresses. Code-reuse attacks execute legitimate instructions rather than injecting new ones and so, requires knowledge of the code layout to work. Aspects that the multicompiler can randomize include the function order, the allocation of variables to CPU registers, and the instruction schedule. The multicompiler can also insert extra no-op instructions and substitute one type of instruction for another instruction having the same effect. Stack-layout randomization interferes with attempts to corrupt stack variables. Without diversity, local variables are packed into a stack frame with each value, including the return address, residing at a well-known offset. The multicompiler re-orders stack elements and can insert padding between stack elements or the entire stack frame. It can also randomly convert stack-allocated buffers into heap allocations and correctly deallocate them.

interferes with attempts to corrupt stack variables. Without diversity, local variables are packed into a stack frame with each value, including the return address, residing at a well-known offset. The multicompiler re-orders stack elements and can insert padding between stack elements or the entire stack frame. It can also randomly convert stack-allocated buffers into heap allocations and correctly deallocate them. Global-variable randomization interferes with attempts to overwrite global state. Normal compilers pack global variables as efficiently as possible in their own section. An overwrite past the end of one global variable can deterministically corrupt other global values. The multicompiler shuffles the order of global variables in each variant, and adds random padding to disrupt global variable overwrite attacks.

The latest version of the multicompiler was just released here https://www.github.com/securesystemslab/multicompiler

Hands-on with the Multicompiler

This part is intended to provide those of you who learn by doing a chance to try the multicompiler for yourself. We’ll be following a simplified set of build steps here; the complete process is described in the multicompiler README. Some advanced features not covered here will not be available or function correctly unless you follow the complete build process.

What we’ll cover:

preparing to build the Multicompiler checking that the system linker can support link-time optimization building the multicompiler diversifying CPython with the multicompiler testing and inspecting differences between variants

Downloading Dependencies and Sources

This section assumes you have root access to a host running Ubuntu 16.04 or later. If you’re on another Linux distribution, you’ll need to adjust the commands accordingly. If you’re on Windows or macOS, you can use docker or VirtualBox to set up a suitable Linux environment.

# install prerequisite packages on Ubuntu as root # note: the default version of cmake on Ubuntu 14.04 is too old for LLVM. apt-get install -y \ build-essential libssl-dev libxml2-dev \ libpcre3-dev binutils-dev cmake git mdm

Linker Plugin Support

$ /usr/bin/ld -plugin /usr/bin/ld: -plugin: missing argument /usr/bin/ld: use the --help option for usage information

If you see -plugin: missing argument in the output, you’re all set. Otherwise you can build your own copy of binutils with plugin support using these instructions. You’ll need to adjust the path to point to your custom binutils build in the next step.

Configuring and Making

# building multicompiler in-tree isn't recommended BUILD_DIR=$ROOT_DIR/multicompiler/build && mkdir -p $BUILD_DIR && cd $BUILD_DIR # `DLLVM_BINUTILS_INCDIR` is required for LTO support. If your host linker # was built without plugin support, point `LLVM_BINUTILS_INCDIR` to the # path of the binutils binaries you built previously. cmake .. -DLLVM_TARGETS_TO_BUILD="X86" \ -DCMAKE_INSTALL_PREFIX=$ROOT_DIR/multicompiler/install \ -DLLVM_BINUTILS_INCDIR=/usr/include \ -DCMAKE_BUILD_TYPE=Release make -j`ncpus` && make install

Downloading CPython Sources

We’re almost ready to use the multicompiler, all we need is to download the CPython source code and its build dependencies.

# dependencies apt-get install --no-install-recommends -y \ python-setuptools tcl-dev liblzma-dev libgdbm-dev tk-dev libreadline-dev # sources PY2_SRC=https://www.python.org/ftp/python/2.7.12/Python-2.7.12.tgz cd $ROOT_DIR && wget $PY2_SRC -O - | tar -zx

Diversifying CPython with the Multicompiler

Kudos for making it thus far, let’s compile our own diversified CPython binary!

There is no particular reason that we chose CPython2 other than the fact that it comes with a comprehensive test suite and has a relatively sane build system. We’re using the CC and CXX environment variables to point the build system to the multicompiler you just built. We’re also passing the -flto flag to make sure we always compile in link-time optimization mode. We’re going to compile CPython twice: once without diversity ( norando ), once with function order randomization ( fnrando ). It is usually desirable to apply multiple types of diversity to a single build; try enabling additional randomization flags. All the flags accepted by the multicompiler are described in the README.

Note that you may have to change the ways flags are passed to the compiler if you decide to compile and diversify some other software package.

# configuring and making a link-time-optimized CPython binary cd $ROOT_DIR/Python-2.7.12 CC="$ROOT_DIR/multicompiler/install/bin/clang -flto" \ CXX="$ROOT_DIR/multicompiler/install/bin/clang++ -flto" \ RANLIB="$ROOT_DIR/multicompiler/install/bin/llvm-ranlib" \ AR="$ROOT_DIR/multicompiler/install/bin/llvm-ar" \ CFLAGS="-O2 -g -frandom-seed=42" \ CXXFLAGS=$CFLAGS \ LDFLAGS="-Wl,--plugin-opt,-random-seed=42" \ ./configure --prefix=$ROOT_DIR/python_norando && make -j`ncpus` install # testing (excluding tests that can fail on Ubuntu hosts) time make TESTOPTS="-x test_gdb test_ssl" quicktest make distclean # configuring and making a link-time-diversified CPython binary # by adding `-randomize-function-list` to `LDFLAGS` CC="$ROOT_DIR/multicompiler/install/bin/clang -flto" \ CXX="$ROOT_DIR/multicompiler/install/bin/clang++ -flto" \ RANLIB="$ROOT_DIR/multicompiler/install/bin/llvm-ranlib" \ AR="$ROOT_DIR/multicompiler/install/bin/llvm-ar" \ CFLAGS="-O2 -g -frandom-seed=42" \ CXXFLAGS=$CFLAGS \ LDFLAGS="-Wl,--plugin-opt,-random-seed=42 -Wl,--plugin-opt,-randomize-function-list" \ ./configure --prefix=$ROOT_DIR/python_fnrando && make -j`ncpus` install # testing diversified python time make TESTOPTS="-x test_gdb test_ssl" quicktest

Note: we pass a random seed even for the “norando” build (where none is required) to suppress warnings.

Verifying the Effects of Function Randomization

nm --numeric-sort $ROOT_DIR/python_norando/bin/python | \ egrep "[[:xdigit:]]+\st\s\w+" | tail nm --numeric-sort $ROOT_DIR/python_fnrando/bin/python | \ egrep "[[:xdigit:]]+\st\s\w+" | tail

On the host used for testing, tail returns the following for python_norando

0000000000559160 t ast_for_call 0000000000559910 t ast_for_slice 0000000000559d40 t alias_for_import_name 000000000055a280 t ast_for_suite 000000000055a520 t ast_for_funcdef 000000000055a6c0 t ast_for_classdef 00000000007a2dd0 t __frame_dummy_init_array_entry 00000000007a2dd0 t __init_array_start 00000000007a2dd8 t __do_global_dtors_aux_fini_array_entry 00000000007a2dd8 t __init_array_end

whereas the same command for python_fnrando prints

000000000055bea0 t iter_len 000000000055bfd0 t slot_nb_inplace_and 000000000055bff0 t wrap_ternaryfunc_r 000000000055c060 t formatteriter_next 000000000055c270 t frame_get_f_exc_value 000000000055c310 t builtin_repr 00000000007a2dd0 t __frame_dummy_init_array_entry 00000000007a2dd0 t __init_array_start 00000000007a2dd8 t __do_global_dtors_aux_fini_array_entry 00000000007a2dd8 t __init_array_end

The last four functions have not been shuffled by the multicompiler. This is because these functions originate from precompiled files (e.g. crtbegin.o ) on the host and were linked in automatically. In other words, precompiled code in objects or libraries is not visible to the multicompiler. For maximal security coverage, all code must pass through the multicompiler.

Note: The way we’ve built CPython is simple but not the most secure. To show what arguments were passed to the CPython configure script on your host, run the following commands in a python REPL:

import distutils.sysconfig print(distutils.sysconfig.get_config_var('CONFIG_ARGS'))

Example output on Ubuntu 16.04:

'--enable-shared' '--prefix=/usr' '--enable-ipv6' '--enable-unicode=ucs4' '--with-dbmliborder=bdb:gdbm' '--with-system-expat' '--with-computed-gotos' '--with-system-ffi' '--with-fpectl' 'CC=x86_64-linux-gnu-gcc' 'CFLAGS=-Wdate-time -D_FORTIFY_SOURCE=2 -g -fstack-protector-strong -Wformat -Werror=format-security ' 'LDFLAGS=-Wl,-Bsymbolic-functions -Wl,-z,relro'

Performance Impact of Randomization

Now that we’ve checked that functions are getting shuffled around as expected, lets see what kind of performance impact this transformation is having. In general, the overheads you’ll see depends on what transformations were enabled and the workload itself. Function shuffling tends to have very low overhead (e.g., less than 1% on the SPEC CINT2006 benchmarks) which is not all that surprising; the layout of functions shouldn’t really matter given the large number of caching mechanisms used in modern processors.

On an unloaded Ubuntu 16.04 host, the norando and fnrando runs of the testsuite finished in 4m48s and 4m51s respectively. Re-running the testsuite shows that much of the difference we observe is due to measurement noise. The performance overhead added by randomization depends on what is being randomized as well as the underlying program and workload. Function layout randomization tends to add non-negligible overheads whereas others can add 5 to 10% in cases to CPU-bound workloads.

Cyber Fault-Tolerant Attack Recovery

Since mid-2015, we’ve been working to test and enhance the multicompiler as part of a DARPAs Cyber Fault-Tolerant Attack Recovery (CFAR) program. While the multicompiler already produces binaries that are more resilient thanks to randomization, running two or more randomized programs (variants) alongside each other provides additional security. According to the CFAR program goals:

Fault-tolerant architectures run multiple subsystems in parallel and constantly cross-check results to rapidly detect, isolate and mitigate faults, which manifest as differences across the subsystems. Adapting fault-tolerant systems to run multiple variants of a vulnerable software system in parallel presents the opportunity to immediately detect and interdict cyber-attacks before they gain a foothold.

This description is a bit terse, so let’s unpack it.

To run multiple program variants in parallel, we use a type of monitoring software (called a multi-variant execution environment, MVEE) to make sure that all variants receive the same inputs and that all variants generate the same output. If the MVEE sees that the variants starts to diverge in their behavior, it shuts them down since the divergence might be a symptom of compromise.

Thanks to variant monitoring, MVEEs stop low-level exploits and contain the effects of a compromised program variant. An exploit can either compromise a single variant, which will cause its execution to diverge from other executing variants in which case the MVEE will terminate execution. Alternatively, an exploit can compromise all program variants and avoid any observable differences. Doing the latter is very difficult-especially when the diversified variants are substantially different. For instance, a regular code-reuse exploit becomes impossible if the address ranges for the .text sections of two program variants are disjoint. You can read much more about this idea in this paper Cloning your Gadgets: Complete ROP Attack Immunity with Multi-Variant Execution written by a fellow CFAR researcher at UCI Stijn Volckaert.

CFAR, like other DARPA programs, involves several teams. Immunant is part of a team led by Galois, Inc. that also includes UC Irvine and Trail of Bits (ToB). ToB provides a binary lifting tool, McSema, which converts binary programs to LLVM’s intermediate representation, bitcode. The bitcode recovered by McSema is not identical to the bitcode one would get by parsing the program source code but is rich enough that it can be fed into the multicompiler and diversified. Immunant works alongside UCI to enhance the multicompiler, and Galois integrates, tests, and carefully analyses the products of all these tools and libraries. The output of this process is a variant set in CFAR parlance. The variant set is a set of diversified program variants randomized using a set of options chosen to minimize the probability that a single exploit can compromise each individual variant when run in an MVEE. In many instances, Galois is able to provide formal guarantees that a given exploit cannot be constructed against the set. This is a great example of how redundancy lets us construct a whole that is more secure and resilient than any of its constituent parts.

Want to learn more?

There’s much more to be said about software randomization. Here’s a few suggested papers if you want to know more.

Software Diversity

Large-scale Automated Software Diversity-Program Evolution Redux provides an in-depth description of the multicompiler and evaluates several of its most mature transformations.

SoK: Automated Software Diversity gives a survey of all the different kinds of randomization researchers have proposed and tried out until circa 2014.

Selfrando: Securing the Tor Browser against De-anonymization Exploits describes a load-time code randomizer that we built. Selfrando is focused on deployability and, unlike the multicompiler, works with existing compilers and linkers out of the box. In fact, it is practical enough that it is being tested in nightly builds of the Tor Browser for Linux. Selfrando is free and open source. Grab a copy from https://github.com/immunant/selfrando and try it out for yourself.

Code Randomization: Haven’t We Solved This Problem Yet? addresses the problem of randomizing shared libraries without interfering with memory sharing for code pages. Most academic work overlooked the fact that nobody is going to deploy diversity techniques if they cause working sets to spike.

Readactor: Practical Code Randomization Resilient to Memory Disclosure addresses the problem of keeping the randomized code layout secret. Randomization is no good if the adversary can use clever tricks (e.g. JIT-ROP) to dynamically read the code after randomization. Readactor explores execute-only memory for modern x86 systems and also prevents pointers into code from indirectly revealing the code layout.

Multi-variant Execution Environments

Cloning your Gadgets: Complete ROP Attack Immunity with Multi-Variant Execution explores the idea of loading the code pages of variants into disjoint address ranges.

ReMon ATC'2016 Paper describes a fast-yet-secure MVEE design, ReMon, which is faster because part of the monitoring is done inside the process that runs the program variants.

ReMon MVEE GitHub repository ReMon is open source and actively maintained during the CFAR program. Download it and try it out for yourself.

Conclusions

Artificial software diversity can improve application security by preventing adversaries from making assumptions about the code and data layout and other offensively useful features of the victim program. Randomization does not make binaries impervious to exploitation on their own but complements and pairs well with other low-level exploit mitigations such as control-flow integrity.

Where even higher levels of resilience are sought, diversified programs can run side by side in an MVEE and continually monitored for signs that the system is under attack. Galois has a detailed writeup on their blog and covers important caveats in this followup post.

Although we’ve focused on compiling from source, it is also possible to randomize source-less binaries. Trail of Bits explains how to diversify binaries with the multicompiler in their excellent companion post.

About Immunant

Immunant is a spin-off from UC Irvine and specializes in compiler-based exploit mitigation and language migration. We’re located in Southern California. If you’re interested in practical systems security, we’d love hear from you; reach us at [email protected] , @immunant , or via our contact form.

Besides the multicompiler, which is the focus of this post, we’ve also built a slightly different randomizer called selfrando. Selfrando is included in the Experimental Tor Browser for Linux. We hope to help shield Tor users from certain kinds of zero-day attacks and monitoring. One of the advantages of selfrando over the multicompiler is that it works with your existing compiler and linker on Windows, Android, and Linux.

Acknowledgements and Disclaimer

This material is based upon work supported by the United States Air Force and DARPA under Contract No. FA8750–15-C–0124.

We thank everyone who directly helped develop the multicompiler and all the people who provided helpful feedback. The views, opinions and/or findings expressed are those of the author and should not be interpreted as representing the official views or policies of the Department of Defense or the U.S. Government.

Distribution Statement “A” (Approved for Public Release, Distribution Unlimited).