By Nuno Lopes, Researcher, Microsoft Research Cambridge

Think compilers cannot compromise the security of your application? Think twice!

Compiler writers work around the clock to continuously deliver better compilers. They are driven by the ever-increasing importance of:

Increasing performance (everyone wants their code to run faster!);

Reducing code size (so that your app can fit on people’s phones);

Reducing energy consumption (it’s nice to not drain phone batteries and to save power in energy-hungry datacenters).

Spotlight: Academic programs Working with the academic community Read more about grants, fellowships, events and other ways to connect with Microsoft research. Read more

Compilers are big: most major compilers consist of several million lines of code. Their development is not stale either: every year, each compiler sees thousands of changes. Their sheer size and complexity, plus the pressure to continuously improve compilers, results in bugs slipping through. These compiler bugs may in turn introduce security vulnerabilities into your program.

Let’s look at a key optimization that removes bounds checks. These are commonly used by memory-safe languages (such as C#, Rust, Swift, etc.) or manually in languages such as C++ to ensure that programs cannot access a memory region that they shouldn’t (to avoid security vulnerabilities). For example, a memory load could look like the following:

while (...) { if (!in_bounds(buffer, idx)) trap(); x = buffer[idx]; }

Compilers will try very hard to eliminate the bounds check or hoist it out of the loop. Compilers can only do these transformations when they can prove it is safe to do so (e.g., if the check always succeeds). However, a buggy compiler optimization may decide to replace the in_bounds check with true because it mistakenly proved that it wasn’t possible for it to be false, essentially removing the bounds check. This may happen because of a bug in one of the hundreds of thousands of lines of code used to prove that the check wasn’t necessary.

Attacks based on compiler bugs are not yet commonplace, but are not science fiction either. It is possible to take advantage of compiler bugs to, for example, elevate user privileges, bypass authentication mechanisms, or steal information. The existence of these attacks has been known at least since 1974, when the Multics security evaluation report mentioned the possibility of injection of security vulnerabilities by compromised or simply buggy compilers. More recently, researchers managed to compromise a machine running Ubuntu and escalate privileges using a publicly known bug in LLVM.

Ensuring that compilers are correct is therefore critical to both the correctness and security of your software. Compiler bugs are hard to detect, yet a single bug can introduce a security vulnerability in your program, or make it compute the wrong result.

Together with academic and industrial partners, Microsoft researchers are working to ensure that compilers are correct. We are approaching the problem from two angles: automatically verifying that new optimizations are correct, and automatically verifying that the output of compilers is correct.

To verify the correctness of optimizations, we have developed Alive. It consists of a DSL to specify peephole optimizations and a tool to automatically prove that an optimization is correct, or else provide a counterexample showing why the optimization is wrong. You can try Alive online.

Alive verifies that an optimization is correct by checking for each possible input that the code after optimization is a refinement of the code before optimization. In practice, Alive uses the Z3 SMT solver to automatically prove small theorems about the optimization that imply that it is correct.

Alive is now used by compiler developers on several compiler teams, including our own C++ compiler team, and also at other companies that develop LLVM. Alive has found dozens of bugs in LLVM and prevented many more from being introduced in LLVM and in Microsoft’s C++ compiler.

We also are working on verifying that the output of compilers is correct. Instead of verifying upfront that an optimization is valid for all input programs, we verify at compile time that the optimization behaved correctly for the particular input it was given (i.e., your program). This approach is called translation validation. Translation validation works by taking a snapshot of the intermediate representation (IR) before and after each optimization and automatically proving that the latter is a refinement of the former.

Translation validation is a powerful ally for verification: first we can support older code that may be out of reach for current automated verification techniques, and second, it provides an extra safety net.

Our translation validation project is codenamed utcTV (short for translation validation for UTC – Microsoft’s C++ compiler). It is still under development, but already has identified several bugs in development versions of the compiler that were not found using other methods. We will share more details about this project in the future.

A related line of work we are pursuing is on semantics of compiler IRs. How can you verify that something is correct if you don’t know what it really means? That’s why we’ve been studying the semantics of modern compiler IRs, and how to fix the inconsistencies we’ve discovered. More details are available in our upcoming paper that we’ll be presenting at PLDI ’17 in Barcelona later this month.

Clearly, it is important to make sure your compiler is working correctly, both for protecting the entire stack, and for reducing exploitable security vulnerabilities in applications. The tools we are working on are designed to automatically prove the correctness of parts of compilers. Microsoft will continue working with its partners to ensure there are no bugs in compilers that may compromise your application’s correctness and security.

Related

Editor’s Note: Microsoft is a Silver sponsor of this year’s PLDI Conference taking place in Barcelona, June 18-23. PLDI is the premier forum in the field of programming languages and programming systems research.