Abstract

This whitepaper is intended to introduce Intel® Memory Protection Extensions (Intel® MPX) to Linux* application developers, showing how to enable applications to use Intel MPX and how to debug problems when they arise. This document contains entirely public information.

Overview: Why do we need Intel MPX?

As long as there have been people writing computer software, there have been software bugs caused by human error. Inevitably, a portion of these bugs manifest as security issues, one of the most common of which is a buffer overflow where a program attempts to access data outside what was intended by the programmer. This paper will give the reader another weapon in their arsenal to find and fix software bugs caused by buffer overflows.

Consider the simple program below:

#include <string.h> #include <stdio.h> #include <stdlib.h> #define noinline __attribute__((noinline)) char dog[] = "dog"; char password[] = "secr3t"; noinline char dog_letter(int nr) { return dog[nr]; } int main(int argc, char **argv) { int max = sizeof(dog); int i; if (argc >= 2) max = atoi(argv[1]); for (i = 0; i < max; i++) printf("dog[%d]: '%c'

", i, dog_letter(i)); return 0; }

Running that program with a bad input (“10” is longer than “dog”) can yield unexpected results, like:

# gcc -Wall -o mpx-out-of-bounds mpx-out-of-bounds.c && ./mpx-out-of-bounds 10 dog[0]: 'd' dog[1]: 'o' dog[2]: 'g' dog[3]: '' dog[4]: 's' dog[5]: 'e' dog[6]: 'c' dog[7]: 'r' dog[8]: '3' dog[9]: 't'

There are many tools that can detect these kinds of issues, but these tools are mostly targeted for developer use, and very few can be used in production.

Some of these tools include Valgrind*, AddressSanitizer*, sparse*, etc.

Intel MPX is a new capability introduced into Intel architecture, providing hardware assistance to make it feasible to include powerful bounds overflow detection in production software. It can be found in 6th Generation Intel Core™ processors, other processors based on the same microarchitecture, and future Atom processors.

Prerequisites

Intel MPX is a new technology, and it requires recent versions of a number of components. A fully updated Fedora* 23 is known to satisfy all of these requirements. Ubuntu* 15.10 meets these requirements, but has not been tested.

Component Version Required Notes Kernel >=3.19 required, 4.1 recommended. CONFIG_X86_INTEL_MPX must be enabled by when building the kernel. 4.1 contains MPX tracepoints and 32-bit binary on 64-bit kernel support. GCC >=5.0 required, >=5.2 recommended. glibc >=2.20 required. binutils >=2.24 required. Needed for objdump, ld, etc. gdb >=7.9 required, >=7.10 recommended.

Compiling and running programs with Intel MPX

Let’s take the earlier example program and run it with Intel MPX instead:

# gcc -Wall -o mpx-out-of-bounds -mmpx -fcheck-pointer-bounds mpx-out-of-bounds.c # ./mpx-out-of-bounds 10 dog[0]: 'd' dog[1]: 'o' dog[2]: 'g' dog[3]: '' Saw a #BR! status 1 at 0x317004 dog[4]: 's' Saw a #BR! status 1 at 0x317004 dog[5]: 'e' Saw a #BR! status 1 at 0x317004 dog[6]: 'c' Saw a #BR! status 1 at 0x317004 dog[7]: 'r' Saw a #BR! status 1 at 0x317004 dog[8]: '3' Saw a #BR! status 1 at 0x317004 dog[9]: 't'

First, notice that adding Intel MPX support does not require any changes to the application source code. All that we have to add are some compiler flags.

When we run the program, as soon as we overrun “dog”, the Intel MPX error handling kicks in and immediately detects the bad access.

We can understand how it does this by taking a look at the raw instructions that the compiler generates. The version of dog_letter() without Intel MPX is quite compact. It’s essentially a single instruction:

00000000004006e0 <dog_letter>: 4006e0: 48 63 ff movslq %edi,%rdi 4006e3: 0f b6 87 43 10 60 00 movzbl 0x601043(%rdi),%eax 4006ea: c3 retq

Now, examine how Intel MPX changes the generated instructions by adding several new instructions that start with bnd . The compiler automatically inserts these instructions when it sees the -fcheck-pointer-bounds -mmpx arguments.

0000000000400750 <dog_letter>: 400750: 66 0f 1a 05 f8 08 20 bndmov 0x2008f8(%rip),%bnd0 # >__chkp_bounds_of_dog> 400757: 00 400758: 48 63 ff movslq %edi,%rdi 40075b: 48 8d 87 67 10 60 00 lea 0x601067(%rdi),%rax 400762: f3 0f 1a 00 bndcl (%rax),%bnd0 400766: 66 0f 1a 0d e2 08 20 bndmov 0x2008e2(%rip),%bnd1 # >__chkp_bounds_of_dog> 40076d: 00 40076e: f2 0f 1a 08 bndcu (%rax),%bnd1 400772: 0f b6 87 67 10 60 00 movzbl 0x601067(%rdi),%eax 400779: f2 c3 bnd retq

The compiler also generates the data at 0x2008f8(%rip) (referenced from location 400750 above), which the compiler helpfully calls __chkp_bounds_of_dog . If we were to go look at that location in memory as the program runs, we would see something that looks like a pair of pointers that contain &dog[0] and &dog[3] . Those happen to be the upper and lower bounds of valid values for the dog[] string.

Intel MPX instruction key (to help read the assembly) bndmov : Fetch the bounds information (upper and lower) out of memory and put it in a bounds register.

bndcl : Check the lower bounds against an argument, ( %rax ) in this case.

bndcu : Check the upper bounds against an argument, ( %rax ) in this case.

bnd retq : Not a “true” Intel MPX instruction. The bnd here is a prefix to a normal retq instruction. It just lets the processor know that this is Intel MPX-instrumented code.

In this case, we asked for a character that was after dog[3] , so we expect the bndcu instruction to cause an exception. Let’s see how this looks when our program actually runs.

Debugging an Intel MPX exception with GDB

So now we have a program that has been compiled with Intel MPX support and hits an exception. If we run the program under GDB, we can find out what state the program was in at the time of the exception.

# gdb --args ./mpx-out-of-bounds 10 (gdb) run ... dog[0]: 'd' dog[1]: 'o' dog[2]: 'g' dog[3]: '' Program received signal SIGSEGV, Segmentation fault. 0x0000000000400863 in dog_letter () (gdb) disassemble $rip,+1 Dump of assembler code from 0x400863 to 0x400864: => 0x0000000000400863 <dog_letter+56>: bndcu (%rax),%bnd2

The exception happened while executing a bndcu instruction. Let’s take a closer look at the registers bndcu is referencing:

(gdb) info registers rax bnd2 rax 0x601054 6295636 bnd2 {lbound = 0x601050, ubound = 0x601053} {lbound = 0x601050 <dog>, ubound = 0x601053 <dog+3>} (gdb) print &dog $1 = (<data variable, no debug info> *) 0x601050 <dog>

It’s pretty easy to see from GDB what happened here. bndcu checked rax to see if it was below dog+3 . It wasn’t, so we ended up with an exception.

0000000000400750 <dog_letter>: 400758: 48 63 ff movslq %edi,%rdi 40075b: 48 8d 87 67 10 60 00 lea 0x601067(%rdi),%rax 400772: 0f b6 87 67 10 60 00 movzbl 0x601067(%rdi),%eax 400779: f2 c3 bnd retq

Using Intel MPX in real programs

The example application is just a toy. Actual applications use more complicated build systems than just running gcc directly. In many cases, you can pass arguments to make to enable Intel MPX for the program as shown here:

make CFLAGS="-mmpx -fcheck-pointer-bounds -lmpx" LDFLAGS="-lmpxwrappers -lmpx"

You might also edit the Makefile for your application and modify CFLAGS/LDFLAGS there instead of from the command line. No matter what build system you are using, the important thing to remember is that you need to modify both the compiler (CFLAGS) and linker (LDFLAGS) arguments.

Another approach is to use the RPM build system. Issuing these commands sets up your build environment to always compile RPMs with support for Intel MPX:

rpm --eval "%%__global_cflags %__global_cflags -mmpx -fcheck-pointer-bounds -lmpx" >> ~/.rpmmacros rpm --eval "%%__global_ldflags %__global_ldflags -lmpxwrappers -lmpx" >> ~/.rpmmacros

Now we can pick a package on the system (in this case it is redis, an in-memory database), download the source code, and build the package. Your specific versions of redis might differ slightly. There will be a lot of output, but make sure to look for the -mmpx -fcheck-pointer-bounds -lmpx options, which we need to enable Intel MPX support:

# yumdownloader --source redis # rpm -i redis-3.0.4-1.fc23.src.rpm # rpmbuild -bc ~/rpmbuild/SPECS/redis.spec ... cc -std=c99 -pedantic -Wall -W -O2 -g -ggdb -O2 -g -pipe -Wall -Werror=format-security -Wp,-D_FORTIFY_SOURCE=2 -fexceptions -fstack-protector-strong --param=ssp-buffer-size=4 -grecord-gcc-switches -specs=/usr/lib/rpm/redhat/redhat-hardened-cc1 -mmpx -fcheck-pointer-bounds -lmpx -m64 -mtune=generic -I../deps/hiredis -I../deps/linenoise -I../deps/lua/src -DUSE_JEMALLOC -DJEMALLOC_NO_DEMANGLE -I/usr/include/jemalloc -c rand.c … # cd ~/rpmbuild/BUILD/redis-3.0.4

We now have a version of redis compiled with Intel MPX support that we can actually run and use. We can compile the vast majority of packages in a distribution like Fedora using this technique.

Why did we choose redis as the package to compile? Because the current version in Fedora (as of this writing) contains a buffer overflow flaw. If we compile redis without MPX, we get a crash from a bad pointer reference. If we compile it with MPX and use the technique described in the link, we see a bounds exception before the bad pointer reference actually happens and before any damage can be done.

By now, the reader should have an actual application compiled with Intel MPX, so we should discuss some of the real-world compromises and tradeoffs that come from using Intel MPX.

Bounds tables

Volume 1 of the Intel Software Developer’s Manual contains an exhaustive definition and explanation of the Intel MPX bounds tables.

Each logical processor has a limited number of Intel MPX bounds registers (four in the current architecture). For any program that has more objects than fit into these registers, the bounds must be kept elsewhere and shuffled in and out of the registers. This is a very similar problem faced by non-Intel MPX programs that need to shuffle values between memory and registers.

Intel MPX provides hardware data structures called bounds tables to ease this process. Bounds tables are a two-level radix tree, indexed by the virtual (linear) address of the pointer for which you want to load/store the bounds. For each pointer, the tables each contain a bounds table entry with four pointer-sized components:

Lower bound

Upper bound

Check pointer value

Unused (reserved) space

The BNDLDX/BNDSTX instructions essentially take a pointer value and move the bounds information between a bounds register and the bounds tables.

Software developers need to be aware of a few key implications of this: Bounds tables can consume and reference a lot of memory.

Memory consumption

A one-page (4 KB) data structure entirely filled with pointers will consume four pages (16 KB) of bounds tables because each bounds table entry contains four pointers’ worth of data. In the worst case, bounds tables can cause an application to consume 500% more memory compared to if the application was not using MPX. In the above example, the redis-server process goes from having 27 MB of memory resident without Intel MPX to 102 MB with MPX.

Memory references and performance impact

Each time a pointer’s bounds information is loaded to/from the bounds tables, the processor has to read some memory out of the process’s virtual (linear) address space. This is like any other memory reference and has a cost in terms of memory bandwidth and pressure on TLBs and processor caches. Careful performance evaluation should be performed on performance-sensitive applications before deploying them with Intel MPX.

Verifying that Intel MPX is working

Intel MPX is intended to be largely transparent to developers and running applications. That can make it difficult to determine if Intel MPX is present and functional, or if it is disabled.

System-level checks

Check for CPU/kernel support

Intel MPX can be found in sixth-generation Intel Core processors as well as other processors with Skylake and Broxton microarchitectures. To check if the processor and the kernel both have Intel MPX support, look for mpx in the /proc filesystem’s cpuinfo file:

# grep mpx /proc/cpuinfo | tail -1 ... flags : fpu vme de pse tsc msr pae … flexpriority ept vpid fsgsbase tsc_adjust bmi1 avx2 smep bmi2 erms invpcid mpx rdseed adx smap clflushopt xsaveopt xsavec xgetbv1

If mpx is not present, confirm that your processor and kernel both support Intel MPX.

You can also check the kernel’s configuration file. The method depends on your distribution, but this approach generally works. You might be able to check /proc/config.gz in some cases. This example shows a kernel with support for Intel MPX:

$ grep INTEL_MPX /boot/config-`uname -r` CONFIG_X86_INTEL_MPX=y

In contrast, the following example shows a kernel that lacks Intel MPX support:

$ grep INTEL_MPX /boot/config-3.19.0 # CONFIG_X86_INTEL_MPX is not set

Checks on non-running programs

Does the program contain Intel MPX-specific instructions?

Compile your app with Intel MPX support, as described above. Disassemble your application and filter for Intel MPX instructions by running this command: objdump -d <app name> | egrep 'bndcu|bndcl|bndmov' Look for Intel MPX instructions. This document has a partial list of instructions, and a full list can be found in the Intel SDM (Volume 1, Section 16.4, INTEL MPX INSTRUCTION SUMMARY).

If the instructions are present, great! If they are not, the special -mmpx -fcheck-pointer-bounds compiler flags are probably missing when you compile the application. This can be caused by issues in the build system, so it is important to ensure that the compiler flags are actually getting passed all the way through the build system to the compiler.

Are the Intel MPX dynamic libraries present in the executable?

Applications that use dynamic libraries need two Intel MPX support libraries to run correctly. You can check for them with the ldd command, as shown:

# ldd ./mpx-out-of-bounds linux-vdso.so.1 (0x00007ffc4f5aa000) libmpx.so.0 => /lib64/libmpx.so.0 (0x00007ff42d223000) libmpxwrappers.so.0 => /lib64/libmpxwrappers.so.0 (0x00007ff42d021000) libc.so.6 => /lib64/libc.so.6 (0x00007ff42cc5f000) libpthread.so.0 => /lib64/libpthread.so.0 (0x00007ff42ca42000) /lib64/ld-linux-x86-64.so.2 (0x0000558cb43be000)

Checks on running programs

Is the Intel MPX-enabling prctl() call being made successfully?

When an Intel MPX-enabled program starts up, the libmpx library informs the kernel that it needs help to manage the MPX bounds tables. It makes a prctl() system call, which we can look for. The easiest way to do this is with the strace(1) utility shown below, although any mechanism to trace system calls will suffice here.

# strace -e trace=prctl ./mpx-out-of-bounds prctl(PR_MPX_ENABLE_MANAGEMENT, 0, 0, 0, 0) = 0 ...

Note: PR_MPX_ENABLE_MANAGEMENT might also appear as 0x2b or 43 , depending on your version of strace.

If you see the system call, it indicates that libmpx is properly linked into your program and is being called at startup. If you do not see this call, confirm that Intel MPX is enabled in both your compiler and the linker invocations.

If the prctl() call returns something other than 0, it indicates a problem of some kind with the underlying kernel or processor support for Intel MPX.

Are Intel MPX bounds tables being allocated?

Intel MPX instructions are special in that if they are executed on a processor without Intel MPX support enabled, they are effectively ignored. This has the benefit of allowing the same application images to be deployed anywhere; you can take an application compiled for Intel MPX and put it on an older system without Intel MPX support and it will run correctly (although without the benefits of Intel MPX). But this means that even if the Intel MPX instructions are present, there is no guarantee that the application is using Intel MPX. One way to guarantee that an application is using Intel MPX is to look for bounds tables.

This is the method that the author uses most often since it usually works, is very quick, does not require any tooling (other than cat ), and is completely unintrusive to running applications.

If you are unfamiliar with the Intel MPX bounds tables, you might want to look over Overview of bounds tables first. Some programs, especially simpler programs, do not need or use bounds tables, so this is not the most precise test. But, if you program has bounds tables, you can be sure that it contains Intel MPX support and that support is being actively used.

The simplest way to search for bounds tables is to examine the target application via the /proc filesystem. This example looks for a single process called mpx-mini-test and dumps out its maps file, as seen here:

# cat /proc/`pidof mpx-mini-test`/maps 00400000-00406000 r-xp 00000000 08:14 4850896 /root/mpx-mini-test/mpx-mini-test 00605000-00606000 r--p 00005000 08:14 4850896 /root/mpx-mini-test/mpx-mini-test 00606000-00607000 rw-p 00006000 08:14 4850896 /root/mpx-mini-test/mpx-mini-test 00607000-0060b000 rw-p 00000000 00:00 0 01e27000-01ec8000 rw-p 00000000 00:00 0 [heap] 1325200000-1325300000 rw-p 00000000 00:00 0 7f4f9bbba000-7f4f9bfba000 rw-p 00000000 00:00 0 [mpx] 7f4f9cbba000-7f4f9dbba000 rw-p 00000000 00:00 0 [mpx] 7f4f9dbba000-7f4fa63ba000 rw-p 00000000 00:00 0 [mpx] 7f4fa63ba000-7f50a63bc000 rw-p 00000000 00:00 0 7f50a63bc000-7f50a6573000 r-xp 00000000 08:14 5636705 /usr/lib64/libc-2.22.so 7f50a6573000-7f50a6773000 ---p 001b7000 08:14 5636705 /usr/lib64/libc-2.22.so ...

The entries that say [mpx] are bounds tables. If you have these in your application, you can be confident that the application has Intel MPX support and is running with it enabled. If these entries are not present, it could simply mean that your program doesn’t use bounds tables but still does use Intel MPX.

For short-lived programs, it can be difficult to read the maps file. In that case, you can use the perf utility to watch for Intel MPX events produced by the kernel:

# perf stat -empx:mpx_new_bounds_table,mpx:mpx_unmap_search,mpx:mpx_unmap_zap ./mpx-mini-test Performance counter stats for './mpx-mini-test-v003': 1 mpx:mpx_new_bounds_table 1 mpx:mpx_unmap_search 0 mpx:mpx_unmap_zap 0.005542554 seconds time elapsed

The mpx:mpx_new_bounds_table event indicates that the kernel demand-allocated a bounds tables for the application. The same caveats apply here as with the maps approach above. Not all Intel MPX applications will use bounds tables, so never use this technique alone to determine if Intel MPX is enabled for an application.

What is in the BNDCFGU register?

Of all the techniques listed so far, this is the most precise. For this technique, you will need a recent version of GDB (version 7.10.1 is known to work). You can invoke GDB in any way you like, but in this example we attach it to an already-running version of mpx-mini-test . Then we dump out the bndcfgu register.

# gdb ./mpx-mini-test `pidof mpx-mini-test` ...might need to press Ctrl-c here for the prompt... (gdb) info registers bndcfgu bndcfgu {raw = 0x7f4fa63bb001, config = {base = 0x7f4fa63bb, reserved = 0x0, preserved = 0x0, enabled = 0x1}} {raw = 0x7f4fa63bb001, config = {base = 34174821307, reserved = 0, preserved = 0, enabled = 1}}

bndcfgu is short for bounds configuration register for user programs. It only exists on processors that support Intel MPX, so being able to dump it out here tells us that we are on a system that supports Intel MPX. The enabled = 0x1 indicates that the support has been enabled in the current program. A program without Intel MPX support will show a raw value of 0x0 .

The base = 0x7f4fa63bb value is also very important. It contains the address of the bounds directory (the actual table is at 0x7f4fa63bb*0x1000 or 0x7f4fa63bb000 ). You can use that value as the base address to examine the contents of the bounds tables from a debugger.

If bndcfgu contains a valid address and has the enabled bit set, it is a strong indicator that the program, kernel, and processor all have functioning Intel MPX support, and that the program is actively using Intel MPX.

Summary

As we have seen, Intel MPX adds an exciting new capability to Intel architecture which can help address an important class of bugs affecting real-world applications.