Our upcoming Oakland paper was released onto the internet recently, despite the publication date actually being in May when the conference is held (the IEEE Symposium on Security and Privacy, to give its official name). So now seemed like a good time to talk about some of the security work we've been doing, in particular our research into schemes for temporal memory safety.

When we talk about memory safety, issues fall into two categories. The first, and most well-known, even if not by name, is spatial safety. This includes buffer-overflow vulnerabilities, such as the one that caused the infamous heartbleed bug. Here, during a TLS session, a missing check on the amount of data requested by the other party caused the program to read beyond the bounds of an array when more data was requested than provided, leading to the leaking of private information. Many other attacks have exploited similar bugs in all kinds of program.

The problem arises because applications written in languages that allow pointer manipulation (e.g. C, C++) generally require the programmer to ensure that pointers stay within the bounds of allocated objects. The programmer should insert the relevant checks to catch situations where this is violated (i.e. bugs), but all too often they are omitted in the interests of speed, especially where the developer mistakenly reasons that they are not required. Obviously one solution is to use higher-level languages that don't allow pointers to arbitrary memory and automatically insert checks on, say, array accesses to catch errors. But the lower-level systems languages remain popular, with regular updates to C++, for example, not to mention the wealth of legacy code in the wild and still in development.

However, there have been recent strides in this area. In particular, local research has developed CHERI, a processor that has first-class support for capabilities in hardware for loading and storing to memory rather than using generic integers. A capability in this scenario is an unforgeable pointer that contains the base address of a virtual-memory region, its size, a current address and some flags. The program has to use a capability to access memory (including for fetching instructions) from the current address of the capability. The processor checks that this address (in fact, all bytes that will be accessed) are within the bounds of the capability and raises an exception if not. Capabilities can only decrease in size, so when given a capability a piece of code can only access the memory reachable through that, and can shrink a capability to restrict the memory that an untrusted piece of code can access (e.g. a library function). Capabilities in memory are tagged so that they can be distinguished from other data and only stores of valid capabilities can set the tags, thus preventing capabilities from being created ad hoc.

As can be inferred from this description, CHERI clearly prevents spatial memory-safety bugs, provided that the programmer doesn't implement their own (buggy) dynamic-memory allocator—despite all that capabilities give you, you can't always protect people from themselves. However, as yet CHERI does not help protect against the second category of memory-safety issues: temporal safety.

Temporal memory safety relates to memory locations containing different data at different times during program execution. When memory is freed by the application, then reallocated, it is important that the memory is not accessed with the thought that the original data is there. This can happen if the program holds a pointer to some memory, frees the memory but then still keeps the pointer around (a so-called dangling pointer) and tries to access through it again. The result is a use-after-free bug (or use-after-reallocate if the memory has been given to the application again). Aside from causing undefined behaviour, there is the potential to leak information or for a determined attacker to take control of the application.

Existing schemes to address temporal safety suffer from high overheads. One class of technique keeps track of all pointers that target each memory allocation. On freeing the memory, these pointers are all set to NULL, either immediately or in the background. But significant extra memory must be set aside to store a potentially large set of pointers for each allocation and performance suffers considerably from all the required bookkeeping whenever pointers are created, destroyed or altered. Another class of technique leverages the virtual-memory system by only ever using a virtual page once and allocating each memory object to its own page. However, applications with large numbers of small objects (i.e. smaller than a page) experience vastly increased TLB pressure with the result of severe performance loss.

Our take on addressing this security issue doesn’t seek to actively remove dangling pointers or deal with the problem by throwing lots of resources at it. Instead, we live with the fact that there may be dangling pointers to memory that has been manually freed by the programmer. To make this safe, we limit what can actually happen to any memory that may still have pointers to it by preventing it being recycled by the memory allocator—in other words, we prevent it being reallocated. We achieve this by placing all freed objects into quarantine and holding them there until we can be sure that there are no dangling pointers to them. In effect, we place the freed memory onto an intermediate list (the quarantine list), then take it off this and place it into the memory allocator’s structures once no dangling pointers to it exist. This two-stage freeing-before-reallocation of memory ensures that we capture the programmer’s intent (the manual free) and decouple it from the process of reallocation (giving it back to the memory allocator), which both allows us to ensure safety and reduces overheads.

The key remaining part is knowing when it is safe for memory to leave quarantine. This is obviously tricky - the programmer has freed the memory but might have left dangling pointers to it. We therefore need to look through all of memory to find out whether there are any pointers to any of the freed objects on the quarantine list. We achieve this through a garbage-collector-style marking pass: we start with roots consisting of registers, the stack and data segments and repeatedly look for plausible pointers, follow them and mark the pointed-to objects. The result is that we only consider memory that is actually reachable by the application and don't accidentally mark objects that are pointed to by memory that is itself free, and yet we don’t mistakenly delete anything the programmer has not yet freed. Having identified these reachable objects, we can remove from the quarantine list all memory allocations that are not marked (i.e. are not reachable) because, to the best of our ability, we have verified that they are truly free. Any allocations on the quarantine list that are marked must remain there because there exist dangling pointers to them, but they may get removed after a subsequent marking phase.

The diagram shown here is taken from our Oakland paper and shows a representation of quarantine compared with the stack, data segments and heap. Quarantine is simply a linked list of pointers to heap objects that have been manually freed by the programmer—in this case objects H, F, D and A. When we run a marking pass, objects that are still accessible to the program get marked (C, B, E, D, G, I, J, K), whereas those that can’t be accessed will remain unmarked (A, F, H). Unmarked objects on the quarantine list can be safely returned to the memory allocator for reallocation; marked objects must not be removed. In this example, after the marking phase, A, F and H would be returned but D would have to remain on the quarantine list. MarkUs would attempt to release it in a subsequent marking phase through the same mechanism, but D will have to stay in quarantine for as long as the dangling pointer to it persists.

We call this technique MarkUs. It also includes some optimisations to give memory back to the operating system proactively, which are described fully in the paper. It is also important to note that although MarkUs is conservative in treating data as a pointer when it looks like it might be, it can do nothing in the face of the programmer explicitly hiding pointers by, for example, XORing them with other data. However, as we argue, the vast majority of pointers within an application are not hidden and those that are will likely be carefully implemented (so unlikely to be a source of use-after-free errors), meaning that in reality MarkUs is a very practical solution to the problem.

The first two graphs here show the performance and memory requirements of MarkUs across a range of SPEC CPU2006 applications (the dashed line shows MarkUs’ 33% memory-overhead threshold for quarantined data, after which a marking pass is run). Although high for some applications (e.g. 2x slowdown on gcc), it never experiences the scale of slowdowns seen in other techniques, and average-case behaviour better than all others at just 10% slowdown. In terms of memory requirements, we again have the lowest average at 16% overhead, never exceeding 2x. We compare against four other schemes in the paper, showing some incur overheads as high as 7.5x slowdown and 135x memory increase (both on omnetpp).

Now back to CHERI. Having developed MarkUs, we wanted to experiment with similar ideas in an environment that was much more strict about how pointers could be used and manipulated. CHERI fits the bill perfectly and we teamed up with some of this group who had already been thinking about these problems and how to tackle them. Collaboration was not difficult - they work on the opposite side of our corridor! The result was CHERIvoke, published in MICRO 2019. Due to the vagaries of conference publication (i.e. MarkUs got rejected a few times), CHERIvoke appeared first, even though the research actually came afterwards.

Like MarkUs, CHERIvoke quarantines memory objects when they are manually freed by the programmer. However, it takes advantage of CHERI’s capabilities to avoid having to traverse through structures in memory to find out whether any pointers to quarantined data still exist. Instead it turns the process around, sweeping systematically through memory and revoking capabilities that point to quarantined memory. In other words, instead of keeping objects quarantined until there are no remaining pointers to it, CHERIvoke removes all pointers to quarantined memory, safely allowing everything in the quarantine list to be given back to the memory allocator after each sweep.

We evaluated CHERIvoke by reproducing its behaviour on an x86 system, which allows us to model its performance and overheads on a mature CHERI implementation. (The existing FPGA prototype is of an in-order MIPS-based processor.) Results, shown in the graphs here, show a 4.7% performance overhead and 12.5% total memory overhead, showing the benefits over MarkUs from leveraging hardware capabilities. The dotted line here shows CHERIvoke’s default 25% memory-overhead threshold for quarantine, before triggering a sweep. The only outlier is xalancbmk whose performance impact is exacerbated by increased L2 misses, but which can be partially mitigated by increasing the allowed quarantine-buffer size.

To conclude, temporal memory safety is an important issue affecting systems programmed in languages that expose pointers, and their manipulation, to the developer. But things are looking bright in the development of techniques to address this. MarkUs is a drop-in replacement for your memory allocator, and its ideas could be easily ported to your favourite allocator if necessary. Using the notion of quarantine, it prevents objects from really being freed, even though the programmer says they can be, until it has performed a memory traversal to satisfy itself that no dangling pointers exist to these quarantined allocations. CHERIvoke builds on this to provide superior performance and memory overheads in future systems containing capabilities (or other methods) to accurately identify pointers. MarkUs is freely available and we also provide a repository of data used for the CHERIvoke paper.