As part of my research on the security of paravirtualized devices, I reported a number of vulnerabilities to the Xen security team, which were patched today. All of them are double fetch vulnerabilities affecting the different backend components used for paravirtualized devices. While the severity and impact of these bugs varies heavily and is dependent on a lot of external factors, I would recommend patching them as soon as possible. In the rest of this blog post I’ll give a short teaser about my research with full details coming out in the first quarter of 2016 .

In order to understand the technical details of these bugs, some background information about the paravirtualized device architecture in Xen is needed: Paravirtualized guests and HVM guests that do not want to rely on slow emulated devices can use paravirtualized devices to interact with their virtual hardware. These devices are implemented using a split driver architecture: A frontend driver is running in the guest itself, whereas the backend component (by default) runs inside the management domain. Due to the privileged position of the management domain, this means that vulnerabilities in the backend driver can be used to break out of a virtual machine and compromise the whole system. Frontend and backend communicate using a ring buffer implemented on top of shared memory.

In comparison to more high level IPC mechanisms such as sockets, shared memory communication can achieve much higher performance due to the lower amount of copying and context switches required. In particular, after the shared mapping between domains is created, the hypervisor does not need access to any of the exchange data.

But this performance benefit comes with a cost: From a security view point, shared memory interfaces are much harder to secure than a message based interface. The main reason for this is a bug class called double fetch, a term coined by Fermin J. Serna in a post on the Microsoft Security and Defense blog. Double fetches are a special type of TOCTTOU bug that affects shared memory interfaces: When a single value is fetched multiple times from a shared memory page, but validation is only performed once, a race condition is created. A malicious attacker who is able to tamper with the fetched data between these fetches can therefore potentially bypass security checks. Until now, most published examples of double fetch vulnerabilities affect communication between user space and kernel and were discovered as part of the Bochspwn project by Mateusz “j00ru” Jurczyk and Gynvael Coldwind.

My research builds up on the ideas presented in Bochspwn and uses hypervisor-based memory access tracing to identify double fetch issues in the communication between frontend and backend components. I hope to present a full description of my approach and its results at some conferences in the next year, but for now I’ll just talk about one of the more interesting bugs that were patched today, even if the real world impact is probably quite low:

xen-pciback is the backend component used for paravirtualized PCI devices. While this means that the following vulnerability only affects systems where guest have access to a passthrough PCI device, the rising prevalence of GPU instances in the cloud makes this assumption not completely unrealistic. xen-pciback is implemented as a kernel module and executes in the management domain by default. The following code is responsible for handling the different types of message types that can be sent by the backend:

switch (op -> cmd) { case XEN_PCI_OP_conf_read: op -> err = xen_pcibk_config_read(dev, op -> offset, op -> size, & op -> value); break ; case XEN_PCI_OP_conf_write: //... case XEN_PCI_OP_enable_msi: //... case XEN_PCI_OP_disable_msi: //... case XEN_PCI_OP_enable_msix: //... case XEN_PCI_OP_disable_msix: //... default: op -> err = XEN_PCI_ERR_not_implemented; break ; }

Even knowing that the op variable is stored in shared memory, source code analysis alone does not show any double fetches issues. But this quickly changes when looking at the assembly code generated by gcc for this code snippet:

1 2 3 4 5 cmp DWORD PTR [ r13 + 0x4 ], 0x5 mov DWORD PTR [ rbp - 0x4c ], eax ja 0x3358 < xen_pcibk_do_op + 952 > mov eax , DWORD PTR [ r13 + 0x4 ] jmp QWORD PTR [ rax * 8 + off_77D0 ]

As its often the case with switch statements, the compiler generated a jump table to dynamically jump to the correct branch. The r13 register points to the shared memory region and r13+0x4 corresponds to the op->cmd value used in the switch statement. In the first two lines the value at r13+0x4 is compared to the upper limit 5 and the default branch is used if it is higher than that. In line 4, r13+0x4 is fetched a second time and used as index into the jump table.

This means that an attacker who is able to manipulate the data stored at address r13+0x4 in the time between the two fetches, can influence the destination of the final jmp instruction and potentially achieve arbitrary code execution. Even though the race window is very narrow (only two instructions) on multi core systems it can be won in less than 2 minutes.

In summary, a simple switch statement operating on shared memory is compiled into a vulnerable double fetch that allows potentially arbitrary code execution on the Xen management domain — talk about well hidden bugs 🙂

– Felix