[prev in list] [next in list] [ prev in thread ] [next in thread] List: openbsd-tech Subject: some vulns From: Maxime Villard <max () m00nbsd ! net> Date: 2020-02-15 12:21:36 Message-ID: 7d3a1553-6633-f70c-fb66-683fda313f12 () m00nbsd ! net [Download RAW message or body] In vmm_update_pvclock(): 6868 pvclock_gpa = vcpu->vc_pvclock_system_gpa & 0xFFFFFFFFFFFFFFF0; <-- controlled by the guest 6869 if (!pmap_extract(vm->vm_map->pmap, pvclock_gpa, &pvclock_hpa)) 6870 return (EINVAL); 6871 pvclock_ti = (void*) PMAP_DIRECT_MAP(pvclock_hpa); 6872 6873 /* START next cycle (must be odd) */ 6874 pvclock_ti->ti_version = 6875 (++vcpu->vc_pvclock_version << 1) | 0x1; Three things are wrong: 1) The RO protections are not enforced, so the guest could have data be written to a GPA it can only access as RO. 2) If 'pvclock_ti' crosses a page, its second half could point to an HPA that doesn't belong to the guest. The guest can therefore, to some limited extent, overwrite host kernel memory. 3) The pmap is not locked, so if the GPA gets unmapped and its corresponding HPA recycled, there is a small window where the (new) content of the HPA can get overwritten. There is, in fact, a fourth case. Watch closely. On AMD CPUs the NPTs are a regular pmap. The higher half of the GPA space is therefore mapped to host kernel memory as KVA. Given that there is no check on PG_u here, the guest can just put a host KVA in pvclock_gpa, and have its content be overwritten. This gives write-where ability for the guest. The OpenBSD kernel does not perform full ASLR, in that the PTE space and direct map are at static addresses (contrary to eg NetBSD where everything is randomized). These addresses are known. The guest can therefore use the static address of the direct map for example to write at whatever HPA by issuing the following instruction: wrmsr(KVM_MSR_SYSTEM_TIME, PMAP_DIRECT_MAP(hpa) | 1); This means the guest can overwrite whatever host kernel memory, and can control *where* to write. I have tested this, and it works. The guest can also choose *what* to write, because it just so happens that 'vc_pvclock_version' is the number of VMEXITs that occurred with pvclock enabled, and the guest can reliably craft this value. So this is not just a write-where, this is a full guest-to-host write-what-where. Had there been proper ASLR, it still could have been somewhat bypassable, because VMD does a pass-through of RDMSR on AMD CPUs (??), which can leak HPAs such as HSAVE_PA. (Speaking about direct map, notice how an alignment bug in locore0.S causes the first 2MB of .text to be writable on Intel CPUs. So there is a static address that maps the kernel .text as writable.) There are additional assorted bugs and vulns that could be used to some degree: - On AMD CPUs the CPL check on XSETBV VMEXITs must be performed by software. VMD forgot to do that, so from guest-userland, we can control the XCR0 that guest-kernel will use. - This XSETBV issue actually has an additional ramification. Right now OpenBSD doesn't check that the guest XCR0 is a subset of the host XCR0, which means that the guest can use more FPU states than the host allows. It looks like this check was lost when fixing another bug I reported one year ago which could cause guest-to-host DoS. - The TLB handling of guest pages is broken, in that the INVEPT instructions in the host could be issued on the wrong CPUs. This means that if UVM decides to swap out a guest page, the guest could still access it via stale TLB entries. On AMD CPUs, there is no TLB handling at all (??). - vmx_load_pdptes is broken. In order to make this whole thing less of a security joke, I would suggest the following: - Fix TLB handling, sanitize the GPAs, lock the pages correctly. - Don't pass-through RDMSR. - Fix the XSETBV issues. - Provide *real* ASLR: randomize the PTE space and the direct map. - Fix the alignment bug in the direct map to not map the text as writable. Maxime [prev in list] [next in list] [ prev in thread ] [next in thread]