A nice set of recent posts have done a great job detailing the remaining ways that a root user can get at kernel memory. Part of this is driven by the ideas behind UEFI Secure Boot, but they come from the same goal: making sure that the root user cannot directly subvert the running kernel. My perspective on this is toward making sure that an attacker who has gained access and then gained root privileges can’t continue to elevate their access and install invisible kernel rootkits.

An outline for possible attack vectors is spelled out by Matthew Gerrett’s continuing “useful kernel lockdown” patch series. The set of attacks was examined by Tyler Borland in “Bypassing modules_disabled security”. His post describes each vector in detail, and he ultimately chooses MSR writing as the way to write kernel memory (and shows an example of how to re-enable module loading). One thing not mentioned is that many distros have MSR access as a module, and it’s rarely loaded. If modules_disabled is already set, an attacker won’t be able to load the MSR module to begin with. However, the other general-purpose vector, kexec, is still available. To prove out this method, Matthew wrote a proof-of-concept for changing kernel memory via kexec.

Chrome OS is several steps ahead here, since it has hibernation disabled, MSR writing disabled, kexec disabled, modules verified, root filesystem read-only and verified, kernel verified, and firmware verified. But since not all my machines are Chrome OS, I wanted to look at some additional protections against kexec on general-purpose distro kernels that have CONFIG_KEXEC enabled, especially those without UEFI Secure Boot and Matthew’s lockdown patch series.

My goal was to disable kexec without needing to rebuild my entire kernel. For future kernels, I have proposed adding /proc/sys/kernel/kexec_disabled , a partner to the existing modules_disabled , that will one-way toggle kexec off. For existing kernels, things got more ugly.

What options do I have for patching a running kernel?

First I looked back at what I’d done in the past with fixing vulnerabilities with systemtap. This ends up being a rather heavy-duty way to go about things, since you need all the distro kernel debug symbols, etc. It does work, but has a significant problem: since it uses kprobes, a root user can just turn off the probes, reverting the changes. So that’s not going to work.

Next I looked at ksplice. The original upstream has gone away, but there is still some work being done by Jiri Slaby. However, even with his updates which fixed various build problems, there were still more, even when building a 3.2 kernel (Ubuntu 12.04 LTS). So that’s out too, which is too bad, since ksplice does exactly what I want: modifies the running kernel’s functions via a module.

So, finally, I decided to just do it by hand, and wrote a friendly kernel rootkit. Instead of dealing with flipping page table permissions on the normally-unwritable kernel code memory, I borrowed from PaX’s KERNEXEC feature, and just turn off write protect checking on the CPU briefly to make the changes. The return values for functions on x86_64 are stored in RAX, so I just need to stuff the kexec_load syscall with “ mov -1, %rax; ret ” (-1 is EPERM ):

#define pr_fmt(fmt) KBUILD_MODNAME ": " fmt #include <linux/init.h> #include <linux/module.h> #include <linux/slab.h> static unsigned long long_target; static char *target; module_param_named(syscall, long_target, ulong, 0644); MODULE_PARM_DESC(syscall, "Address of syscall"); /* mov $-1, %rax; ret */ unsigned const char bytes[] = { 0x48, 0xc7, 0xc0, 0xff, 0xff, 0xff, 0xff, 0xc3 }; unsigned char *orig; /* Borrowed from PaX KERNEXEC */ static inline void disable_wp(void) { unsigned long cr0; preempt_disable(); barrier(); cr0 = read_cr0(); cr0 &= ~X86_CR0_WP; write_cr0(cr0); } static inline void enable_wp(void) { unsigned long cr0; cr0 = read_cr0(); cr0 |= X86_CR0_WP; write_cr0(cr0); barrier(); preempt_enable_no_resched(); } static int __init syscall_eperm_init(void) { int i; target = (char *)long_target; if (target == NULL) return -EINVAL; /* save original */ orig = kmalloc(sizeof(bytes), GFP_KERNEL); if (!orig) return -ENOMEM; for (i = 0; i < sizeof(bytes); i++) { orig[i] = target[i]; } pr_info("writing %lu bytes at %p

", sizeof(bytes), target); disable_wp(); for (i = 0; i < sizeof(bytes); i++) { target[i] = bytes[i]; } enable_wp(); return 0; } module_init(syscall_eperm_init); static void __exit syscall_eperm_exit(void) { int i; pr_info("restoring %lu bytes at %p

", sizeof(bytes), target); disable_wp(); for (i = 0; i < sizeof(bytes); i++) { target[i] = orig[i]; } enable_wp(); kfree(orig); } module_exit(syscall_eperm_exit); MODULE_LICENSE("GPL"); MODULE_AUTHOR("Kees Cook <kees@outflux.net>"); MODULE_DESCRIPTION("makes target syscall always return EPERM");

If I didn’t want to leave an obvious indication that the kernel had been manipulated, the module could be changed to:

not announce what it’s doing

remove the exit route to not restore the changes on module unload

error out at the end of the init function instead of staying resident

And with this in place, it’s just a matter of loading it with the address of sys_kexec_load (found via /proc/kallsyms ) before I disable module loading via modprobe. Here’s my upstart script:

# modules-disable - disable modules after rc scripts are done # description "disable loading modules" start on stopped module-init-tools and stopped rc task script cd /root/modules/syscall_eperm make clean make insmod ./syscall_eperm.ko \ syscall=0x$(egrep ' T sys_kexec_load$' /proc/kallsyms | cut -d" " -f1) modprobe disable end script

And now I’m safe from kexec before I have a kernel that contains /proc/sys/kernel/kexec_disabled .

© 2013, Kees Cook. This work is licensed under a Creative Commons Attribution-ShareAlike 4.0 License.

