Using simple seccomp filters

Introduction

The Linux kernel (starting in version 3.5) supports " seccomp filter " (or "mode 2 seccomp"). Ubuntu 12.04 LTS had it backported to its 3.2 kernel, and Chrome OS has been using it (in various forms) for a while. This document is designed as a quick-start guide for software authors that want to take advantage of this security feature. In the simplest terms, it allows a program to declare ahead of time which system calls it expects to use, so that if an attacker gains arbitrary code execution, they cannot poke at any unexpected system calls.

The full seccomp filter documentation can be found in the Linux kernel source, here . The seccomp filter system uses the Berkley Packet Filter system. Combined with argument checking and the many possible filter return values (kill, trap, trace, errno), this is allows for extensive logic. This document seeks to show only the minimal case of defining a syscall whitelist. Everything not added to this filter causes the program to be killed.

To determine which seccomp features are available at runtime, please see the seccomp autodetection examples.

Since it is not always obvious to see which syscalls are being called by the various libraries a program might use, this document also includes example code that provides a helper to assist in discovering unwhitelisted syscalls during filter development.

Example Program

/* * seccomp example with syscall reporting * * Copyright (c) 2012 The Chromium OS Authors <chromium-os-dev@chromium.org> * Authors: * Kees Cook <keescook@chromium.org> * Will Drewry <wad@chromium.org> * * Use of this source code is governed by a BSD-style license that can be * found in the LICENSE file. */ #define _GNU_SOURCE 1 #include <stdio.h> #include <stddef.h> #include <stdlib.h> #include <unistd.h> #include "config.h" int main(int argc, char *argv[]) { char buf[1024]; printf("Type stuff here: "); fflush(NULL); buf[0] = '\0'; fgets(buf, sizeof(buf), stdin); printf("You typed: %s", buf); printf("And now we fork, which should do quite the opposite ...

"); fflush(NULL); sleep(1); fork(); printf("You should not see this because I'm dead.

"); return 0; } First , we start with an example program that reads stdin, writes to stdout, sleeps, and exits. We want to make sure it never calls "fork", so we've added that to the end so we can verify that seccomp filter is working, once it gets added.

$ autoconf $ ./configure checking for gcc... gcc checking whether the C compiler works... yes checking for C compiler default output file name... a.out checking for suffix of executables... checking whether we are cross compiling... no checking for suffix of object files... o checking whether we are using the GNU C compiler... yes checking whether gcc accepts -g... yes checking for gcc option to accept ISO C89... none needed configure: creating ./config.status config.status: creating config.h $ make gcc -Wall -c -o example.o example.c gcc example.o -o example $ ./example Type stuff here: asdf You typed: asdf And now we fork, which should do quite the opposite ... You should not see this because I'm dead. You should not see this because I'm dead. When we build and run this now, we get:

Adding basic seccomp filtering

--- step-1/example.c 2012-03-22 21:43:10.845732543 -0700 +++ step-2/example.c 2012-03-22 21:50:56.373304922 -0700 @@ -16,11 +16,54 @@ #include <unistd.h> #include "config.h" +#include "seccomp-bpf.h" + +static int install_syscall_filter(void) +{ + struct sock_filter filter[] = { + /* Validate architecture. */ + VALIDATE_ARCHITECTURE, + /* Grab the system call number. */ + EXAMINE_SYSCALL, + /* List allowed syscalls. */ + ALLOW_SYSCALL(rt_sigreturn), +#ifdef __NR_sigreturn + ALLOW_SYSCALL(sigreturn), +#endif + ALLOW_SYSCALL(exit_group), + ALLOW_SYSCALL(exit), + ALLOW_SYSCALL(read), + ALLOW_SYSCALL(write), + KILL_PROCESS, + }; + struct sock_fprog prog = { + .len = (unsigned short)(sizeof(filter)/sizeof(filter[0])), + .filter = filter, + }; + + if (prctl(PR_SET_NO_NEW_PRIVS, 1, 0, 0, 0)) { + perror("prctl(NO_NEW_PRIVS)"); + goto failed; + } + if (prctl(PR_SET_SECCOMP, SECCOMP_MODE_FILTER, &prog)) { + perror("prctl(SECCOMP)"); + goto failed; + } + return 0; + +failed: + if (errno == EINVAL) + fprintf(stderr, "SECCOMP_FILTER is not available. :(

"); + return 1; +} int main(int argc, char *argv[]) { char buf[1024]; + if (install_syscall_filter()) + return 1; + printf("Type stuff here: "); fflush(NULL); buf[0] = '\0'; --- step-1/configure.ac 2012-03-22 21:40:51.651435417 -0700 +++ step-2/configure.ac 2012-03-22 21:44:19.438868163 -0700 @@ -2,4 +2,5 @@ AC_PREREQ([2.59]) AC_CONFIG_HEADERS([config.h]) AC_PROG_CC +AC_CHECK_HEADERS([linux/seccomp.h]) AC_OUTPUT Next , we include the fancy " seccomp-bpf.h " header. Additionally, this also updates an example " configure.ac " to check for the new "linux/seccomp.h" include, since "seccomp-bpf.h" would like to use it. Then we build our initial list of basic system calls we expect (signal handling, read, write, exit). The flow of a simple seccomp BPF starts with verifying the architecture (since syscall numbers are tied to architecture), and then loads the syscall number and compares it against the whitelist. If no good match is found, it kills the process:

$ ./configure ... checking for linux/seccomp.h... yes configure: creating ./config.status config.status: creating config.h $ make gcc -Wall -c -o example.o example.c gcc example.o -o example $ ./example Bad system call $ echo $? 159 While this gets us to a nice starting place, it's not obvious what's still needed when we run the program, since it just blows up instead:

Adding syscall reporting

Now we can utilize one of the extra features of seccomp filter, and temporarily catch the failed syscall and report it, instead of immediately exiting. The intention is to remove this at the end, since once we've finished our syscall list, we won't need to change it (unless the program or its libraries change, in which case, we can do this again).

--- step-2/example.c 2012-03-22 21:50:56.373304922 -0700 +++ step-3/example.c 2012-03-22 21:51:04.377433872 -0700 @@ -17,6 +17,7 @@ #include "config.h" #include "seccomp-bpf.h" +#include "syscall-reporter.h" static int install_syscall_filter(void) { @@ -34,6 +35,7 @@ ALLOW_SYSCALL(exit), ALLOW_SYSCALL(read), ALLOW_SYSCALL(write), + /* Add more syscalls here. */ KILL_PROCESS, }; struct sock_fprog prog = { @@ -61,6 +63,8 @@ { char buf[1024]; + if (install_syscall_reporter()) + return 1; if (install_syscall_filter()) return 1; --- step-2/Makefile 2012-03-22 19:41:02.510347542 -0700 +++ step-3/Makefile 2012-03-22 19:41:33.706847395 -0700 @@ -3,7 +3,9 @@ all: example -example: example.o +include syscall-reporter.mk + +example: example.o syscall-reporter.o .PHONY: clean clean: Here, we add the " syscall-reporter.mk " Makefile include and the " syscall-reporter.c " object to the Makefile, and then add " syscall-reporter.h " and a call to "install_syscall_reporter" to the program.

$ make gcc -Wall -c -o example.o example.c In file included from example.c:20:0: syscall-reporter.h:21:2: warning: #warning "You've included the syscall reporter. Do not use in production!" [-Wcpp] echo "static const char *syscall_names[] = {" > syscall-names.h ;\ echo "#include <syscall.h>" | cpp -dM | grep '^#define __NR_' | \ LC_ALL=C sed -r -n -e 's/^\#define[ \t]+__NR_([a-z0-9_]+)[ \t]+([0-9]+)(.*)/ [\2] = "\1",/p' >> syscall-names.h;\ echo "};" >> syscall-names.h gcc -Wall -c -o syscall-reporter.o syscall-reporter.c In file included from syscall-reporter.c:12:0: syscall-reporter.h:21:2: warning: #warning "You've included the syscall reporter. Do not use in production!" [-Wcpp] gcc example.o syscall-reporter.o -o example $ ./example Looks like you need syscall fstat(5) too! $ vi example.c ... $ make gcc -Wall -c -o example.o example.c gcc example.o syscall-reporter.o -o example $ ./example Looks like you need syscall mmap(9) too! $ vi example.c ... $ make gcc -Wall -c -o example.o example.c gcc example.o syscall-reporter.o -o example $ ./example Type stuff here: asdf You typed: asdf And now we fork, which should do quite the opposite ... Looks like you need syscall rt_sigprocmask(14) too! $ ... Now, when we run it, we can see the missing syscalls, and progressively add them until we're up to the fork (which is implemented via the "clone" syscall):

Testing is done

--- step-3/example.c 2012-03-22 21:51:04.377433872 -0700 +++ step-4/example.c 2012-03-22 21:51:13.577583466 -0700 @@ -36,6 +36,11 @@ ALLOW_SYSCALL(read), ALLOW_SYSCALL(write), /* Add more syscalls here. */ + ALLOW_SYSCALL(fstat), + ALLOW_SYSCALL(mmap), + ALLOW_SYSCALL(rt_sigprocmask), + ALLOW_SYSCALL(rt_sigaction), + ALLOW_SYSCALL(nanosleep), KILL_PROCESS, }; struct sock_fprog prog = { $ ./example Type stuff here: asdf You typed: asdf And now we fork, which should do quite the opposite ... Looks like you need syscall clone(56) too! This continues until we hit the report of the "clone" use, and we know we're done:

Ready for prime-time

--- step-4/example.c 2012-03-22 21:51:13.577583466 -0700 +++ step-5/example.c 2012-03-22 21:51:21.785717260 -0700 @@ -17,7 +17,6 @@ #include "config.h" #include "seccomp-bpf.h" -#include "syscall-reporter.h" static int install_syscall_filter(void) { @@ -35,7 +34,6 @@ ALLOW_SYSCALL(exit), ALLOW_SYSCALL(read), ALLOW_SYSCALL(write), - /* Add more syscalls here. */ ALLOW_SYSCALL(fstat), ALLOW_SYSCALL(mmap), ALLOW_SYSCALL(rt_sigprocmask), @@ -68,8 +66,6 @@ { char buf[1024]; - if (install_syscall_reporter()) - return 1; if (install_syscall_filter()) return 1; --- step-4/Makefile 2012-03-22 19:55:27.056164102 -0700 +++ step-5/Makefile 2012-03-22 19:55:33.680270186 -0700 @@ -3,9 +3,7 @@ all: example -include syscall-reporter.mk - -example: example.o syscall-reporter.o +example: example.o .PHONY: clean clean: $ ./example Type stuff here: asdf You typed: asdf And now we fork, which should do quite the opposite ... Bad system call $ echo $? 159 Now that we're done, we can remove the syscall reporter again, and see that the program correctly dies when it hits the fork. (To be really done, the fork should be removed too!)

Conclusion

Ta-da! That's it -- you've now got a seccomp filter built into your program. To make this even more portable, you can ignore the "prctl" failures if seccomp is not available, or warn the user but not die, or put the entire thing behind a "#ifdef HAVE_LINUX_SECCOMP_H" test.

For more complex, or dynamic, BPF constructions, you'll probably want to take a look at libseccomp

For a stand-alone filtering tool, check out minijail