A library for seccomp filters

Benefits for LWN subscribers The primary benefit from subscribing to LWN is helping to keep us publishing, but, beyond that, subscribers get immediate access to all site content and access to a number of extra site features. Please sign up today!

Now that it is looking like Linux will be getting an enhanced "secure computing" (seccomp) facility, some are starting to turn toward actually using the new feature in applications. To that end, Paul Moore has introduced libseccomp, which is meant to make it easier for applications to take advantage of the packet-filter-based seccomp mode. That will lead to more secure applications that can permanently reduce their ability to make "unsafe" system calls, which can only be a good thing for Linux application security overall.

Enhanced seccomp has taken a somewhat tortuous path toward the mainline—and it's not done yet. Will Drewry's BPF-based solution (aka seccomp filter or seccomp mode 2) is currently in linux-next, and recent complaints about it have been few and far between, so it would seem likely that it will appear in the 3.5 kernel. It will provide fine-grained control over the system calls that the process (and its children) can make.

What libseccomp does is make it easier for applications to add support for sandboxing themselves by providing a simpler API to use the new seccomp mode. By way of contrast, Kees Cook posted a seccomp filter tutorial that describes how to build an application using the filters directly. In addition, it is also interesting to see that the recent OpenSSH 6.0 release contains support for seccomp filtering using a (pre-libseccomp) patch from Drewry. The patch limits the privilege-separated OpenSSH child process to a handful of legal system calls, while setting up open() to fail with an EACCESS error

As described in the man pages that accompany the libseccomp code, the starting point is to include seccomp.h , then an application must call:

int seccomp_init(uint32_t def_action);

def_action

SCMP_ACT_KILL

SCMP_ACT_TRAP

SIGSYS

SCMP_ACT_ERRNO(errno)

ptrace()

SCMP_ACT_TRACE(msg_num)

SCMP_ACT_ALLOW

Theparameter governs the default action that is taken when a system call is rejected by the filter.will kill the process, whilewill cause asignal to be issued. There are also options to force rejected system calls to return a certain error (), to generate aevent (), or to simply allow the system call to proceed ().

Next, the application will want to add its filter rules. Those rules can apply to any invocation of a particular system call, or it can restrict calls to only use certain values for the system call arguments. So, a rule could specify that write() can only be used on file descriptor 1, or that open() is forbidden, for example. The interface for adding rules is:

int seccomp_rule_add(uint32_t action, int syscall, unsigned int arg_cnt, ...);

action

seccomp_init()

syscall

__NR_syscall

SCMP_SYS()

arg_cnt

Theparameter uses the same action macros as are used in. Theargument is the system call number of interest for this rule, which could be specified usingvalues, but it is recommended that themacro be used to properly handle multiple architectures. Thespecifies the number of rules that are being passed; those rules then follow.

In the simplest case, where the rule is just allowing a system call for example, there are no argument rules. So, if the default action is to kill the process, adding a rule to allow close() would look like:

seccomp_rule_add(SCMP_ACT_ALLOW, SCMP_SYS(close), 0);

SCMP_A0()

SCMP_A5()

SCMP_CMP_EQ

SCMP_CMP_GT

stderr

seccomp_rule_add(SCMP_ACT_ALLOW, SCMP_SYS(write), 1, SCMP_A0(SCMP_CMP_EQ, STDERR_FILENO));

Doing filtering based on the system call arguments relies on a set of macros that specify the argument of interest by number (through), and the comparison to be done (, and so on). So, adding a rule that allows writing towould look like:

Once all the rules have been added, the filter is loaded into the kernel (and activated) with:

int seccomp_load(void);

seccomp_load()

void seccomp_release(void);

The internal library state that was used to build the filter is no longer needed after the call to, so it can be released with a call to:

There are a handful of other functions that libseccomp provides, including two ways to extract the filter code from the library:

int seccomp_gen_bpf(int fd); int seccomp_gen_pfc(int fd);

fd

int seccomp_syscall_priority(int syscall, uint8_t priority);

int seccomp_attr_set(enum scmp_filter_attr attr, uint32_t value); int seccomp_attr_get(enum scmp_filter_attr attr, uint32_t *value);

SCMP_FLTATR_ACT_DEFAULT

SCMP_FLTATR_ACT_BADARCH

SCMP_ACT_KILL

PR_SET_NO_NEW_PRIVS

SCMP_FLTATR_CTL_NNP

NO_NEW_PRIVS

NO_NEW_PRIVS

setuid()

Those functions will write the filter code in either kernel-readable BPF or human-readable Pseudo Filter Code (PFC) to. One can also set the priority of system calls in the filter. That priority is used as a hint by the filter generation code to put higher priority calls earlier in the filter list to reduce the overhead of checking those calls (at the expense of the others in the rules):In addition, there are a few attributes for the seccomp filter that can be set or queried using:The attributes available are the default action for the filter (, which is read-only), the action taken when the loaded filter does not match the architecture it is running on (, which defaults to), or whetheris turned on or off before activating the filter (, which defaults tobeing turned on). Theflag is a recent kernel addition that stops a process and its children from ever being able to get new privileges (viaor capabilities for example).

The last attribute came about after some discussion in the announcement thread. The consensus on the list was that it was desirable to set NO_NEW_PRIVS by default, but allow libseccomp users to override that if desired. Other than some kudos from other developers about the project, the only other messages in the thread concerned the GPLv2 license. Moore said that the GPL was really just his default license and, since it made more sense for a library to use the LGPL, he was able to get the other contributors to agree to switch to the LGPLv2.1

While it is by no means a panacea, the seccomp filter will provide a way for applications to make themselves more secure. In particular, programs that handle untrusted user input, like the Chromium browser which was the original impetus for the feature, will be able to limit the ways in which damage can be done through a security hole in their code. One would guess we will see more applications using the feature via libseccomp. Seccomp mode 2 is currently available in Ubuntu kernels, and is slated for inclusion in ChromeOS—with luck we'll see it in the mainline soon too.