Applying Seccomp Filters at Runtime for Go Binaries

Listen to this article

Seccomp (short for security computing mode) is a useful feature provided by the Linux kernel since 2.6.12 and is used to control the syscalls made by a process. Seccomp has been implemented by numerous projects such as Docker, Android, OpenSSH and Firefox to name a few.

In this blog post, I am going to show you how you can implement your own seccomp filters, at runtime, for a Go binary on your Dyno.

By default, when you run a process on your Dyno, it is limited by which syscalls it can make because the Dyno has been implemented with a restricted set of seccomp filters. This means, for example, that your process has access to syscalls A,B and C and not H and J as defined in the filters for your Dyno. This reduces the overall attack surface* of the Dyno (and is something of a best practice) but what if your process does not to make use of syscall A but only needs C and B? In this case, your process has an unnecessary syscall exposed which increases the attack surface of your process. By limiting the process attack surface, this increases the security posture of your process and if your process were to be compromised in some way, the compromise would be limited by default to the syscalls available to the process. This allows for a layered, defence in depth approach, whereby should one security control fail, another would be able to prevent further damage.

...if we were to create a program [that] was only required to create a folder at a specific location on the file system then we could apply a seccomp filter which would ensure that only the syscalls that are required to create a folder at a specific location are accessible to the program.

For example, if we were to create a program and the program was only required to create a folder at a specific location on the file system then we could apply a seccomp filter which would ensure that only the syscalls that are required to create a folder at a specific location are accessible to the program. However, if the program were to modified—be it via the source code or some form of code injection—and the program then attempted to establish a network connection (e.g. via curl ), then the applied seccomp filter would block this behaviour. This behaviour is blocked because the syscalls required for the network connection have not been added to our program's seccomp filter.

* Attack Surface is common lingo for security folks - maybe not everyday language for developers. Attack surface might be defined as an exposure presenting a malicious actor opportunity to attack or manipulate your environment to their own will - we seek to remove or contain these from their use at every possible opportunity.

For the remainder of this post, I am going to go through the steps on how to deploy a Go binary and have it implement seccomp filters at runtime.

Firstly we need an application, in this case, I've created a Go program to create a folder at /tmp called moo . The code for the program is located below:

package main import ( "fmt" "syscall" ) func main() { err := syscall.Mkdir("/tmp/moo", 0755) if err != nil { panic(err) } else { fmt.Printf("I just created a file

") } }

We now have a simple Go program to create a folder. As we are working with syscalls, we need to determine what syscalls this program needs to execute successfully. There are multiple ways to determine this, but we will use the application binary and strace . Let's run the following to create the executable binary:

$ go build -o makeTheFolder

We now have the binary and if we execute it, we should get the following output:

$ ./makeTheFolder I just created a file

We now know that our binary is working and we are going to determine what syscalls are made. To achieve this, we will run the following command:

$ strace -c ./makeTheFolder

The output of the above command will be something like this:

I just created a file % time seconds usecs/call calls errors syscall ------ ----------- ----------- --------- --------- ---------------- 0.00 0.000000 0 3 read 0.00 0.000000 0 1 write 0.00 0.000000 0 4 open 0.00 0.000000 0 4 close 0.00 0.000000 0 4 fstat 0.00 0.000000 0 25 mmap 0.00 0.000000 0 12 mprotect 0.00 0.000000 0 2 munmap 0.00 0.000000 0 3 brk 0.00 0.000000 0 120 rt_sigaction 0.00 0.000000 0 11 rt_sigprocmask 0.00 0.000000 0 5 5 access 0.00 0.000000 0 4 clone 0.00 0.000000 0 1 execve 0.00 0.000000 0 1 getrlimit 0.00 0.000000 0 2 sigaltstack 0.00 0.000000 0 1 arch_prctl 0.00 0.000000 0 1 gettid 0.00 0.000000 0 1 futex 0.00 0.000000 0 1 sched_getaffinity 0.00 0.000000 0 1 set_tid_address 0.00 0.000000 0 1 mkdirat 0.00 0.000000 0 1 readlinkat 0.00 0.000000 0 1 set_robust_list ------ ----------- ----------- --------- --------- ---------------- 100.00 0.000000 210 5 total

From the output above, we have a list of the syscalls that were executed by our makeTheFolder binary. Next we need to use our syscall list in such a way that when our binary is executed, it's process only has access to the syscalls it requires. To achieve this we will use seccomp, more specifically, we will be making use of the Go library libseccomp-golang which is the Go bindings for libseccomp .

We will need to check if our local system supports seccomp and has the required dependencies for libseccomp-golang . To check if your kernel supports seccomp, run the following command:

$ grep CONFIG_SECCOMP=/boot/config-$(uname -r)

If your kernel supports seccomp, you should get the following returned:

CONFIG_SECCOMP=y

Additionally, we need to ensure that we have libseccomp-dev installed on our local system. To install this package, we can run the following command:

$ apt-get install libseccomp-dev

At this point, we have everything we need to start using the libseccomp-golang library. The following code will be used to achieve our goal of limiting the syscalls available to our binary at runtime:

package main import ( "fmt" "syscall" libseccomp "github.com/seccomp/libseccomp-golang" ) func whiteList(syscalls []string) { filter, err := libseccomp.NewFilter(libseccomp.ActErrno.SetReturnCode(int16(syscall.EPERM))) if err != nil { fmt.Printf("Error creating filter: %s

", err) } for _, element := range syscalls { fmt.Printf("[+] Whitelisting: %s

",element) syscallID, err := libseccomp.GetSyscallFromName(element) if err != nil { panic(err) } filter.AddRule(syscallID, libseccomp.ActAllow) } filter.Load() }

The code above implements seccomp filters using a whitelist approach. We first apply a “deny all” filter to our seccomp filter which restricts access to all syscalls. This is achieved in this line of code:

filter, err := libseccomp.NewFilter(libseccomp.ActErrno.SetReturnCode(int16(syscall.EPERM)))

The method whiteList expects an array of type string which contains the names of the syscalls that we want our process to have access to. We make use of this list by iterating over the elements and then adding the syscall to our filter whitelist which allows our binary to have access to the syscall name provided.

for _, element := range syscalls { fmt.Printf("[+] Whitelisting: %s

",element) syscallID, err := libseccomp.GetSyscallFromName(element) if err != nil { panic(err) } filter.AddRule(syscallID, libseccomp.ActAllow) }

Once we are done adding our required syscalls to the filter, we then load the filter which applies the filter we just created to our binary at runtime. The code to load our filter is:

filter.Load()

We now have a mechanism to limit which syscalls our process will have access to. To use this in our makeTheFolder program, we add the following code:

package main import ( "fmt" "syscall" ) func main() { var syscalls = []string{ "rt_sigaction", "mkdirat", "clone", "mmap", "readlinkat", "futex", "rt_sigprocmask", "mprotect", "write", "sigaltstack", "gettid", "read", "open", "close", "fstat", "munmap", "brk", "access", "execve", "getrlimit", "arch_prctl", "sched_getaffinity", "set_tid_address", "set_robust_list"} whiteList(syscalls) err := syscall.Mkdir("/tmp/moo", 0755) if err != nil { panic(err) } else { fmt.Printf("I just created a file

") } }

Our addition to the code is a string array containing the names of the syscalls we extracted from our strace output, and we use this array when we call the method whiteList(syscalls) .

We can now test our modified program using the same steps mentioned above:

$ go build -o makeTheFolder && ./makeTheFolder

The above command provides us with the following output:

[+] Whitelisting: rt_sigaction [+] Whitelisting: mkdirat [+] Whitelisting: clone [+] Whitelisting: mmap [+] Whitelisting: readlinkat [+] Whitelisting: futex [+] Whitelisting: rt_sigprocmask [+] Whitelisting: mprotect [+] Whitelisting: write [+] Whitelisting: sigaltstack [+] Whitelisting: gettid [+] Whitelisting: read [+] Whitelisting: open [+] Whitelisting: close [+] Whitelisting: fstat [+] Whitelisting: munmap [+] Whitelisting: brk [+] Whitelisting: access [+] Whitelisting: execve [+] Whitelisting: getrlimit [+] Whitelisting: arch_prctl [+] Whitelisting: sched_getaffinity [+] Whitelisting: set_tid_address [+] Whitelisting: set_robust_list I just created a file Segmentation fault (core dumped)

We can verify if our folder was created successfully by running the following command:

$ file /tmp/moo /tmp/moo: directory

Our process is successfully creating the folder we specified but our process crashed afterwards with what appears to be a Segmentation fault. After much investigation (which is beyond the scope of this blog post), I discovered that this crash was due to the process not having access to the exit_group syscall. I stumbled upon this error when verifying my strace output and noticed that the -c option for strace does not display syscalls that do not have a return type. To verify this, I ran strace again without the -c option and dumped the raw output to file. I used the following command:

$ strace -o output.txt ./makeTheFolder

The content of output.txt looks like this:

....... mmap(NULL, 8392704, PROT_READ|PROT_WRITE, MAP_PRIVATE|MAP_ANONYMOUS|MAP_STACK, -1, 0) = 0x7fb8fc8e0000 mprotect(0x7fb8fc8e0000, 4096, PROT_NONE) = 0 clone(child_stack=0x7fb8fd0dfff0, flags=CLONE_VM|CLONE_FS|CLONE_FILES|CLONE_SIGHAND|CLONE_THREAD|CLONE_SYSVSEM|CLONE_SETTLS|CLONE_PARENT_SETTID|CLONE_CHILD_CLEARTID, parent_tidptr=0x7fb8fd0e09d0, tls=0x7fb8fd0e0700, child_tidptr=0x7fb8fd0e09d0) = 16335 rt_sigprocmask(SIG_SETMASK, [], NULL, 8) = 0 rt_sigprocmask(SIG_SETMASK, ~[RTMIN RT_1], [], 8) = 0 mmap(NULL, 8392704, PROT_READ|PROT_WRITE, MAP_PRIVATE|MAP_ANONYMOUS|MAP_STACK, -1, 0) = 0x7fb8fc0df000 mprotect(0x7fb8fc0df000, 4096, PROT_NONE) = 0 clone(child_stack=0x7fb8fc8deff0, flags=CLONE_VM|CLONE_FS|CLONE_FILES|CLONE_SIGHAND|CLONE_THREAD|CLONE_SYSVSEM|CLONE_SETTLS|CLONE_PARENT_SETTID|CLONE_CHILD_CLEARTID, parent_tidptr=0x7fb8fc8df9d0, tls=0x7fb8fc8df700, child_tidptr=0x7fb8fc8df9d0) = 16336 rt_sigprocmask(SIG_SETMASK, [], NULL, 8) = 0 rt_sigprocmask(SIG_SETMASK, ~[RTMIN RT_1], [], 8) = 0 mmap(NULL, 8392704, PROT_READ|PROT_WRITE, MAP_PRIVATE|MAP_ANONYMOUS|MAP_STACK, -1, 0) = 0x7fb8fb8de000 mprotect(0x7fb8fb8de000, 4096, PROT_NONE) = 0 clone(child_stack=0x7fb8fc0ddff0, flags=CLONE_VM|CLONE_FS|CLONE_FILES|CLONE_SIGHAND|CLONE_THREAD|CLONE_SYSVSEM|CLONE_SETTLS|CLONE_PARENT_SETTID|CLONE_CHILD_CLEARTID, parent_tidptr=0x7fb8fc0de9d0, tls=0x7fb8fc0de700, child_tidptr=0x7fb8fc0de9d0) = 16337 rt_sigprocmask(SIG_SETMASK, [], NULL, 8) = 0 futex(0x72f7c8, FUTEX_WAIT, 0, NULL) = 0 rt_sigprocmask(SIG_SETMASK, ~[RTMIN RT_1], [], 8) = 0 mmap(NULL, 8392704, PROT_READ|PROT_WRITE, MAP_PRIVATE|MAP_ANONYMOUS|MAP_STACK, -1, 0) = 0x7fb8fb0dd000 mprotect(0x7fb8fb0dd000, 4096, PROT_NONE) = 0 clone(child_stack=0x7fb8fb8dcff0, flags=CLONE_VM|CLONE_FS|CLONE_FILES|CLONE_SIGHAND|CLONE_THREAD|CLONE_SYSVSEM|CLONE_SETTLS|CLONE_PARENT_SETTID|CLONE_CHILD_CLEARTID, parent_tidptr=0x7fb8fb8dd9d0, tls=0x7fb8fb8dd700, child_tidptr=0x7fb8fb8dd9d0) = 16338 rt_sigprocmask(SIG_SETMASK, [], NULL, 8) = 0 readlinkat(AT_FDCWD, "/proc/self/exe", "/home/brompwnie/go/src/github.co"..., 128) = 68 mmap(NULL, 262144, PROT_READ|PROT_WRITE, MAP_PRIVATE|MAP_ANONYMOUS, -1, 0) = 0x7fb8fda88000 mkdirat(AT_FDCWD, "/tmp/moo", 0755) = 0 write(1, "I just created a file

", 22) = 22 exit_group(0) = ? +++ exited with 0 +++

The output above contains a list of the syscalls that were executed, their return values and other information. The return value = ? indicates that the syscall exit_group does not have a return value. strace does not display these sycalls with the -c option so it is recommended that you analyze both output formats to ensure that you get all the syscalls needed by the process.

At this point, our process is executing successfully but crashing near the end of execution. To remediate this, we add the exit_group syscall to our list of syscalls to whitelist as shown below:

var syscalls = []string{ "rt_sigaction", "mkdirat", "clone", "mmap", "readlinkat", "futex", "rt_sigprocmask", "mprotect", "write", "sigaltstack", "gettid", "read", "open", "close", "fstat", "munmap","brk", "access", "execve", "getrlimit", "arch_prctl", "sched_getaffinity", "set_tid_address", "set_robust_list", "exit_group"}

We can now rebuild and check if our new whitelist of syscalls works with the following command:

$ go build -o makeTheFolder && ./makeTheFolder

The above command should result in the following output:

[+] Whitelisting: rt_sigaction [+] Whitelisting: mkdirat [+] Whitelisting: clone [+] Whitelisting: mmap [+] Whitelisting: readlinkat [+] Whitelisting: futex [+] Whitelisting: rt_sigprocmask [+] Whitelisting: mprotect [+] Whitelisting: write [+] Whitelisting: sigaltstack [+] Whitelisting: gettid [+] Whitelisting: read [+] Whitelisting: open [+] Whitelisting: close [+] Whitelisting: fstat [+] Whitelisting: munmap [+] Whitelisting: brk [+] Whitelisting: access [+] Whitelisting: execve [+] Whitelisting: getrlimit [+] Whitelisting: arch_prctl [+] Whitelisting: sched_getaffinity [+] Whitelisting: set_tid_address [+] Whitelisting: set_robust_list I just created a file

The output above indicates that our process successfully created the folder moo at the correct location /tmp and exited gracefully.

At this point we have our Go program running locally as required with seccomp filters, which means that when when the binary makeTheFolder is executed, its process can only use the syscalls that we specified.

In the previous section, we implemented a whitelist to allow for the program to create a folder moo at /tmp but what would happen if the program were to be modified and attempted to execute the following code?

.... whiteList(syscalls) err := syscall.Mkdir("/tmp/moo", 0755) if err != nil { panic(err) } else { fmt.Printf("I just created a file

") } err2 := syscall.Exec("/bin/ls", []string{"ls", "-l"}, nil) }

The code above attempts to run the ls -l command and if it were to be executed from within our seccomp whitelisted program, we would get the following output:

... [+] Whitelisting: getrlimit [+] Whitelisting: arch_prctl [+] Whitelisting: sched_getaffinity [+] Whitelisting: set_tid_address [+] Whitelisting: set_robust_list [+] Whitelisting: exit_group I just created a file ls: reading directory '.': Operation not permitted total 0

The output above tells us that the operation was not permitted, and this operation was the command ls -l , which was executed by syscall.Exec . We did not whitelist the syscalls required for the command ls -l ( ioctl , getdents , and statfs ) therefore it is not allowed to be executed within the context of our program. We just blocked non-whitelisted syscalls.

Figure 1: How whitelisted syscalls can be used to restrict the syscalls executed by a process.

We can implement this on Heroku as you would with any other Go program on Heroku. First, make sure you have the dependency libseccomp-golang added to your project via Govendor or Godeps and simply deploy. I made use of Govendor and had the following entry in my vendor.json file:

"package": [ { "checksumSHA1": "bCj0+g9CKyCA90SlDxaPA6+zZeg=", "path": "github.com/seccomp/libseccomp-golang", "revision": "f6ec81daf48e41bf48b475afc7fe06a26bfb72d1", "revisionTime": "2017-06-09T13:46:05Z" } ],

And there you go. You now know how to implement seccomp filters at runtime for your Go binaries. We have added the necessary packages required such as libseccomp-dev to the build environment so that we can achieve this. You can find the full list of packages available below.

In this blog post, we discussed how you can configure and deploy Go binaries with seccomp at runtime to harden your processes. This allows Go developers to programmatically reduce the attack surface of their deployed processes and allows developers to embrace the “shift left” philosophy for secure software development.

This post and functionality would not have been made possible without Heroku's Build Team adding the required packages to the Heroku stack images. Thank you!