In the last 2 posts about the main security features in Docker 1.10 we tackled the Authorization plug-in and the Secomp profiles. In this final post we'll focus on the new support for Linux User Namespace.

Before the release of Docker 1.10, a container running under user=root in Docker would have run as root also on the host itself. Although it was not a fully privileged root user thanks to Linux Capabilities definitions, it still had more privileges than needed, and additionally file system security was not governed by those privileges. This meant that any root user could read/write all files, and could also easily escalate privileges using setuid programs.

As part of Docker 1.10, Docker announced support for Linux User Namespace. Now, even containers running under user=root in Docker are automatically given a regular, non-root user on the host. The purpose of User Namespace is similar to other types of Linux namespaces - isolation. It isolates user and group ID number spaces, so that a process’s user and group ID can be different inside and outside of a user namespace.

The fact a user is assigned different uid/gid inside and outside the container solves a problem related to file system permissions: containers accessing files on the host through shared volumes will no longer be accessing these files as a uid 0 (root), but as a regular user, mapped by user namespace. Nice!

The use of user namespaces in Docker is a daemon-wide setting, which is disabled by default. To enable it, the daemon must start with `--userns-remap` flag with a parameter that specifies base uid/gid. All containers are run with the same mapping range according to /etc/subuid and /etc/subguid. Let's see what it looks like.

We have several containers running:

Without User Namespace enabled, the containerized process’s user and group ID are just as if they were running on the host itself:

Now, let see what it looks like when user namespace is enabled. Let’s say the daemon is running with ‘--userns-remap=default’ option. This default mapping means that a user called dockermap, defined on the host, is used for the uid/gid mapping.

The mapping range for this user is defined in the /etc/subuid file.

Now the containerized process’s user and group IDs are mapped to dockermap’s subordinate range:

As you can see, all containers running as root (uid=0) inside Docker are mapped to uid 427680 on the host (the mapping of 0 entry into dockermap’s subordinate).



What does this mean from a security perspective?

From a security standpoint, the User Namespace primarily makes it harder to perform privilege elevation through the file system.

But there is an important additional benefit. Nproc cgroup can now be enforced on containerized applications. The limits set by nproc cgroup work per user, but do not apply to root. So while beforehand applying nproc on a containerized application would have had no actual effect since the processes had root UID, now with User Namespace enabled, containerized applications actually run with a non-root user and therefore nproc setting work as expected…

While the User Namespace feature is an important one, there are limitations that will make its adoption difficult in many cases, specifically the lack of support for --readonly file systems and the inability to share host’s network, process table and even running privileged containers - these are all quite limiting. It’s also important to understand that User Namespace on its own does not provide full isolation for containers. They still run on a shared kernel and can still exploit network and other OS resources.