I get asked all the time about the different security measures used to control what container processes on a system can do. Most of these I covered in previous articles on Opensource.com:

Almost all of the above articles are about controlling what a privileged process on a system can do in a container. For example: How do I run root in a container and not allow it to break out?

User namespace is all about running privileged processes in containers, such that if they break out, they would no-longer be privileged outside the container. For example, in the container my UID is 0 (zero), but outside of the container my UID is 5000. Due to problems with the lack of file system support, user namespace has not been the panacea it is cracked up to be. Until now.

OpenShift is Red Hat's container platform, built on Kubernetes, Red Hat Enterprise Linux, and OCI containers, and it has a great security feature: By default, no containers are allowed to run as root. An admin can override this, otherwise all user containers run without ever being root. This is particularly important in multi-tenant OpenShift Kubernetes clusters, where a single cluster may be serving multiple applications and multiple development teams. It is not always practical or even advisable for administrators to run separate clusters for each. Sadly one of the biggest complaints about OpenShift is that users can not easily run all of the community container images available at docker.io. This is because the vast majority of container images in the world today require root.

Why do all these images require root?

If you actually examine the reasons to be root, on a system, they are quite limited.

Modify the host system:

One major reason for being root on the system is to change the default settings on the system, like modifying the kernel's configuration.

In Fedora, CentOS, and Red Hat Enterprise Linux, we have the concept of system containers , which are privileged containers that can be installed on a system using the atomic command. They can run fully privileged and are allowed to modify the system as well as the kernel. In the case of system containers we are using the container image as a content delivery system, not really looking for container separation. System containers are more for the core operating system host services as opposed to the users app services that most containers run.

, which are privileged containers that can be installed on a system using the command. They can run fully privileged and are allowed to modify the system as well as the kernel. In the case of we are using the container image as a content delivery system, not really looking for container separation. System containers are more for the core operating system host services as opposed to the users app services that most containers run. In application containers, we almost never want the processes inside the container to modify the kernel. This is definitely not required by default.

Unix/Linux tradition:

Operating system software vendors and developers have known for a long time running processes as root is dangerous, so the kernel added lots of Linux capabilities to allow a process to start as root and then drop privileges as quickly as possible. Most of the UID/GID capabilities allow a processes like a web service to start as root, then become non-root. This is done to bind to ports below 1024 (more on this later).

Container runtimes can start applications as non-root to begin with. Truth be known, so can systemd, but most software that has been built over the past 20 years assumes it is starting as root and dropping privileges.

Bind to ports < 1024

Way back in the 1960s and 1970s when there were few computers, the inability of unprivileged users to bind to network ports < 1024 was considered a security feature. Because only an admin could do this, you could trust the applications listening on these ports. Ports > 1024 could be bound by any user on the system so they could not be trusted. The security benefits of this are largely gone, but we still live with this restriction.

The biggest problem with this restriction is the web services where people love to have their web servers listening on port 80. This means the main reason apache or nginx start out running as root is so that they can bind to port 80 and then become non-root.

Container runtimes, using port forwarding, can solve this problem. You can set up a container to listen on any network port, and then have the container runtime map that port to port 80 on the host.

In this command, the podman runtime would run an apache_unpriv container on your machine listening on port 80 on the host, while the Apache process inside the container was never root, started as the apache user, and told to listen on port 8080.

podman run -d -p 80:8080 -u apache apache_unpriv

Alternatively:

docker run -d -p 80:8080 -u apache apache_unpriv

Installing software into a container image

When Docker introduced building containers with docker build , the content in the containers was just standard packaged software for distributions. The software usually came via rpm packages or Debian packages. Well, distributions package software to be installed by root. A package expects to be able to do things like manipulate the /etc/passwd file by adding users, and to put down content on the file system with different UID/GIDs. A lot of the software also expects to be started via the init system (systemd) and start as root and then drop privileges after it starts.

, the content in the containers was just standard packaged software for distributions. The software usually came via rpm packages or Debian packages. Well, distributions package software to be installed by root. A package expects to be able to do things like manipulate the file by adding users, and to put down content on the file system with different UID/GIDs. A lot of the software also expects to be started via the init system (systemd) and start as root and then drop privileges after it starts. Sadly, five years into the container revolution, this is still the status quo. A few years ago, I attempted to get the httpd package to know when it is being installed by a non-root user and to have a different configuration. But I dropped the ball on this. We need to have packagers and package management systems start to understand the difference, and then we could make nice containers that run without requiring root.

One thing we could do now to fix this issue is to separate the build systems from the installing systems. One of my issues with #nobigfatdaemons is that the Docker daemon led to the containers running with the same privs for running a container as it did for building a container image.

If we change the system or use different tools, say Buildah, for building a container with looser constraints and CRI-O/Kubernetes/OpenShift for running the containers in production, then we can build with elevated privileges, but then run the containers with much tighter constraints,or hopefully as a non-root user.

Bottom line

Almost all software you are running in your containers does not require root. Your web applications, databases, load balancers, number crunchers, etc. do not need to be run as root ever. When we get people to start building container images that do not require root at all, and others to base their images off of non-privileged container images, we would see a giant leap in container security.

There is still a lot of educating needing to be done around running root inside of containers. Without education, smart administrators can make bad decisions.