KubeCon + CloudNativeCon sponsored this post, in anticipation of KubeCon + CloudNativeCon EU, in Amsterdam.

Sascha Grunert Sascha is a senior software engineer at SUSE, where he works on many different container-related open-source projects like Kubernetes and CRI-O. He joined the open source community in November 2018, having gained container experience before joining SUSE. Sascha's passions include contributing to open source, as well as giving talks and evangelizing Kubernetes-related technologies.

Containers have evolved from the beginning of the first namespace implementations in the Linux kernel in 2002 up to serve as the underlying structure for full-featured cloud native applications inside cluster orchestration systems, such as Kubernetes. There are many different and independently maintained projects involved when spinning up a single container-based workload in Kubernetes. This drastically increases the attack surface of a simple application and its infrastructure when deploying on top of Kubernetes.

However, what happens now, if you encounter common vulnerabilities and exposure (CVEs) in one of the cluster components? To understand the impact, we also have to understand the interconnections of vulnerable software components that they also share with other interfaces. This is not easy to do. Organizations must find a way to handle software security in an economic manner. It’s not only about finding good people doing DevOps, it’s more important to fully support a DevSecOps strategy. Software engineers need to support the full skill set to be able to work on the full stack from conception to operations.

In this post, we describe the different layers involved in cloud native and containerized software security.

The Linux Kernel

Containers start at the Linux kernel by isolating resources into dedicated namespaces. This is exactly where the first level of exploitation can happen, while the namespace resources might be a first possible attack vector. There are some already known vulnerabilities related to namespaces, for example in conjunction to privilege escalation inside the user namespace. This means a generally good approach is to keep the kernel up-to-date. Nevertheless, it does not happen very often that kernel-based vulnerabilities appear in the wild, which is definitely an overall good sign.

The latest user namespace-related vulnerability is CVE-2018-18955, which uses a bug in kernel-to-namespace ID transformation. The vulnerability applies to users who have the Linux capability CAP_SYS_ADMIN in an affected user namespace and who can bypass access controls on resources outside the namespace. Those so-called capabilities are one of the first Linux kernel features we have to deal with when restricting security accesses inside containers.

Capabilities add an additional layer of control to superuser permissions by avoiding the usage of the root user and group ID 0 at all. When running software on Linux systems, it is recommended to run the binary with a minimal set of privileged capabilities as possible, which makes the binary unprivileged for a certain subset of features.

The list of available capabilities is long. For example, the capability CAP_SYS_ADMIN allows accessing system calls like unshare(2) and clone(2). On kernels newer than Linux 3.8, these functions do not require any capability at all. This means that software developers also have to take the target platform into account when developing software for it, which makes things more complicated.

The fact that we run our application in different Kernel namespaces does not allow us to skip considerations about the right set of permissions we need in order to run our applications. To achieve an even higher set of security, we could lock-down the application into a better-suited container image as well.

Container Images

Besides the actual application running inside the container image, the runtime dependencies can introduce security issues as well. As a first principle, it is important not to add unnecessary tools or build-time dependencies to the container image. It is always worth it to specify a minimal base image, where having a closer look at base images like node is recommended as well. Most of these base images rely on distributions like Debian or Ubuntu, which have tools and libraries included we probably do not need at all and broaden the attack surface of the deployment.

If we build a container image from scratch, we might run into issues when it comes to debugging in production. How to debug a running container which writes files for maintaining its state? Usually, this is the right time to utilize the higher abstraction level of Kubernetes. It is great to have external monitoring and logging facades like Prometheus, Kibana, Grafana or Loki.

It is also necessary to never leak private secrets into container images, which can happen easily during the image build process. Temporarily exposing secrets as environment variables will still result in showing up the secret in the image history. To avoid something like that, either use multistaged builds or a secret mount feature of the container building tool. In a Continuous Integration and Deployment (CI/CD) pipeline, it might be better to locally rely on previous build steps which provide the secret file and copy them into the build context.

Signing container images can be important when mitigating man-in-the-middle (MITM) attacks. As already seen in the past blog post, it is easily possible to hook into a Docker build process and modify the content during the build. Having a single resource of trust and verifying it via signing the images is an advantage we really should take into consideration.

Analog to image signing, container image encryption can add an additional level of security as well. This way a locally stored key can be used to decrypt the layers at the container runtime level. There are three different encryption technologies common right now: OpenPGP, JSON Web Encryption (JWE) and PKCS#7.

Container Runtimes

Container runtimes generally increase the security-related attack surface by adding possibly vulnerable source code on top of the overall stack. The current de-facto standard low-level container runtime is runc, which is used by Podman, CRI-O, containerd and Docker. In terms of Kubernetes, the container runtimes CRI-O and containerd support any OCI (runc) compatible container runtime. We have to distinguish the level of security depending on the underlying used container runtimes. For example, a possible vulnerability in runc has a much higher impact than one in containerd because of its usage scope. The use of additional software, such as Kata Containers, provides a higher level of security by isolating the workloads in a micro VM. This boosts application security but also defers the vulnerable attack surface to hypervisors and the Kata runtime itself.

Preventing container runtime issues is not possible in every case, but container-based workloads can be security hardened with additional patterns, such as by applying Secure Computing (seccomp) profiles.

Seccomp provides an enhanced way to filter syscalls issued by a program to reduce the Kernels’ attack surface. It is especially useful when running untrusted third-party programs. By restricting what system calls can be made, seccomp provides a great addition for building modern application sandboxes.

For containers, runtimes supporting seccomp can pass a seccomp profile to a container, which is basically a JSON whitelist of specified system calls. All other system calls are denied by default. Most container runtimes ship a default seccomp profile with their packages as well.

In the same manner as for the Linux capability feature, it is valuable to invest time to work on seccomp filters for applications and lock them down in a minimal subset of required system calls. Having the wisdom to know which system calls are necessary for the running application also enables software developers to maintain a good understanding of the security requirements for their applications.

Having even more security-related control over applications can be achieved via SELinux and AppArmor. Both projects target to enhance the granularity about the possible set of permissions an application has in general, for example in relation to file or network permissions. Because of the shared target scope of both solutions, distributions usually decide if they want to go with SELinux or AppArmor. It is not possible to see one of them as the better solution in general since they are originated by different historic backgrounds.

Kubernetes

The next layer of security-related mechanisms resides in the hands of the container orchestrator, which is probably Kubernetes. The adoption of Kubernetes within the market is tremendously where people are wondering these days how secure a Kubernetes installation really is. We will not cover Kubernetes security in detail here because this is worth a dedicated blog post. What we can say is that securing the cluster components of Kubernetes is only one essential part of running workloads in production.

Kubernetes provides nice mechanisms to also secure running workloads. Storing sensitive data in Secrets is just one of them. Another great example is the usage of Pod Security Policies (PSP), which enable fine-grained authorization of pod creation and updates.

Our Application

The application code we write is the uppermost level of encountering possible security vulnerabilities independently if we’re running it inside Kubernetes or not. Cloud native applications (which, of course, run inside Kubernetes clusters) need a deeper security audit because of their broader possible attack surface. The positive side-effect is that this part of the overall vulnerability stack provides us with the most control and we can build up a good security awareness around it.

It’s in our hands and in our responsibility to write secure applications that do not harm the privacy we all need.

To learn more about containerized infrastructure and cloud native technologies, consider coming to KubeCon + CloudNativeCon EU, in Amsterdam.

Feature image via Pixabay.