Role-based access control (RBAC) is one important tool in the larger security toolkit for Kubernetes and indeed for any system security. This much we can probably all agree on. Where we might part ways, though, is when we come face to face with the reality of implementation. Because Kubernetes roles are not assigned by default to service accounts (in other words, access to the API server is locked down by default), and because service accounts are fundamental to cluster management, it’s tempting to effectively work around RBAC altogether by granting cluster-wide access to all resources in an effort to just get things up and running. This is bad practice. But defining appropriate roles can seem like a daunting proposition.

Photo by Katie Chase on Unsplash

In this post, we show you how we went about defining roles for some of the tools that we have running on our cluster. If you’re not familiar with RBAC, we encourage you to read the core Kubernetes documentation, which does a good job of explaining the implementation details of RBAC and providing some sample configurations for the required child objects. But because the samples are theoretical constructs, they don’t necessarily provide a easy place to start when writing RBAC configurations for your own resources. In this post we’ll show some real-world examples, drawn from our team’s experience, that can help you tackle RBAC for yourselves.

First principles

Kubernetes creates and assigns a number of system roles automatically when you spin up your cluster; this post focuses on the roles that you need to plan and create for your application deployments. But before we get to that, let’s go over a brief summary of how RBAC is handled in Kubernetes. If you’re familiar with the basics, feel free to skip down to What we did.

RBAC in Kubernetes is not fundamentally different from roles and permissions for any other part of your application or deployment stack. You create roles based on the different kinds of access your users and applications need to different resources, and then assign only the required permissions for appropriate access to those roles. In other words, assign the minimum permission required for a user or a service to perform a task, and no more — the practice known as the principle of least privilege.

Restricting access to only specified users who must perform specified actions on a resource is critical to securing your cluster. Even if you install only trusted applications, those applications themselves can be vulnerable to attack. For example, by exploiting a remote code execution vulnerability, an attacker could access your cluster using your application’s credentials. And even without remote code execution, Kubernetes service accounts are vulnerable to local file disclosure, so that even if you specify read-only access, an attacker can steal the JWT token for the service account and gain access to the API server. Well-defined RBAC roles restrict the amount of damage these attacks can do, however, by allowing access only to a small, boring set of data.

Let’s move on to a brief summary of how Kubernetes lets you define and apply these roles.

The Kubernetes objects

Kubernetes defines four RBAC-related objects that can be combined in different ways to provide different layers of access control, either to your entire cluster, or to specified namespaces in the cluster.

Role, ClusterRole

As their names suggest, these objects define the kinds of access you assign to roles. Roles are assigned to namespaces only; ClusterRoles can be used for assignment to all namespaces in a cluster, to resources such as nodes that are scoped to the cluster (and therefore are not namespaced), or to non-resource endpoints such as healthz .

RoleBinding, ClusterRoleBinding

You can assign either a Role or a ClusterRole with a RoleBinding. These bindings specify the user and the role that’s granted to the user. (Note that a “user” in this sense can be plural, and it can be a service account.) A RoleBinding is namespace-specific, while a ClusterRoleBinding — well, it does what it says it does.

Where things can get confusing

The potential catch, however, is that a default installation of Kubernetes does not assign any user-facing roles to any Kubernetes object, nor does it assign any roles to the service accounts in either namespace that’s installed by default (namely, kube-system and default ). If you enable RBAC (installed by default with kubeadm as of Kubernetes v1.6), this means that out of the box if you don’t specify roles and rules and bindings, there’s effectively no admin access to manage your cluster.

This matters especially if you think you need to specify the cluster-admin role. This role does just what it sounds like — allows full admin access to all resources in the cluster, namespaced or non-namespaced. It’s tempting to think “well, I’ll just assign this role for now while I get things up and running,“ to get resources talking to each other as expected. But if you start this way, you run the risk of never getting access control properly defined, and leaving too many resources more accessible than they should be. You might not need excruciatingly fine-grained control over every resource in your entire cluster. But you certainly do need to put some thought and effort into restricting access to resources appropriately.

What we did

Unsurprisingly, we work a lot on careful management of our internal cluster. Because we’re all about Kubernetes best practices, we’ve had to pay particular attention to RBAC rules. Let’s look at two of the apps that we’re using: Fluentd, as part of our monitoring stack, and Jenkins, for CI/CD.

In both cases, notice that we do not specify any of the four pre-defined user roles that Kubernetes ships with. (For more information about these roles, see the core documentation.) Instead, we define only the specific verbs that we want roles to be able to perform on the specified resources. This approach lets us implement the principle of least privilege precisely.

Fluentd

We pull the Fluentd container from the Quay registry, which is agnostic when it comes to deployment platforms, so we can define RBAC objects out of the box as part of our Fluentd DaemonSet. Defining these objects is pretty straightforward: we want cluster-wide logging for our entire system, so we define a ClusterRole and a ClusterRoleBinding, and we assign them to a custom ServiceAccount in the kube-system namespace. Fluentd needs access only to pods (and their logs), and it needs only read access. We’ve chosen to specify the HTTP verbs instead of using the default view ClusterRole to make sure that Fluentd can perform exactly the operations that it needs.

apiVersion: v1

items:

- apiVersion: v1

kind: ServiceAccount

metadata:

name: fluentd

namespace: kube-system

- apiVersion: rbac.authorization.k8s.io/v1beta1

kind: ClusterRole

metadata:

name: fluentd

rules:

- apiGroups:

- ''

resources:

- pods

verbs:

- get

- list

- watch

- apiVersion: rbac.authorization.k8s.io/v1beta1

kind: ClusterRoleBinding

metadata:

name: fluentd

roleRef:

apiGroup: rbac.authorization.k8s.io

kind: ClusterRole

name: fluentd

subjects:

- kind: ServiceAccount

name: fluentd

namespace: kube-system

kind: List

Jenkins

Jenkins, on the other hand, we install using Helm, a package manager for Kubernetes. To date, there’s not a standard for RBAC objects in Helm charts, so each chart manages RBAC in its own way. Your own RBAC rules should conform to your policies, not to the default values for rules in a chart.

The Jenkins chart is an excellent example of why you should not trust these default values. This chart installs Jenkins with a ClusterRole and ClusterRoleBinding that assign the `cluster-admin` role. This is much too permissive. So we wrote our own Role and RoleBinding for our Jenkins instance. We don’t need to give Jenkins access to any cluster-wide resources, but we do create a Jenkins-specific namespace to isolate its resources. Note that because we’ve given Jenkins its own namespace, we can use the default ServiceAccount for the namespace; we don’t need to create a custom ServiceAccount in addition to the custom namespace. An alternative, depending on the rest of your deployment and your team structure, is to keep applications in the same namespace, but create a custom ServiceAccount for each application.

apiVersion: v1

items:

- apiVersion: rbac.authorization.k8s.io/v1beta1

kind: Role

metadata:

name: jenkins

namespace: jenkins

rules:

- apiGroups:

- ''

resources:

- pods

verbs:

- get

- list

- create

- delete

- apiGroups:

- ''

resources:

- pods/log

verbs:

- get

- apiVersion: rbac.authorization.k8s.io/v1beta1

kind: RoleBinding

metadata:

name: jenkins

namespace: jenkins

roleRef:

apiGroup: rbac.authorization.k8s.io

kind: Role

name: jenkins

subjects:

- kind: ServiceAccount

name: default

namespace: jenkins

kind: List

How we did it

How did we arrive at these configurations? TL;DR: painstakingly. There aren’t really shortcuts to doing RBAC right, although some apps require more attention than others. But our basic approach is the same for everything: we start by creating the ServiceAccount. Then we create the Role (or ClusterRole) and RoleBinding (or ClusterRoleBinding). And only then do we start adding permissions. One at a time. In other words, we begin with the most restrictive scenario (basically, no access), and only gradually add levels of access. It’s an iterative process, and it’s the only way to be sure you’re implementing the principle of least privilege.

The API server audit logs are your friend here. (See the core docs on Auditing.) Every time a user can’t hit a resource, check the logs to make sure the issue is related to permissions. When you start out, you should be getting HTTP 403 Forbidden response codes frequently. (Note that HTTP 401, Not Authorized, indicates an issue with user authentication, not RBAC.) As you open up role definitions, you should see fewer 403 response codes. Make sure to continue to test beyond the limits of your planned restrictions, too — at some point, you want those response codes to continue to be returned. And you’ll need to continue to check the permissions themselves whenever one of your users has trouble with authorization. Don’t assume immediately that you need to open up your role definitions; instead, check first whether you have in fact assigned the appropriate role to the user. If you assign roles to groups, make sure that the user is properly part of the group. Check authorization issues for other users; only if authorization persists as a reproducible issue where you know you’ve assigned the appropriate roles should you start tweaking role definitions again after you’ve established a good baseline set.

Caveats and next steps

There’s a lot more to the security story than well-managed roles for RBAC. In the realm of access control, you should also pay careful attention to admission control plug-ins, which are not enabled by default. In the case of our Jenkins configuration, for example, we recommend adding the PodNodeSelector plug-in. And you should also look at PodSecurityPolicy and this example that shows you how to work with a PodSecurityPolicy and RBAC rules.

Thanks to Matt Moyer, Timothy St. Clair, and Chuck Ha