In a best-practice Kubernetes cluster every request to the Kubernetes APIServer is authenticated and authorized. Authorization is usually implemented by the RBAC authorization module. But there are alternatives and this blog post explains how to implement advanced authorization policies via Open Policy Agent (OPA) by leveraging the Webhook authorization module. There’s also another blog post about how to fine-tune this solution for production (Optimizing Open Policy Agent-based Kubernetes Authorization via Go Execution Tracer).

Motivation

We are a team providing managed Kubernetes clusters to our company-internal customers. To provide a near upstream Kubernetes experience, we want to grant our customers cluster-admin-like access. But to ensure baseline security and stability, we don’t want to grant full cluster-admin privileges. For example:

We want to allow full access to any namespace except `kube-system`, because our infrastructure (e.g. monitoring & logging) is deployed there.

We want to enforce a PodSecurityPolicy which doesn’t allow running containers as `root` user or the direct mount of `hostPath` volumes.

Our first implementation was implemented via Kubernetes RBAC and a custom operator. The basic idea was to grant all necessary rights via RBAC RoleBindings. So we gave our customers the ClusterRole `admin` for every namespace except `kube-system` (via the operator). Every time we found that something wasn’t working as expected we added additional rights, either via a per-namespace Role or via a ClusterRole. This lead to a lot of individual rules for specific use-cases and wasn’t really maintainable in the long term. Especially as our user base continues to grow it’s not really feasible to adjust the Roles whenever somebody detects an edge cases which doesn’t work with our configuration.

So instead of configuring authorization based on a whitelist we switched to a blacklist-based model. What we actually wanted was to give our customers cluster-admin access and only restrict some specific rights. Therefore an implementation based on a blacklist via Open Policy Agent was a natural fit.

Whitelist vs. Blacklist-based Authorization

Most requirements regarding Authorization can be implemented by simply using the RBAC authorization module via Roles and RoleBindings, which are explained in Using RBAC Authorization. But RBAC is by design limited to whitelisting, i.e. for every requests it’s checked if one of the Roles and RoleBindings apply and in that case the request is approved. Requests are only denied if there is no match, there is no way to deny requests explicitly. At first this doesn’t sound like a big limitation, but some specific use cases require more flexibility. For example:

A user should be able to create/update/delete pods in all namespaces except `kube-system`. The only way to implement this via RBAC is to assign the rights on a per-namespaces basis, e.g. by deploying a ClusterRole and a per-namespace RoleBinding. If the namespaces change over time you have to either deploy this RoleBindings manually or run an operator for this.

A Kubernetes cluster is provided with pre-installed StorageClasses. A user should be able to create/update/delete custom StorageClasses, but he shouldn’t be able to modify the pre-installed ones. If this would be implemented via RBAC, the user must have the right to create StorageClasses and as soon as he creates a StorageClass additional rights must be assigned to update and delete this StorageClass. As above, this could be implemented via an operator.

When you have lot of this use cases, you’ll get a lot of custom logic implemented via operators. Sooner or later this doesn’t scale, because with a lof of operators and accompanying RBAC Roles it gets really hard to understand what rights a user actually has. We will show that both cases can be implemented easier via Open Policy Agent.

Webhook Authorization Module vs. ValidatingWebhook & MutatingWebhook

Some advanced use cases can also be implemented via Dynamic Admission Control, i.e. ValidatingWebhook or MutatingWebhook. There are also blog posts which dive into how Open Policy Agent can be used for this: Policy Enabled Kubernetes with Open Policy Agent and Kubernetes Compliance with Open Policy Agent. Dynamic Admission Control has the limitation that the webhooks are only called for create, update and delete events on Kubernetes resources. So it’s for example impossible to deny get requests. But they also have advantages compared to the Webhook authorization module because they can deny requests based on the content of a Kubernetes resource. These are informations the Webhook authorization module has no access to. For reference, the Webhook authorization module decides based on SubjectAccessReviews, whereas the ValidatingWebhook and MutatingWebhook decide based on AdmissionReviews. In our implementation we’ve integrated OPA via authorization module and via MutatingWebhook.

Architecture

This section shows on a conceptual level how Kubernetes is integrated with Open Policy Agent. Because the Open Policy Agent itself doesn’t implement the REST interface required by Kubernetes, the Kubernetes Policy Controller translates Kubernetes SubjectAccessReviews and AdmissionReviews into Open Policy Agent queries.