Container Platform Security at Cruise

Best practices for enterprise-grade Kubernetes security.

Authors: Karl Isenberg & Mike Ruth

This is part two of our ongoing series on the Cruise PaaS:

Stay tuned for more on observability, and deployment!

Safety is one of our core values at Cruise. It’s why we challenge our cars to master the complexities of double-parked vehicles in San Francisco. It’s also why security is a top priority in everything we do.

However, security isn’t just a checkbox you mark off on project designs — it’s continual improvements made at multiple layers of the stack. Since security improvements often generate new requirements for existing projects, it’s good to minimize disruption by planning ahead. Because of this, security was one of the first areas we invested in when building out our internal Platform as a Service (PaaS), kickstarting our iteration towards production readiness.

In our previous post, Building a Container Platform at Cruise, we covered how the Cruise PaaS spans multiple Google Kubernetes Engine (GKE) clusters in multiple Google Cloud Provider (GCP) environments and projects, with a bunch of addons to increase the functionality of GKE and make it work on our private hybrid-cloud network.

In this post, we’ll cruise through some of the many security domains that intersect with container platforms and explore how we tackled their challenges:

Identity Authentication Authorization Secrets Encryption

Identity

To better understand how all of the different domains interact with one another, we first need to look at Identity. An identity is the representation of a person or program interacting with a system. They always take one of two types, users or services, and their type depends on their use case. Both types of identity include a compound unique identifier and a set of credentials made up of multiple factors.

Here are some example identifiers and credentials:

Having an identity is a prerequisite to securely interacting with a service or system of services, but it is not enough by itself. In order to prove you are who you say you are, your identity needs to be managed and authenticated. Usually, this is done by a separate service, to avoid having to implement the same functionality into every service and to enable auditing and transactions across multiple services.

Identity Management

For identity management, we leverage Okta as our Identity Provider (IdP). Okta enables a Single Sign-On (SSO) experience for users between systems with Multi-Factor Authentication (MFA). Okta isn’t required for GKE or Kubernetes — we could have used another IdP or manually managed users within GCP itself, but Okta provides integration points and management tools that make it easier to secure a wide variety of systems.

Something neither Okta nor the majority of IdPs provide is a universal service identity. Each platform and cloud provider tends to implement their own service account management (if any at all), and as such, we’re forced to either overload a user identity within Okta, or alternatively use the built-in primitives of GCP and Kubernetes. Cruise has primarily chosen to use the latter approach for service identity, but occasionally an Okta service identity is needed for the few services that interact with the Okta API directly.

For GCP service identity, we use GCP Service Accounts (GCP SA). GCP service accounts can be granted permissions in GCP through Google Cloud IAM. GKE automatically maps GCP service accounts to user accounts within Kubernetes, allowing PaaS to leverage GCP SAs as unique identities.

Within Kubernetes, we use Kubernetes service accounts for establishing the service identity of workloads running in pods. This allows applications to authenticate to the GKE API server and allows other privileged services to look up which service account belongs to which application (using the TokenReview API). This leads us into our next topic: how does authentication work on Cruise’s PaaS?

Authentication

Authentication is the means by which we confirm an identity is whom they claim to be. Together, identifiers and credentials can be used to distinguish a given identity from another and establish non-repudiation: high confidence authenticity, proof of origin, and proof of integrity.

There are multiple factors that can be used to authenticate identities:

Something you know (knowledge factor) Something you have (ownership factor) Something you are (inherence factor; most common with user identities) Somewhere you are (location factor)

Multi-factor Authentication

For Cruise PaaS, multi-factor authentication (MFA) is achieved through Duo, integrated with Okta’s IdP. Okta is configured to require a password (knowledge factor) and a secure token (ownership factor). The Duo Mobile app can generate a secure token via either push notification or a time-based one-time passcode (TOTP). Additionally, Duo can enforce security profiles on devices that require authentication to access, by passcode, certificate, or biometric scan (fingerprint or facial recognition). Duo can also be configured to track or enforce geolocation (location factor).

For user identities, credentials are often memorized or stored in a password manager, itself accessed by one or more authentication factors. For services and other programmatic workloads, secrets management is a harder problem to solve. Checkout the Secrets section to see how we securely manage credentials.

For now, let’s take a closer look at how users and services authenticate within Cruise’s PaaS.

Authentication Protocols

Google has invested heavily into OAuth2, so it may come as no surprise that GCP relies heavily on it for both user and service authentication alike. For users authenticating to GCP, this means authenticating with a password & second factor through an associated IdP. Behind the scenes, this does one of two things depending on if the user is authenticating manually via a browser, or programmatically via GCP’s CLI (gcloud), or API.

Browsers: The browser Single Sign On (SSO) workflow utilizes the SAML protocol. Provided the user has properly authenticated, the SAML assertion is stored for the remainder of the session (or lifetime of the assertion, whichever comes first). Backend services then transparently validate the user’s session on each interaction using the assertion, rather than requiring the user to sign in on every request. Programs: The newer OIDC protocol is used for programmatic interactions. The user or service identity logs in with its credentials and Google generates a signed access token for use in subsequent interactions. The OIDC access token is the basis for API and CLI authentication, analogous to the SAML assertion stored in the browser flow. For terminal access, most users use the gcloud CLI, which handles the OIDC authentication flow and caches the access token.

Identity Translation

Once authenticated with the gcloud CLI, GKE users can use it to fetch kubectl credentials, allowing them access to the Cruise PaaS using kubectl, the Kubernetes CLI, provided their identity has the required role bindings. This allows users to only have to manage their GCP credentials, and generate Kubernetes credentials on-demand.

Services can use the same identity translation process, from GCP SA to Kubernetes user. For example, some of our continuous integration and deployment (CI/CD) automation uses GCP SAs to generate kubectl credentials for deployment to Cruise PaaS. This reduces the number of credentials that need to be managed in CI/CD, since it often needs to make other GCP API calls to services like Google Cloud Storage (GCS) or Cloud SQL. GCP SA credentials can even be generated on-demand, with a TTL, using the Vault Google Cloud Secrets Engine, providing another layer of identity translation to reduce the amount of credentials stored in CI/CD. We’ll talk about Vault a bit more in the upcoming Secrets section.

Workload Identity

Recently, Google introduced GKE Workload Identity, which allows Kubernetes SAs to act as GCP SAs, so that pods can authenticate with GCP. This replaces the legacy pattern of using GCE instance metadata, which would allow every pod on the node to have access to the same GCP SA credentials.

This feature is great for simplicity, but even without GKE, you can use the Vault Kubernetes Auth Method. With the Vault Kubernetes Auth Backend configured, pods can log into Vault using their Kubernetes SA, and use Vault Secret Engines to generate credentials for other systems, like GCP.

In order for both of these methods to work, we depend on the native Kubernetes feature that allows configuring service accounts for pods. Kubernetes handles generating service account credentials and injecting them into pods based on the configuration of the Deployment, StatefulSet, or CronJob that spawned the pod as one of its replicas. The workload operator just needs to create the service account and configure the resource to use it.

Kubernetes injects the Kubernetes SA credentials (a JWT) into the pod using a bind mount. The pod can then use that JWT as a bearer token in subsequent interactions with the Kubernetes API. This way, all replicas of the pod can authenticate as the same service identity. As a result, it’s really Kubernetes that manages the workload identity across replicas. GKE and Vault just allow translating that into an identity from another IdP.

Now that we have explained how authentication works, let’s take a look at what happens after a user or service authenticates to PaaS.

Authorization

Authorization is the means by which we enforce what an authenticated identity may access. There are many types of access control, but within the context of container platforms, we typically use Role-Based Access Control (RBAC).

With RBAC, actors (individual identities or groups of identities) are granted permissions after role bindings are defined. Roles are sets of permissions. Role bindings are relationships between roles and actors. The role or the role binding also includes the resource that the permission applies to.