TL;DR

Use this guide when deploying Vault with Terraform in Google Cloud for a production-hardened architecture following security best practices that enable DevOps and the business to succeed!

Overview

HashiCorp’s Terraform is a tool for provisioning and managing resources through structured configuration files, an approach commonly called infrastructure as code (IaC). Security is always important and one of the most common security exposures involves storing credentials or other secrets in configuration files. HashiCorp’s Vault helps by providing secrets management which eliminates the requirement to store secrets such as credentials in configuration files.

In this post, I’ll describe a reference architecture for deploying and configuring Vault in GCP using Terraform tools that follows cloud security best practices and adheres to the Principle of Least Privilege. If you stick around long enough, I’ll also list out some security best practices for each of the components of this system.

Reference Architecture

Reference Architecture for Vault and Terraform on GCP

Project boundaries

Within GCP, you can isolate groups of resources into projects that have a hard boundary, which allows you to adhere to the Principle of Least Privilege. Each of the resources in these projects have their own set of permissions and can only talk to one another if explicitly allowed. In this case there’s a Shared VPC between multiple projects which allow the application projects to communicate over the network with the Secrets project.

Project functionality

This architecture lays out 4 major components which I’ll describe and then provide some best practices.

IaC Project

This is the GCP project where the CI/CD pipeline for Terraform should be deployed. This project is granted a large number of Cloud IAM privileges since it is responsible for creating and maintaining the rest of your infrastructure. If someone creates a trigger with undesirable behavior, the impact can be huge. As a result, monitoring of this service account as well code review become crucial.

As a caveat, this is only really necessary if you are using the Open Source version since Terraform Enterprise can handle automated deployments for you. In any case, this project should contain the build system (e.g. Jenkins, Spinnaker, etc.) configuration necessary to run terraform plan|apply in an automated fashion and send the right logs to the right folks when it fails. You should also store your Terraform state file in GCS within this project, protecting it with VPC Service Controls.

Secrets Project

This GCP project includes the necessary infrastructure for a Vault cluster including the cluster itself (which could run on GCE or GKE), the storage backend, an internal load balancer, and a bastion host (running on a Compute Engine VM) used to maintain Vault using it’s API.

Typically a bastion is placed on the public internet as a hardened VM whose only responsibility is to accept SSH connections. With Cloud IAP SSH Tunnelling, you not only gain this functionality but also prevent DDoS attacks. You may be asking then, why do I need a bastion host at all? Well in many cases you don’t, but in the case of maintaining internal services over HTTP where you don’t need SSH, a bastion host becomes useful. This means I don’t have to make the Vault server itself listen on port 22, but only on 443 as it should. The same concept can be used to maintain private GKE clusters as well. You can also turn off the bastion when you aren’t using it to save some money.

You’ll notice Vault is also behind an internal load balancer, which though not depicted in the diagram explicitly, should be a TCP/UDP Load Balancer. The reason for this is that Vault allows you to terminate TLS within the process itself ensuring total end-to-end encryption. If you use an HTTPS load balancer, you would have to re-encrypt traffic to get the same effect. You might as well use TCP listener with TLS that Vault provides.

Version Control System

This is not a GCP project, but the system that stores your Terraform code. I’ll talk about some best practices around securely configuring Terraform for a production environment a bit later. In general, you should pick a version control system (VCS) that has a high level of control over access to the master branch. As an example, in many VCS’s you can enforce that multiple users are required for code review before merging into master or even that you cannot use the rebase command to rewrite history on a particular branch. This level of control is important, especially when moving toward an automated system where a merge to a branch triggers another automated job.

Application Project(s)

These GCP projects contain the GCP resources that are the consumers of this shared infrastructure. They are the projects that are maintained by the Terraform config files and need to access secrets from Vault to function. For example, let’s say we have a Java app running in a GKE container that needs to talk to a MySQL database. You might have a file that specifies the environment config like the host of that MySQL database in source code, but would not want to have the credentials in source. Instead of baking these values into the container, you can use Vault to pull them into the container at run time. The same process can be used for GCE images as well.

Flow of the Architecture

The flow of the architecture above indicates the primary flow of data or interactions from one entity to the other. Starting from the top-left:

The Vault Admin goes through two flows: (a) Pushing configuration changes to the Terraform repo for Vault. (b) Updating secrets in Vault via the bastion VM (through Cloud IAP) since secrets should not live in Terraform, they must be added manually. The repo update triggers the build system which pulls the Terraform code from that same repository and interacts with the Terraform state file on GCS. The net result is that the GCP resources are deployed or updated to match the state described in the Terraform config files. The Vault cluster stores or updates the secrets in the “storage backend” which is GCS in this case.

Finally, the application projects, depicted here as GKE clusters, pull secrets from Vault at startup as well as periodically using the Vault Agent.

So when should I use this?

This architecture should be applied when Terraform is used as the primary means to deploy Google Cloud infrastructure; part of which Vault is used for secrets management. Vault is not always an ideal solution for secrets management. If only static secrets are needed in certain contexts, you should consider Cloud KMS to encrypt secrets and store them in source code or GCS buckets. It’s perfectly fine to store secrets in source code if they are encrypted. Vault is an ideal solution for disparate teams storing secrets at a large scale or when you need some of Vault’s dynamic secret generation capability.