Is it really self-recovery and automation-friendly? What is the catch?

Secrets management and data protection are critical and important. However, most solutions in the market are not designed for DevSecOps, meaning they are not developing security as code. Instead, they are driven by manual configurations and snowflake changes. Besides, in some organizations, they have to hire dedicated vendor-specific consultants to maintain this Snowflake Server.

DevSecOps Manifesto

Leaning in over Always Saying “No”

Data & Security Science over Fear, Uncertainty and Doubt

Open Contribution & Collaboration over Security-Only Requirements

Consumable Security Services with APIs over Mandated Security Controls & Paperwork

Business Driven Security Scores over Rubber Stamp Security

Red & Blue Team Exploit Testing over Relying on Scans & Theoretical Vulnerabilities

24x7 Proactive Security Monitoring over Reacting after being Informed of an Incident

Shared Threat Intelligence over Keeping Info to Ourselves

Compliance Operations over Clipboards & Checklists

Hashicorp Vault OSS provides a full-featured and code-friendly solution for secrets management, encryption as a service, and privileged access management, dynamic secrets, leasing and renewal, and so on.

I have tried several impressive functionalities including AWS Auth backend, Kubernetes Auth backend, dynamic MySQL secrets, dynamic AWS access credentials, and etc. They are super easy to set up and automation friendly.

Before Hashicorp Open Sourcing Auto-unseal

As you may know, Auto-unseal was previously available only to Vault Enterprise customers. In 2018 December, Hashicorp announced Vault 1.0 and the availability of auto-unseal in Vault OSS.

Auto unseal was developed to aid in reducing the operational complexity of unsealing Vault while keeping the master key secure. This feature delegates the responsibility of securing the master key from operators to a trusted device or service.

Before we dive into how awesome auto-unseal is, let’s take a look at what we had to do manually with older versions of vault.

As we know, there are 2 common ways to make vault pod sealed:

Vault pod restarted due to failure, deployment or upgrade

Intentional seal operation: vault operator seal

Under these situations, Vault pods will fail the Kubernetes readiness probe and stop serving traffic. Every time I have a dashboard like this, my heart died a little, here is why.

To get these pods back into the business, as shown in the commands below, we have to manually kubectl port-foward to each vault pod and run vault operator unseal at least 3 times with unique unseal keys. That is 9 manual operations with 3 operators involved.

Seriously, who wants to get up at 3 am, dial in a web conference, locate the unseal keys, and run unseal commands? 😅

Not only is the operational cost so high, but the business downtime due to vault outage is also not acceptable.

# Check Vault Status, seems it is already sealed

(⎈ |:)bash-3.2$ vault status

Key Value

--- -----

Seal Type shamir

Sealed true

Total Shares 5

Threshold 3

Unseal Progress 0/3

Unseal Nonce n/a

Version 0.10.1

HA Enabled true # First Unseal, they key is hidden of course

(⎈ |:)bash-3.2$ vault operator unseal

Unseal Key (will be hidden):

Key Value

--- -----

Seal Type shamir

Sealed true

Total Shares 5

Threshold 3

Unseal Progress 1/3

Unseal Nonce 12345678-2c03-320a-6dea-12345678

Version 0.10.1

HA Enabled true # Second Unseal

(⎈ |:)bash-3.2$ vault operator unseal

Unseal Key (will be hidden):

Key Value

--- -----

Seal Type shamir

Sealed true

Total Shares 5

Threshold 3

Unseal Progress 2/3

Unseal Nonce 12345678-2c03-320a-6dea-12345678

Version 0.10.1

HA Enabled true # Third Unseal is a charm!

(⎈ |:)bash-3.2$ vault operator unseal

Unseal Key (will be hidden):

Key Value

--- -----

Seal Type shamir

Sealed false

Total Shares 5

Threshold 3

Version 0.10.1

Cluster Name vault-cluster-e17ad79e

Cluster ID ab0dd9a0-dfaa-25ef-0d30-12345678

HA Enabled true

HA Cluster

HA Mode standby

Active Node Address (⎈ |:)bash-3.2$ vault operator unsealUnseal Key (will be hidden):Key Value--- -----Seal Type shamirTotal Shares 5Threshold 3Version 0.10.1Cluster Name vault-cluster-e17ad79eCluster ID ab0dd9a0-dfaa-25ef-0d30-12345678HA Enabled trueHA Cluster https://10.0.3.25:8201 Active Node Address http://10.0.3.25:8200

Auto-unseal in a K8S chart

Our Vault deployment pipeline is simplistic with only 2 helm chart deployment tasks.

First, deploy the consul helm chart as the vault storage backend with the following value.yaml. This chart is built by the OSS community, if you prefer the official Hashicorp version, you can get it from here.

Notice the consul replica size is 5. We pre-provisioned an AutoScaling group across at least 3 availability zones(AZ). The AZ number varies based on the region and different cloud providers(minimum requirement is 3).

VM instances in the auto-scaling group are labeled by Kubernetes with kubectl label nodes <node-name> class=consul . The node affinity and pod anti-affinity below will make sure 5 consul pods are distributed onto 5 nodes across at least 3 AZs.

Second, deploy Vault helm chart with vaule.yaml as below. There are some key features enabled as below:

Consul agent is used as a sidecar to talk to Consul service

Auto-unseal is set up using AWS KMS service

Statsd-exporter is running as a sidecar to expose metrics to Prometheus

Vault is exposed as a LoadBalancer service with SSL through AWS Certificate Manager(ACM)

Auto-unseal: the bomb!

In a few seconds, we have both Consul and Vault service up and running.

First, let's initialize vault with the configurations for key shares and key thresholds.