Introducing Vaultenv: Keeping your secrets secure with Vault and Haskell Posted on July 6, 2017 by Laurens Duijvesteijn

We’re pleased to announce our first bit of open source code. It is a CLI utility that fetches secrets from the HashiCorp Vault secret store. It makes secrets available using environment variables to a process of your choosing. Vaultenv generalizes fetching secrets from Vault so you don’t have to reinvent the wheel for each program in your infrastructure.

We wrote it in Haskell, and you can find it on GitHub at channable/vaultenv .

This companion post discusses:

Background on HashiCorp Vault

Gaps in the existing tooling around Vault

How this project helped fill them

Our experiences writing infrastructure glue code in Haskell

Usage instructions for Vaultenv

Background

Almost every program that needs to interface with external services or databases needs API keys or access credentials. These bits of information have collectively come to be called secrets.

You likely want to control access to secrets, manage their life cycle, and audit their use. We used to store these secrets in our git repositories. This is not ideal:

Access control is binary. You either had access to secrets for a repository or you didn’t.

Updating/revoking secrets is a pain and sometimes requires building new releases.

There is no way of knowing which person or machine accessed what secret at what time.

Secrets are not encrypted at rest without jumping through additional hoops.

These problems can be solved with a central service which manages access, life cycle, and audit trails of secrets. Such a piece of software is called a secret store.

We went with HashiCorp Vault. In Vault, secrets are encrypted at rest in a central location and available over an API. Vault is a program that you can install and run on your own infrastructure.

Some notes on terminology before we continue. A Vault secret consists of arbitrary key/value pairs, stored under a path in a storage backend. The path serves as an identifier for the secret.

We now use Vault to store API keys and access credentials as Vault secrets .

Vault in practice

So Vault can store secrets, but how do you make these available to your programs? The default answer is to fetch them over the HTTP API. The documentation is pretty good and it works as advertised. This approach has one big architectural problem, though:

Every program that requires secrets needs to know about the Vault API.

This has a couple of consequences:

You must make potentially invasive changes to the startup sequence of the application. This can be easy if you have a main or equivalent entrypoint, but difficult if you use a framework that initializes resources, spawns threads and runs before your code does.

or equivalent entrypoint, but difficult if you use a framework that initializes resources, spawns threads and runs before your code does. You have to implement fetching secrets multiple times . Nice-to-have features – retries and exponential backoff – will likely not survive three ports.

You’re out of luck if you want to run programs that you have not written yourself, which use the secrets. Do you store them on disk in a config file? That defeats the point of using Vault. Do you send a patch to the project? They’ll likely not want to accept it, because it is a fringe use case. Do you maintain a fork? Expensive. If the program isn’t open source, you can submit a feature request, but you’ll likely be out of luck.

Candidate solution: generalize secret fetching and make secrets available through environment variables (like all other configuration).

Existing solutions

HashiCorp has a project, envconsul, which can fetch KV pairs from Consul and make them available through environment variables. It also supports Vault.

You can give it a list of secrets as CLI flags, and it will fetch those from Vault. In our eyes, there were a few problems. It:

Daemonizes so it can detect changes in Consul keys (now optional)

Attempts to manage the life cycle of the processes it fetches secrets for

Catches signals we’d like to send to the managed process

Does not support arbitrary renaming of environment variables

Cannot read lists of required secrets from disk

We had to decide, fork and send patches - or write our own. We went with the latter. Simplicity, as well as control over the codebase and direction of the tool was a requirement here.

The tool should only do these three simple things:

Start, read a list of secrets

Fetch them over the HTTP API, using exponential backoff with jitter

Start whatever program you want to start using exec() .

Experiences

If you’re not interested in Haskell, skip straight to usage and download instructions.

We chose to write vaultenv in Haskell, because of a previous success story. It was the second project that we used Haskell for at Channable. We’d like to give another experience report.

Vaultenv was mostly written by someone on the “medium-to-advanced beginner” level. The experience was mostly positive. An advanced type system, a compiler that tells you when you have made mistakes, easy refactoring, what more could you want?

Well…

Documentation of the cookbook-variety. Most libraries come with excellent API documentation. It can be difficult to get up to speed with how the author intended the functions to be used together, though. It helps to have more experienced coworkers. A Prelude that aligns with best practices. The current Prelude contains non-total functions and uses String s for everything. It should use Maybe more often and ship faster types for working with text.

Apart from the above, there are lots of things to love about Haskell libraries. Some of my favorites:

Adding concurrency was an afterthought and a 2 line change. Want to do a bunch of HTTP requests concurrently ? Before:

<- mapM (requestSecret opts) secrets newEnvOrErrors(requestSecret opts) secrets

And after, fetching all secrets concurrently:

import Control.Concurrent.Async <- Async.mapConcurrently (requestSecret opts) secrets newEnvOrErrorsAsync.mapConcurrently (requestSecret opts) secrets

We measured a 3x speedup because of this two line change.

Another cool thing I learned about: Lenses. It turns out you can use these without fully understanding the type theory behind them. The code is pretty readable. Think getters and setters.

Want to get the "foo" key out of the "data" dictionary in the following blob of JSON?

{ "auth" : null , "data" : { "foo" : "bar" }, "lease_duration" : 2764800 , "lease_id" : "" , "renewable" : false }

Use a Lens!

{-# LANGUAGE OverloadedStrings #-} import Control.Lens (preview) (preview) import Data.Text import qualified Data.Aeson.Lens as Lens (key, _ String ) (key, _ import qualified Data.ByteString.Lazy as LBS parseResponse :: LBS.ByteString -> Maybe Text = parseResponse response let = Lens.key "data" . Lens.key "foo" . Lens._String getterLens.keyLens.keyLens._String in preview getter responseBody

The OverloadedStrings extension lets us use "data" instead of pack "data" – the same goes for "foo" . The extension automatically converts string literals to the right type based on the function you pass them to.

Implementing retries was a piece of cake thanks to the retry package. You can specify a RetryPolicyM , which details how often and with which delays to retry an action. Here, we use exponential backoff with jitter – a backoff pattern that causes a low amount of contention/calls :

import qualified Control.Retry as Retry -- We use a limited exponential backoff with the policy -- fullJitterBackoff that comes with the Retry package. vaultRetryPolicy :: ( MonadIO m) => Retry.RetryPolicyM m m) = vaultRetryPolicy let = 9 -- Try at most 10 times in total maxRetries = 40000 baseDelayMicroSeconds in Retry.fullJitterBackoff baseDelayMicroSeconds Retry.fullJitterBackoff baseDelayMicroSeconds <> Retry.limitRetries maxRetries Retry.limitRetries maxRetries

And then you pass this into retrying , which also expects a predicate to determine when retries should happen and the action to retry:

Retry.retrying vaultRetryPolicy shouldRetry retryAction

APIs that separate the generic from the specific are really prevalent in Haskell. As long as your action lives in MonadIO , you don’t have to change the logic of the action itself. Lovely.

How to use it

To show off Vaultenv, we need Vault itself. Download a binary from the Vault site and make sure it is in your $PATH .

Then run the following command to start a development Vault server:

$ vault server -dev

Copy the root token that has been printed to the console, we’ll need it later.

Write some secrets to the test server:

# Tell the vault client to connect over HTTP $ export VAULT_ADDR= 'http://127.0.0.1:8200' $ vault write secret/hello foo=world bar=supersecret write secret/hello foo=world bar=supersecret

Let’s try to load up the values of the foo and bar keys into a program of our choosing using vaultenv. For the purposes of this demonstration, we’ll use env – pretend it is a program you want to run to get something done.

First, we need to create a file that specifies the secrets we want vaultenv to fetch. Let’s create a file /etc/env.secrets with the following content:

hello#foo BAR=hello#bar

This tells vaultenv to fetch the contents of the foo and bar keys from the hello secret. It will make each of these available through an environment variable of it’s own.

The default behaviour is to infer the name of the environment variables. The contents of the foo key will be available under HELLO_FOO . For the bar key, we tell vaultenv to use the BAR environment variable. This allows interoperability with programs that you haven’t written yourself and that expect environment variables with certain names.

We’ll invoke vaultenv as follows:

$ /usr/bin/vaultenv \ --no-connect-tls \ --token YOUR_VAULT_TOKEN_HERE \ --secrets-file /etc/env.secrets \ -- /usr/bin/env HELLO_FOO= world world BAR= supersecret supersecret USER= laurens laurens LANGUAGE= en_US en_US HOME= /home/laurens /home/laurens [ ... ]

Notice that:

The specified secrets are available in the process environment under configurable names. Environment variables set by our shell are passed on. This means vaultenv can compose with service managers and other configuration management tools . We need to pass --no-connect-tls to vaultenv so it can connect to the development server. It connects via HTTPS by default. The -- disambiguates between options passed to vaultenv and those passed to the program. Add flags or arguments to the program like you would expect.

In production, Vault probably runs on dedicated instances, instead of together with your program. Use the --host and --port options if you want to use something different from localhost:8200 .

Conclusions and future work

Vault is a stable piece of our infrastructure at Channable. It has never stopped functioning on its own, although we had some trouble due to operator error .

There were some gaps in tooling, so we had to write some glue code ourselves. This went pretty well. We fetch around 5.5 million secrets a day from Vault using vaultenv. Our biggest application needs around 50 secrets; fetching these generally takes between 300 and 600 milliseconds on our infrastructure.

There are some opportunities for future work:

We don’t have a clear integration testing story. We currently use a bash script that sort of works, but it is not pretty. There is plenty of room for improvement here. We make two HTTP requests if we want two keys from the same secret. This rarely induces extra overhead for our own use cases – we don’t often store multiple secrets in a single path. For general use cases, this data-fetching may be optimized.

All in all, we had a nice time writing this piece of infrastructural glue. If you use HashiCorp Vault, or are thinking about using it, you might also enjoy vaultenv. Get it here.