I’m pleased to announce the initial open source release of Confidant, a secret management service for AWS users, built by engineers at Lyft.

Creating a more efficient secret management system

As Lyft has grown, we’ve added numerous services to our infrastructure. These services have credentials to internal services and external services, SSL keys, and other types of secrets. Each of these services has multiple environments, and to insulate these environments from each other, they have a version of each of these secrets for each environment. In many cases some of these secrets may be shared across a few services. Given a large number of services, this leads to a very large number of credentials.

The rotation of these secrets can be a laborious process, especially credentials for external services, since a large number of external services don’t have rotation methods that can be done without some amount of downtime and coordination. Coordination of the rotation of secrets became a difficult and time-consuming process for us pretty early on, and we knew the problem would only get worse as we added more internal services and more external dependencies.

To reduce time spent rotating secrets, we decided to build a secret management system with a basic design: a database of secrets and mappings of secrets to services.

To reduce time spent rotating secrets, we decided to build a secret management system with a basic design: a database of secrets and mappings of secrets to services. We started in late January 2015, and one month later we had an MVP that stored the first secret and mapping to a service. Since then we’ve made Confidant production ready and it’s used across our infrastructure as the means for secret management.

Designing a secure, scalable, and user-friendly system

We had some specific design goals in mind as we created Confidant:

Secrets should be mappable to multiple services and environments. Secrets and services should have history, since they’re essentially configuration data. Secret rotations are critical, but can be eventually consistent. Access to secrets is critical and must be highly available. Secrets and their backups must be stored in a secure manner. Access to secrets must be controllable by service and environment. All forms of access to secrets must be auditable. Authentication from services to the secret management system must be secure and we must handle the problem of securely bootstrapping authentication credentials onto EC2 instances in autoscale groups. We don’t care about being cloud agnostic and assume at least for now that a solution that only works for AWS is acceptable.

The AWS-only design goal made the majority of our other design goals much easier, particularly the authentication goal, which is one of the more difficult problems in secret management.

The AWS-only design goal made the majority of our other design goals much easier, particularly the authentication goal, which is one of the more difficult problems in secret management.

Since Confidant is intended to be run in AWS, we designed it to act and feel like an extension of AWS. We extended the concept of IAM roles to represent services in Confidant so that they can be used to provide access to secrets. We chose IAM roles to represent our services because we were already providing a unique IAM role for every service and every environment for each service, so we could easily use roles to map secrets to a specific environment of a particular service.

Securely storing secrets in the cloud is a challenge, due to the turtles-all-the-way-down nature of securing secrets. At some point, there must be a plaintext encryption key that’s used to encrypt the secrets, and the safe-keeping of that encryption key is essential. The same problem exists for authentication credentials from services to the secret management service. A secret that can be used for authentication must originate somewhere, and the plaintext version of that secret needs to be available for EC2 instances at boot time, if autoscaling is to be used. Confidant uses AWS’ KMS service to provide a solution for both of these problems.

KMS provides access to master encryption keys, which can be used for encryption and decryption actions, but doesn’t provide direct access to the master key itself, so it can’t be stolen. Confidant uses two master keys: one for encryption, which only it can access, and another for authentication, which can be used by both Confidant and all services that authenticate to Confidant. Access to the authentication key is provided by KMS key grants, which are controlled by Confidant. The way KMS is used for authentication is a bit involved, so I won’t describe it in detail in this post, but the initial concept was implemented as a Lyft hackathon project that was mentioned in a blog post in June.

For at-rest encryption, KMS is also being used. For every revision of every secret, Confidant generates a unique data key from KMS. The secrets are encrypted using cryptography.io’s implementation of Fernet, with the encrypted version of the data key stored along with the secret.

Through a combination of Flask API access logs, action logs and CloudTrail logs, all actions in Confidant can be tracked to a specific user or service. Additionally, the service was designed to keep full history of secrets and mappings, so that it’s easy to revert to an older revision of a secret or mapping, in case of emergency. Graphite events can be sent for changes to secrets or mappings to make it easy to correlate Confidant events to time-series data in your services.

Based on our design goals, our internal client implementation fetches secrets and caches them in-memory, encrypted at rest using KMS. If later calls to Confidant fail for any reason, we log the failure and use the cached credentials until a successful call is made. If fetches fail for a long enough period of time, we send alerts. Confidant provides metadata with the secrets so that it’s possible to know if anything has changed since last call. When we detect changes, we notify processes using the credentials, which causes them to update their secrets in-memory.

Flexible implementation for any project

On the server side, Confidant is primarily implemented in Python, using Flask for the API. AngularJS is used for the frontend. DynamoDB is used for primary storage, and Redis is used for end-user session storage. Google OAuth2 is used for end-user authentication and KMS is used for service authentication and encryption-at-rest. The Flask application is written in a stateless manner and can easily be run on an autoscale group behind a round-robin ELB.

On the client-side (services fetching secrets), we’re using SaltStack and have written clients as SaltStack execution modules and external pillars. SaltStack isn’t a requirement, though. Writing a Confidant client is simple, and a basic implementation using just the AWS SDK and a single REST API call is all that’s necessary.

We assume that client-side use-cases for Confidant will vary a lot, so we don’t provide an opinionated implementation. We do, however, provide a basic python implementation that will generate an authentication token, make a rest call, and return the secrets data. This is meant as a starting point and can be used directly from the CLI or as a library, for more sophisticated implementations.

Contributing to Confidant

To contribute to the project, good places to start are the documentation and Github repository. We’re excited about the release and gladly welcome contribution. Our Github repository has a set of issues being tracked already which are marked easy/medium/hard, and backend/frontend/design, to make it easier to pick up any issues that may be relevant to your skillset. The documentation site has installation and configuration information that should make it quick and easy to get a version of Confidant working in your infrastructure quickly.

FAQ

How is Confidant different from Vault or Keywhiz?

Ostensibly Confidant, Vault, and Keywhiz provide the same function. The main difference between Confidant and the others is that Confidant is purposely not cloud agnostic, choosing to use AWS’s features to deliver a more integrated experience. By leveraging AWS’s KMS service, Confidant is able to ensure the master encryption key can’t be stolen, that authentication credentials for Confidant don’t need to be distributed to clients, and that authentication credentials for Confidant clients don’t need to generated and trusted through quasi-trustable metadata.

Why build another secret management system, rather than contributing to Vault or Keywhiz?

We wrote Confidant months before either Vault or Keywhiz were released, otherwise we likely would have started with contributing to an upstream project. By the time either were released we had brought Confidant to a point where it wasn’t worth the effort for us to switch to another project, and the general model and simplicity of Confidant made it easier for us to continue maintaining a service than for us to switch.

Are there plans to build Confidant clients for other languages?

Yes. We have an open Github issue for this. The initial targets will be languages frequently used for systems development or scripting (like Go, bash, Ruby, etc). Of course, it’s not necessary to have your application read directly from Confidant. Instead you can have an out-of-band cron or daemon that calls Confidant and caches the returned value in a ramdisk (potentially encrypted at rest using KMS), to be used by your applications or configuration management systems.

Is there a mailing list, or IRC channel?

Yes, we have a users mailing list, a low-volume announcements mailing list for updates and security releases and a #confidant IRC channel on freenode.

What open source license is Confidant released under?

Confidant is Apache2 licensed.