While working on a new project we faced a task where end-user needs to upload a file with the sensitive data (eg. national ID of most people in a country, yay), which should be persisted within a minimal security perimeter.

I do not think that this is the only right approach, there is always a tradeoff. This solution was the best for our use case.

Picking the data store

Since we are running our projects on Google Container Engine, we decided to keep using benefits of cloud services and store files in a Google Cloud Storage. There is no real vendor lock addition because of it, there are plenty of similar solutions that can replace existing implementation by only rewriting the API connector: Amazon S3, OpenStack Swift and even some containerized open source solutions with compatible API’s.

When you deploy to a Kubernetes you have an additional benefit by using cloud storages — you don’t need to deal with persistent volumes and backups for the data.

And a security benefit is that cloud storage guarantees that there is no way to perform Unrestricted File Upload to Remote Code Execution attacks, which PayPal faced recently.

What is the common approach for handling uploads?

Most projects I’ve seen take a simple approach where API consumers upload files directly to the application, which uses libraries like arc to take care of future file processing and storing it in an upstream storage.

The common approach

The main advantage of it is that is simple to understand and provides ability synchronously post process the data — crop the image, validate its mime-type, contents, etc and respond immediately to a client.

But whats the disadvantages:

Sensitive data is temporarily stored in the application, which extends security perimeters. It can be intercepted, accidentally persisted along with a VM snapshot, you can simply forget to remove temporary copies. This problem is magnified in microservice architectures, where those requests hits multiple application services;

You can run out of disk space while processing large uploads;

Increased RAM usage, some API clients is not able to perform chunk uploads;

Credentials for cloud storage are long-living. More credential lives, less secure it is because the probability of leaking it grows over time. In monolith, any parts of the code can use them. Any microservice that works with these files needs a copy or a separate service account.

Can we do better?

Signed URL’s

Most of the cloud storage providers (1, 2, 3) allow signing a URL that can be used as temporary access token to send a file directly to the cloud storage. This URL can grant only a fraction of what application IAM role allows, limiting to only single file upload within limited time frame.

IAM role itself can allow only write access, thus even by holding a service account private key, it is not possible to read data from the storage.

Upload with a Signed URL

There are plenty of options how to run a post processing for uploads:

Spin up a worker GenServer with a delayed message and timeout greater than signed URL TTL;

Depending on your storage provider you might also be able to define a callback (eg. with AWS lambda or Google Cloud pub/sub) that triggers file processing pipeline in your back-end;

Or make your front-end responsible for notifying on successful file upload by posting related metadata to the back-end, as we did.

Async job processing

Do not go this direction when you have a monolith, without an isolated microservice your application would still share a secret with a full bucket permissions and it would be meaningless in terms of security.

Also, is negates most of the performance benefits we discussed above, but allows to queue jobs and spend this time asynchronously which requires us to have fewer resources up and running.

Secret management for Cloud Storage

We took one more step further and built a Cloud Storage secret management around this idea. Similarly to Hashicorp Vault, each time some part of our application or a front-end (that verifies document contents) needs an access to the file in storage, it requests a Signed URL with only the rights that it needs and a reasonable TTL.

Working via Storage Secret Manager

This requests can be authorized based on service role and logged for a security tail.

Implementation

Erlang/OTP has a :public_key application to handle public-key infrastructure, building and signing URL takes only a few lines of code:

Full source code is available in a GitHub repo, it is a dockerized Phoenix 1.3 microservice:

Thanks

I want to thank Pavel Vesnin who was responsible for most of the implementation code.