seatgeek open sourced seatgeek/docker-build-cacher Builds a service with docker and caches the intermediate stages

At SeatGeek we use Multi-stage Dockerfiles to build the container images that we deploy to production. We have found them to be a great and simple way of building projects with dependencies in different languages or tools. If you are not familiar with multi-stage Dockerfiles, we recommend you take a look at this blog post.

In our first days of using them in our build pipeline, we found a few shortcomings that were making our deploys take longer than they should have. We traced these shortcomings to a missing key feature: It is not possible to carry statically generated cache files from one build to another once certain source files in the project change.

For example when building our frontend pipeline we have to invoke yarn first to get all the npm packages. But this command can only be executed after adding the yarn.lock and package.json files to the Docker container. Because of the nature of how Docker caching works, this meant that each time those files are modified, the node_modules folder cached in previous built was also trashed. As you may already know, building that folder from scratch is not a cheap operation.

Here’s an example that illustrates the issue.

Imagine you create a generic Dockerfile for building node projects

1 2 3 4 5 6 7 8 9 10 11 12 13 14 FROM nodejs RUN apt-get install nodejs yarn WORKDIR /app # Whenever this image is used execute these triggers ONBUILD ADD package.json yarn.lock . # Dowanload npm packages ONBUILD RUN yarn # Build the assets pipeline ONBUILD RUN yarn run dist

We can now build and tag a Docker image with for building yarn based projects

1 docker build -t nodejs-build .

The tagged image can be used in a generic way like this:

1 2 3 4 5 6 7 8 9 10 11 # Automatically build yarn dependencies FROM nodejs-build as nodedeps # Build the final container image FROM scratch # Copy the generated app.js from yarn run dist COPY --from = nodedeps /app/app.js . # Rest of the Dockerfile ...

So far so good, we have build a pretty lean docker image that discards the node_modules folder and only keeps the final artifact. For example a set of js bundles from a React application.

It’s also very fast to build! This is because each individual step is cleverly cached by Docker during the build processes. That is, as long as none of the steps or files used in the step have changed.

And that’s exactly where the problem is: Whenever the package.json or yarn.lock files change, Docker will trash all the files in node_modules directory as well as the cached yarn packages and will start downloading from scratch, linking and building every single dependency.

That’s far from ideal, as it takes significant time to rebuild all dependencies. What if we could make a change to the process so that changes to those files do not bust the yarn cache? It turns out we can!

Enter docker-build-cacher`

We have built a slim utility that helps overcome the problem by providing a way to build the Dockerfile and cache all of the intermediate stages. On subsequent builds, it will make sure that the static cache files that were generated during previous builds will also be present.

The effect it has should be obvious: your builds will be consistently fast, at the cost of a bit of extra disk space.

Building and caching is done in separate steps. The first step is a replacement for the docker build command and the second step is the cache persisting phase.

1 2 3 4 5 6 export APP_NAME = fancyapp export GIT_BRANCH = master # Used to internally tag cache artifacts export DOCKER_TAG = fancyapp:latest docker-build-cacher build # This will build the docker file docker-build-cacher cache # This will cache each of the stage results separately

How It Works

The docker-build-cacher tool works by parsing the Dockerfile and extracting COPY or ADD instructions nested inside ONBUILD for each of the stages found in the file.

It will compare the source files present in such COPY or ADD instructions to check for changes. If it detects changes, it rewrites the Dockerfile on the fly, such that FROM directives in each of the stages use the locally cached images instead of the original base image.

The effect this FROM swap has is that disk state for the image is preserved between builds.

docker-build-cacher is available now on GitHub under the BSD 3-Clause License.

Make sure to grab the binary files from the releases page

If you think these kinds of things are interesting, consider working with us as a Software Engineer at SeatGeek. Or, if backend development isn’t your thing, we have other openings in engineering and beyond!