So, we have a solid understanding of Docker and where it fits into continuous delivery. So how do we implement it? And furthermore, how do we do so efficiently? We will explore some common patterns, optimizations, and best practices that will allow your organization to leverage Docker with efficiency and consistency.

Core Principles

Before detailing any specifics, it will be helpful to briefly discuss some of the key principles in your Docker usage. These principles will come up frequently throughout the rest of the post and they should always be in the back of your mind when creating and running Docker containers.

Keep your Docker images as small as possible. The smaller the image, the fewer bytes sent over the wire and less overhead when running a container from the image. Smaller images will also mean that your Docker hosts will fill up much slower.

Standardize on everything. As you standardize on things like build tools and commands for your applications, you will see many opportunities to reduce code duplication in your Dockerfiles and pipelines. This will also make it easier for developers to use now and also to make the sweeping changes later.

Decide on your tagging scheme early and enforce that decision. The tagging scheme will determine much of your pipeline logic. This is also important to provide consistency in your base images (more on this later in the post).

With that out of the way, let’s dig into some common patterns and techniques to better utilize Docker.

Make Use of Your Docker Image Layers

Layers can be your best friend or worst enemy when creating Docker images. When used effectively, they can speed up build times through layer caching, but if not they can greatly increase build times and image sizes.

Each layer is associated with a single command on the underlying Dockerfile. These layers are stacked one atop the other with the bottom most being the first command in the Dockerfile and the top being the last. None of this information is news, it’s covered in my previous post and in the Docker documentation, but it is important to understand how it can cache and reuse these layers.

Docker will cache layers based on what has changed in the files added to the image and the commands which were run to build the image. The stacking order of image layers is very important because if a layer is unchanged, Docker will pull it from cache or the registry directly rather than rebuilding. Any change in a layer could have an effect on the image up until that point, and thus will require rebuilding all subsequent layers. What that means to us is that the image layers should be ordered by their likelihood to change.

In general, commands like CMD and ENV which are not likely to change frequently should be up at the top of the file underneath FROM , whereas commands like COPY or ADD which move your code into the image should be at the bottom since the code will change on most builds. As for installing other dependencies or running commands to configure the environment these should be right in the middle.

Now we can extrapolate upon these ideas with some common patterns useful in your images.

Reducing Layer Sizes

Aside from stacking order, how you build a single layer can be very important as well. Let’s look at the example in my previous post:

RUN apt-get update -y && apt-get install -y curl

You can see that rather than creating two separate run commands, one to do the update and the other to do the install, the two are concatenated. You will see this often in Dockerfiles and this is because the two commands are logically associated and should make up a single layer of the image. Since layers in a Docker image are zipped, stored, and sent separately to assemble the overall image less layers will mean less overhead with things like network latency and running containers.

One useful bit of information on layers is that Docker will only commit what is there in the filesystem at the time that your layer’s task is complete. That being the case, your Dockerfile commands should also clean up after themselves in the same RUN command to free space on the filesystem before committing the layer that way any extraneous files are not committed. Let’s refactor the previous command to use this method:

RUN apt-get update -y && \

apt-get install -y curl && \

rm -rf /var/lib/apt/lists/*

Notice the last command which we added rm -rf /var/lib/apt/lists/* . What you may not know about aptitude and many other package managers, is that when you run an update command it will pull down a manifest of all available packages and where to find them. This manifest is surprisingly large, roughly 40 MB in this example, and we do not need this data after installing curl. By cleaning that out as part of our installation command we have saved that space from the layer and the overall image. This technique could apply to anything that is no longer needed to run the image. Things such as tar files, binaries, or scripts that are used only as part of the build can then be removed.

Clearing the Cache

A common pattern you will see to leverage layering in the Dockerfile similar to this:

ENV ENVIRONMENT_REFRESH 2017-06-22

This is setting an environment variable called ENVIRONMENT_REFRESH and it will be positioned directly underneath the FROM command. This will not be used anywhere in the code but is serves as a way to break the cached layers as it will force Docker to rebuild all subsequent layers. This is necessary if you want to install updated versions of dependencies or refresh configurations.

Caching for Reduced Build Time

We have seen how we can use our knowledge of image layers to reduce our overall image size, but they can greatly reduce your build times when used intelligently. As I have said, the layers will be cached and used in subsequent image builds when there has been no change to the layer. This can be used with great effect to cache any long running commands used to build the image and is especially useful with external dependencies which can take a great deal of time to download and install. The problem here is that the dependencies can change with the code. Let’s take a look at a naive approach with a front-end project built by Node.js:

FROM node:6.7.0-wheezy ENV REFRESHED_AT 2017-03-21 CMD ["npm", "run", "build"] RUN mkdir /app && \

mkdir /app/dist WORKDIR /app COPY . /app RUN npm set progress=false && \

npm install --silent

Here we need to copy in our current version of the code before we run the npm install command (including the package.json file which defines the projects dependencies). This means that we must install dependencies every time we run a build in our container, but this is avoidable. If we were to cache a layer including our dependencies, we would need an appropriate file to cache with. Since a change in a dependency would require a change to the package.json file, we can cache the layer based on that file. Let’s see how this would look:

FROM node:6.7.0-wheezy ENV REFRESHED_AT 2017-03-21 CMD ["npm", "run", "build"] RUN mkdir /app && \

mkdir /app/dist WORKDIR /app COPY package.json /app RUN npm set progress=false && \

npm install --silent COPY . /app

With this change we first copy in the package.json file separate from the rest of our code, run the install, then copy in the remainder of the code. Since we have the package.json copied before the install, Docker will only rebuild these two layers if the package.json file is changed.

Using this technique is great to reduce build times and overhead for your images, but be wary if you use any mutable versions or version sugar (e.g. Maven SNAPSHOTS or “^1.0.0” in a package.json) which allow different versions of a dependency to be installed depending on what is available at the time of installation. In this case, a break can appear in the artifact produced at build which may not be apparent on a developer’s machine.

Create Base Images

Even if the public Docker image you are using for your app is as small as it can be and covers all the dependencies you need, you still may want to consider making your own base images. You can contain common steps or logic in your images as well as common authentication making your developers lives easier. This will likely require that you have a private Docker registry, but these can be pretty simple to setup. As always, keep the images as small as possible.

Common Logic in Base Images

Putting common logic into base images will reduce the amount of duplicate code and also will make sweeping changes simpler. A great example of some common logic would be the inclusion of some tools used globally for certain types of applications in your organization. Another common example is including information on a private artifact repository which your app relies on. Here is an example of a Dockerfile for a base image of a Node.js application:

FROM node:6.11.1-alpine RUN npm config set registry http://myprivatenpmrepo.com/

We are using the Node.js official alpine image but setting the npm registry to a private registry. Any images running from that base will automatically be configured to pull Node modules from the private repository rather than the public. If the URL to the repository ever needs to change we can then update the image and users will receive the update.

Use ONBUILD Commands in Images

Docker provides a directive ONBUILD for your Dockerfiles. What this does is it specifies commands which are to be run when a new image is being built from the image with the ONBUILD commands. One way to think of it is as a form of inheritance for images and it can be very useful in reducing code duplication up your Dockerfiles. In fact, oftentimes Dockerfiles in your individual projects can be just a single FROM command using your onbuild image. The one caveat to this is that you must have a standard set of build and run commands for all projects of that type.

Let’s look at an example for a Node.js project:

FROM node:6.11.1-alpine ONBUILD CMD ["node", "/app/index.js"] ONBUILD COPY package.json /app ONBUILD RUN npm set progress=false && \

npm install --silent ONBUILD RUN npm test && \

npm coverage ONBUILD COPY . /app

Now, say we build an image from the above Dockerfile and tag it as thehipbot/node:6.11-onbuild and publish this image. Then we can use this image for any Node.js project so long as it has the npm scripts for test and coverage and can be run from the file index.js in the root of the project. The Dockerfile for our projects will be one line:

FROM thehipbot/node:6.11-onbuild

And all of the logic will live in the shared image.

It is very beneficial for you and your organization to create and maintain common base images for all types of applications you plan to support and even create onbuild images off of these bases. This provides a good place to make sweeping changes, reduce the amount of code, and simplify your CI pipelines.

Separating Build and Run Dependencies

Typically, your projects dependencies and tasks can be divided into two categories, those needed to build the application and those needed for it to be run. If we can separate the two stages, we will at least reduce the size of our images but can also speed up things like the time it takes to start our app.

Ideally we want both stages running in a containerized environment to ensure reproducibility, but we would want the two phases to take place in separate containers with the result of the build being copied into the run image. In that case, we would need only the production dependencies or an artifact generated by the build in our run image.

As an example of this we will be looking at a sample build for a front-end application, but the same pattern will apply to compiled languages as well. With our front-end code our build stage will verify the code with unit tests and linting, then run a build tool like webpack to minify and bundle. For our run stage, we would like to create an NGINX image to serve up our bundled JavaScript and CSS.

Assuming we have an npm script build which will test, lint, then package our front-end code, here is what our build stage Dockerfile would look like:

FROM node:8-wheezy WORKDIR /app CMD ["cp", "./dist", "./build/"] COPY package.json /app RUN npm set progress=false && \

npm install --silent && \ COPY . /app RUN npm run build

Ignore, for a moment, the 3rd line with our CMD directive. It will be important when we combine our stages.

And here is our Dockerfile for the runnable NGINX container:

FROM nginx COPY build/ /usr/share/nginx/html

We are copying the contents of a folder build (look familiar?) into the default directory out of which the NGINX container will serve files.

Now in older versions of Docker, the only option to connect these pieces is to do so manually, typically in a shell script. This script should run the build, pull the result out of the build container, and then drop the result into an NGINX container which can be run to serve those files. For this case, we need two Dockerfiles which we will call Dockerfile.build and Dockerfile.run . Here’s what that would look like:

#!/usr/bin/env bash # build the build-stage image

docker build -t thehipbot/fe-build -f Dockerfile.build . # run the build stage image

docker run -v $(pwd)/build:/app/build thehipbot/fe-build # build the runnable container

docker build -t thehipbot/fe-run -f Dockerfile.run . # now if we run the new image, we will have NGINX servering

# up our static files

docker run -p 80:80 thehipbot/fe-run

So all in all, what we do here is volume map ( -v $(pwd)/build:/app/build ) a folder called build from our local filesystem to the first container so that it can copy the result of the build out of the container. From there the build of our runnable image picks up those files and copies them into the image.

This technique works great, but as of Docker CE version 17.05.0-ce, released in May, we have a feature called multi-stage builds which provides a cleaner way to do this.

Multi-stage builds

Multi-stage builds allow us to include multiple FROM commands in your Dockerfiles which creates a new stage with a method to retrieve and use artifacts from previous stages. Let’s see our example refactored to use this.

Here we have our single Dockerfile with multiple stages:

FROM node:8-wheezy as build WORKDIR /app COPY package.json /app RUN npm set progress=false && \

npm install --silent && \ COPY . /app RUN npm run build FROM nginx COPY --from=build /app/build/ /usr/share/nginx/html

With just a small change we have just one Dockerfile and do not have to worry about copying any files in or out of containers. Notice in our first FROM command we have appended as build which allows us to reference this first stage in the build from other stages in the same Dockerfile. We then use this reference in our COPY --from=build ... command to pull files from the build and copy them into our final result stage.

Using multi-stage builds (or scripts if you are stuck on an older version of Docker) to reduce image sizes is a great idea for your images. I also have an example of a multi-stage build in Golang if you would like a similar example in a compiled language.

The techniques which we have covered over the course of this post will benefit your practices with Docker and CI in many ways. To name a few, It will reduce the amount of space used on your Docker host and artifact repository, reduce build and deploy times, and reduce duplicate code. If you take nothing else from this post, keep in mind the principles mentioned at the start since many of the patterns arise directly from them.

Do you have any other patterns you have found useful? Please tell me in the comments.