Backstory

At Nordstrom we use Docker and Kubernetes everywhere. We’ve deeply integrated these container technologies into every aspect of our day-to-day development, from CI/CD builds, scheduled jobs, or hosting public services taking thousands of transactions per second.

We’ve been doing this for years which has given us time to mature it is a platform. This means proper engineering standards, shared distributed tracing and logging, and most importantly to what spawned this deep dive: restrictive pod security policies.

On the search team we’ve recently introduced another container orchestrator: Amazon’s Elastic Container Service. We did this to reduce complexity working with AWS-specific services, but new infrastructure invariably leads to learning opportunities.

Specifically, I ran into the problem of trying to get our Gitlab CICD runners in Kubernetes to build a docker image from a Dockerfile and deploying it to ECR. The root of the problem was that a single Gitlab CICD stage could either:

Acquire AWS credentials (e.g. awscli ecr get-login ) OR

) Run the docker daemon to pull, build, and push (e.g. docker pull )

I could do one in one container, and I could do the other in another container, but I could not do both in the same container. This is because our shared CICD runners lock down what services can run in parallel with what images. This gave me the following options:

Build the docker image without a daemon process (daemonless docker)

Find a way to securely generate and pass credentials for ECR from one stage into another stage that had docker and could use them.

While the final solution ended up being the latter (using Gitlab’s secret manager) I started the process by looking into daemonless docker builders. I learned the following:

There are daemonless container builders, and they use runc to build images and run containers the same open container specification that Docker does.

There are two very promising container builders that are not docker:

Google’s Kaniko was specifically built with the intention of being used on Kubernetes as a docker image builder.

Genuine Tool’s img which bills itself a “standalone, daemon-less, unprivileged Dockerfile and OCI compatible container image builder”

Sounds great! Right? Well… sort of.

Remember those pod security policies I mentioned earlier? One of those policies on our Kubernetes cluster is ReadOnlyRootFilesystem, which means your container cannot modify the root filesystem. With Kaniko, this is a problem because it writes to hardcoded root folder paths. We could make our own customized build, but for now, let’s continue.

So I took a look at Genuine Tool’s img and, it looks like it meets all of the criteria:

Daemonless

Runs just fine in an unprivileged setting

No building in the root directory

Supports ECR authentication docker login {creds} → img login {creds}

However in my research I found the following ticket on github:

Barely squeezing through on features

Turns out img only supports the ECR private repo, which made me wonder:

How does docker authentication work?

Why would this project work for one private repository but not all the others?

How hard would it be to implement support for GCR in this project? (We also have Google Cloud projects at Nordstrom)

How Docker Authentication Works

We’re going to look at this in detail by using the Google Container Registry as an example of a private registry, we’ll first look at the relevant documentation on what it SHOULD be doing, and then by setting up a MITM proxy on my machine watching all the outgoing requests to see exactly what is being sent.

If you want to follow along, you’ll need to have the Google Cloud SDK installed, and be properly authenticated via gcloud auth and have used the SDK’s gcloud auth configure-docker setup command.

The Documentation

When you perform a docker login gcr.io to log in and it just works™ it’s easy to miss just how much is going on under the hood.

The docker authentication model

The above image is from the Docker Token Authentication Specification, which shows us at a high level how authentication occurs, it also tells what each call represents:

1. Attempt to begin a push/pull operation with the registry. 2. If the registry requires authorization it will return a 401 Unauthorized HTTP response with information on how to authenticate. 3. The registry client makes a request to the authorization service for a Bearer token. 4. The authorization service returns an opaque Bearer token representing the client’s authorized access. 5. The client retries the original request with the Bearer token embedded in the request’s Authorization header. 6. The Registry authorizes the client by validating the Bearer token and the claim set embedded within it and begins the push/pull session as usual.

This is a great start, but how does the Docker Daemon, or maybe its client (it’s unclear which), get the credentials to acquire the Bearer Token from the “authorization service”?

Credential Helpers

Per the documentation, credential helpers are how the docker client bootstraps this process. Credential helpers are binaries that the docker client uses to call service-specific authentication mechanisms. They implement a very simple interface and are called via a very simple mechanism.

When you setup docker with the gcloud sdk’s gcloud auth configure-docker it writes the following file to your ~/.docker/config.json :

Example gcloud docker config

When you perform a docker login {HOST_NAME} it performs a lookup in this file to find out which helper to call. In the above configuration docker login gcr.io will call docker-credential-gcloud .

Meaning the config has the following structure:

{

"credHelpers":

[HOST_NAME]: [HELPER_SUFFIX]

}

}

The helper interface is straightforward:

docker-credential-{suffix} get [host_name]

docker-credential-{suffix} store [host_name]

docker-credential-{suffix} erase [host_name]

get asks the helper to receive the authentication credentials it will require

asks the helper to receive the authentication credentials it will require store asks the helper to save the credentials

asks the helper to save the credentials erase asks the helper to erase any saved credentials for a docker logout call

We can actually test this ourselves by calling the helper directly:

Authenticate and print our credentials

Which will give you something like

{

"Secret": "ya29.[REDACTED]",

"Username": "_dcgcloud_token"

}

By MITM

Let’s see what the exact requests that are being made are. We can do this by setting up a man-in-the-middle proxy between:

The credential helper and the internet

The docker daemon and the internet

The docker client and the docker daemon

Warning: we will be configuring both Docker and the gcloud SDK to be insecure, making it trivially easy to set up a MITM attack against us. Do not try this on your machine unless you are comfortable with undoing this yourself.

The MITM attack we will be setting up

Setup

Open your terminal, and let’s get started by installing what we’re going to need to setup a MITM attack

docker pull mitmproxy/mitmproxy

sudo apt-get install socat

Now setup gcloud to allow insecure connections and to proxy through a proxy we will setup soon

gcloud config set proxy/address 127.0.0.1

gcloud config set proxy/port 8080

gcloud config set proxy/type http

gcloud config set auth/disable_ssl_validation True

Now setup the docker daemon to go through our proxy and allow it to make insecure connections to the docker daemon

in /etc/docker/daemon.json

{

"insecure-registries" : ["gcr.io"]

}

in /etc/systemd/system/docker.service.d/http-proxy.conf

in /etc/systemd/system/docker.service.d/https-proxy.conf

Now in two other terminals we will need to start the following:

Our HTTP proxy

Our SOCKS proxy

HTTP proxy:

docker run --rm -it -v ~/.mitmproxy:/home/mitmproxy/.mitmproxy -p 8080:8080 mitmproxy/mitmproxy

SOCKS proxy:

socat -v UNIX-LISTEN:/tmp/fake,fork UNIX-CONNECT:/var/run/docker.sock

Execution

We now have a man in the middle proxy running. In a third terminal we can start authentication.

export DOCKER_HOST=unix:///tmp/fake

docker login gcr.io

You’ll immediately see 4 HTTP requests and a flurry of SOCKs messages. The most important SOCKS message is the authentication request that kicks off the whole process:

> 2020/01/11 16:29:33.415001 length=353 from=165 to=517

POST /v1.39/auth HTTP/1.1\r

Host: docker\r

User-Agent: Docker-Client/18.09.0 (linux)\r

Content-Length: 214\r

Content-Type: application/json\r

\r

{"username":"_dcgcloud_token","password":"ya29.[REMOVED]","serveraddress":"gcr.io"}

< 2020/01/11 16:29:33.850583 length=250 from=2826 to=3075

HTTP/1.1 200 OK\r

Api-Version: 1.39\r

Content-Type: application/json\r

Docker-Experimental: false\r

Ostype: linux\r

Server: Docker/18.09.0 (linux)\r

Date: Sun, 12 Jan 2020 00:29:33 GMT\r

Content-Length: 48\r

\r

From there the docker daemon picks up the rest of the process and performs the following 4 calls:

The gcloud credential helper making a call to /oauth/v4/token with the user’s Oauth token (retrieved and stored by browser log in during gcloud auth ) The first call to the registry, which returns a 401 Forbidden which triggers the next call: The call to the registry with the proper credentials as GET parameters. A test call with the bearer token to the registry to make sure that the authentication was successful.

And there you have it. The full set of network calls described by the documentation.

If you followed along, now would be a good time to undo this insecure configuration.

Implementation

So why doesn’t img support authentication for GCR using the authentication helper or any other registry auth helper than ECR? Turns out they almost do.

img uses the same exact code that the docker client uses, in fact it’s a direct dependency. They’re just ignoring the credential helper’s work and always prompting for a password of isn’t provided via the img login command.

So the fix?

Pull Request img

My first three lines of golang. Submitted as a pull request to the project.

Conclusion

As software engineers, we stand on the shoulders of giants. A large amount of our daily work relies on a Rube Goldberg machine of binaries, libaries and services that we can take for granted on our way to solve OUR problem. Taking a moment to deep dive into how the sausage gets made can teach us things that we never knew we wanted to learn about like:

That the images we build are all a part of the same Open Container Specification , and there are different runtimes outside of docker.

Almost the entirety of the docker/runc/img/kaniko/jib ecosystem relies on the same shared plumbing.

The Infrastructure Planning Manifesto and Docker’s attempts to break out all of their infrastructure into modular components.

Setting up MITM proxies for reverse engineering socket requests is almost trivially easy.

So when you face a problem, take a moment to consider what you expected to just work™ and dive into what it’s doing to help keep the bits flowing.