In our recent MesosCon survey to the existing Mesos users, one of the biggest feature ask was to have Docker integration into Mesos. Although users can already launch Docker images with Mesos thanks to the external containerizer work with Deimos, that approach still requires a external component to be installed on each slave and also we see that integrating Docker directly into Mesos provides longer term roadmap of how possibly Docker can provide future features to Mesos.

What’s been added?

At the API level we also added a new ContainerInfo that serves as the base proto message for all future Containers, and added a DockerInfo message that provides Docker specific options that can be set. We also added ContainerInfo into TaskInfo and ExecutorInfo so that users can launch a Task with Docker, or launch an Executor with Docker.

Internally we created a Docker Containerizer that encapsulates Docker as a containerizer for Mesos and it is going to be released with 0.20.

The Docker Containerizer will take the specified Docker image and launch, wait, and remove the Docker container following the life cycle of the Containerizer itself. It will also redirect Docker logs into your sandbox’s stdout/stderr log files for you.

We also added a Docker abstraction that currently maps Docker commands to Docker CLI commands that will be issued in the slave.

For more information about the docker changes, please take a look at the documentation in the Mesos repo (https://github.com/apache/mesos/blob/master/docs/docker-containerizer.md)

Challenges

The first big challenge is trying to integrate Docker into Mesos is to find a way to fit Docker into the Mesos’s slave containerizer model, and keep it as simple as possible.

As we decided to integrate with the Docker CLI we get to really learn what does Docker CLI provide and how we can map starting a container (docker run), waiting (docker wait), destroying a container (docker kill, docker rm -f) to Mesos.

Although Docker provides an option to the run command to specify the CPU and Memory resources allocated for that container, the first gap we identified was that it does not provide the interface to update the resources allocated. Part of the Containerizer interface is to provide a way to update the resources used for a container. Luckily Mesos already has utilities that deals with Cgroup as part of the Mesos Containerizer, so we decided to re-use the code to update the Docker’s cgroup values underneath Docker.

One of the biggest concern for a Mesos slave is to be able to recover the docker tasks after the slave recovers from a crash, and somehow to make sure we don’t leak docker containers or any resource as part of the slave crash with Docker. We decided to name every container that mesos created with a prefix and the container id, and use the container name to help us during recovery to know what’s still running and what should be destroyed if it’s not part of the slave’s checkpoint state.

After mapping all the Cli commands and seeing things working with simple Docker run, we started to realize Docker images various ways that affect what the actually command is being ran after Docker run, such as ENTRYPOINT and CMD in the Docker image itself. It becomes obvious that we don’t have enough flexibility in our Mesos API when we see the only option for our API to specify a command is a required string field. We need to make the command value optional so users can use the image’s default command. We also used to have to wrap the command in /bin/sh to handle commands that contain pipes and or any operators so the whole command gets to execute in the docker image and not the host. However, when a image has a ENTRYPOINT /bin/sh becomes part of the parameter to ENTRYPOINT and causes bad behaviors. We’ve then added both a shell flag and making the value as an optional field in Mesos.

The last and one of the biggest challenge is make sure we handle the timeouts in Mesos in each stage of the Docker Containerizer launch. Part of the Mesos containerizer life cycle is to trigger a destroy when the launch exceeds a certain timeout, however it is up to the containerizer to properly destroy and log what is going on. We went through each stage and made sure when the containerizer is pulling large files or docker is pulling a large image we show a sensible error message and can clean up correctly.

We also ran into couple Docker bugs that is logged into Github. However, I found the Docker community to be really responsive and the velocity of the project is definitely going fast.

Further Work

Currently we had to default to host networking while we launch Docker as it is the simplest way to get a Mesos executor running in a Docker image so it can talk about to the slave with it’s advertised PID. More work is needed to support more networking options.

There is also a lot more features to consider, especially around allowing Docker containers to be linked and communicated to each other.

There is a lot more that I won’t list in this blog, but I’m glad of the shape of the integration and looking forward to see community feedback.

Credits

Really like to thank Ben Hindman for the overall guidance and working many late nights resolving many docker on mesos issues. Also Yifan Gu that worked on many patches in the beginning as well!