The purpose of this post is to expand on some existing information in the community regarding the topic of using HashiCorp’s Consul with Amazon’s Elastic Container Service (ECS). (I continue the series on this topic in Deploying Consul with ECS — Part 2)

If you’ve ever looked into this topic before, you’ve likely come across this article by Matt McClean & Chris Barclay: https://aws.amazon.com/blogs/compute/service-discovery-via-consul-with-amazon-ecs/

The article does a nice job of setting the stage for what service discovery is and why you might want it as a supporting service for your ECS based workloads. Additionally, the article includes good sample code for deploying the example scenario described. If you’ve not read it and reviewed the sample code, I would recommend this as a starting point in understanding the architecture and components you will need to configure and deploy.

Having gone through this exercise recently, there are a few areas that are worth highlighting that will hopefully help save some of you time in getting things deployed in your own environment.

The Consul Container

Since the article was published, HashiCorp has released an official Docker image that is available on Docker Hub: https://hub.docker.com/r/hashicorp/consul/

While an excellent image to simply pull in and start using right away, it additionally serves as an excellent resource on how to build minimal containers and how to manage Consul in a containerized environment. The documentation is very well done and definitely worth the read.

Some points of distinction for the official Docker image are as follows:

The image is based on Alpine Linux. If you are unfamiliar with Alpine, it is a very minimal Linux distribution that is ideal for containers due to its small footprint (~ 2 MB). As Consul is written in Go and has no external dependencies, Alpine is an ideal pairing keeping the final image to just 10 MB.

The image utilizes dumb-init to run as PID 1 in the container to better manage sub-processes.

The image utilizes gosu for running the Consul process as a non-root user.

The image has a well featured entry point script for easing configuration at runtime.

Consul on ECS

While the documentation provided with the official image is fairly comprehensive, it is geared towards environments where setting the container networking to the host mode is possible, which at present is not supported by ECS.

When running the Consul container on the default bridge network, an additional configuration parameter is required to enable host-to-host gossip between Consul agents. The parameter in question is the advertise parameter. It is intended for scenarios where the routable IP to the agent is not discoverable and needs to be manually configured. In the case of the container, we need the IP of the ECS cluster host instead of the container IP on the hosts’ bridge network.

This step is actually covered in the original article’s sample code, however you need to dig for it a little bit and the requirement is not actually discussed in the article itself.

The step is captured in the CloudFormation template that deploys the cluster: https://github.com/awslabs/service-discovery-ecs-consul/blob/master/service-discovery-blog-template

While a valid approach to deploying default services on your cluster hosts, these containers are neither managed nor properly accounted for by the ECS scheduler. This is a problem for centrally managing workloads across the cluster, but more importantly results in inaccurate reporting of resource availability per host.

A superior approach in my opinion is detailed in the official ECS documentation:

http://docs.aws.amazon.com/AmazonECS/latest/developerguide/start_task_at_launch.html

Following this approach, a Consul task is defined in ECS. That task would then be instantiated at boot time of the cluster host using an Upstart service unit (assumes AWS Linux). This does solve the aforementioned problems inherent to running the container on the host directly, but it also introduces a new challenge.

Unfortunately, the ECS task definition does not support in-line scripting. Primarily this has to do with being agnostic to the configuration of the hosts’ system, but it is also an important security precaution. As a general statement, getting information about the cluster hosts within the containers that are scheduled to run on them can be quite difficult. Ironically, this is exactly the type of scenario one could use Consul to solve.

While not a universal solution to this obstacle, there is a method of acquiring the required metadata that is unique to the EC2 service. The same mechanism was actually used in the original example above, even though it was not a requirement in that scenario.

Amazon has their own service registry that in many ways is comparable to Consul. When an EC2 instance is launched, metadata about that instance and its associated resources are registered in this catalog. That metadata can then be leveraged from the host using tools like curl, as demonstrated in the sample code. The only thing that needs to change in the ECS scenario, is that this information needs to be sourced from within the container. As the metadata is contextually specific to the host the container is running on, the logic itself can be generic.

In order to introduce this capability in to the container, I forked the official container and added some additional logic to the entry point script.

# Set advertisement and node name for use with AWS ECS

if [ "$DEPLOY_TYPE" = "ecs" ]; then

ADVERTISE_IP=$(wget -qO- 169.254.169.254/latest/meta-data/local-ipv4)

NODE_ID=$(wget -qO- 169.254.169.254/latest/meta-data/instance-id) CONSUL_AD="-advertise=$ADVERTISE_IP"

echo "==> Found address '$ADVERTISE_IP' for use with ECS" CONSUL_NODE="-node=$NODE_ID"

echo "==> Found name '$NODE_ID' for use with ECS"

fi .

.

. # Look for Consul subcommands.

if [ "$1" = 'agent' ]; then

shift

set -- consul agent \

-data-dir="$CONSUL_DATA_DIR" \

-config-dir="$CONSUL_CONFIG_DIR" \

$CONSUL_BIND \

$CONSUL_CLIENT \

$CONSUL_NODE \

$CONSUL_AD \

"$@"

As shown above, the DEPLOY_TYPE environment variables needs to be set to a value of ecs in the task definition to trigger this logic.

My modified image can be found at the following location: https://hub.docker.com/r/unifio/consul/ (https://github.com/unifio/docker-consul)

The Upstart service unit can be added to the cluster host in several ways depending on your infrastructure strategy. The ECS documentation demonstrates how to inject the logic at instance runtime using user_data.

I will continue to expand on this topic in the following upcoming posts:

Consul server deployment with ECS

ECS immutable AMI build using HashiCorp Packer

ECS cluster deployment using Terraform

If you need additional guidance or want to find out more about our services, contact us at at contact@unif.io or visit unif.io/services. Unif.io offers consulting services to supplement your team and DevOps enablement services through our team of expert developers, architects and engineers.

Wilson Carey, CTO

Unif.io, Inc.