There are some very good articles and posts out there which describe in fantastic detail how Docker works and how easy it is to start a container (now a “service” in Docker 1.12). However not many blogs talk about the other stuff that must be done to successfully get the application responding to requests from the web, and less talk about a 1000 node cluster.

This post is for those of you who have not yet set up a cluster in production, we’ll go through every little detail you need to know before you set up your own 1000 node cluster.

Important note about 1000 nodes

I keep mentioning 1000 nodes in this post just to take it to the extreme. The steps in this post will leave you with a working 1000 node cluster but if you’re determined to actually start a 1000 node cluster then you will end up with a hefty bill at the end of the month. You might be better realizing that adding 997 nodes to a 3 node swarm is trivial, so replacing 3 in wherever you see 1000 in this post is probably a good idea.

Overview

Here are the steps we’re going to go through:

Buy 1000 servers

This one is easy. Pick your favourite cloud provider and buy lots of servers. Docker can be installed on most (not too old) *NIX operating systems, if in doubt use Ubuntu Xenial.

Once you have your 1000 nodes, let’s install Docker!

Install Docker

This one is fairly straightforward too. Just install Docker on each node. If you have lots of nodes then it might be worthwhile to use Ansible to install Docker on all of them at the same time. If you only have a handful of nodes then just go ahead and install Docker manually on each one.

Join the nodes into a Swarm

Creating a swarm is easy and the official docs do a great job of explaining the process.

Basically you will run docker swarm init on the first node and then docker swarm join on all the other nodes. There are a few other arguments that you’ll need to add to those commands but if you follow the docs you’ll have no problems at all.

Configure DNS to point to your domain

Here’s where we’re going to depart from the other posts and articles. Not only do we want a working swarm but we also want to deploy an application that responds to web requests. For that we need to configure DNS.

Presuming you have already bought your domain name (if not, buy it now), we’ll log in to the domain name provider and set up multiple A records to point the domain name to a few of the IPs of the nodes in our swarm.

Let’s discuss this for a moment. We’re going to take advantage of the routing mesh that was introduced in Docker 1.12. The routing mesh routes any request on a port on any node to the correct service. This means that we will be able to access our web application from any node! We’re using that feature to provide DNS failover, we’ll specify multiple A records so that if one node goes down then it won’t be the end of the world for everybody.

Let’s say that you want to have 3 nodes accepting requests, your A records might look like this:

1.1.1.1 A example.com 1.1.1.2 A example.com 1.1.1.3 A example.com

Where 1.1.1.X are the public IPs of your nodes.

You don’t have to put all your nodes in the DNS, you can get away with just one IP in the A records even with a 1000 node cluster. The only issue is that there is one single point of failure (if that one server behind that IP goes down, apocalypse now).

Pro tip: If you genuinely are setting up a 1000 node cluster then you probably want to have a few load balancers in front of the cluster. This is because DNS load balancing is highly dependant on the clients implementation, so for more control use a load balancer. In that case you would point the A records at the load balancers and configure the load balancers to balance over the 1000 nodes.

At this point we have a working swarm. Presuming the DNS has been given time to propagate (people recommend a few hours, but it usually takes less than 30 minutes), when we type our domain name into a browser our request will be routed to one of our nodes. However since nothing is listening on the nodes we’ll probably get a response timed out. Time to deploy our application!

Deploy the application

Now we just deploy our application to the swarm as a service. This is pretty easy to do and is covered by other tutorials as well so we won’t go into too much detail. The important thing is that we will publish port 80 on the nodes (using the --publish 80:80 argument when creating the service). This will tell swarm to route port 80 on any node to this service. This is what makes the DNS failover that we mentioned before work so beautifully.

Here’s how we would deploy a simple web application to a 1000 node swarm. We’re launching 3000 replicas, this will mean around 3 per node which is totally manageable. Log in to any manager node of the swarm and run this:

$ docker swarm service create \ --name app1 \ --replicas 3000 \ -- publish 80:80 \ dockercloud/hello-world

That’s it! Now hit your domain name in any browser and you should see “Hello World” shining brightly on your screen!

Deploy a reverse proxy (optional)

Right now everything is working perfectly. We have a huge swarm and we have a service which is completely balanced over the swarm, we even have DNS failover! Right now the swarm is completely self healing, the only improvement would be to implement load balancers which would help only in situations with lots of nodes.

Still, in terms of resilience, we’re happy for now, the cluster is already very fault tolerant. Let’s take a look at another type of improvement we can make: multiple web applications on the same cluster.

The problem

You’ve probably noticed that we are using port 80 on the swarm for our hello world service. This means that we cannot deploy another web application, since port 80 is already taken. The solution is to put a reverse proxy on the cluster which listens to port 80 and proxies the request to the correct service depending on the domain that was requested.

There is a fantastic container called jwilder/nginx-proxy which listens to the docker events and dynamically updates the proxys configuration when it hears containers starting up and shutting down. This is exactly what we need. However, at the time of writing the container doesn’t support Docker 1.12 swarms completely so it only works on single node clusters. Expect this to be fixed quickly though.

Instead, we’re going to create our own proxy, which will have the correct configuration baked in. The disadvantage of this is that we’ll need to redeploy the proxy whenever we want to add a new service to the cluster, but it’s not a big price to pay. To do this we will set up the proxy to forward the domain name to the name of the service, the swarm will take care of resolving the service name to an actual container.

If you use nginx as the reverse proxy the configuration might look something like this:

# App 1 upstream app1_upstream { server app1:80; } server { listen 80; server_name app1.example.com; location / { proxy_pass http://app1_upstream; } } # App 2 upstream app2_upstream { server app2:80; } server { listen 80; server_name app2.example.com; location / { proxy_pass http://app2_upstream; } }

You get the idea, basically we need the reverse proxy to interpret the domain name and route to the correct service.

You’ll also need set up the DNS A records for the second application, otherwise the browser won’t know how to resolve the new domain name.

When we launch the proxy we will publish port 80 on it instead, and when we launch the app services we wont publish any port on the cluster. Instead, you should set the EXPOSE 80 directive in the Dockerfile to tell Docker that the container should expose the port.

Wrapping up

That’s it! We have successfully created a production swarm of 1000 nodes and deployed a real application which responds to web requests onto it! This will work with a swarm of 1000 nodes or a swarm with one single node so give it a try!

Docker 1.12 has made things a lot easier, among other things it brings orchestration and better out-of-the-box security. For more info on the goodies that Docker 1.12 brings, check this out.