Setting up your own Docker swarm

Scaling your service has usually been in the domain of system operators, which installed servers and developers tweaking software when the load got high enough to warrant scaling. Soon enough you’d be looking at tens or even hundreds of instances which took a lot of time to manage. With the release of Docker 1.12, you now have orchestration built in - you can scale to as many instances as your hosts can allow. And setting up a docker swarm is easy-peasy.

Initialize swarm

First off - i’m starting with a clean docker 1.12.0 installation. I’ll be creating a swarm with a few simple steps:

root@swarm1:~$ docker swarm init Swarm initialized: current node (4i0lko1qdwqp4x1aqwn6o7obh) is now a manager. To add a worker to this swarm, run the following command: docker swarm join \ --token SWMTKN-1-0445f42yyhu8z4k3lgxsbnusnd8ws83urf56t02rv1vdh1zqlj-9ycry5kc20rnw5cbxhyduzg1f \ 10.55.0.248:2377 To add a manager to this swarm, run the following command: docker swarm join \ --token SWMTKN-1-0445f42yyhu8z4k3lgxsbnusnd8ws83urf56t02rv1vdh1zqlj-09swsjxdz80bfxbc1aed6mack \ 10.55.0.248:2377

I now have a swarm consisting of exactly 1 manager node. You can attach additional swarm workers, or add new managers for high-availability. If you’re running a swarm cluster with only one manager and several workers, you’re risking an interruption of service if the manager node fails.

“In Docker Swarm, the Swarm manager is responsible for the entire cluster and manages the resources of multiple Docker hosts at scale. If the Swarm manager dies, you must create a new one and deal with an interruption of service.”

As we’re interested in setting up a two-node swarm cluster, it makes sense to make both nodes in the swarm be managers. If one goes down, the other would take it’s place.

root@swarm2:~# docker swarm join \ > --token SWMTKN-1-0445f42yyhu8z4k3lgxsbnusnd8ws83urf56t02rv1vdh1zqlj-09swsjxdz80bfxbc1aed6mack \ > 10.55.0.248:2377 This node joined a swarm as a manager.

To list the nodes in the swarm, run docker node ls .

root@swarm2:~# docker node ls ID HOSTNAME STATUS AVAILABILITY MANAGER STATUS 4i0lko1qdwqp4x1aqwn6o7obh swarm1 Ready Active Leader 9gyk5t22ngndbwtjof80hpg54 * swarm2 Ready Active Reachable

Creating a a service

As you see, when adding a new manager node, it’s automatically added but not promoted to the leader. Let’s start some service that will perform something that can be scaled over both hosts. I will ping google.com , for example. I want to have 5 instances of this service available from the start by using the --replicas flag.

root@swarm2:~# docker service create --replicas 5 --name helloworld alpine ping google.com 31zloagja1dlkt4kaicvgeahn

As the service started without problems, we just get the id of the service which was started. By using docker service ls we can get more information about the running service.

root@swarm2:~# docker service ls ID NAME REPLICAS IMAGE COMMAND 31zloagja1dl helloworld 5/5 alpine ping google.com

Of course, as we’re talking orchestration, the services in the examples are split between swarm1 and swarm2 nodes. You can still use docker ps -a on indivudual nodes to inspect single containers, but there’s the handy docker service ps [name] .

root@swarm1:~# docker service ps helloworld ID NAME IMAGE NODE DESIRED STATE CURRENT STATE ERROR 5fxtllouvmd91tmgzoudtt7a4 helloworld.1 alpine swarm1 Running Running 7 minutes ago cqvgixx3djhvtiahba971ivr7 helloworld.2 alpine swarm2 Running Running 7 minutes ago 99425nw3r4rf5nd66smjm13f5 helloworld.3 alpine swarm2 Running Running 7 minutes ago 1dj3cs7v5ijc93k9yc2p42bhj helloworld.4 alpine swarm1 Running Running 7 minutes ago 0hy3yzwqzlnee10gat6w2lnp2 helloworld.5 alpine swarm1 Running Running 7 minutes ago

Testing fault tolerance

As we connected two managers to run our service, let’s just bring one of them down. I’m going to power off swarm1 , the current leader, so that it will hopefully do the following:

elect a new leader (swarm2), start up additional helloworld containers to cover the outage

root@swarm1:~# poweroff Connection to 10.55.0.248 closed by remote host. Connection to 10.55.0.248 closed.

First off, let’s list the cluster state.

root@swarm2:~# docker node ls Error response from daemon: rpc error: code = 2 desc = raft: no elected cluster leader

Uh oh, this was slightly unexpected. After bringing up swarm1 , I’m seeing that swarm2 was promoted to a leader. But it’s not exactly the fail-over I imagined. While swarm1 was offline, the ping service only ran as 2⁄ 5 and didn’t automatically scale on swarm2 as expected.

root@swarm2:~# docker node ls ID HOSTNAME STATUS AVAILABILITY MANAGER STATUS 4i0lko1qdwqp4x1aqwn6o7obh swarm1 Ready Active Reachable 9gyk5t22ngndbwtjof80hpg54 * swarm2 Ready Active Leader root@swarm2:~# docker service ps helloworld ID NAME IMAGE NODE DESIRED STATE CURRENT STATE ERROR 4x0zgeiucsizvmys5orih2bru helloworld.1 alpine swarm1 Running Running 3 minutes ago 5fxtllouvmd91tmgzoudtt7a4 \_ helloworld.1 alpine swarm1 Shutdown Complete 3 minutes ago cqvgixx3djhvtiahba971ivr7 helloworld.2 alpine swarm2 Running Running 21 minutes ago 99425nw3r4rf5nd66smjm13f5 helloworld.3 alpine swarm2 Running Running 21 minutes ago 5xzldwvoplqpg1qllg28kh2ef helloworld.4 alpine swarm1 Running Running 3 minutes ago 1dj3cs7v5ijc93k9yc2p42bhj \_ helloworld.4 alpine swarm1 Shutdown Complete 3 minutes ago avm36h718yihd5nomy2kzhy7m helloworld.5 alpine swarm1 Running Running 3 minutes ago 0hy3yzwqzlnee10gat6w2lnp2 \_ helloworld.5 alpine swarm1 Shutdown Complete 3 minutes ago

So, what went wrong? A bit of reading and I’ve come up to the following explanation of how Docker uses the RAFT consensus algorithm for leader selection:

Consensus is fault-tolerant up to the point where quorum is available. If a quorum of nodes is unavailable, it is impossible to process log entries or reason about peer membership. For example, suppose there are only 2 peers: A and B. The quorum size is also 2, meaning both nodes must agree to commit a log entry. If either A or B fails, it is now impossible to reach quorum.

Adding an additional manager to enable fault tolerance

So, if you have three managers, one manager can fail, and the remaining two represent a majority, which can decide which one of the remaining managers will be elected as a leader. I quickly add a swarm3 node to the swarm. You can retrieve credentials to add nodes by issuing docker swarm join-token [type] where type can be either worker or manager.

root@swarm2:~# docker swarm join-token manager To add a manager to this swarm, run the following command: docker swarm join \ --token SWMTKN-1-0445f42yyhu8z4k3lgxsbnusnd8ws83urf56t02rv1vdh1zqlj-09swsjxdz80bfxbc1aed6mack \ 10.55.0.238:2377

And we run this command on our swarm3 machine.

root@swarm3:~# docker swarm join \ > --token SWMTKN-1-0445f42yyhu8z4k3lgxsbnusnd8ws83urf56t02rv1vdh1zqlj-09swsjxdz80bfxbc1aed6mack \ > 10.55.0.238:2377 This node joined a swarm as a manager. root@swarm3:~# docker node ls ID HOSTNAME STATUS AVAILABILITY MANAGER STATUS 4i0lko1qdwqp4x1aqwn6o7obh swarm1 Ready Active Reachable 9gyk5t22ngndbwtjof80hpg54 swarm2 Ready Active Leader b9dyyc08ehtnl62z7e3ll0ih3 * swarm3 Ready Active Reachable

Yay! Our swarm3 is ready. I cleared out the container inventory to start with a clean swarm.

Scaling our service with fault tolerance

I deleted the service with a docker service rm helloworld , and cleaned up the containers with a docker ps -a -q | xargs docker rm . Now I can start the service again from zero.

root@swarm1:~# docker service create --replicas 5 --name helloworld alpine ping google.com 5gmrllue1sgdwl1yd5ubl16md root@swarm1:~# docker service scale helloworld=10 helloworld scaled to 10 root@swarm1:~# docker service ps helloworld ID NAME IMAGE NODE DESIRED STATE CURRENT STATE ERROR 2hb76h8m7oop9pit4jgok2jiu helloworld.1 alpine swarm1 Running Running about a minute ago 5lxefcjclasna9as4oezn34i8 helloworld.2 alpine swarm3 Running Running about a minute ago 95cab7hte5xp9e8mfj1tbxms0 helloworld.3 alpine swarm2 Running Running about a minute ago a6pcl2fce4hwnh347gi082sc2 helloworld.4 alpine swarm2 Running Running about a minute ago 61rez4j8c5h6g9jo81xhc32wv helloworld.5 alpine swarm1 Running Running about a minute ago 2lobeil8sndn0loewrz8n9i4s helloworld.6 alpine swarm1 Running Running 20 seconds ago 0gieon36unsggqjel48lcax05 helloworld.7 alpine swarm1 Running Running 21 seconds ago 91cdmnxarluy2hc2fejvxnzfg helloworld.8 alpine swarm3 Running Running 21 seconds ago 02x6ppzyseak8wsdcqcuq545d helloworld.9 alpine swarm3 Running Running 20 seconds ago 4gmn24kjfv7apioy6t8e5ibl8 helloworld.10 alpine swarm2 Running Running 21 seconds ago root@swarm1:~#

And powering off swarm1 , gives us:

root@swarm2:~# docker node ls ID HOSTNAME STATUS AVAILABILITY MANAGER STATUS 4i0lko1qdwqp4x1aqwn6o7obh swarm1 Ready Active Unreachable 9gyk5t22ngndbwtjof80hpg54 * swarm2 Ready Active Leader b9dyyc08ehtnl62z7e3ll0ih3 swarm3 Ready Active Reachable

and additional containers have spawned, just as intended:

root@swarm2:~# docker service ps helloworld ID NAME IMAGE NODE DESIRED STATE CURRENT STATE ERROR bb8nwud2h75xpvkxouwt8rftm helloworld.1 alpine swarm2 Running Running 26 seconds ago 2hb76h8m7oop9pit4jgok2jiu \_ helloworld.1 alpine swarm1 Shutdown Running 2 minutes ago 5lxefcjclasna9as4oezn34i8 helloworld.2 alpine swarm3 Running Running 2 minutes ago 95cab7hte5xp9e8mfj1tbxms0 helloworld.3 alpine swarm2 Running Running 2 minutes ago a6pcl2fce4hwnh347gi082sc2 helloworld.4 alpine swarm2 Running Running 2 minutes ago 8n1uonzp2roy608kd6v888y3d helloworld.5 alpine swarm3 Running Running 26 seconds ago 61rez4j8c5h6g9jo81xhc32wv \_ helloworld.5 alpine swarm1 Shutdown Running 2 minutes ago 17czblq9saww4e2wok235kww8 helloworld.6 alpine swarm2 Running Running 26 seconds ago 2lobeil8sndn0loewrz8n9i4s \_ helloworld.6 alpine swarm1 Shutdown Running about a minute ago 6f3tm5vvhq07kwqt3zu0xr5mi helloworld.7 alpine swarm3 Running Running 26 seconds ago 0gieon36unsggqjel48lcax05 \_ helloworld.7 alpine swarm1 Shutdown Running about a minute ago 91cdmnxarluy2hc2fejvxnzfg helloworld.8 alpine swarm3 Running Running about a minute ago 02x6ppzyseak8wsdcqcuq545d helloworld.9 alpine swarm3 Running Running about a minute ago 4gmn24kjfv7apioy6t8e5ibl8 helloworld.10 alpine swarm2 Running Running about a minute ago

Move the services away from a specific node (drain)

With this setup we can tolerate failure of one manager node. But say we wanted a bit more “graceful” procedure of removing containers from one node? We can set the availability to drain to empty containers on one node.

root@swarm2:~# docker node update --availability drain swarm3 swarm3 root@swarm2:~# docker service ps helloworld | grep swarm3 5lxefcjclasna9as4oezn34i8 \_ helloworld.2 alpine swarm3 Shutdown Shutdown 19 seconds ago 8n1uonzp2roy608kd6v888y3d \_ helloworld.5 alpine swarm3 Shutdown Shutdown 19 seconds ago 6f3tm5vvhq07kwqt3zu0xr5mi \_ helloworld.7 alpine swarm3 Shutdown Shutdown 19 seconds ago 91cdmnxarluy2hc2fejvxnzfg \_ helloworld.8 alpine swarm3 Shutdown Shutdown 19 seconds ago 02x6ppzyseak8wsdcqcuq545d \_ helloworld.9 alpine swarm3 Shutdown Shutdown 19 seconds ago root@swarm2:~# docker service ps helloworld | grep swarm2 | wc -l 10

All the containers on swarm3 shut down, and started up on the remaining node, swarm2 . Let’s scale down the example to only one instance.

root@swarm2:~# docker service scale helloworld=1 helloworld scaled to 1 root@swarm2:~# docker service ps helloworld | grep swarm2 | grep -v Shutdown 17czblq9saww4e2wok235kww8 helloworld.6 alpine swarm2 Running Running 7 minutes ago

Cleaning up the containers is still very much in the domain of the sysadmin. I started up swarm1 , and scaled our service to 20 instances.

root@swarm2:~# docker service scale helloworld=20 helloworld scaled to 20 root@swarm2:~# docker service ls ID NAME REPLICAS IMAGE COMMAND 5gmrllue1sgd helloworld 2/20 alpine ping google.com root@swarm2:~# docker service ls ID NAME REPLICAS IMAGE COMMAND 5gmrllue1sgd helloworld 10/20 alpine ping google.com root@swarm2:~# docker service ls ID NAME REPLICAS IMAGE COMMAND 5gmrllue1sgd helloworld 16/20 alpine ping google.com root@swarm2:~# docker service ls ID NAME REPLICAS IMAGE COMMAND 5gmrllue1sgd helloworld 20/20 alpine ping google.com

As you can see here, it does take some time for the instances to start up. Let’s see how it distributed.

root@swarm3:~# docker service ps -f "desired-state=running" helloworld | grep swarm | awk '{print $4}' | sort | uniq -c 10 swarm1 10 swarm2

Enabling and scaling to a new node

As we put swarm3 into drain availability, we don’t have any instances running on it. Let’s fix that very quickly by putting it back into active availability mode.

root@swarm3:~# docker node update --availability active swarm3 swarm3

As the already running service will stay the same, we need to scale our service to populate swarm3 .

root@swarm3:~# docker service scale helloworld=30 helloworld scaled to 30 root@swarm3:~# docker service ps -f "desired-state=running" helloworld | grep swarm | awk '{print $4}' | sort | uniq -c 10 swarm1 10 swarm2 10 swarm3

It takes a bit of getting used to, but docker service is a powerful way to scale out your microservices. It might be slightly more tricky when it comes to data volumes (mounts), but that’s the subject of another post.

Closing words

Keep in mind, if you’re provisioning swarm managers, you need a majority to resolve failures gracefully. That means you should have an odd number of managers, where N > 2. A cluster of N managers is able to tolerate failure of ((N-1)/2) nodes, for example 3 managers = 1 failed node, 5 managers = 2 failed nodes, 7 managers = 3 failed nodes and so on.

A worker in comparison doesn’t replicate the manager state, and you can’t start or query services from a worker. You should only do that from any of the manager nodes - the commands will be ran on the leader node.

About the author

I’m the author of API Foundations in Go. Consider buying it if you like the article. If you’d like to be notified of new posts, sign up for my mailing list - I’ll notify you when I post new articles like this one. It may be minutes before I post it on Twitter.

You should also give me a follow on Twitter and let’s talk. I’m also available for consulting / development jobs. Fixing bottlenecks and scaling services to cope with high traffic is my thing.

While I have you here...

It would be great if you buy one of my books:

I promise you'll learn a lot more if you buy one. Buying a copy supports me writing more about similar topics. Say thank you and buy my books.

Feel free to send me an email if you want to book my time for consultancy/freelance services. I'm great at APIs, Go, Docker, VueJS and scaling services, among many other things.