I really like RethinkDB, and after a time of shaky waters they have no joined the Linux Foundation and we're hoping for a stable future.

One of the big selling points of RethinkDB is how easy it is to set up a cluster of servers. Basically you do it like this (straight from the RethinkDB docs):

On first server:

$:>rethinkdb --bind all

On second server:

$:>rethinkdb --bind all --join IP_OF_FIRST_MACHINE:29015

Now the two machines are in a cluster and everything is peachy. If server two dies, restart it with the same command. If server one dies restart it with --join IP_OF_SECOND_MACHINE:29015 .

If you have more than two servers, starting more servers require them just joining any of the servers in the cluster and they will find each other.

Problems... and their implications for dockerizing

If you want to put your RethinkDB cluster in Docker there are a few snags. Especially if you want it to go into an orchestrated Docker environment like Docker Cloud, Swarm or Rancher.

Startup command

One big challenge is the start command of each RethinkDB container. Say that you have five containers that go into your cluster. The first one can just start normally and the subsequent four can --join the first one. The problem arises if the first one dies and one of the other four restarts. If it tries to join server one it will not find it and either die or create it's own little one-server cluster.

Multiple joins

Fortunately, RethinkDB has support for multiple --join arguments when starting up. So each server container can have a --join for each of the other four server containers in the cluster. This means one service defined for each RethinkDB container, but that will have to be fine for now. So command for server one would be:

$:>rethinkdb --bind-all server-name rethinkdb1 --join rethinkdb2 --join rethinkdb3 --join rethinkdb4 --join rethinkdb5

Not the prettiest solutions, but would do for now. If it wasn't for the fact that is doesn't work either. Not quite.

Joining non-existing hosts

When RethinkDB starts with --join arguments it will not fail if none of the servers responds immediately. As soon as one of them replies, the cluster is up. However: if you try to join a server to a hostname that does not exist the server will exit with an error.

You could use IPs, but how would you find which IPs the RethinkDB containers will be assigned, and if they restart IPs will be reassigned. So the sensible idea is to use the link names provided by the Docker network defined in docker-compose.yml or set up by your orchestration, right? Like I did in the example in the section above.

Well... not exactly. The problem is, when a container dies its hostname is removed from the internal DNS of the Docker network. This means a server starting with --join arguments to the four other servers in the cluster will fail if one of the four are also down. You could change the startup command of the container, but that sort of defeats the whole purpose of orchestration, automatic restarts etc.

I made a script

I did not find any super-simple solution to this problem. I did however make a custom startup script for the RethinkDB Docker image that takes a bunch of hostnames of other members of the cluster and then checks if they exist before adding --join arguments when starting RethinkDB.

The script and a bit of demonstration is in this GitHub repo:

https://github.com/osirisguitar/rethinkdb-cluster-docker

Next steps

Here are some future improvements:

Make a Docker image and publish it to Docker Hub

Create a way to use just one service definition for all RethinkDB containers. Neater, less typing and - most important - scalable within your orchestration.

Other solutions 1: join by proxy

You can join separate RethinkDB servers by launching a new server in proxy mode. In proxy mode the server just relays queries to other servers, but it does join those servers together in a cluster (if you provide --join arguments). You can also use this method to join two separate clusters into one. It's all described here:

http://blog.hiphipjorge.com/connecting-2-rethinkdb-clusters-with-proxy-node/

I didn't really like this solution since it requires launching the proxy whenever one of the servers are restarted - not very orchestratable, does not respond well to autorestart etc.

Other solutions 2: bait and switch with Swarm

With Swarm you can set up a primary server, then add secondary servers that joins the primary, and finally kill the primary and recreate it with joins to the secondary ones.

https://stefanprodan.com/2016/rethinkdb-cluster-docker-swarm-mode/

A bit convoluted, but it works. What's really nice is that you can scale your services with the normal scale commands in Swarm.

However, I didn't like this solution for two reasons: