Get blog posts to your inbox. Facebook Twitter Linkedin Link Email

Why is a multi-cloud database important? Suppose you’re operating an application that receives the bulk of its traffic from the east coast of the United States. You want to minimize latency by locating your database on the eastern seaboard, but it’s important that availability doesn’t suffer as a result.

Even with a distributed database like CockroachDB, if you locate your database nodes in just one or two data centers, you’re vulnerable to an outage. You’ll need at least a majority of replicas (i.e., two replicas, under the default three-way replication) online to serve traffic, so you’ll need nodes in at least three data centers to survive a one data center failure.

How often do data centers fail? Exact statistics are hard to come by, but examples of major outages abound: last year, many services in an entire Amazon Web Services region were unavailable for several hours, and an entire Google Compute Engine region was offline for an hour.

Unfortunately, most cloud providers offer only one data center on the east coast; those that offer two data centers in the “eastern United States” tend to either colocate them in the same state (see Azure), putting you at risk in the case of a major power outage or natural disaster, or put the second data center in the midwest (see Amazon Web Services), meaning your latency would suffer.

But with CockroachDB, you aren’t bound by the data centers of just one cloud provider! Creating a multi-cloud deployment is a breeze. Here’s how.

For the sake of this example, we’ll use the following data centers[^1]:

Digital Ocean, region nyc1

Amazon Web Services, region us-east-1

Google Compute Engine, zone us-east1

That gives us one data center in New York, one data center in Virginia, and one data center in South Carolina, respectively. Not bad for creating a cluster out of stock parts.

Note: This is not an endorsement of these particular cloud platforms. Choosing a cloud provider requires evaluating hardware performance, price, and geographic location for your particular workload.

Get started by launching one VM in each of the three data centers listed above. Then launch an additional VM in one of the data centers, for a total of four VMs. Why the extra VM? Running a CockroachDB cluster with less than three nodes will trigger warnings about “underreplication,” and later we’ll be taking one of the four VMs offline to simulate a data center failure.

You might find the following guides helpful:

CockroachDB works with any recent x64 Linux distribution. If you don’t have a preference, we recommend using the latest long-term support (LTS) release of Ubuntu. Be sure to configure networking to allow inbound and output TCP connections on both port 26257, for inter-node communication and SQL access, and port 8080, for the web admin UI.

We have more detailed walkthroughs in our deployment guides:

Once you have VMs on each cloud provider up and running, it’s time to install and boot Cockroach. You can follow our comprehensive installation instructions, or, as a quickstart, you can run the following commands on each VM, replacing PUBLIC-IP and DATA-CENTER appropriately:

$ wget -qO- https://binaries.cockroachdb.com/cockroach-v1.1.3.linux-amd64.tgz | sudo tar -xvz -C /usr/local/bin --strip=1 $ cockroach start --insecure --background \ --advertise-host PUBLIC-IP-SELF \ --join PUBLIC-IP-1,PUBLIC-IP-2,PUBLIC-IP-3,PUBLIC-IP-4 \ --locality data-center=DATA-CENTER

Remember: Real production deployments must never use --insecure! Attackers can freely read and write data on insecure cross-cloud deployments.

You can choose any value for DATA-CENTER, provided that nodes in the same data center use the same value. For example, you might launch two Digital Ocean nodes, each with --locality data-center=digital-ocean . CockroachDB uses this locality information to increase “replica diversity,” that is, preferring to store copies of data on machines in different localities rather than machines in the same locality.

From your local machine, initialize the cluster. You can use any of the nodes’ public IPs in this command.

$ cockroach init --host=PUBLIC-IP --insecure

That’s it! You’ve created a cross-cloud CockroachDB cluster. If you take a peek at the admin UI, served at port 8080 on any of the nodes, you should see that four nodes are connected:

Now, let’s insert some data into the cluster. CockroachDB ships with some example data:

$ cockroach gen example-data | cockroach sql --host=PUBLIC-IP --insecure

Here’s a quick tour of this data:

$ cockroach sql --host=PUBLIC-IP --insecure root@52.91.188.221:26257/> SHOW DATABASES; +--------------------+ | Database | +--------------------+ | crdb_internal | | information_schema | | pg_catalog | | startrek | | system | +--------------------+ > SHOW TABLES FROM startrek; +----------+ | Table | +----------+ | episodes | | quotes | +----------+ > SELECT * FROM startrek.episodes ORDER BY random() LIMIT 5; +----+--------+-----+----------------------+----------+ | id | season | num | title | stardate | +----+--------+-----+----------------------+----------+ | 33 | 2 | 4 | Mirror, Mirror | NULL | | 56 | 3 | 1 | Spock's Brain | 5431.4 | | 35 | 2 | 6 | The Doomsday Machine | 4202.9 | | 72 | 3 | 17 | That Which Survives | NULL | | 55 | 2 | 26 | Assignment: Earth | NULL | +----+--------+-----+----------------------+----------+

Looks like our CockroachDB cluster is a Star Trek whiz. We can verify that our episode table is safely stored on at least three nodes:

> SHOW TESTING_RANGES FROM TABLE startrek.episodes; +-----------+---------+----------+--------------+ | Start Key | End Key | Replicas | Lease Holder | +-----------+---------+----------+--------------+ | NULL | NULL | {1,2,3} | 1 | +-----------+---------+----------+--------------+ (1 row)

CockroachDB automatically splits data within a table into “ranges,” which are then copied to three “replicas” for redundancy in case of node failure. In this case, the entire table fits into one range, as indicated by the Start Key and End Key columns. The Replicas column indicates this range is replicated onto node 1, node 2, and node 3[^2].

Note: If your Replicas column only lists one node, your nodes likely can’t communicate over the network. Remember, port 26257 needs to allow both inbound and outbound connections. Consult our Cluster Setup Troubleshooting guide, or ask for help in our Gitter channel or on our forum.

In the cluster shown here, nodes 1 and 4 are in the same data center. Since the table is stored on nodes 1, 2, and 3, that means we have one copy of the table in each of the data centers, just as we’d hoped! Fair warning: node numbers are not assigned deterministically and will likely differ in your cluster.

Now, let’s simulate a full data center failure. Power off the VM in one of the data centers with only one VM. In this example, we’ll power off node 3. Make sure your SQL session is connected to a node besides the node you’re taking offline. (In a real deployment, to avoid this point of failure, you’d use a load balancer to automatically reroute traffic to a live node.) First, notice that even in the moment after node 3 goes offline, querying the episodes table still succeeds!

> SELECT * FROM startrek.episodes ORDER BY random() LIMIT 5; +----+--------+-----+-----------------------------+----------+ | id | season | num | title | stardate | +----+--------+-----+-----------------------------+----------+ | 27 | 1 | 27 | The Alternative Factor | 3087.6 | | 59 | 3 | 4 | And the Children Shall Lead | 5029.5 | | 18 | 1 | 18 | Arena | 3045.6 | | 13 | 1 | 13 | The Conscience of the King | 2817.6 | | 21 | 1 | 21 | The Return of the Archons | 3156.2 | +----+--------+-----+-----------------------------+----------+

When a node dies, CockroachDB immediately and automatically reroutes traffic to one of the remaining replicas. After five minutes, by default, the cluster will mark a down node as permanently “dead” and move all of its replicas to other nodes. In this example, node 3 is no longer in the replica set:

> SHOW TESTING_RANGES FROM TABLE startrek.episodes; +-----------+---------+----------+--------------+ | Start Key | End Key | Replicas | Lease Holder | +-----------+---------+----------+--------------+ | NULL | NULL | {1,2,4} | 2 | +-----------+---------+----------+--------------+

Note that the replica set now includes two nodes in the same data center, node 1 and node 4. With only two data centers, CockroachDB will reluctantly put two copies of the data in the same data center, as a total of three copies is better than only two copies. If the third data center does come back online, or if another data center is connected to the cluster, CockroachDB will again balance replicas across data centers.

That’s all there is to it! With very little configuration, we have a multi-cloud deployment of CockroachDB that checks all the boxes: nodes in three data centers along the east coast to minimize latency without compromising availability, even in the face of total data center failure.

If you’re not yet ready to commit to a permanent multi-cloud deployment, CockroachDB’s multi-cloud support can still be instrumental in providing operational flexibility. Check out how CockroachDB unlocks zero-downtime migrations from one cloud to another.

Illustration by Zoë van Dijk

[^1]: Most cloud providers charge an egress bandwidth fee for network traffic that leaves the data center. Be sure to factor this additional cost into account before launching a multi-cloud cluster.

[^2]: Technically, the Replicas column lists store IDs, not node IDs. In the default configuration we’ve described in this tutorial, however, every node has exactly one store, so store IDs correspond one-to-one to node IDs.