Design A Highly Available PostgreSQL Cluster With Patroni In GCP — Part 1 Bhuvanesh Follow Sep 16, 2019 · 5 min read

Patroni is the new age HA solution for PostgreSQL with cloud-native features and advanced options for failover and failback. PostgreSQL is one of the world top open-source databases but the worst part is, it doesn’t have inbuilt automatic failover. Still, I remember my old days, fighting with repmgr for HA and its very hard for me to add the failed master back to the cluster. But later pg_rewind made that process more simple. The world is moving very fast to adopt the cloud. So some vintage HA solutions like DRBD, corosync + pacemaker, repmgr, and a few other technologies are out of date.

Problems in traditional replication:

The default replication mechanism will not support the Failover. Disk-based replication, sometimes ends with data corruption. Using external tools for failover may need additional effort to keep them up and running and monitoring. Automatically adding the failed node back to the cluster, its a nightmare (but still scripting geeks can do this). Handle the Split Brain situation.

The Bot approach:

Credit: Zalando

I learned this concept from a conference by Zalando. PostgreSQL will take care of its own process, but we need something to monitor the PostgreSQL service and its replication status in distributed systems like etcd, zookeeper or Consul. But PostgreSQL can’t to DCS directly right? So if PostgreSQL went down, then the bot will start electing a new master. Also if the old master came up, then the bot will add them back to the cluster. Here BOT refers to the Patroni. Patroni is the successor of compose governor.

The Patroni:

Credit: Zalando

Its developed by Zalando.

Completely Python based and 100% Open source.

Maintain the cluster status in DCS(etcd, zookeeper, and Consul).

Rest API — get the state of the node and etc.

While adding a node to the replication, we can use the custom method to archive this(barman, Wal-E, your own scripts).

You can prevent some nodes(the nodes that you want only for reporting)to become a master.

You can decide, while adding a new node to the replication, from where to sync(from which node) the data.

Hooks — Trigger some actions once the bootstrap or when PostgreSQL starts, stops, Failover.

Manual switchover during the maintenance window and integration with HA proxy.

And a lot more features.

ETCD:

ETCD is a distributed control system. We’ll use ETCD to keep the PostgreSQL cluster’s health, node state, and other information about the cluster. The other important thing is, etcd also will be in a High availability mode. So either you can use it on GKE clusters or create a cluster on Compute engine and put a load balancer on top of it. But unfortunately, the GCP’s HTTP load balancer will support port 80 as a Front end port. But no worries, it won’t affect anything.

Credit: Zalando

Do we need HA Proxy:

To use Patroni, we need at least 3 nodes. In case of failover, we need some Virtual(or floating) IP address to make the application continue to access the database. To reduce this dependency, the HA proxy will always talk to the master node. Here is how HA proxy knows who is the master. In HA proxy, we need to give health check port. 8008 is the default Patroni rest API port. HA proxy will send a GET Request to all the nodes in the cluster in 8008 port. Only the Master node will give 200 OK status. Other nodes will return 503 Server Unavailable .

But as I mentioned above, its a Cloud Era. All the cloud providers have their own load balancers which work better than HA proxy(in terms of scalability, customization, availability and etc). Patroni provides feasibility via Rest API to determine the role of a node. On each node, we can trigger a Get Request to /master and /replica URLs. If the node is master, then the /master will return 200 OK. On the replica’s it’ll return 503. Similarly /replica will return 503 on replicas and 200 OK on slave nodes.

GCP Internal Load Balancer:

This feature will help us to deploy the GCP TCP load balancer to talk to master and slaves. We can create 2 load balancers.

Writer LB — Add all the PostgreSQL under this LB. In the health check use /master as a path. Reader LB — Add all the PostgreSQL under this LB. In the health check use /replica as a path.

Lets say if you have 3 node cluster, then the Writer LB shows 1/3 nodes are healthy meantime Reader LB shows 2/3 nodes are healthy.

Standby Cluster:

Credit: Zalando

Patroni is good for HA, but if you have some DR servers on a different Region or Reporting Replica with minimal hardware configuration, Automatic failover should not promote these nodes as a Master. Or if you have any delayed replica, that node also won’t become a master. In this case, we can define which nodes should be eligible for master.

Final Architecture:

In GCP, I have designed the below architecture for Patroni.