In parts One through Three, we’ve laid out the orchestration concerns of PostgreSQL on Kubernetes — how the application should behave, and then how to coordinate this behavior the Kubernetes way. The underlying (or overarching) pattern is to represent the behavior of the application as a set of agents, each working to move its component to a centrally-defined desired state. Kubernetes itself works this way.

Understanding the basic framework for orchestration is a big step towards being able to automate any distributed application. In this final post, we’ll look at a few of the remaining details.

Backing Up

The primary (master) PostgreSQL instance is continuously writing chunks of write-ahead log (WAL) to a backup volume. Periodically, it also writes full backups. There are a couple requirements for the way this volume is managed:

The volume isn’t unbounded; eventually it will run out of space.

The volume must always contain enough information to bring up a new PostgreSQL instance.

One answer is to have an additional worker (perhaps a separate Pod) that throws out WAL older than the latest full backup and moves old full backups to long-term storage. As long as the backup volume always contains at least the latest full backup and the WAL since then, there’s enough information to bring a new instance up to date.

Monitoring

Both users and the application orchestration layer need to keep an eye on the health of our PostgreSQL instances. For example, when the primary fails, the controller needs to start the failover process. To do this, the controller first needs to know when the primary failed.

One approach might be to use Kubernetes’ built-in Pod health check system. If our PostgreSQL instance Pod implements a health check, the controller can see in the Kubernetes API when a Pod becomes unhealthy. For more granular monitoring, the controller can set up a direct connection to each PostgreSQL instance Pod, which can stream whatever monitoring information we require. Of course, there are more complex options as well, depending on your needs.

Load Balancing & Failover

While there’s only one primary instance to handle write queries, PostgreSQL can offload read-only queries to any number of standby instances. To handle incoming requests, we can use two Kubernetes Services — one for the primary, and one for the standbys. When a standby is promoted to primary, simply update its labels so it belongs to the primary service rather than the standby service. Components that use the PostgreSQL services can rely on the services’ DNS names, so they never need to know that a change has occurred.

Distribution Across Failure Zones

Distributing PostgreSQL instances across failure zones is very straightforward. In fact, it doesn’t even require any code. It’s a built-in Kubernetes feature called Pod Affinity. I’ve written a separate tutorial on exactly that subject. Read it here. The short version is that you declare an anti-affinity on each PostgreSQL instance Pod. When the scheduler chooses where to run them, they will repel each other into different zones.

Want more? Ask away!

Automating a complex application like PostgreSQL is a large topic. I’ve covered what I believe are the more interesting and valuable pieces. If you want to know more about anything in this series (or anything I’ve left out), leave a comment!

I’m excited to hear what you all are interested in learning about, and I’d love to write more!