In Part One, we outlined how PostgreSQL Pods in Kubernetes need to behave in a production-ready PostgreSQL installation. Today, we’ll explore how these behaviors can be coordinated.

At a high level, the PostgreSQL application is orchestrated by a control layer. The control layer is composed of central store (the “source of truth”) and distributed agents that make decisions based on the central store state and feed state changes back into the central store.

The Controller

Just like parts of Kubernetes itself, a PostgreSQL application has a centralized controller that represents how the application acts as a whole. Application-level behaviors like scaling, upgrading to a new PostgreSQL version, choosing a new primary (master) to start failover, or the initial deployment itself — these are all the responsibility of the controller.

If you want to reason about a distributed PostgreSQL application like it’s a single process, that’s what the controller lets you do.

What’s under the hood?

From the outside, the controller makes PostgreSQL look like a single process, but inside it’s more complicated.

First of all, the controller itself is more than just a single process. If the controller fails, a new instance is created to replace it. One way to do this is with a single-replica Kubernetes Deployment resource. Kubernetes will do its best to make sure there’s always one controller running. Additionally, where a single process keeps its state in memory, the controller uses the central store, typically a distributed database running across multiple machines.

How does a distributed system like the combination of controller and central store present a unified application-level interface for PostgreSQL? The answer lies in categorizing the possible states of the PostgreSQL application. For example:

Pre-deployment: PostgreSQL isn’t running yet.

PostgreSQL isn’t running yet. Healthy: Running with 1 primary and N standbys

Running with 1 primary and N standbys Failover Required: Running with N standbys but no primary.

Consider a scenario where the primary fails and the controller must select a standby to promote. The state changes might look like this:

Healthy: 1 primary and 2 standbys Failover Required: 0 primary and 2 standbys Healthy: 1 primary and 1 standby Healthy: 1 primary and 2 standbys

If a disaster kills the whole PostgreSQL deployment, we might see this:

Healthy: 1 primary and 2 standbys Pre-deployment: PostgreSQL isn’t running any more. Healthy: 1 primary and 0 standbys Healthy: 1 primary and 2 standbys

These examples raise a few important questions: How does the controller decide what to do in a given state? How is the state populated — what external information is it based on? How are these states represented in the store?

How does the controller decide what to do?

Kubernetes follows a pattern of keeping separate spec and status fields. The spec represents the desired state, while the status is the actual state. A controller in Kubernetes acts to push the status closer to spec .

The PostgreSQL controller does the same. The central store holds the desired state, as specified by the application administrator, and the actual state, as collected by the controller and other agents (e.g. PostgreSQL Pod sidecars). The controller operates as a loop, continually performing actions to unify the actual state with the desired state.

How does the controller actually do things?

The controller triggers changes to the rest of the application by modifying the central store. In order to create a new PostgreSQL instance, the controller creates a PostgreSQL Pod in the Kubernetes API (a kind of central store). In order to promote a standby to primary, the controller updates that instance’s desired state to say “primary”. The sidecar for that PostgreSQL instance actually performs the promotion.

Coming Up Next

It’s a complex design problem to figure out how to represent the state of a distributed application. It deserves special attention. The next post in this series will address the questions we left open today:

How is the PostgreSQL application state represented in the central store?

How is it this representation kept in sync with the real-world state of the application?

Thanks for reading, and I’ll see you all in Part Three!