Why Do I Need PostgreSQL on Kubernetes

I’ve decided to migrate GolangCI from Heroku to Kubernetes. Before that GolangCI used Heroku PostgreSQL as the primary database. The popular way to migrate is to use managed cloud PostgreSQL service like AWS Aurora or Google Cloud SQL. For example, Algolia migrated from Heroku to GKE and used Google Cloud SQL. But I needed a fully controlled solution because GolangCI has an on-premise version that should work on a local Kubernetes cluster without access to AWS or Google Cloud.

With the Helm I Will Do it in 5 minutes

I was new to Kubernetes so first I have tried to install PostgreSQL by helm install stable/postgresql . This Helm chart uses a bitnami image. Bitnami has an article about using this image for making production-ready PostgreSQL on Kubernetes.

Soon I realized that this installation has not production-ready config and we need to install it like this:

helm install --name pg --namespace postgres -f ./values-production.yaml stable/postgresql kubectl -n postgres scale statefulset my-postgresql-slave --replicas = 3

Where is the Automatic Master Failover

HA PostgreSQL should automatically survive and recover if a node with a database master (primary) crashes. With the pure Kubernetes solution from the previous step it works in the following way:

a PostgreSQL master pod or node crashes Kubernetes runs a healthcheck every 10 seconds by default. We can tune periodSeconds and run a healthcheck every second (it’s the minimum interval). In one second after the crash Kubernetes detects it. Kubernetes schedules a new pod for the PostgreSQL master server. If a node crashed it may need to run a new node first. If just a pod failed Kubernetes can just restart a pod. PostgreSQL master starts

This pure Kubernetes HA solution has an advantage of the simplicity of setup. But it has the following drawbacks:

we have downtime between steps 1 and 4 and the downtime can be minutes if Kubernetes needs to restart a node no stable and trusted cross-zone availability PostgreSQL master run in a StatefulSet with persistent volumes.

By default persistent volumes exist only in one zone. If the full zone outage occurs we can’t automatically failover the master: Kubernetes pod with the master can’t be scheduled in another availability zone.

AWS EBS doesn’t support cross-zone replication of disks. Google Cloud does support regional persistent disks but it’s the beta feature.

I don’t trust Cross-Zone High Availability based on a closed-source disk replication. Better if HA based on open-source consensus algorithms. In this paper authors don’t call the “modern HA” this solution because it’s based on opaque disk replication and not consensus.

There is an interesting comparison of pure Kubernetes and Stolon HA. Stolon is the HA wrapper for PostgreSQL that can work on Kubernetes. We will look at it later.

Choosing Cross DC Consensus Based HA PostgreSQL

I decided that I need cross-datacenter (cross-zone) HA PostgreSQL solution based on consensus. There are two primary choices:

stolon

patroni - created in Zalando project

Stolon has the following architecture:

Stolon architecture. The image from https://github.com/sorintlab/stolon

Both projects are similar by supported features. The small comparison:

Patroni is supported by Postgres Operator Both of projects support sync replication: patroni and stolon. Both of them support quorum-based sync replication (but not natively by PostgreSQL): patroni and stolon

I’ve chosen (like lwolfs) the Stolon because it has more clear for me architecture.

Running Stolon PostgreSQL on Kubernetes

Set up

The steps I performed to set up Stolon PostgreSQL in k8s with Helm:

Make a copy of values.yml (I’ve used it from this commit). Create secrets for PostgreSQL superuser and replication in a namespace postgres :

Superuser:

kubectl create secret generic pg-su \ --namespace postgres \ --from-literal = username = 'su_username' --from-literal = password = 'su_password'

and set them:

values.yml

superuserSecret : name : pg - su usernameKey : username passwordKey : password

Replication:

kubectl create secret generic pg-repl \ --namespace postgres \ --from-literal = username = 'repl_username' --from-literal = password = 'repl_password'

and set them:

values.yml

replicationSecret : name : pg - repl usernameKey : username passwordKey : password

Remove not needed parts from values.yaml :

values.yml

superuserUsername : "stolon" superuserPassword : replicationUsername : "repluser" replicationPassword :

Configure replication. Replace

values.yml

clusterSpec : { }

with the

values.yml

clusterSpec : synchronousReplication : true minSynchronousStandbys : 1 maxSynchronousStandbys : 1 initMode : new

Set pod disruption budget

values.yml

podDisruptionBudget : minAvailable : 2

Configure pgParameters and set replica count to 3 (total 3 instances: 1 master, 1 sync standby and 1 async standby):

values.yml

pgParameters : max_connections : 100 keeper : replicaCount : 3 proxy : replicaCount : 3 sentinel : replicaCount : 3

Set up Prometheus service discovery annotations:

values.yml

annotations : prometheus.io/scrape : "true" prometheus.io/port : "8080"

Install the helm chart

$ helm install --name pg --namespace postgres -f local-values.yml ~/charts/stable/stolon/ NAME: pg LAST DEPLOYED: Thu Feb 28 13 :16:20 2019 NAMESPACE: postgres STATUS: DEPLOYED RESOURCES: == > v1/ServiceAccount NAME SECRETS AGE pg-stolon 1 5s == > v1beta1/Role NAME AGE pg-stolon 5s == > v1beta1/RoleBinding NAME AGE pg-stolon 5s == > v1/Service NAME CLUSTER-IP EXTERNAL-IP PORT ( S ) AGE pg-stolon-keeper-headless None < none > 5432 /TCP 5s pg-stolon-proxy 10.39 .250.171 < none > 5432 /TCP 5s == > v1beta2/Deployment NAME KIND pg-stolon-proxy Deployment.v1beta2.apps pg-stolon-sentinel Deployment.v1beta2.apps == > v1beta2/StatefulSet pg-stolon-keeper StatefulSet.v1beta2.apps == > v1beta1/PodDisruptionBudget NAME MIN-AVAILABLE MAX-UNAVAILABLE ALLOWED-DISRUPTIONS AGE pg-stolon-keeper 2 N/A 0 5s pg-stolon-proxy 2 N/A 0 5s pg-stolon-sentinel 2 N/A 0 5s == > v1/ConfigMap NAME DATA AGE pg-stolon 0 5s NOTES: Stolon cluster installed and initialized. To get superuser password run PGPASSWORD = $( kubectl get secret --namespace postgres pg-su -o jsonpath = "{.data.password}" | base64 --decode ; echo )

Check the installed cluster:

$ kubectl -n postgres get all NAME READY STATUS RESTARTS AGE pod/pg-stolon-create-cluster-mhmxm 0 /1 Completed 0 4m pod/pg-stolon-keeper-0 1 /1 Running 0 4m pod/pg-stolon-keeper-1 1 /1 Running 0 4m pod/pg-stolon-proxy-6f648b49d4-cnnpz 1 /1 Running 0 4m pod/pg-stolon-proxy-6f648b49d4-rsv8j 1 /1 Running 0 4m pod/pg-stolon-sentinel-694cf8f76f-8fmr9 1 /1 Running 0 4m pod/pg-stolon-sentinel-694cf8f76f-d5f6r 1 /1 Running 0 4m NAME TYPE CLUSTER-IP EXTERNAL-IP PORT ( S ) AGE service/kubernetes ClusterIP 10.39 .240.1 < none > 443 /TCP 3d service/pg-stolon-keeper-headless ClusterIP None < none > 5432 /TCP 4m service/pg-stolon-proxy ClusterIP 10.39 .250.171 < none > 5432 /TCP 4m NAME DESIRED CURRENT UP-TO-DATE AVAILABLE AGE deployment.apps/pg-stolon-proxy 2 2 2 2 4m deployment.apps/pg-stolon-sentinel 2 2 2 2 4m NAME DESIRED CURRENT READY AGE replicaset.apps/pg-stolon-proxy-6f648b49d4 2 2 2 4m replicaset.apps/pg-stolon-sentinel-694cf8f76f 2 2 2 4m NAME DESIRED CURRENT AGE statefulset.apps/pg-stolon-keeper 2 2 4m NAME DESIRED SUCCESSFUL AGE job.batch/pg-stolon-create-cluster 1 1 4m

and show distribution amongst nodes:

$ kubectl -n postgres get pods -o wide NAME READY STATUS RESTARTS AGE IP NODE pg-stolon-create-cluster-mhmxm 0 /1 Completed 0 6m 10.36 .2.28 gke-primary-services-default-pool-8dd9a533-3tkl pg-stolon-keeper-0 1 /1 Running 0 6m 10.36 .2.29 gke-primary-services-default-pool-8dd9a533-3tkl pg-stolon-keeper-1 1 /1 Running 0 6m 10.36 .2.30 gke-primary-services-default-pool-8dd9a533-3tkl pg-stolon-proxy-6f648b49d4-cnnpz 1 /1 Running 0 6m 10.36 .2.27 gke-primary-services-default-pool-8dd9a533-3tkl pg-stolon-proxy-6f648b49d4-rsv8j 1 /1 Running 0 6m 10.36 .0.30 gke-primary-services-default-pool-8dd9a533-s2vg pg-stolon-sentinel-694cf8f76f-8fmr9 1 /1 Running 0 6m 10.36 .0.29 gke-primary-services-default-pool-8dd9a533-s2vg pg-stolon-sentinel-694cf8f76f-d5f6r 1 /1 Running 0 6m 10.36 .1.18 gke-primary-services-default-pool-8dd9a533-h4zs

Errors I've caught during at first attempts First, I got: 2019 -03-01 06:19:51.982 GMT [ 108194 ] LOG: unrecognized configuration parameter "maxConnections" in file "/stolon-data/postgres/stolon-temp-postgresql.conf" line 6 < br/ > 2019 -03-01 06:19:51.982 GMT [ 108194 ] FATAL: configuration file "/stolon-data/postgres/stolon-temp-postgresql.conf" contains errors < br/ > 2019 -03-01T06:19:52.172Z ERROR cmd/keeper.go:1083 failed to start instance { "error" : "postgres exited unexpectedly" } < br/ > 2019 -03-01T06:19:53.897Z ERROR cmd/keeper.go:641 cannot get configured pg parameters { "error" : "dial unix /tmp/.s.PGSQL.5432: To fix it I’ve changed values.yml pgParameters : maxConnections : 100 to values.yml pgParameters : max_connections : "100" After that I got an error because of unknown sha256sum in a template: charts/stable/stolon/templates/sentinel-deployment.yaml annotations : checksum/config : { { include (print .Template.BasePath "/hooks/update - cluster - spec - job.yaml") . | sha256sum } } To fix it I’ve updated Helm to the latest version.

Test the Cluster

Check Simple Commands

Find pg cluster ip:

$ kubectl -n postgres get services | fgrep proxy pg-stolon-proxy ClusterIP 10.39 .252.208 < none > 5432 /TCP 18m

find password from the secret:

echo $( kubectl get secret --namespace postgres pg-su -o jsonpath = "{.data.password}" | base64 --decode )

connect to PostgreSQL through the proxy and create the table:

$ kubectl -n postgres exec -it pg-stolon-keeper-0 -- psql --host 10.39 .252.208 --port 5432 --username superuser_name -W test = $ kubectl -n postgres exec -it pg-stolon-keeper-0 -- psql --host 10.39 .252.208 --port 5432 --username superuser_name -W -d test test = CREATE TABLE test = INSERT 0 1 test = id | value ----+-------- 1 | value1 ( 1 row )

Check a Master Death

Find out the master pod: pg-stolon-keeper-0 is the master because pg_is_in_recovery is false for it.

$ kubectl -n postgres exec -it pg-stolon-keeper-0 -- psql --host 127.0 .0.1 --port 5432 --username superuser_name -W -d test test = pg_is_in_recovery ------------------- f ( 1 row )

Check the master death: connect to the current master in keeper-0 through the proxy:

$ kubectl -n postgres exec -it pg-stolon-keeper-0 -- psql --host 10.39 .252.208 --port 5432 --username superuser_name -W -d test test = id | value ----+-------- 1 | value1 ( 1 row )

Don’t close this SQL shell and in another window run:

$ kubectl -n postgres delete pod pg-stolon-keeper-0 pod "pg-stolon-keeper-0" deleted

Let’s check logs of sentinels:

2019 -03-01T08:25:38.090Z WARN cmd/sentinel.go:264 no keeper info available { "db" : "66ced60c" , "keeper" : "keeper0" } 2019 -03-01T08:25:43.140Z WARN cmd/sentinel.go:264 no keeper info available { "db" : "66ced60c" , "keeper" : "keeper0" } 2019 -03-01T08:25:48.201Z WARN cmd/sentinel.go:264 no keeper info available { "db" : "66ced60c" , "keeper" : "keeper0" } 2019 -03-01T08:25:53.412Z WARN cmd/sentinel.go:264 no keeper info available { "db" : "66ced60c" , "keeper" : "keeper0" } 2019 -03-01T08:25:58.462Z INFO cmd/sentinel.go:976 master db is failed { "db" : "66ced60c" , "keeper" : "keeper0" } 2019 -03-01T08:25:58.462Z INFO cmd/sentinel.go:987 trying to find a new master to replace failed master 2019 -03-01T08:25:58.463Z INFO cmd/sentinel.go:1013 electing db as the new master { "db" : "6784fde9" , "keeper" : "keeper1" } 2019 -03-01T08:26:08.564Z INFO cmd/sentinel.go:1132 removing old master db { "db" : "66ced60c" , "keeper" : "keeper0" } 2019 -03-01T08:26:08.564Z INFO cmd/sentinel.go:1218 one of the new synchronousStandbys has been removed { "db" : "66ced60c" , "inSyncStandbys" : [ ] , "synchronousStandbys" : [ "66ced60c" ] } 2019 -03-01T08:26:08.565Z INFO cmd/sentinel.go:1230 setting the expected sync-standbys to the current known in sync sync-standbys { "inSyncStandbys" : [ ] , "synchronousStandbys" : [ "66ced60c" ] }

we see that keeper-1 became a new master, check it:

$ kubectl -n postgres exec -it pg-stolon-keeper-1 -- psql --host 127.0 .0.1 --port 5432 --username superuser_name -W -d test -c 'select pg_is_in_recovery()' pg_is_in_recovery ------------------- f ( 1 row )

kubernetes restarts a pod:

$ kubectl -n postgres get pods | fgrep keeper pg-stolon-keeper-0 1 /1 Running 0 1m pg-stolon-keeper-1 1 /1 Running 0 49m pg-stolon-keeper-2 1 /1 Running 0 49m

before restart we had:

keeper-0 - master

keeper-1 - sync replica

keeper-2 - async replica

check it after restart:

$ kubectl -n postgres exec -it pg-stolon-keeper-1 cat /stolon-data/postgres/postgresql.conf | fgrep sync synchronous_standby_names = 'stolon_46cb4a21' $ kubectl -n postgres exec -it pg-stolon-keeper-0 cat /stolon-data/postgres/recovery.conf | fgrep slot_name primary_slot_name = 'stolon_3a667e52' $ kubectl -n postgres exec -it pg-stolon-keeper-2 cat /stolon-data/postgres/recovery.conf | fgrep slot_name primary_slot_name = 'stolon_46cb4a21'

and after the restart we have:

keeper-0 - async replica

keeper-1 - master

keeper-2 - sync replica

In PostgreSQL shell to the proxy run select again:

test = FATAL: terminating connection due to administrator command server closed the connection unexpectedly This probably means the server terminated abnormally before or while processing the request . The connection to the server was lost . Attempting reset: Succeeded . test = id | value 1 | value1 ( 1 row )

and we see that proxy closed connection to the old master as we expected.

Notes

I should note two things:

when killing master Stolon re-elected master faster than pod was restarted. the downside is that proxy goes only to a master, we need to set up haproxy to go to slaves too.

Conclusion

This Stolon setup is running in the production of GolangCI now. We haven’t encountered any problems with it yet.