repmgr Replication Manager for PostgreSQL clusters

repmgr 2.0 released

What is repmgr?

repmgr is a set of open source tools that helps DBAs and System administrators manage a cluster of PostgreSQL databases.

By taking advantage of the Hot Standby capability introduced in PostgreSQL 9, repmgr greatly simplifies the process of setting up and managing database with high availability and scalability requirements.

repmgr simplifies administration and daily management, enhances productivity and reduces the overall costs of a PostgreSQL cluster by:

monitoring the replication process;

allowing DBAs to issue high availability operations such as switch-overs and fail-overs

repmgr is production quality software and used widely across the world with PostgreSQL. Many users rely on repmgr to maintain their replication setups and as such we take new releases seriously and mark the level of maturity as a guide for users. As repmgr 2.0 moves towards production we continue to regard the autofailover feature as currently in beta; we expect that to fully mature in the 2.1 release.

2ndQuadrant provides contract support for PostgreSQL that includes both repmgr and the autofailover feature.

Features

Improved Documentation

General refactoring, code quality improvements and stabilization work

Support for daemonizing ( -d / --daemonize )

/ ) PID file handling ( -p / --pid-file )

/ ) New config option: monitor_interval_secs

New config option: retry_promote_interval

New config option: logfile

New config option: pg_bindir

New config option: pgctl_options

Add timestamps to log line in stderr

Add a ssh_options parameter

parameter Make CLONE command try to make an exact copy including $PGDATA location

command try to make an exact copy including location Add detection of master failure

Add the notion of a witness server

Add experimental autofailover capabilities

Add a configuration parameter to indicate the script to execute on failover or follow

Make the monitoring optional and turned off by default, it can be turned on with --monitoring-history switch

switch Add tunables to specify number of retries to reconnect to master and the time between them

Bugfixes

Fixed PQexec() calls: fixed several calls where we did not check the result status but only the return value of PQexec() ; the query may have failed nonetheless

calls: fixed several calls where we did not check the result status but only the return value of ; the query may have failed nonetheless Flush stderr after a log message appears: We had the problem that the log file appeared empty for a long time due to file buffers. Thus we call fflush() after every log message so the log file gets written out to disk quickly

after a log message appears: We had the problem that the log file appeared empty for a long time due to file buffers. Thus we call after every log message so the log file gets written out to disk quickly Fixed repmgr repl_status columns: repmgr repl_status view had the column time_lag which was documented to be the time a standby is behind master. In fact it only works like this when viewed on the standby and not on the master: there it only was the time of the last status update. We dropped that column and replaced it by a new column „ communication_time_lag “ which is the content of the repl_status column on the master. On the standby we contain the time of the last update in shared mem though refer always to the correct time nonetheless where repl_status is queried. We also added a new column, „ replication_time_lag “, which refers to the apply delay.

view had the column which was documented to be the time a standby is behind master. In fact it only works like this when viewed on the standby and not on the master: there it only was the time of the last status update. We dropped that column and replaced it by a new column „ “ which is the content of the repl_status column on the master. On the standby we contain the time of the last update in shared mem though refer always to the correct time nonetheless where repl_status is queried. We also added a new column, „ “, which refers to the apply delay. Set connections to NULL when calling PQfinish() on them.

when calling on them. Performance improvements: the old implementation took round about 8 seconds per monitoring interval because it got caught in a sleep call and had to wait for timeouts. MUCH too long, especially when you look at the default monitor_interval value of 2 seconds – we could never hold that. The new implementation uses PQgetResult() and select() to avoid the sleep and thus the monitoring routine now only uses a fraction of the time before (<1s).

value of 2 seconds – we could never hold that. The new implementation uses and to avoid the sleep and thus the monitoring routine now only uses a fraction of the time before (<1s). Leak and memory fixes: Fixed some leaks and an overlapping strcpy() call.

call. Overhauled CloseConnections() : CloseConnections() did not have a NULL check for PQisBusy() call and was a macro. It also didn't set the connections to NULL . Now it is a function and sets the connections to NULL and checks for NULL before calling functions on connection variables.

: did not have a check for call and was a macro. It also didn't set the connections to . Now it is a function and sets the connections to and checks for before calling functions on connection variables. Ignore pg_log when cloning

when cloning Correctly check wal_keep_segments

General code refactoring

Log format fixes

handle stdin / stdout / stderr for repmgrd

/ / for repmgrd Added format checking for printf() like functions

like functions Added forgotten priority value when creating a witness

pg_config is now setable from outsite of the makefile

is now setable from outsite of the makefile Split install targets to install_prog and install_ext with doing both as the default

and with doing both as the default Flush output before calling system()

Initialize variables as sscanf() leaves them untouched upon error

leaves them untouched upon error No longer exit when standby connection drops

Several typos have been corrected

Fixed string comparison when reloading config files

Do not create data directory before sanity checks succeeded when creating a witness

Also check if query was successful when registering a new standby

Remove master node earlier so that master register --force succeeds when it is already registered

succeeds when it is already registered Do not exit with database in backup mode ( pg_start_backup() )

) Debian control file now accepts PostgreSQL 9.0, 9.1, 9.2 and 9.3

Now compiles with 9.3

Upcoming

Features we are working on in the near future:

timeline increase when a standby gets promoted

A better check which standby did receive most of the data

Respect the fact that a standby can be delayed on purpose a factor in the voting algorithm

include support for delayed standbys

Community and development

repmgr is free and open source software and is licensed under the GPLv3.

Contributions to repmgr are welcome. See the README.rst file for information about how to contribute.