14.1. Pausing the repmgrd service

14.1.1. Prerequisites for pausing repmgrd
14.1.2. Pausing/unpausing repmgrd
14.1.3. Details on the repmgrd pausing mechanism

In normal operation, repmgrd monitors the state of the PostgreSQL node it is running on, and will take appropriate action if problems are detected, e.g. (if so configured) promote the node to primary, if the existing primary has been determined as failed.

However, repmgrd is unable to distinguish between planned outages (such as performing a switchover or installing PostgreSQL maintenance released), and an actual server outage. In versions prior to repmgr 4.2 it was necessary to stop repmgrd on all nodes (or at least on all nodes where repmgrd is configured for automatic failover) to prevent repmgrd from making unintentional changes to the replication cluster.

From repmgr 4.2, repmgrd can now be "paused", i.e. instructed not to take any action such as performing a failover. This can be done from any node in the cluster, removing the need to stop/restart each repmgrd individually.

Note

For major PostgreSQL upgrades, e.g. from PostgreSQL 11 to PostgreSQL 12, repmgrd should be shut down completely and only started up once the repmgr packages for the new PostgreSQL major version have been installed.

14.1.1. Prerequisites for pausing repmgrd

In order to be able to pause/unpause repmgrd, following prerequisites must be met:

  • repmgr 4.2 or later must be installed on all nodes.
  • The same major repmgr version (e.g. 4.2) must be installed on all nodes (and preferably the same minor version).
  • PostgreSQL on all nodes must be accessible from the node where the pause/unpause operation is executed, using the conninfo string shown by repmgr cluster show.

Note

These conditions are required for normal repmgr operation in any case.

14.1.2. Pausing/unpausing repmgrd

To pause repmgrd, execute repmgr service pause (repmgr 4.2 - 4.4: repmgr daemon pause), e.g.:

$ repmgr -f /etc/repmgr.conf service pause
NOTICE: node 1 (node1) paused
NOTICE: node 2 (node2) paused
NOTICE: node 3 (node3) paused

The state of repmgrd on each node can be checked with repmgr service status (repmgr 4.2 - 4.4: repmgr daemon status), e.g.:

$ repmgr -f /etc/repmgr.conf service status
 ID | Name  | Role    | Status  | repmgrd | PID  | Paused?
----+-------+---------+---------+---------+------+---------
 1  | node1 | primary | running | running | 7851 | yes
 2  | node2 | standby | running | running | 7889 | yes
 3  | node3 | standby | running | running | 7918 | yes

Note

If executing a switchover with repmgr standby switchover, repmgr will automatically pause/unpause the repmgrd service as part of the switchover process.

If the primary (in this example, node1) is stopped, repmgrd running on one of the standbys (here: node2) will react like this:

[2019-08-28 12:22:21] [WARNING] unable to connect to upstream node "node1" (node ID: 1)
[2019-08-28 12:22:21] [INFO] checking state of node 1, 1 of 5 attempts
[2019-08-28 12:22:21] [INFO] sleeping 1 seconds until next reconnection attempt
...
[2019-08-28 12:22:24] [INFO] sleeping 1 seconds until next reconnection attempt
[2019-08-28 12:22:25] [INFO] checking state of node 1, 5 of 5 attempts
[2019-08-28 12:22:25] [WARNING] unable to reconnect to node 1 after 5 attempts
[2019-08-28 12:22:25] [NOTICE] node is paused
[2019-08-28 12:22:33] [INFO] node "node2" (ID: 2) monitoring upstream node "node1" (node ID: 1) in degraded state
[2019-08-28 12:22:33] [DETAIL] repmgrd paused by administrator
[2019-08-28 12:22:33] [HINT] execute "repmgr service unpause" to resume normal failover mode

If the primary becomes available again (e.g. following a software upgrade), repmgrd will automatically reconnect, e.g.:

[2019-08-28 12:25:41] [NOTICE] reconnected to upstream node 1 after 8 seconds, resuming monitoring

To unpause the repmgrd service, execute repmgr service unpause ((repmgr 4.2 - 4.4: repmgr daemon unpause), e.g.:

$ repmgr -f /etc/repmgr.conf service unpause
NOTICE: node 1 (node1) unpaused
NOTICE: node 2 (node2) unpaused
NOTICE: node 3 (node3) unpaused

Note

If the previous primary is no longer accessible when repmgrd is unpaused, no failover action will be taken. Instead, a new primary must be manually promoted using repmgr standby promote, and any standbys attached to the new primary with repmgr standby follow.

This is to prevent execution of repmgr service unpause resulting in the automatic promotion of a new primary, which may be a problem particularly in larger clusters, where repmgrd could select a different promotion candidate to the one intended by the administrator.

14.1.3. Details on the repmgrd pausing mechanism

The pause state of each node will be stored over a PostgreSQL restart.

repmgr service pause and repmgr service unpause can be executed even if repmgrd is not running; in this case, repmgrd will start up in whichever pause state has been set.

Note

repmgr service pause and repmgr service unpause do not start/stop repmgrd.

The commands repmgr daemon start and repmgr daemon stop (if correctly configured) can be used to start/stop repmgrd on individual nodes.