8.1. Preparing for switchover

8.1.1. Switchover and pg_rewind

As mentioned in the previous section, success of the switchover operation depends on repmgr being able to shut down the current primary server quickly and cleanly.

Ensure that the promotion candidate has sufficient free walsenders available (PostgreSQL configuration item max_wal_senders), and if replication slots are in use, at least one free slot is available for the demotion candidate ( PostgreSQL configuration item max_replication_slots).

Ensure that a passwordless SSH connection is possible from the promotion candidate (standby) to the demotion candidate (current primary). If --siblings-follow will be used, ensure that passwordless SSH connections are possible from the promotion candidate to all nodes attached to the demotion candidate (including the witness server, if in use).

Note

repmgr expects to find the repmgr binary in the same path on the remote server as on the local server.

Double-check which commands will be used to stop/start/restart the current primary; this can be done by e.g. executing repmgr node service on the current primary:

     repmgr -f /etc/repmgr.conf node service --list-actions --action=stop
     repmgr -f /etc/repmgr.conf node service --list-actions --action=start
     repmgr -f /etc/repmgr.conf node service --list-actions --action=restart

These commands can be defined in repmgr.conf with service_start_command, service_stop_command and service_restart_command.

Important

If repmgr is installed from a package. you should set these commands to use the appropriate service commands defined by the package/operating system as these will ensure PostgreSQL is stopped/started properly taking into account configuration and log file locations etc.

If the service_*_command options aren't defined, repmgr will fall back to using pg_ctl to stop/start/restart PostgreSQL, which may not work properly, particularly when executed on a remote server.

For more details, see service command settings.

Note

On systemd systems we strongly recommend using the appropriate systemctl commands (typically run via sudo) to ensure systemd is informed about the status of the PostgreSQL service.

If using sudo for the systemctl calls, make sure the sudo specification doesn't require a real tty for the user. If not set this way, repmgr will fail to stop the primary.

See the service command settings documentation section for further details.

Check that access from applications is minimalized or preferably blocked completely, so applications are not unexpectedly interrupted.

Note

If an exclusive backup is running on the current primary, or if WAL replay is paused on the standby, repmgr will not perform the switchover.

Check there is no significant replication lag on standbys attached to the current primary.

If WAL file archiving is set up, check that there is no backlog of files waiting to be archived, as PostgreSQL will not finally shut down until all of these have been archived. If there is a backlog exceeding archive_ready_warning WAL files, repmgr will emit a warning before attempting to perform a switchover; you can also check manually with repmgr node check --archive-ready.

Note

From repmgr 4.2, repmgr will instruct any running repmgrd instances to pause operations while the switchover is being carried out, to prevent repmgrd from unintentionally promoting a node. For more details, see pausing the repmgrd service.

Users of repmgr versions prior to 4.2 should ensure that repmgrd is not running on any nodes while a switchover is being executed.

Finally, consider executing repmgr standby switchover with the --dry-run option; this will perform any necessary checks and inform you about success/failure, and stop before the first actual command is run (which would be the shutdown of the current primary). Example output:

      $ repmgr standby switchover -f /etc/repmgr.conf --siblings-follow --dry-run
      NOTICE: checking switchover on node "node2" (ID: 2) in --dry-run mode
      INFO: SSH connection to host "node1" succeeded
      INFO: archive mode is "off"
      INFO: replication lag on this standby is 0 seconds
      INFO: all sibling nodes are reachable via SSH
      NOTICE: local node "node2" (ID: 2) will be promoted to primary; current primary "node1" (ID: 1) will be demoted to standby
      INFO: following shutdown command would be run on node "node1":
        "pg_ctl -l /var/log/postgresql/startup.log -D '/var/lib/postgresql/data' -m fast -W stop"
      INFO: parameter "shutdown_check_timeout" is set to 60 seconds
    

Important

Be aware that --dry-run checks the prerequisites for performing the switchover and some basic sanity checks on the state of the database which might effect the switchover operation (e.g. replication lag); it cannot however guarantee the switchover operation will succeed. In particular, if the current primary does not shut down cleanly, repmgr will not be able to reliably execute the switchover (as there would be a danger of divergence between the former and new primary nodes).

Note

See repmgr standby switchover for a full list of available command line options and repmgr.conf settings relevant to performing a switchover.

8.1.1. Switchover and pg_rewind

If the demotion candidate does not shut down smoothly or cleanly, there's a risk it will have a slightly divergent timeline and will not be able to attach to the new primary. To fix this situation without needing to reclone the old primary, it's possible to use the pg_rewind utility, which will usually be able to resync the two servers.

To have repmgr execute pg_rewind if it detects this situation after promoting the new primary, add the --force-rewind option.

Note

If repmgr detects a situation where it needs to execute pg_rewind, it will execute a CHECKPOINT on the new primary before executing pg_rewind.

For more details on pg_rewind, see section Using pg_rewind in the repmgr node rejoin documentation and the PostgreSQL documentation at https://www.postgresql.org/docs/current/app-pgrewind.html.