8.4. Troubleshooting switchover issues

As emphasised previously, performing a switchover is a non-trivial operation and there are a number of potential issues which can occur. While repmgr attempts to perform sanity checks, there's no guaranteed way of determining the success of a switchover without actually carrying it out.

8.4.1. Demotion candidate (old primary) does not shut down

repmgr may abort a switchover with a message like:

ERROR: shutdown of the primary server could not be confirmed
HINT: check the primary server status before performing any further actions

This means the shutdown of the old primary has taken longer than repmgr expected, and it has given up waiting.

In this case, check the PostgreSQL log on the primary server to see what is going on. It's entirely possible the shutdown process is just taking longer than the timeout set by the configuration parameter shutdown_check_timeout (default: 60 seconds), in which case you may need to adjust this parameter.

Note: Note that shutdown_check_timeout is set on the node where repmgr standby switchover is executed (promotion candidate); setting it on the demotion candidate (former primary) will have no effect.

If the primary server has shut down cleanly, and no other node has been promoted, it is safe to restart it, in which case the replication cluster will be restored to its original configuration.

8.4.2. Switchover aborts with an "exclusive backup" error

repmgr may abort a switchover with a message like:

ERROR: unable to perform a switchover while primary server is in exclusive backup mode
HINT: stop backup before attempting the switchover

This means an exclusive backup is running on the current primary; interrupting this will not only abort the backup, but potentially leave the primary with an ambiguous backup state.

To proceed, either wait until the backup has finished, or cancel it with the command SELECT pg_stop_backup(). For more details see the PostgreSQL documentation section Making an exclusive low level backup.