repmgr 4.3 Documentation | |||
---|---|---|---|
Prev | Up | Chapter 12. Automatic failover with repmgrd | Next |
From repmgr 4.3, repmgr makes it possible to provide a script to repmgrd which, in a failover situation, will be executed by the promotion candidate (the node which has been selected to be the new primary) to confirm whether the node should actually be promoted.
To use this, failover_validation_command in repmgr.conf to a script executable by the postgres system user, e.g.:
failover_validation_command=/path/to/script.sh %n %a
The %n parameter will be replaced with the node ID, and the %a parameter will be replaced by the node name when the script is executed.
This script must return an exit code of 0 to indicate the node should promote itself. Any other value will result in the promotion being aborted and the election rerun. There is a pause of election_rerun_interval seconds before the election is rerun.
Sample repmgrd log file output during which the failover validation script rejects the proposed promotion candidate:
[2019-03-13 21:01:30] [INFO] visible nodes: 2; total nodes: 2; no nodes have seen the primary within the last 4 seconds [2019-03-13 21:01:30] [NOTICE] promotion candidate is "node2" (ID: 2) [2019-03-13 21:01:30] [NOTICE] executing "failover_validation_command" [2019-03-13 21:01:30] [DETAIL] /usr/local/bin/failover-validation.sh 2 [2019-03-13 21:01:30] [INFO] output returned by failover validation command: Node ID: 2 [2019-03-13 21:01:30] [NOTICE] failover validation command returned a non-zero value: "1" [2019-03-13 21:01:30] [NOTICE] promotion candidate election will be rerun [2019-03-13 21:01:30] [INFO] 1 followers to notify [2019-03-13 21:01:30] [NOTICE] notifying node "node3" (node ID: 3) to rerun promotion candidate selection INFO: node 3 received notification to rerun promotion candidate election [2019-03-13 21:01:30] [NOTICE] rerunning election after 15 seconds ("election_rerun_interval")