To demonstrate automatic failover, set up a 3-node replication cluster (one primary and two standbys streaming directly from the primary) so that the cluster looks something like this:
$ repmgr -f /etc/repmgr.conf cluster show ID | Name | Role | Status | Upstream | Location | Connection string ----+-------+---------+-----------+----------+----------+-------------------------------------- 1 | node1 | primary | * running | | default | host=node1 dbname=repmgr user=repmgr 2 | node2 | standby | running | node1 | default | host=node2 dbname=repmgr user=repmgr 3 | node3 | standby | running | node1 | default | host=node3 dbname=repmgr user=repmgr
Start repmgrd on each standby and verify that it's running by examining the log output, which at log level INFO will look like this:
[2017-08-24 17:31:00] [NOTICE] using configuration file "/etc/repmgr.conf" [2017-08-24 17:31:00] [INFO] connecting to database "host=node2 dbname=repmgr user=repmgr" [2017-08-24 17:31:00] [NOTICE] starting monitoring of node node2 (ID: 2) [2017-08-24 17:31:00] [INFO] monitoring connection to upstream node "node1" (node ID: 1)
Each repmgrd should also have recorded its successful startup as an event:
$ repmgr -f /etc/repmgr.conf cluster event --event=repmgrd_start Node ID | Name | Event | OK | Timestamp | Details ---------+-------+---------------+----+---------------------+------------------------------------------------------------- 3 | node3 | repmgrd_start | t | 2017-08-24 17:35:54 | monitoring connection to upstream node "node1" (node ID: 1) 2 | node2 | repmgrd_start | t | 2017-08-24 17:35:50 | monitoring connection to upstream node "node1" (node ID: 1) 1 | node1 | repmgrd_start | t | 2017-08-24 17:35:46 | monitoring cluster primary "node1" (node ID: 1)
Now stop the current primary server with e.g.:
pg_ctl -D /var/lib/postgresql/data -m immediate stop
This will force the primary to shut down straight away, aborting all processes and transactions. This will cause a flurry of activity in the repmgrd log files as each repmgrd detects the failure of the primary and a failover decision is made. This is an extract from the log of a standby server (node2) which has promoted to new primary after failure of the original primary (node1).
[2017-08-24 23:32:01] [INFO] node "node2" (node ID: 2) monitoring upstream node "node1" (node ID: 1) in normal state [2017-08-24 23:32:08] [WARNING] unable to connect to upstream node "node1" (node ID: 1) [2017-08-24 23:32:08] [INFO] checking state of node 1, 1 of 5 attempts [2017-08-24 23:32:08] [INFO] sleeping 1 seconds until next reconnection attempt [2017-08-24 23:32:09] [INFO] checking state of node 1, 2 of 5 attempts [2017-08-24 23:32:09] [INFO] sleeping 1 seconds until next reconnection attempt [2017-08-24 23:32:10] [INFO] checking state of node 1, 3 of 5 attempts [2017-08-24 23:32:10] [INFO] sleeping 1 seconds until next reconnection attempt [2017-08-24 23:32:11] [INFO] checking state of node 1, 4 of 5 attempts [2017-08-24 23:32:11] [INFO] sleeping 1 seconds until next reconnection attempt [2017-08-24 23:32:12] [INFO] checking state of node 1, 5 of 5 attempts [2017-08-24 23:32:12] [WARNING] unable to reconnect to node 1 after 5 attempts INFO: setting voting term to 1 INFO: node 2 is candidate INFO: node 3 has received request from node 2 for electoral term 1 (our term: 0) [2017-08-24 23:32:12] [NOTICE] this node is the winner, will now promote self and inform other nodes INFO: connecting to standby database NOTICE: promoting standby DETAIL: promoting server using 'pg_ctl -l /var/log/postgres/startup.log -w -D '/var/lib/pgsql/data' promote' INFO: reconnecting to promoted server NOTICE: STANDBY PROMOTE successful DETAIL: node 2 was successfully promoted to primary INFO: node 3 received notification to follow node 2 [2017-08-24 23:32:13] [INFO] switching to primary monitoring mode
The cluster status will now look like this, with the original primary (node1) marked as inactive, and standby node3 now following the new primary (node2):
$ repmgr -f /etc/repmgr.conf cluster show ID | Name | Role | Status | Upstream | Location | Connection string ----+-------+---------+-----------+----------+----------+---------------------------------------------------- 1 | node1 | primary | - failed | | default | host=node1 dbname=repmgr user=repmgr 2 | node2 | primary | * running | | default | host=node2 dbname=repmgr user=repmgr 3 | node3 | standby | running | node2 | default | host=node3 dbname=repmgr user=repmgr
repmgr cluster event will display a summary of what happened to each server during the failover:
$ repmgr -f /etc/repmgr.conf cluster event Node ID | Name | Event | OK | Timestamp | Details ---------+-------+--------------------------+----+---------------------+----------------------------------------------------------------------------------- 3 | node3 | repmgrd_failover_follow | t | 2017-08-24 23:32:16 | node 3 now following new upstream node 2 3 | node3 | standby_follow | t | 2017-08-24 23:32:16 | node 3 is now attached to node 2 2 | node2 | repmgrd_failover_promote | t | 2017-08-24 23:32:13 | node 2 promoted to primary; old primary 1 marked as failed 2 | node2 | standby_promote | t | 2017-08-24 23:32:13 | node 2 was successfully promoted to primary