13.1. repmgrd configuration

13.1. repmgrd configuration
Prev	Up	Chapter 13. repmgrd setup and configuration	Home	Next

13.1.1. Required configuration for automatic failover
13.1.2. Optional configuration for automatic failover
13.1.3. Configuring repmgrd and pgbouncer to fence a failed primary node
13.1.4. PostgreSQL service configuration
13.1.5. repmgrd service configuration
13.1.6. Monitoring configuration
13.1.7. Applying configuration changes to repmgrd

To use repmgrd, its associated function library must be included via postgresql.conf with:

        shared_preload_libraries = 'repmgr'

Changing this setting requires a restart of PostgreSQL; for more details see the PostgreSQL documentation.

The following configuraton options apply to repmgrd in all circumstances:

monitor_interval_secs

The interval (in seconds, default: 2) to check the availability of the upstream node.

connection_check_type

The option connection_check_type is used to select the method repmgrd uses to determine whether the upstream node is available.

Possible values are:

ping (default) - uses PQping() to determine server availability
connection - determines server availability by attempting to make a new connection to the upstream node
query - determines server availability by executing an SQL statement on the node via the existing connection
The query is a minimal throwaway query - SELECT 1 - which is used to determine that the server can accept queries.

reconnect_attempts

The number of attempts (default: 6) will be made to reconnect to an unreachable upstream node before initiating a failover.

There will be an interval of reconnect_interval seconds between each reconnection attempt.

reconnect_interval

Interval (in seconds, default: 10) between attempts to reconnect to an unreachable upstream node.

The number of reconnection attempts is defined by the parameter reconnect_attempts.

degraded_monitoring_timeout

Interval (in seconds) after which repmgrd will terminate if either of the servers (local node and or upstream node) being monitored is no longer available (degraded monitoring mode).

-1 (default) disables this timeout completely.

See also repmgr.conf.sample for an annotated sample configuration file.

13.1.1. Required configuration for automatic failover

The following repmgrd options must be set in repmgr.conf:

failover
promote_command
follow_command

Example:

          failover=automatic
          promote_command='/usr/bin/repmgr standby promote -f /etc/repmgr.conf --log-to-file'
          follow_command='/usr/bin/repmgr standby follow -f /etc/repmgr.conf --log-to-file --upstream-node-id=%n'

Details of each option are as follows:

failover

failover can be one of automatic or manual.

Note

If failover is set to manual, repmgrd will not take any action if a failover situation is detected, and the node may need to be modified manually (e.g. by executing repmgr standby follow).

promote_command

The program or script defined in promote_command will be executed in a failover situation when repmgrd determines that the current node is to become the new primary node.

Normally promote_command is set as repmgr's repmgr standby promote command.

Note

When invoking repmgr standby promote (either directly via the promote_command, or in a script called via promote_command), --siblings-follow must not be included as a command line option for repmgr standby promote.

It is also possible to provide a shell script to e.g. perform user-defined tasks before promoting the current node. In this case the script must at some point execute repmgr standby promote to promote the node; if this is not done, repmgr metadata will not be updated and repmgr will no longer function reliably.

Example:

                promote_command='/usr/bin/repmgr standby promote -f /etc/repmgr.conf --log-to-file'

Note that the --log-to-file option will cause output generated by the repmgr command, when executed by repmgrd, to be logged to the same destination configured to receive log output for repmgrd.

Note

repmgr will not apply pg_bindir when executing promote_command or follow_command; these can be user-defined scripts so must always be specified with the full path.

follow_command

The program or script defined in follow_command will be executed in a failover situation when repmgrd determines that the current node is to follow the new primary node.

Normally follow_command is set as repmgr's repmgr standby follow command.

The follow_command parameter should provide the --upstream-node-id=%n option to repmgr standby follow; the %n will be replaced by repmgrd with the ID of the new primary node. If this is not provided, repmgr standby follow will attempt to determine the new primary by itself, but if the original primary comes back online after the new primary is promoted, there is a risk that repmgr standby follow will result in the node continuing to follow the original primary.

It is also possible to provide a shell script to e.g. perform user-defined tasks before promoting the current node. In this case the script must at some point execute repmgr standby follow to promote the node; if this is not done, repmgr metadata will not be updated and repmgr will no longer function reliably.

Example:

          follow_command='/usr/bin/repmgr standby follow -f /etc/repmgr.conf --log-to-file --upstream-node-id=%n'

Note that the --log-to-file option will cause output generated by the repmgr command, when executed by repmgrd, to be logged to the same destination configured to receive log output for repmgrd.

Note

repmgr will not apply pg_bindir when executing promote_command or follow_command; these can be user-defined scripts so must always be specified with the full path.

13.1.2. Optional configuration for automatic failover

The following configuraton options can be used to fine-tune automatic failover:

priority

Indicates a preferred priority (default: 100) for promoting nodes.

Note that the priority setting is only applied if two or more nodes are determined as promotion candidates; in that case the node with the higher priority is selected.

A value of zero will always prevent the node being promoted to primary, even if there is no other promotion candidate.

failover_validation_command

User-defined script to execute for an external mechanism to validate the failover decision made by repmgrd.

Note

This option must be identically configured on all nodes.

One or more of the following parameter placeholders may be provided, which will be replaced by repmgrd with the appropriate value:

%n: node ID
%a: node name
%v: number of visible nodes
%u: number of shared upstream nodes
%t: total number of nodes

Note

This option must be identically configured on all nodes.

always_promote

Default: false.

If true, promote the local node even if its repmgr metadata is not up-to-date.

Normally repmgr expects its metadata (stored in the repmgr.nodes table) to be up-to-date so repmgrd can take the correct action during a failover. However it's possible that updates made on the primary may not have propagated to the standby (promotion candidate). In this case repmgrd will default to not promoting the standby. This behaviour can be overridden by setting always_promote to true.

standby_disconnect_on_failover

In a failover situation, disconnect the local node's WAL receiver.

This option is available from PostgreSQL 9.5 and later.

Note

This option must be identically configured on all nodes.

Additionally the repmgr user must be a superuser for this option.

repmgrd will refuse to start if this option is set but either of these prerequisites is not met.

repmgrd_exit_on_inactive_node

This parameter is available in repmgr 5.3 and later.

If a node was marked as inactive but is running, and this option is set to true, repmgrd will abort on startup.

By default, repmgrd_exit_on_inactive_node is set to false, in which case repmgrd will set the node record to active on startup.

Setting this parameter to true causes repmgrd to behave in the same way it did in repmgr 5.2 and earlier.

The following options can be used to further fine-tune failover behaviour. In practice it's unlikely these will need to be changed from their default values, but are available as configuration options should the need arise.

election_rerun_interval: If failover_validation_command is set, and the command returns an error, pause the specified amount of seconds (default: 15) before rerunning the election.
sibling_nodes_disconnect_timeout: If standby_disconnect_on_failover is true, the maximum length of time (in seconds, default: 30) to wait for other standbys to confirm they have disconnected their WAL receivers.

13.1.3. Configuring repmgrd and pgbouncer to fence a failed primary node

For further details and a reference implementation, see the separate document Fencing a failed master node with repmgrd and PgBouncer.

13.1.4. PostgreSQL service configuration

If using automatic failover, currently repmgrd will need to execute repmgr standby follow to restart PostgreSQL on standbys to have them follow a new primary.

To ensure this happens smoothly, it's essential to provide the appropriate system/service restart command appropriate to your operating system via service_restart_command in repmgr.conf. If you don't do this, repmgrd will default to using pg_ctl, which can result in unexpected problems, particularly on systemd-based systems.

For more details, see service command settings.

13.1.5. repmgrd service configuration

If you are intending to use the repmgr daemon start and repmgr daemon stop commands, the following parameters must be set in repmgr.conf:

repmgrd_service_start_command
repmgrd_service_stop_command

Example (for repmgr with PostgreSQL 12 on CentOS 7):

repmgrd_service_start_command='sudo systemctl repmgr12 start'
repmgrd_service_stop_command='sudo systemctl repmgr12 stop'

For more details see the reference page for each command.

13.1.6. Monitoring configuration

To enable monitoring, set:

          monitoring_history=yes

in repmgr.conf.

Monitoring data is written at the interval defined by the option monitor_interval_secs (see above).

For more details on monitoring, see Storing monitoring data. For information on monitoring standby disconnections, see Monitoring standby disconnections on the primary.

13.1.7. Applying configuration changes to repmgrd

To apply configuration file changes to a running repmgrd daemon, execute the operating system's repmgrd service reload command (see Package details for examples), or for instances which were manually started, execute kill -HUP, e.g. kill -HUP `cat /tmp/repmgrd.pid`.

Tip

Check the repmgrd log to see what changes were applied, or if any issues were encountered when reloading the configuration.

Note that only the following subset of configuration file parameters can be changed on a running repmgrd daemon:

async_query_timeout
child_nodes_check_interval
child_nodes_connected_include_witness
child_nodes_connected_min_count
child_nodes_disconnect_command
child_nodes_disconnect_min_count
child_nodes_disconnect_timeout
connection_check_type
conninfo
degraded_monitoring_timeout
event_notification_command
event_notifications
failover_validation_command
failover
follow_command
log_facility
log_file
log_level
log_status_interval
monitor_interval_secs
monitoring_history
primary_notification_timeout
primary_visibility_consensus
always_promote
promote_command
reconnect_attempts
reconnect_interval
retry_promote_interval_secs
repmgrd_standby_startup_timeout
sibling_nodes_disconnect_timeout
standby_disconnect_on_failover

The following set of configuration file parameters must be updated via repmgr standby register --force, as they require changes to the repmgr.nodes table so they are visible to all nodes in the replication cluster:

node_id
node_name
data_directory
location
priority

Note

After executing repmgr standby register --force, repmgrd must be restarted for the changes to take effect.