repmgr 4.1.1 Documentation

2ndQuadrant Ltd

This is the official documentation of repmgr 4.1.1 for use with PostgreSQL 9.3 - PostgreSQL 10. It describes the functionality supported by the current version of repmgr.

repmgr was developed by 2ndQuadrant along with contributions from other individuals and companies. Contributions from the community are appreciated and welcome - get in touch via github or the mailing list/forum. Multiple 2ndQuadrant customers contribute funding to make repmgr development possible.

2ndQuadrant, a Platinum sponsor of the PostgreSQL project, continues to develop repmgr to meet internal needs and those of customers. Other companies as well as individual developers are welcome to participate in the efforts.

Legal Notice

repmgr is Copyright © 2010-2018 by 2ndQuadrant, Ltd. All rights reserved.

This program is free software: you can redistribute it and/or modify it under the terms of the GNU General Public License as published by the Free Software Foundation, either version 3 of the License, or (at your option) any later version.

This program is distributed in the hope that it will be useful, but WITHOUT ANY WARRANTY; without even the implied warranty of MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the GNU General Public License for more details.

You should have received a copy of the GNU General Public License along with this program. If not, see https://www.gnu.org/licenses/ to obtain one.


Table of Contents
I. Getting started
1. repmgr overview
2. Installation
3. Quick-start guide
II. repmgr administration manual
4. repmgr configuration
5. Cloning standbys
6. Promoting a standby server with repmgr
7. Following a new primary
8. Performing a switchover with repmgr
9. Using a witness server
10. Event Notifications
11. Upgrading repmgr
III. Using repmgrd
12. Automatic failover with repmgrd
13. repmgrd configuration
14. repmgrd demonstration
15. repmgrd and cascading replication
16. Handling network splits with repmgrd
17. Using a witness server with repmgrd
18. "degraded monitoring" mode
19. Monitoring with repmgrd
20. BDR failover with repmgrd
IV. repmgr command reference
repmgr primary register -- initialise a repmgr installation and register the primary node
repmgr primary unregister -- unregister an inactive primary node
repmgr standby clone -- clone a PostgreSQL standby node from another PostgreSQL node
repmgr standby register -- add a standby's information to the repmgr metadata
repmgr standby unregister -- remove a standby's information from the repmgr metadata
repmgr standby promote -- promote a standby to a primary
repmgr standby follow -- attach a standby to a new primary
repmgr standby switchover -- promote a standby to primary and demote the existing primary to a standby
repmgr witness register -- add a witness node's information to the repmgr metadata
repmgr witness unregister -- remove a witness node's information to the repmgr metadata
repmgr node status -- show overview of a node's basic information and replication status
repmgr node check -- performs some health checks on a node from a replication perspective
repmgr node rejoin -- rejoin a dormant (stopped) node to the replication cluster
repmgr cluster show -- display information about each registered node in the replication cluster
repmgr cluster matrix --  runs repmgr cluster show on each node and summarizes output
repmgr cluster crosscheck -- cross-checks connections between each combination of nodes
repmgr cluster event -- output a formatted list of cluster events
repmgr cluster cleanup -- purge monitoring history
A. Release notes
A.1. Release 4.1.1
A.2. Release 4.1.0
A.3. Release 4.0.6
A.4. Release 4.0.5
A.5. Release 4.0.4
A.6. Release 4.0.3
A.7. Release 4.0.2
A.8. Release 4.0.1
A.9. Release 4.0.0
B. Verifying digital signatures
B.1. repmgr source code signing key
C. FAQ (Frequently Asked Questions)
C.1. General
C.2. repmgr
C.3. repmgrd
D. repmgr package details
D.1. CentOS Packages
D.2. Debian/Ubuntu Packages
D.3. Snapshot packages
D.4. Installing old package versions
D.5. Information for packagers
Index

Chapter 1. repmgr overview

This chapter provides a high-level overview of repmgr's components and functionality.


1.1. Concepts

This guide assumes that you are familiar with PostgreSQL administration and streaming replication concepts. For further details on streaming replication, see the PostgreSQL documentation section on streaming replication.

The following terms are used throughout the repmgr documentation.

replication cluster

In the repmgr documentation, "replication cluster" refers to the network of PostgreSQL servers connected by streaming replication.

node

A node is a single PostgreSQL server within a replication cluster.

upstream node

The node a standby server connects to, in order to receive streaming replication. This is either the primary server, or in the case of cascading replication, another standby.

failover

This is the action which occurs if a primary server fails and a suitable standby is promoted as the new primary. The repmgrd daemon supports automatic failover to minimise downtime.

switchover

In certain circumstances, such as hardware or operating system maintenance, it's necessary to take a primary server offline; in this case a controlled switchover is necessary, whereby a suitable standby is promoted and the existing primary removed from the replication cluster in a controlled manner. The repmgr command line client provides this functionality.

fencing

In a failover situation, following the promotion of a new standby, it's essential that the previous primary does not unexpectedly come back on line, which would result in a split-brain situation. To prevent this, the failed primary should be isolated from applications, i.e. "fenced off".

witness server

repmgr provides functionality to set up a so-called "witness server" to assist in determining a new primary server in a failover situation with more than one standby. The witness server itself is not part of the replication cluster, although it does contain a copy of the repmgr metadata schema.

The purpose of a witness server is to provide a "casting vote" where servers in the replication cluster are split over more than one location. In the event of a loss of connectivity between locations, the presence or absence of the witness server will decide whether a server at that location is promoted to primary; this is to prevent a "split-brain" situation where an isolated location interprets a network outage as a failure of the (remote) primary and promotes a (local) standby.

A witness server only needs to be created if repmgrd is in use.


1.2. Components

repmgr is a suite of open-source tools to manage replication and failover within a cluster of PostgreSQL servers. It supports and enhances PostgreSQL's built-in streaming replication, which provides a single read/write primary server and one or more read-only standbys containing near-real time copies of the primary server's database. It provides two main tools:

repmgr

A command-line tool used to perform administrative tasks such as:

  • setting up standby servers

  • promoting a standby server to primary

  • switching over primary and standby servers

  • displaying the status of servers in the replication cluster

repmgrd

A daemon which actively monitors servers in a replication cluster and performs the following tasks:

  • monitoring and recording replication performance

  • performing failover by detecting failure of the primary and promoting the most suitable standby server

  • provide notifications about events in the cluster to a user-defined script which can perform tasks such as sending alerts by email


1.3. Repmgr user and metadata

In order to effectively manage a replication cluster, repmgr needs to store information about the servers in the cluster in a dedicated database schema. This schema is automatically created by the repmgr extension, which is installed during the first step in initializing a repmgr-administered cluster (repmgr primary register) and contains the following objects:

Tables

  • repmgr.events: records events of interest

  • repmgr.nodes: connection and status information for each server in the replication cluster

  • repmgr.monitoring_history: historical standby monitoring information written by repmgrd

Views

  • repmgr.show_nodes: based on the table repmgr.nodes, additionally showing the name of the server's upstream node

  • repmgr.replication_status: when repmgrd's monitoring is enabled, shows current monitoring status for each standby.

The repmgr metadata schema can be stored in an existing database or in its own dedicated database. Note that the repmgr metadata schema cannot reside on a database server which is not part of the replication cluster managed by repmgr.

A database user must be available for repmgr to access this database and perform necessary changes. This user does not need to be a superuser, however some operations such as initial installation of the repmgr extension will require a superuser connection (this can be specified where required with the command line option --superuser).


Chapter 2. Installation

repmgr can be installed from binary packages provided by your operating system's packaging system, or from source.

In general we recommend using binary packages, unless unavailable for your operating system.

Source installs are mainly useful if you want to keep track of the very latest repmgr development and contribute to development. They're also the only option if there are no packages for your operating system yet.

Before installing repmgr make sure you satisfy the installation requirements.


2.1. Requirements for installing repmgr

repmgr is developed and tested on Linux and OS X, but should work on any UNIX-like system supported by PostgreSQL itself. There is no support for Microsoft Windows.

From version 4.0, repmgr is compatible with all PostgreSQL versions from 9.3, including PostgreSQL 10. Note that some repmgr functionality is not available in PostgreSQL 9.3 and PostgreSQL 9.4.

Note: If upgrading from repmgr 3.x, please see the section Upgrading from repmgr 3.x.

All servers in the replication cluster must be running the same major version of PostgreSQL, and we recommend that they also run the same minor version.

repmgr must be installed on each server in the replication cluster. If installing repmgr from packages, the package version must match the PostgreSQL version. If installing from source, repmgr must be compiled against the same major version.

A dedicated system user for repmgr is *not* required; as many repmgr and repmgrd actions require direct access to the PostgreSQL data directory, these commands should be executed by the postgres user.

Passwordless ssh connectivity between all servers in the replication cluster is not required, but is necessary in the following cases:

Tip: We recommend using a session multiplexer utility such as screen or tmux when performing long-running actions (such as cloning a database) on a remote server - this will ensure the repmgr action won't be prematurely terminated if your ssh session to the server is interrupted or closed.


2.2. Installing repmgr from packages

We recommend installing repmgr using the available packages for your system.


2.2.1. RedHat/CentOS/Fedora

repmgr RPM packages for RedHat/CentOS variants and Fedora are available from the 2ndQuadrant public repository; see following section for details.

RPM packages for repmgr are also available via Yum through the PostgreSQL Global Development Group RPM repository (http://yum.postgresql.org/). Follow the instructions for your distribution (RedHat, CentOS, Fedora, etc.) and architecture as detailed there. Note that it can take some days for new repmgr packages to become available via the this repository.

Note: repmgr packages are designed to be compatible with the community-provided PostgreSQL packages. They may not work with vendor-specific packages such as those provided by RedHat for RHEL customers, as the filesystem layout may be different to the community RPMs. Please contact your support vendor for assistance.

For more information on the package contents, including details of installation paths and relevant service commands, see the appendix section CentOS packages.


2.2.1.1. 2ndQuadrant public RPM yum repository

Beginning with repmgr 4.0.5, 2ndQuadrant provides a dedicated yum public repository for 2ndQuadrant software, including repmgr. We recommend using this for all future repmgr releases.

General instructions for using this repository can be found on its homepage. Specific instructions for installing repmgr follow below.

Installation

  • Locate the repository RPM for your PostgreSQL version from the list at: https://dl.2ndquadrant.com/

  • Install the repository definition for your distribution and PostgreSQL version (this enables the 2ndQuadrant repository as a source of repmgr packages).

    For example, for PostgreSQL 10 on CentOS, execute:

    curl https://dl.2ndquadrant.com/default/release/get/10/rpm | sudo bash

    Verify that the repository is installed with:

    sudo yum repolist

    The output should contain two entries like this:

    2ndquadrant-dl-default-release-pg10/7/x86_64        2ndQuadrant packages (PG10) for 7 - x86_64          4
    2ndquadrant-dl-default-release-pg10-debug/7/x86_64  2ndQuadrant packages (PG10) for 7 - x86_64 - Debug  3

  • Install the repmgr version appropriate for your PostgreSQL version (e.g. repmgr10):

    $ yum install repmgr10

Compatibility with PGDG Repositories

The 2ndQuadrant repmgr yum repository packages use the same definitions and file system layout as the main PGDG repository.

Normally yum will prioritize the repository with the most recent repmgr version. Once the PGDG repository has been updated, it doesn't matter which repository the packages are installed from.

To ensure the 2ndQuadrant repository is always prioritised, install yum-plugin-priorities and set the repository priorities accordingly.

Installing a specific package version

To install a specific package version, execute yum --showduplicates list for the package in question:

        [root@localhost ~]# yum --showduplicates list repmgr10
        Loaded plugins: fastestmirror
        Loading mirror speeds from cached hostfile
         * base: ftp.iij.ad.jp
         * extras: ftp.iij.ad.jp
         * updates: ftp.iij.ad.jp
        Available Packages
		repmgr10.x86_64                       4.0.3-1.rhel7                        pgdg10
		repmgr10.x86_64                       4.0.4-1.rhel7                        pgdg10
		repmgr10.x86_64                       4.0.5-1.el7                          2ndquadrant-repo-10

then append the appropriate version number to the package name with a hyphen, e.g.:

        [root@localhost ~]# yum install repmgr10-4.0.3-1.rhel7


2.2.2. Debian/Ubuntu

.deb packages for repmgr are available from the PostgreSQL Community APT repository (http://apt.postgresql.org/). Instructions can be found in the APT section of the PostgreSQL Wiki (https://wiki.postgresql.org/wiki/Apt).

For more information on the package contents, including details of installation paths and relevant service commands, see the appendix section Debian/Ubuntu packages.


2.2.2.1. 2ndQuadrant public apt repository for Debian/Ubuntu

Beginning with repmgr 4.0.5, 2ndQuadrant provides a public apt repository for 2ndQuadrant software, including repmgr.

General instructions for using this repository can be found on its homepage. Specific instructions for installing repmgr follow below.

Installation

  • Install the repository definition for your distribution and PostgreSQL version (this enables the 2ndQuadrant repository as a source of repmgr packages) by executing:

    curl https://dl.2ndquadrant.com/default/release/get/deb | sudo bash

    Note: This will automatically install the following additional packages, if not already present:

    • lsb-release
    • apt-transport-https

  • Install the repmgr version appropriate for your PostgreSQL version (e.g. repmgr10):

    $ apt-get install postgresql-10-repmgr

    Note: For packages for PostgreSQL 9.6 and earlier, the package name includes a period between major and minor version numbers, e.g. postgresql-9.6-repmgr.


2.3. Installing repmgr from source

2.3.1. Prerequisites for installing from source

To install repmgr the prerequisites for compiling PostgreSQL must be installed. These are described in PostgreSQL's documentation on build requirements and build requirements for documentation.

Most mainstream Linux distributions and other UNIX variants provide simple ways to install the prerequisites from packages.

  • Debian and Ubuntu: First add the apt.postgresql.org repository to your sources.list if you have not already done so. Then install the pre-requisites for building PostgreSQL with:

           sudo apt-get update
           sudo apt-get build-dep postgresql-9.6

  • RHEL or CentOS 6.x or 7.x: install the appropriate repository RPM for your system from yum.postgresql.org. Then install the prerequisites for building PostgreSQL with:

           sudo yum check-update
           sudo yum groupinstall "Development Tools"
           sudo yum install yum-utils openjade docbook-dtds docbook-style-dsssl docbook-style-xsl
           sudo yum-builddep postgresql96

Note: Select the appropriate PostgreSQL versions for your target repmgr version.


2.3.2. Getting repmgr source code

There are two ways to get the repmgr source code: with git, or by downloading tarballs of released versions.


2.3.2.1. Using git to get the repmgr sources

Use git if you expect to update often, you want to keep track of development or if you want to contribute changes to repmgr. There is no reason not to use git if you're familiar with it.

The source for repmgr is maintained at https://github.com/2ndQuadrant/repmgr.

There are also tags for each repmgr release, e.g. 4.0.5.

Clone the source code using git:

     git clone https://github.com/2ndQuadrant/repmgr

For more information on using git see git-scm.com.


2.3.2.2. Downloading release source tarballs

Official release source code is uploaded as tarballs to the repmgr website along with a tarball checksum and a matching GnuPG signature. See http://repmgr.org/ for the download information. See Verifying digital signatures for information on verifying digital signatures.

You will need to download the repmgr source, e.g. repmgr-4.0.tar.gz. You may optionally verify the package checksums from the .md5 files and/or verify the GnuPG signatures per Verifying digital signatures.

After you unpack the source code archives using tar xf the installation process is the same as if you were installing from a git clone.


2.3.3. Installation of repmgr from source

To installing repmgr from source, simply execute:

    ./configure && make install

Ensure pg_config for the target PostgreSQL version is in $PATH.


2.3.4. Building repmgr documentation

The repmgr documentation is (like the main PostgreSQL project) written in DocBook format. To build it locally as HTML, you'll need to install the required packages as described in the PostgreSQL documentation then execute:

    ./configure && make install-doc

The generated HTML files will be placed in the doc/html subdirectory of your source tree.

To build the documentation as a single HTML file, execute:

    cd doc/ && make repmgr.html

Note: Due to changes in PostgreSQL's documentation build system from PostgreSQL 10, the documentation can currently only be built agains PostgreSQL 9.6 or earlier. This limitation will be fixed when time and resources permit.


Chapter 3. Quick-start guide

This section gives a quick introduction to repmgr, including setting up a sample repmgr installation and a basic replication cluster.

These instructions for demonstration purposes and are not suitable for a production install, as issues such as account security considerations, and system administration best practices are omitted.

Note: To upgrade an existing repmgr 3.x installation, see section Upgrading from repmgr 3.x.


3.1. Prerequisites for setting up a basic replication cluster with repmgr

The following section will describe how to set up a basic replication cluster with a primary and a standby server using the repmgr command line tool.

We'll assume the primary is called node1 with IP address 192.168.1.11, and the standby is called node2 with IP address 192.168.1.12

Following software must be installed on both servers:

  • PostgreSQL
  • repmgr (matching the installed PostgreSQL major version)

At network level, connections between the PostgreSQL port (default: 5432) must be possible in both directions.

If you want repmgr to copy configuration files which are located outside the PostgreSQL data directory, and/or to test switchover functionality, you will also need passwordless SSH connections between both servers, and rsync should be installed.

Tip: For testing repmgr, it's possible to use multiple PostgreSQL instances running on different ports on the same computer, with passwordless SSH access to localhost enabled.


3.2. PostgreSQL configuration

On the primary server, a PostgreSQL instance must be initialised and running. The following replication settings may need to be adjusted:


    # Enable replication connections; set this figure to at least one more
    # than the number of standbys which will connect to this server
    # (note that repmgr will execute `pg_basebackup` in WAL streaming mode,
    # which requires two free WAL senders)

    max_wal_senders = 10

    # Ensure WAL files contain enough information to enable read-only queries
    # on the standby.
    #
    #  PostgreSQL 9.5 and earlier: one of 'hot_standby' or 'logical'
    #  PostgreSQL 9.6 and later: one of 'replica' or 'logical'
    #    ('hot_standby' will still be accepted as an alias for 'replica')
    #
    # See: https://www.postgresql.org/docs/current/static/runtime-config-wal.html#GUC-WAL-LEVEL

    wal_level = 'hot_standby'

    # Enable read-only queries on a standby
    # (Note: this will be ignored on a primary but we recommend including
    # it anyway)

    hot_standby = on

    # Enable WAL file archiving
    archive_mode = on

    # Set archive command to a script or application that will safely store
    # you WALs in a secure place. /bin/true is an example of a command that
    # ignores archiving. Use something more sensible.
    archive_command = '/bin/true'

    # If you have configured "pg_basebackup_options"
    # in "repmgr.conf" to include the setting "--xlog-method=fetch" (from
    # PostgreSQL 10 "--wal-method=fetch"), *and* you have not set
    # "restore_command" in "repmgr.conf"to fetch WAL files from another
    # source such as Barman, you'll need to set "wal_keep_segments" to a
    # high enough value to ensure that all WAL files generated while
    # the standby is being cloned are retained until the standby starts up.
    #
    # wal_keep_segments = 5000
   

Tip: Rather than editing these settings in the default postgresql.conf file, create a separate file such as postgresql.replication.conf and include it from the end of the main configuration file with: include 'postgresql.replication.conf.

Additionally, if you are intending to use pg_rewind, and the cluster was not initialised using data checksums, you may want to consider enabling wal_log_hints; for more details see Using pg_rewind.


3.3. Create the repmgr user and database

Create a dedicated PostgreSQL superuser account and a database for the repmgr metadata, e.g.

   createuser -s repmgr
   createdb repmgr -O repmgr
  

For the examples in this document, the name repmgr will be used for both user and database, but any names can be used.

Note: For the sake of simplicity, the repmgr user is created as a superuser. If desired, it's possible to create the repmgr user as a normal user. However for certain operations superuser permissions are requiredl; in this case the command line option --superuser can be provided to specify a superuser.

It's also assumed that the repmgr user will be used to make the replication connection from the standby to the primary; again this can be overridden by specifying a separate replication user when registering each node.

Tip: repmgr will install the repmgr extension, which creates a repmgr schema containing the repmgr's metadata tables as well as other functions and views. We also recommend that you set the repmgr user's search path to include this schema name, e.g.

       ALTER USER repmgr SET search_path TO repmgr, "$user", public;


3.4. Configuring authentication in pg_hba.conf

Ensure the repmgr user has appropriate permissions in pg_hba.conf and can connect in replication mode; pg_hba.conf should contain entries similar to the following:

    local   replication   repmgr                              trust
    host    replication   repmgr      127.0.0.1/32            trust
    host    replication   repmgr      192.168.1.0/24          trust

    local   repmgr        repmgr                              trust
    host    repmgr        repmgr      127.0.0.1/32            trust
    host    repmgr        repmgr      192.168.1.0/24          trust
  

Note that these are simple settings for testing purposes. Adjust according to your network environment and authentication requirements.


3.5. Preparing the standby

On the standby, do not create a PostgreSQL instance, but do ensure the destination data directory (and any other directories which you want PostgreSQL to use) exist and are owned by the postgres system user. Permissions must be set to 0700 (drwx------).

Check the primary database is reachable from the standby using psql:

    psql 'host=node1 user=repmgr dbname=repmgr connect_timeout=2'

Note: repmgr stores connection information as libpq connection strings throughout. This documentation refers to them as conninfo strings; an alternative name is DSN (data source name). We'll use these in place of the -h hostname -d databasename -U username syntax.


3.6. repmgr configuration file

Create a repmgr.conf file on the primary server. The file must contain at least the following parameters:

    node_id=1
    node_name=node1
    conninfo='host=node1 user=repmgr dbname=repmgr connect_timeout=2'
    data_directory='/var/lib/postgresql/data'
  

repmgr.conf should not be stored inside the PostgreSQL data directory, as it could be overwritten when setting up or reinitialising the PostgreSQL server. See sections Configuration and configuration file location for further details about repmgr.conf.

Tip: For Debian-based distributions we recommend explictly setting pg_bindir to the directory where pg_ctl and other binaries not in the standard path are located. For PostgreSQL 9.6 this would be /usr/lib/postgresql/9.6/bin/.

Note: repmgr only uses pg_bindir when it executes PostgreSQL binaries directly.

For user-defined scripts such as promote_command and the various service_*_commands, you must always explicitly provide the full path to the binary or script being executed, even if it is repmgr itself.

This is because these options can contain user-defined scripts in arbitrary locations, so prepending pg_bindir may break them.

See the file repmgr.conf.sample for details of all available configuration parameters.


3.7. Register the primary server

To enable repmgr to support a replication cluster, the primary node must be registered with repmgr. This installs the repmgr extension and metadata objects, and adds a metadata record for the primary server:

    $ repmgr -f /etc/repmgr.conf primary register
    INFO: connecting to primary database...
    NOTICE: attempting to install extension "repmgr"
    NOTICE: "repmgr" extension successfully installed
    NOTICE: primary node record (id: 1) registered

Verify status of the cluster like this:

    $ repmgr -f /etc/repmgr.conf cluster show
     ID | Name  | Role    | Status    | Upstream | Connection string
    ----+-------+---------+-----------+----------+--------------------------------------------------------
     1  | node1 | primary | * running |          | host=node1 dbname=repmgr user=repmgr connect_timeout=2
  

The record in the repmgr metadata table will look like this:

    repmgr=# SELECT * FROM repmgr.nodes;
    -[ RECORD 1 ]----+-------------------------------------------------------
    node_id          | 1
    upstream_node_id |
    active           | t
    node_name        | node1
    type             | primary
    location         | default
    priority         | 100
    conninfo         | host=node1 dbname=repmgr user=repmgr connect_timeout=2
    repluser         | repmgr
    slot_name        |
    config_file      | /etc/repmgr.conf

Each server in the replication cluster will have its own record. If repmgrd is in use, the fields upstream_node_id, active and type will be updated when the node's status or role changes.


3.8. Clone the standby server

Create a repmgr.conf file on the standby server. It must contain at least the same parameters as the primary's repmgr.conf, but with the mandatory values node, node_name, conninfo (and possibly data_directory) adjusted accordingly, e.g.:

    node_id=2
    node_name=node2
    conninfo='host=node2 user=repmgr dbname=repmgr connect_timeout=2'
    data_directory='/var/lib/postgresql/data'

Use the --dry-run option to check the standby can be cloned:

    $ repmgr -h node1 -U repmgr -d repmgr -f /etc/repmgr.conf standby clone --dry-run
    NOTICE: using provided configuration file "/etc/repmgr.conf"
    NOTICE: destination directory "/var/lib/postgresql/data" provided
    INFO: connecting to source node
    NOTICE: checking for available walsenders on source node (2 required)
    INFO: sufficient walsenders available on source node (2 required)
    NOTICE: standby will attach to upstream node 1
    HINT: consider using the -c/--fast-checkpoint option
    INFO: all prerequisites for "standby clone" are met

If no problems are reported, the standby can then be cloned with:

    $ repmgr -h node1 -U repmgr -d repmgr -f /etc/repmgr.conf standby clone

    NOTICE: using configuration file "/etc/repmgr.conf"
    NOTICE: destination directory "/var/lib/postgresql/data" provided
    INFO: connecting to source node
    NOTICE: checking for available walsenders on source node (2 required)
    INFO: sufficient walsenders available on source node (2 required)
    INFO: creating directory "/var/lib/postgresql/data"...
    NOTICE: starting backup (using pg_basebackup)...
    HINT: this may take some time; consider using the -c/--fast-checkpoint option
    INFO: executing:
      pg_basebackup -l "repmgr base backup" -D /var/lib/postgresql/data -h node1 -U repmgr -X stream
    NOTICE: standby clone (using pg_basebackup) complete
    NOTICE: you can now start your PostgreSQL server
    HINT: for example: pg_ctl -D /var/lib/postgresql/data start
  

This has cloned the PostgreSQL data directory files from the primary node1 using PostgreSQL's pg_basebackup utility. A recovery.conf file containing the correct parameters to start streaming from this primary server will be created automatically.

Note: By default, any configuration files in the primary's data directory will be copied to the standby. Typically these will be postgresql.conf, postgresql.auto.conf, pg_hba.conf and pg_ident.conf. These may require modification before the standby is started.

Make any adjustments to the standby's PostgreSQL configuration files now, then start the server.

For more details on repmgr standby clone, see the command reference. A more detailed overview of cloning options is available in the administration manual.


3.9. Verify replication is functioning

Connect to the primary server and execute:

    repmgr=# SELECT * FROM pg_stat_replication;
    -[ RECORD 1 ]----+------------------------------
    pid              | 19111
    usesysid         | 16384
    usename          | repmgr
    application_name | node2
    client_addr      | 192.168.1.12
    client_hostname  |
    client_port      | 50378
    backend_start    | 2017-08-28 15:14:19.851581+09
    backend_xmin     |
    state            | streaming
    sent_location    | 0/7000318
    write_location   | 0/7000318
    flush_location   | 0/7000318
    replay_location  | 0/7000318
    sync_priority    | 0
    sync_state       | async

This shows that the previously cloned standby (node2 shown in the field application_name) has connected to the primary from IP address 192.168.1.12.

From PostgreSQL 9.6 you can also use the view pg_stat_wal_receiver to check the replication status from the standby.

    repmgr=# SELECT * FROM pg_stat_wal_receiver;
    Expanded display is on.
    -[ RECORD 1 ]---------+--------------------------------------------------------------------------------
    pid                   | 18236
    status                | streaming
    receive_start_lsn     | 0/3000000
    receive_start_tli     | 1
    received_lsn          | 0/7000538
    received_tli          | 1
    last_msg_send_time    | 2017-08-28 15:21:26.465728+09
    last_msg_receipt_time | 2017-08-28 15:21:26.465774+09
    latest_end_lsn        | 0/7000538
    latest_end_time       | 2017-08-28 15:20:56.418735+09
    slot_name             |
    conninfo              | user=repmgr dbname=replication host=node1 application_name=node2
   

Note that the conninfo value is that generated in recovery.conf and will differ slightly from the primary's conninfo as set in repmgr.conf - among others it will contain the connecting node's name as application_name.


3.10. Register the standby

Register the standby server with:

    $ repmgr -f /etc/repmgr.conf standby register
    NOTICE: standby node "node2" (ID: 2) successfully registered

Check the node is registered by executing repmgr cluster show on the standby:

    $ repmgr -f /etc/repmgr.conf cluster show
     ID | Name  | Role    | Status    | Upstream | Location | Connection string
    ----+-------+---------+-----------+----------+----------+--------------------------------------
     1  | node1 | primary | * running |          | default  | host=node1 dbname=repmgr user=repmgr
     2  | node2 | standby |   running | node1    | default  | host=node2 dbname=repmgr user=repmgr

Both nodes are now registered with repmgr and the records have been copied to the standby server.


Chapter 4. repmgr configuration

4.1. Configuration file location

repmgr and repmgrd use a common configuration file, by default called repmgr.conf (although any name can be used if explicitly specified). repmgr.conf must contain a number of required parameters, including the database connection string for the local node and the location of its data directory; other values will be inferred from defaults if not explicitly supplied. See section required configuration file settings for more details.

The configuration file will be searched for in the following locations:

  • a configuration file specified by the -f/--config-file command line option

  • a location specified by the package maintainer (if repmgr as installed from a package and the package maintainer has specified the configuration file location)

  • repmgr.conf in the local directory

  • /etc/repmgr.conf

  • the directory reported by pg_config --sysconfdir

Note that if a file is explicitly specified with -f/--config-file, an error will be raised if it is not found or not readable, and no attempt will be made to check default locations; this is to prevent repmgr unexpectedly reading the wrong configuraton file.

Note: If providing the configuration file location with -f/--config-file, avoid using a relative path, particularly when executing repmgr primary register and repmgr standby register, as repmgr stores the configuration file location in the repmgr metadata for use when repmgr is executed remotely (e.g. during repmgr standby switchover). repmgr will attempt to convert the a relative path into an absolute one, but this may not be the same as the path you would explicitly provide (e.g. ./repmgr.conf might be converted to /path/to/./repmgr.conf, whereas you'd normally write /path/to/repmgr.conf).


4.2. Required configuration file settings

Each repmgr.conf file must contain the following parameters:

node_id (int)

A unique integer greater than zero which identifies the node.

node_name (string)

An arbitrary (but unique) string; we recommend using the server's hostname or another identifier unambiguously associated with the server to avoid confusion. Avoid choosing names which reflect the node's current role, e.g. primary or standby1 as roles can change and if you end up in a solution where the current primary is called standby1 (for example), things will be confusing to say the least.

conninfo (string)

Database connection information as a conninfo string. All servers in the cluster must be able to connect to the local node using this string.

For details on conninfo strings, see section Connection Strings in the PosgreSQL documentation.

If repmgrd is in use, consider explicitly setting connect_timeout in the conninfo string to determine the length of time which elapses before a network connection attempt is abandoned; for details see the PostgreSQL documentation.

data_directory (string)

The node's data directory. This is needed by repmgr when performing operations when the PostgreSQL instance is not running and there's no other way of determining the data directory.

For a full list of annotated configuration items, see the file repmgr.conf.sample.

For repmgrd-specific settings, see Chapter 13.

Note: The following parameters in the configuration file can be overridden with command line options:

  • -L/--log-level overrides log_level in repmgr.conf

  • -b/--pg_bindir overrides pg_bindir in repmgr.conf


4.3. Log settings

By default, repmgr and repmgrd write log output to STDERR. An alternative log destination can be specified (either a file or syslog).

Note: The repmgr application itself will continue to write log output to STDERR even if another log destination is configured, as otherwise any output resulting from a command line operation will "disappear" into the log.

This behaviour can be overriden with the command line option --log-to-file, which will redirect all logging output to the configured log destination. This is recommended when repmgr is executed by another application, particularly repmgrd, to enable log output generated by the repmgr application to be stored for later reference.

log_level (string)

One of DEBUG, INFO, NOTICE, WARNING, ERROR, ALERT, CRIT or EMERG.

Default is INFO.

Note that DEBUG will produce a substantial amount of log output and should not be enabled in normal use.

log_facility (string)

Logging facility: possible values are STDERR (default), or for syslog integration, one of LOCAL0, LOCAL1, ..., LOCAL7, USER.

log_file (string)

If log_facility is set to STDERR, log output can be redirected to the specified file.

See Section 13.4 for information on configuring log rotation.

log_status_interval (integer)

This setting causes repmgrd to emit a status log line at the specified interval (in seconds, default 300) describing repmgrd's current state, e.g.:

      [2018-07-12 00:47:32] [INFO] monitoring connection to upstream node "node1" (node ID: 1)

4.4. Service command settings

In some circumstances, repmgr (and repmgrd) need to be able to stop, start or restart PostgreSQL. repmgr commands which need to do this include repmgr standby follow, repmgr standby switchover and repmgr node rejoin.

By default, repmgr will use PostgreSQL's pg_ctl to control the PostgreSQL server. However this can lead to various problems, particularly when PostgreSQL has been installed from packages, and expecially so if systemd is in use.

Note: If using systemd, ensure you have RemoteIPC set to off. See the systemd entry in the PostgreSQL wiki for details.

With this in mind, we recommend to always configure repmgr to use the available system service commands.

To do this, specify the appropriate command for each action in repmgr.conf using the following configuration parameters:

    service_start_command
    service_stop_command
    service_restart_command
    service_reload_command

Note: repmgr will not apply pg_bindir when executing any of these commands; these can be user-defined scripts so must always be specified with the full path.

Note: It's also possible to specify a service_promote_command. This is intended for systems which provide a package-level promote command, such as Debian's pg_ctlcluster, to promote the PostgreSQL from standby to primary.

If your packaging system does not provide such a command, it can be left empty, and repmgr will generate the appropriate `pg_ctl ... promote` command.

Do not confuse this with promote_command, which is used by repmgrd to execute repmgr standby promote.

To confirm which command repmgr will execute for each action, use repmgr node service --list --action=..., e.g.:

      repmgr -f /etc/repmgr.conf node service --list --action=stop
      repmgr -f /etc/repmgr.conf node service --list --action=start
      repmgr -f /etc/repmgr.conf node service --list --action=restart
      repmgr -f /etc/repmgr.conf node service --list --action=reload

These commands will be executed by the system user which repmgr runs as (usually postgres) and will probably require passwordless sudo access to be able to execute the command.

For example, using systemd on CentOS 7, the service commands can be set as follows:

      service_start_command   = 'sudo systemctl start postgresql-9.6'
      service_stop_command    = 'sudo systemctl stop postgresql-9.6'
      service_restart_command = 'sudo systemctl restart postgresql-9.6'
      service_reload_command  = 'sudo systemctl reload postgresql-9.6'

and /etc/sudoers should be set as follows:

      Defaults:postgres !requiretty
      postgres ALL = NOPASSWD: /usr/bin/systemctl stop postgresql-9.6, \
        /usr/bin/systemctl start postgresql-9.6, \
        /usr/bin/systemctl restart postgresql-9.6 \
        /usr/bin/systemctl reload postgresql-9.6

Important: Debian/Ubuntu users: instead of calling sudo systemctl directly, use sudo pg_ctlcluster, e.g.:

      service_start_command   = 'sudo pg_ctlcluster 9.6 main start'
      service_stop_command    = 'sudo pg_ctlcluster 9.6 main stop'
      service_restart_command = 'sudo pg_ctlcluster 9.6 main restart'
      service_reload_command  = 'sudo pg_ctlcluster 9.6 main reload'

and set /etc/sudoers accordingly.

While pg_ctlcluster will work when executed as user postgres, it's strongly recommended to use sudo pg_ctlcluster on systemd systems, to ensure systemd has a correct picture of the PostgreSQL application state.


4.5. repmgr database user permissions

repmgr will create an extension database containing objects for administering repmgr metadata. The user defined in the conninfo setting must be able to access all objects. Additionally, superuser permissions are required to install the repmgr extension. The easiest way to do this is create the repmgr user as a superuser, however if this is not desirable, the repmgr user can be created as a normal user and a superuser specified with --superuser when registering a repmgr node.


Chapter 5. Cloning standbys

5.1. Cloning a standby from Barman

repmgr standby clone can use 2ndQuadrant's Barman application to clone a standby (and also as a fallback source for WAL files).

Tip: Barman (aka PgBarman) should be considered as an integral part of any PostgreSQL replication cluster. For more details see: https://www.pgbarman.org/.

Barman support provides the following advantages:

  • the primary node does not need to perform a new backup every time a new standby is cloned

  • a standby node can be disconnected for longer periods without losing the ability to catch up, and without causing accumulation of WAL files on the primary node

  • WAL management on the primary becomes much easier as there's no need to use replication slots, and wal_keep_segments does not need to be set.


5.1.1. Prerequisites for cloning from Barman

In order to enable Barman support for repmgr standby clone, following prerequisites must be met:

  • the barman_server setting in repmgr.conf is the same as the server configured in Barman;

  • the barman_host setting in repmgr.conf is set to the SSH hostname of the Barman server;

  • the restore_command setting in repmgr.conf is configured to use a copy of the barman-wal-restore script shipped with the barman-cli package (see section Using Barman as a WAL file source below).

  • the Barman catalogue includes at least one valid backup for this server.

Note: Barman support is automatically enabled if barman_server is set. Normally it is good practice to use Barman, for instance when fetching a base backup while cloning a standby; in any case, Barman mode can be disabled using the --without-barman command line option.

Tip: If you have a non-default SSH configuration on the Barman server, e.g. using a port other than 22, then you can set those parameters in a dedicated Host section in ~/.ssh/config corresponding to the value ofbarman_host in repmgr.conf. See the Host section in man 5 ssh_config for more details.

It's now possible to clone a standby from Barman, e.g.:

    NOTICE: using configuration file "/etc/repmgr.conf"
    NOTICE: destination directory "/var/lib/postgresql/data" provided
    INFO: connecting to Barman server to verify backup for test_cluster
    INFO: checking and correcting permissions on existing directory "/var/lib/postgresql/data"
    INFO: creating directory "/var/lib/postgresql/data/repmgr"...
    INFO: connecting to Barman server to fetch server parameters
    INFO: connecting to upstream node
    INFO: connected to source node, checking its state
    INFO: successfully connected to source node
    DETAIL: current installation size is 29 MB
    NOTICE: retrieving backup from Barman...
    receiving file list ...
    (...)
    NOTICE: standby clone (from Barman) complete
    NOTICE: you can now start your PostgreSQL server
    HINT: for example: pg_ctl -D /var/lib/postgresql/data start


5.1.2. Using Barman as a WAL file source

As a fallback in case streaming replication is interrupted, PostgreSQL can optionally retrieve WAL files from an archive, such as that provided by Barman. This is done by setting restore_command in recovery.conf to a valid shell command which can retrieve a specified WAL file from the archive.

barman-wal-restore is a Python script provided as part of the barman-cli package (Barman 2.0 and later; for Barman 1.x the script is provided separately as barman-wal-restore.py) which performs this function for Barman.

To use barman-wal-restore with repmgr and assuming Barman is located on the barmansrv host and that barman-wal-restore is located as an executable at /usr/bin/barman-wal-restore, repmgr.conf should include the following lines:

    barman_host=barmansrv
    barman_server=somedb
    restore_command=/usr/bin/barman-wal-restore barmansrv somedb %f %p

Note: barman-wal-restore supports command line switches to control parallelism (--parallel=N) and compression ( --bzip2, --gzip).

Note: To use a non-default Barman configuration file on the Barman server, specify this in repmgr.conf with barman_config:

      barman_config=/path/to/barman.conf


5.2. Cloning and replication slots

Replication slots were introduced with PostgreSQL 9.4 and are designed to ensure that any standby connected to the primary using a replication slot will always be able to retrieve the required WAL files. This removes the need to manually manage WAL file retention by estimating the number of WAL files that need to be maintained on the primary using wal_keep_segments. Do however be aware that if a standby is disconnected, WAL will continue to accumulate on the primary until either the standby reconnects or the replication slot is dropped.

To enable repmgr to use replication slots, set the boolean parameter use_replication_slots in repmgr.conf:

       use_replication_slots=true

Replication slots must be enabled in postgresql.conf by setting the parameter max_replication_slots to at least the number of expected standbys (changes to this parameter require a server restart).

When cloning a standby, repmgr will automatically generate an appropriate slot name, which is stored in the repmgr.nodes table, and create the slot on the upstream node:

    repmgr=# SELECT node_id, upstream_node_id, active, node_name, type, priority, slot_name
               FROM repmgr.nodes ORDER BY node_id;
     node_id | upstream_node_id | active | node_name |  type   | priority |   slot_name
    ---------+------------------+--------+-----------+---------+----------+---------------
           1 |                  | t      | node1     | primary |      100 | repmgr_slot_1
           2 |                1 | t      | node2     | standby |      100 | repmgr_slot_2
           3 |                1 | t      | node3     | standby |      100 | repmgr_slot_3
     (3 rows)

    repmgr=# SELECT slot_name, slot_type, active, active_pid FROM pg_replication_slots ;
       slot_name   | slot_type | active | active_pid
    ---------------+-----------+--------+------------
     repmgr_slot_2 | physical  | t      |      23658
     repmgr_slot_3 | physical  | t      |      23687
    (2 rows)

Note that a slot name will be created by default for the primary but not actually used unless the primary is converted to a standby using e.g. repmgr standby switchover.

Further information on replication slots in the PostgreSQL documentation: https://www.postgresql.org/docs/current/interactive/warm-standby.html#STREAMING-REPLICATION-SLOTS

Tip: While replication slots can be useful for streaming replication, it's recommended to monitor for inactive slots as these will cause WAL files to build up indefinitely, possibly leading to server failure.

As an alternative we recommend using 2ndQuadrant's Barman, which offloads WAL management to a separate server, negating the need to use replication slots to reserve WAL. See section Cloning from Barman for more details on using repmgr together with Barman.


5.3. Cloning and cascading replication

Cascading replication, introduced with PostgreSQL 9.2, enables a standby server to replicate from another standby server rather than directly from the primary, meaning replication changes "cascade" down through a hierarchy of servers. This can be used to reduce load on the primary and minimize bandwith usage between sites. For more details, see the PostgreSQL cascading replication documentation.

repmgr supports cascading replication. When cloning a standby, set the command-line parameter --upstream-node-id to the node_id of the server the standby should connect to, and repmgr will create recovery.conf to point to it. Note that if --upstream-node-id is not explicitly provided, repmgr will set the standby's recovery.conf to point to the primary node.

To demonstrate cascading replication, first ensure you have a primary and standby set up as shown in the Quick-start guide. Then create an additional standby server with repmgr.conf looking like this:

    node_id=3
    node_name=node3
    conninfo='host=node3 user=repmgr dbname=repmgr'
    data_directory='/var/lib/postgresql/data'

Clone this standby (using the connection parameters for the existing standby), ensuring --upstream-node-id is provide with the node_id of the previously created standby (if following the example, this will be 2):

    $ repmgr -h node2 -U repmgr -d repmgr -f /etc/repmgr.conf standby clone --upstream-node-id=2
    NOTICE: using configuration file "/etc/repmgr.conf"
    NOTICE: destination directory "/var/lib/postgresql/data" provided
    INFO: connecting to upstream node
    INFO: connected to source node, checking its state
    NOTICE: checking for available walsenders on upstream node (2 required)
    INFO: sufficient walsenders available on upstream node (2 required)
    INFO: successfully connected to source node
    DETAIL: current installation size is 29 MB
    INFO: creating directory "/var/lib/postgresql/data"...
    NOTICE: starting backup (using pg_basebackup)...
    HINT: this may take some time; consider using the -c/--fast-checkpoint option
    INFO: executing: 'pg_basebackup -l "repmgr base backup" -D /var/lib/postgresql/data -h node2 -U repmgr -X stream '
    NOTICE: standby clone (using pg_basebackup) complete
    NOTICE: you can now start your PostgreSQL server
    HINT: for example: pg_ctl -D /var/lib/postgresql/data start

then register it (note that --upstream-node-id must be provided here too):

     $ repmgr -f /etc/repmgr.conf standby register --upstream-node-id=2
     NOTICE: standby node "node2" (ID: 2) successfully registered
    

After starting the standby, the cluster will look like this, showing that node3 is attached to node2, not the primary (node1).

    $ repmgr -f /etc/repmgr.conf cluster show
     ID | Name  | Role    | Status    | Upstream | Location | Connection string
    ----+-------+---------+-----------+----------+----------+--------------------------------------
     1  | node1 | primary | * running |          | default  | host=node1 dbname=repmgr user=repmgr
     2  | node2 | standby |   running | node1    | default  | host=node2 dbname=repmgr user=repmgr
     3  | node3 | standby |   running | node2    | default  | host=node3 dbname=repmgr user=repmgr
    

Tip: Under some circumstances when setting up a cascading replication cluster, you may wish to clone a downstream standby whose upstream node does not yet exist. In this case you can clone from the primary (or another upstream node); provide the parameter --upstream-conninfo to explictly set the upstream's primary_conninfo string in recovery.conf.


5.4. Advanced cloning options

5.4.1. pg_basebackup options when cloning a standby

As repmgr uses pg_basebackup to clone a standby, it's possible to provide additional parameters for pg_basebackup to customise the cloning process.

By default, pg_basebackup performs a checkpoint before beginning the backup process. However, a normal checkpoint may take some time to complete; a fast checkpoint can be forced with the -c/--fast-checkpoint option. Note that this may impact performance of the server being cloned from (typically the primary) so should be used with care.

Tip: If Barman is set up for the cluster, it's possible to clone the standby directly from Barman, without any impact on the server the standby is being cloned from. For more details see Cloning from Barman.

Other options can be passed to pg_basebackup by including them in the repmgr.conf setting pg_basebackup_options.

If using a separate directory to store WAL files, provide the option --waldir (--xlogdir in PostgreSQL 9.6 and earlier) with the absolute path to the WAL directory. Any WALs generated during the cloning process will be copied here, and a symlink will automatically be created from the main data directory.

See the PostgreSQL pg_basebackup documentation for more details of available options.


5.4.2. Managing passwords

If replication connections to a standby's upstream server are password-protected, the standby must be able to provide the password so it can begin streaming replication.

The recommended way to do this is to store the password in the postgres system user's ~/.pgpass file. It's also possible to store the password in the environment variable PGPASSWORD, however this is not recommended for security reasons. For more details see the PostgreSQL password file documentation.

Note: If using a pgpass file, an entry for the replication user (by default the user who connects to the repmgr database) must be provided, with database name set to replication, e.g.:

          node1:5432:replication:repmgr:12345

If, for whatever reason, you wish to include the password in recovery.conf, set use_primary_conninfo_password to true in repmgr.conf. This will read a password set in PGPASSWORD (but not ~/.pgpass) and place it into the primary_conninfo string in recovery.conf. Note that PGPASSWORD will need to be set during any action which causes recovery.conf to be rewritten, e.g. repmgr standby follow.

It is of course also possible to include the password value in the conninfo string for each node, but this is obviously a security risk and should be avoided.

From PostgreSQL 9.6, libpq supports the passfile parameter in connection strings, which can be used to specify a password file other than the default ~/.pgpass.

To have repmgr write a custom password file in primary_conninfo, specify its location in passfile in repmgr.conf.


5.4.3. Separate replication user

In some circumstances it might be desirable to create a dedicated replication-only user (in addition to the user who manages the repmgr metadata). In this case, the replication user should be set in repmgr.conf via the parameter replication_user; repmgr will use this value when making replication connections and generating recovery.conf. This value will also be stored in the parameter repmgr.nodes table for each node; it no longer needs to be explicitly specified when cloning a node or executing repmgr standby follow.


Chapter 6. Promoting a standby server with repmgr

If a primary server fails or needs to be removed from the replication cluster, a new primary server must be designated, to ensure the cluster continues to function correctly. This can be done with repmgr standby promote, which promotes the standby on the current server to primary.

To demonstrate this, set up a replication cluster with a primary and two attached standby servers so that the cluster looks like this:

     $ repmgr -f /etc/repmgr.conf cluster show
     ID | Name  | Role    | Status    | Upstream | Location | Connection string
    ----+-------+---------+-----------+----------+----------+--------------------------------------
     1  | node1 | primary | * running |          | default  | host=node1 dbname=repmgr user=repmgr
     2  | node2 | standby |   running | node1    | default  | host=node2 dbname=repmgr user=repmgr
     3  | node3 | standby |   running | node1    | default  | host=node3 dbname=repmgr user=repmgr

Stop the current primary with e.g.:

   $ pg_ctl -D /var/lib/postgresql/data -m fast stop

At this point the replication cluster will be in a partially disabled state, with both standbys accepting read-only connections while attempting to connect to the stopped primary. Note that the repmgr metadata table will not yet have been updated; executing repmgr cluster show will note the discrepancy:

    $ repmgr -f /etc/repmgr.conf cluster show
     ID | Name  | Role    | Status        | Upstream | Location | Connection string
    ----+-------+---------+---------------+----------+----------+--------------------------------------
     1  | node1 | primary | ? unreachable |          | default  | host=node1 dbname=repmgr user=repmgr
     2  | node2 | standby |   running     | node1    | default  | host=node2 dbname=repmgr user=repmgr
     3  | node3 | standby |   running     | node1    | default  | host=node3 dbname=repmgr user=repmgr

    WARNING: following issues were detected
    node "node1" (ID: 1) is registered as an active primary but is unreachable

Now promote the first standby with:

   $ repmgr -f /etc/repmgr.conf standby promote

This will produce output similar to the following:

    INFO: connecting to standby database
    NOTICE: promoting standby
    DETAIL: promoting server using "pg_ctl -l /var/log/postgresql/startup.log -w -D '/var/lib/postgresql/data' promote"
    server promoting
    INFO: reconnecting to promoted server
    NOTICE: STANDBY PROMOTE successful
    DETAIL: node 2 was successfully promoted to primary

Executing repmgr cluster show will show the current state; as there is now an active primary, the previous warning will not be displayed:

    $ repmgr -f /etc/repmgr.conf cluster show
     ID | Name  | Role    | Status    | Upstream | Location | Connection string
    ----+-------+---------+-----------+----------+----------+--------------------------------------
     1  | node1 | primary | - failed  |          | default  | host=node1 dbname=repmgr user=repmgr
     2  | node2 | primary | * running |          | default  | host=node2 dbname=repmgr user=repmgr
     3  | node3 | standby |   running | node1    | default  | host=node3 dbname=repmgr user=repmgr

However the sole remaining standby (node3) is still trying to replicate from the failed primary; repmgr standby follow must now be executed to rectify this situation (see Chapter 7 for example).


Chapter 7. Following a new primary

Following the failure or removal of the replication cluster's existing primary server, repmgr standby follow can be used to make 'orphaned' standbys follow the new primary and catch up to its current state.

To demonstrate this, assuming a replication cluster in the same state as the end of the preceding section (Promoting a standby), execute this:

    $ repmgr -f /etc/repmgr.conf repmgr standby follow
    INFO: changing node 3's primary to node 2
    NOTICE: restarting server using "pg_ctl -l /var/log/postgresql/startup.log -w -D '/var/lib/postgresql/data' restart"
    waiting for server to shut down......... done
    server stopped
    waiting for server to start.... done
    server started
    NOTICE: STANDBY FOLLOW successful
    DETAIL: node 3 is now attached to node 2
  

The standby is now replicating from the new primary and repmgr cluster show output reflects this:

    $ repmgr -f /etc/repmgr.conf cluster show
     ID | Name  | Role    | Status    | Upstream | Location | Connection string
    ----+-------+---------+-----------+----------+----------+--------------------------------------
     1  | node1 | primary | - failed  |          | default  | host=node1 dbname=repmgr user=repmgr
     2  | node2 | primary | * running |          | default  | host=node2 dbname=repmgr user=repmgr
     3  | node3 | standby |   running | node2    | default  | host=node3 dbname=repmgr user=repmgr

Note that with cascading replication, repmgr standby follow can also be used to detach a standby from its current upstream server and follow the primary. However it's currently not possible to have it follow another standby; we hope to improve this in a future release.


Chapter 8. Performing a switchover with repmgr

A typical use-case for replication is a combination of primary and standby server, with the standby serving as a backup which can easily be activated in case of a problem with the primary. Such an unplanned failover would normally be handled by promoting the standby, after which an appropriate action must be taken to restore the old primary.

In some cases however it's desirable to promote the standby in a planned way, e.g. so maintenance can be performed on the primary; this kind of switchover is supported by the repmgr standby switchover command.

repmgr standby switchover differs from other repmgr actions in that it also performs actions on another server (the demotion candidate), which means passwordless SSH access is required to that server from the one where repmgr standby switchover is executed.

Note: repmgr standby switchover performs a relatively complex series of operations on two servers, and should therefore be performed after careful preparation and with adequate attention. In particular you should be confident that your network environment is stable and reliable.

Additionally you should be sure that the current primary can be shut down quickly and cleanly. In particular, access from applications should be minimalized or preferably blocked completely. Also be aware that if there is a backlog of files waiting to be archived, PostgreSQL will not shut down until archiving completes.

We recommend running repmgr standby switchover at the most verbose logging level (--log-level=DEBUG --verbose) and capturing all output to assist troubleshooting any problems.

Please also read carefully the sections Preparing for switchover and Caveats below.


8.1. Preparing for switchover

As mentioned in the previous section, success of the switchover operation depends on repmgr being able to shut down the current primary server quickly and cleanly.

Ensure that the promotion candidate has sufficient free walsenders available (PostgreSQL configuration item max_wal_senders), and if replication slots are in use, at least one free slot is available for the demotion candidate ( PostgreSQL configuration item max_replication_slots).

Ensure that a passwordless SSH connection is possible from the promotion candidate (standby) to the demotion candidate (current primary). If --siblings-follow will be used, ensure that passwordless SSH connections are possible from the promotion candidate to all standbys attached to the demotion candidate.

Note: repmgr expects to find the repmgr binary in the same path on the remote server as on the local server.

Double-check which commands will be used to stop/start/restart the current primary; on the current primary execute:

     repmgr -f /etc/repmgr.conf node service --list --action=stop
     repmgr -f /etc/repmgr.conf node service --list --action=start
     repmgr -f /etc/repmgr.conf node service --list --action=restart

These commands can be defined in repmgr.conf with service_start_command, service_stop_command and service_restart_command.

Important: If repmgr is installed from a package. you should set these commands to use the appropriate service commands defined by the package/operating system as these will ensure PostgreSQL is stopped/started properly taking into account configuration and log file locations etc.

If the service_*_command options aren't defined, repmgr will fall back to using pg_ctl to stop/start/restart PostgreSQL, which may not work properly, particularly when executed on a remote server.

For more details, see service command settings.

Note: On systemd systems we strongly recommend using the appropriate systemctl commands (typically run via sudo) to ensure systemd is informed about the status of the PostgreSQL service.

If using sudo for the systemctl calls, make sure the sudo specification doesn't require a real tty for the user. If not set this way, repmgr will fail to stop the primary.

Check that access from applications is minimalized or preferably blocked completely, so applications are not unexpectedly interrupted.

Note: If an exclusive backup is running on the current primary, repmgr will not perform the switchover.

Check there is no significant replication lag on standbys attached to the current primary.

If WAL file archiving is set up, check that there is no backlog of files waiting to be archived, as PostgreSQL will not finally shut down until all of these have been archived. If there is a backlog exceeding archive_ready_warning WAL files, repmgr will emit a warning before attempting to perform a switchover; you can also check manually with repmgr node check --archive-ready.

Note: Ensure that repmgrd is *not* running anywhere to prevent it unintentionally promoting a node. This restriction will be removed in a future repmgr version.

Finally, consider executing repmgr standby switchover with the --dry-run option; this will perform any necessary checks and inform you about success/failure, and stop before the first actual command is run (which would be the shutdown of the current primary). Example output:

      $ repmgr standby switchover -f /etc/repmgr.conf --siblings-follow --dry-run
      NOTICE: checking switchover on node "node2" (ID: 2) in --dry-run mode
      INFO: SSH connection to host "node1" succeeded
      INFO: archive mode is "off"
      INFO: replication lag on this standby is 0 seconds
      INFO: all sibling nodes are reachable via SSH
      NOTICE: local node "node2" (ID: 2) will be promoted to primary; current primary "node1" (ID: 1) will be demoted to standby
      INFO: following shutdown command would be run on node "node1":
        "pg_ctl -l /var/log/postgresql/startup.log -D '/var/lib/postgresql/data' -m fast -W stop"
    

Important: Be aware that --dry-run checks the prerequisites for performing the switchover and some basic sanity checks on the state of the database which might effect the switchover operation (e.g. replication lag); it cannot however guarantee the switchover operation will succeed. In particular, if the current primary does not shut down cleanly, repmgr will not be able to reliably execute the switchover (as there would be a danger of divergence between the former and new primary nodes).

Note: See repmgr standby switchover for a full list of available command line options and repmgr.conf settings relevant to performing a switchover.


8.1.1. Switchover and pg_rewind

If the demotion candidate does not shut down smoothly or cleanly, there's a risk it will have a slightly divergent timeline and will not be able to attach to the new primary. To fix this situation without needing to reclone the old primary, it's possible to use the pg_rewind utility, which will usually be able to resync the two servers.

To have repmgr execute pg_rewind if it detects this situation after promoting the new primary, add the --force-rewind option.

Note: If repmgr detects a situation where it needs to execute pg_rewind, it will execute a CHECKPOINT on the new primary before executing pg_rewind.

For more details on pg_rewind, see: https://www.postgresql.org/docs/current/static/app-pgrewind.html.

pg_rewind has been part of the core PostgreSQL distribution since version 9.5. Users of versions 9.3 and 9.4 will need to manually install it; the source code is available here: https://github.com/vmware/pg_rewind. If the pg_rewind binary is not installed in the PostgreSQL bin directory, provide its full path on the demotion candidate with --force-rewind.

Note that building the 9.3/9.4 version of pg_rewind requires the PostgreSQL source code. Also, PostgreSQL 9.3 does not provide wal_log_hints, meaning data checksums must have been enabled when the database was initialized.


8.2. Executing the switchover command

To demonstrate switchover, we will assume a replication cluster with a primary (node1) and one standby (node2); after the switchover node2 should become the primary with node1 following it.

The switchover command must be run from the standby which is to be promoted, and in its simplest form looks like this:

    $ repmgr -f /etc/repmgr.conf standby switchover
    NOTICE: executing switchover on node "node2" (ID: 2)
    INFO: searching for primary node
    INFO: checking if node 1 is primary
    INFO: current primary node is 1
    INFO: SSH connection to host "node1" succeeded
    INFO: archive mode is "off"
    INFO: replication lag on this standby is 0 seconds
    NOTICE: local node "node2" (ID: 2) will be promoted to primary; current primary "node1" (ID: 1) will be demoted to standby
    NOTICE: stopping current primary node "node1" (ID: 1)
    NOTICE: issuing CHECKPOINT
    DETAIL: executing server command "pg_ctl -l /var/log/postgres/startup.log -D '/var/lib/pgsql/data' -m fast -W stop"
    INFO: checking primary status; 1 of 6 attempts
    NOTICE: current primary has been cleanly shut down at location 0/3001460
    NOTICE: promoting standby to primary
    DETAIL: promoting server "node2" (ID: 2) using "pg_ctl -l /var/log/postgres/startup.log -w -D '/var/lib/pgsql/data' promote"
    server promoting
    NOTICE: STANDBY PROMOTE successful
    DETAIL: server "node2" (ID: 2) was successfully promoted to primary
    INFO: setting node 1's primary to node 2
    NOTICE: starting server using  "pg_ctl -l /var/log/postgres/startup.log -w -D '/var/lib/pgsql/data' restart"
    NOTICE: NODE REJOIN successful
    DETAIL: node 1 is now attached to node 2
    NOTICE: switchover was successful
    DETAIL: node "node2" is now primary
    NOTICE: STANDBY SWITCHOVER is complete
   

The old primary is now replicating as a standby from the new primary, and the cluster status will now look like this:

    $ repmgr -f /etc/repmgr.conf cluster show
     ID | Name  | Role    | Status    | Upstream | Location | Connection string
    ----+-------+---------+-----------+----------+----------+--------------------------------------
     1  | node1 | standby |   running | node2    | default  | host=node1 dbname=repmgr user=repmgr
     2  | node2 | primary | * running |          | default  | host=node2 dbname=repmgr user=repmgr
   


8.3. Caveats

  • If using PostgreSQL 9.3 or 9.4, you should ensure that the shutdown command is configured to use PostgreSQL's fast shutdown mode (the default in 9.5 and later). If relying on pg_ctl to perform database server operations, you should include -m fast in pg_ctl_options in repmgr.conf.
  • pg_rewind *requires* that either wal_log_hints is enabled, or that data checksums were enabled when the cluster was initialized. See the pg_rewind documentation for details.
  • repmgrd should not be running with setting failover=automatic in repmgr.conf when a switchover is carried out, otherwise the repmgrd daemon may try and promote a standby by itself.

We hope to remove some of these restrictions in future versions of repmgr.


Chapter 9. Using a witness server

A witness server is a normal PostgreSQL instance which is not part of the streaming replication cluster; its purpose is, if a failover situation occurs, to provide proof that the primary server itself is unavailable.

A typical use case for a witness server is a two-node streaming replication setup, where the primary and standby are in different locations (data centres). By creating a witness server in the same location (data centre) as the primary, if the primary becomes unavailable it's possible for the standby to decide whether it can promote itself without risking a "split brain" scenario: if it can't see either the witness or the primary server, it's likely there's a network-level interruption and it should not promote itself. If it can seen the witness but not the primary, this proves there is no network interruption and the primary itself is unavailable, and it can therefore promote itself (and ideally take action to fence the former primary).

Note: Never install a witness server on the same physical host as another node in the replication cluster managed by repmgr - it's essential the witness is not affected in any way by failure of another node.

For more complex replication scenarios,e.g. with multiple datacentres, it may be preferable to use location-based failover, which ensures that only nodes in the same location as the primary will ever be promotion candidates; see Handling network splits with repmgrd for more details.

Note: A witness server will only be useful if repmgrd is in use.


9.1. Creating a witness server

To create a witness server, set up a normal PostgreSQL instance on a server in the same physical location as the cluster's primary server.

This instance should *not* be on the same physical host as the primary server, as otherwise if the primary server fails due to hardware issues, the witness server will be lost too.

Note: repmgr 3.3 and earlier provided a repmgr create witness command, which would automatically create a PostgreSQL instance. However this often resulted in an unsatisfactory, hard-to-customise instance.

The witness server should be configured in the same way as a normal repmgr node; see section Configuration.

Register the witness server with repmgr witness register. This will create the repmgr extension on the witness server, and make a copy of the repmgr metadata.

Note: As the witness server is not part of the replication cluster, further changes to the repmgr metadata will be synchronised by repmgrd.

Once the witness server has been configured, repmgrd should be started; for more details see Using a witness server with repmgrd.

To unregister a witness server, use repmgr witness unregister.


Chapter 10. Event Notifications

Each time repmgr or repmgrd perform a significant event, a record of that event is written into the repmgr.events table together with a timestamp, an indication of failure or success, and further details if appropriate. This is useful for gaining an overview of events affecting the replication cluster. However note that this table has advisory character and should be used in combination with the repmgr and PostgreSQL logs to obtain details of any events.

Example output after a primary was registered and a standby cloned and registered:

    repmgr=# SELECT * from repmgr.events ;
     node_id |      event       | successful |        event_timestamp        |                                       details
    ---------+------------------+------------+-------------------------------+-------------------------------------------------------------------------------------
           1 | primary_register | t          | 2016-01-08 15:04:39.781733+09 |
           2 | standby_clone    | t          | 2016-01-08 15:04:49.530001+09 | Cloned from host 'repmgr_node1', port 5432; backup method: pg_basebackup; --force: N
           2 | standby_register | t          | 2016-01-08 15:04:50.621292+09 |
    (3 rows)

Alternatively, use repmgr cluster event to output a formatted list of events.

Additionally, event notifications can be passed to a user-defined program or script which can take further action, e.g. send email notifications. This is done by setting the event_notification_command parameter in repmgr.conf.

The following format placeholders are provided for all event notifications:

%n

node ID

%e

event type

%s

success (1) or failure (0)

%t

timestamp

%d

details

The values provided for %t and %d will probably contain spaces, so should be quoted in the provided command configuration, e.g.:

    event_notification_command='/path/to/some/script %n %e %s "%t" "%d"'
  

The following parameters are provided for a subset of event notifications:

%p

node ID of the current primary (repmgr standby register and repmgr standby follow)

node ID of the demoted primary (repmgr standby switchover only)

%c

conninfo string of the primary node (repmgr standby register and repmgr standby follow)

conninfo string of the next available node (bdr_failover and bdr_recovery)

%a

name of the current primary node (repmgr standby register and repmgr standby follow)

name of the next available node (bdr_failover and bdr_recovery)

The values provided for %c and %a will probably contain spaces, so should always be quoted.

By default, all notification types will be passed to the designated script; the notification types can be filtered to explicitly named ones using the event_notifications parameter.

Events generated by the repmgr command:

Events generated by repmgrd (streaming replication mode):

  • repmgrd_start
  • repmgrd_shutdown
  • repmgrd_reload
  • repmgrd_failover_promote
  • repmgrd_failover_follow
  • repmgrd_failover_aborted
  • repmgrd_standby_reconnect
  • repmgrd_promote_error
  • repmgrd_local_disconnect
  • repmgrd_local_reconnect
  • repmgrd_upstream_disconnect
  • repmgrd_upstream_reconnect
  • standby_disconnect_manual
  • standby_failure
  • standby_recovery

Events generated by repmgrd (BDR mode):

  • bdr_failover
  • bdr_reconnect
  • bdr_recovery
  • bdr_register
  • bdr_unregister

Note that under some circumstances (e.g. when no replication cluster primary could be located), it will not be possible to write an entry into the repmgr.events table, in which case executing a script via event_notification_command can serve as a fallback by generating some form of notification.


Chapter 11. Upgrading repmgr

repmgr is updated regularly with point releases (e.g. 4.0.1 to 4.0.2) containing bugfixes and other minor improvements. Any substantial new functionality will be included in a feature release (e.g. 4.0.x to 4.1.x).


11.1. Upgrading repmgr 4.x and later

repmgr 4.x is implemented as a PostgreSQL extension; normally the upgrade consists of the two following steps:

  1. Install the updated package (or compile the updated source)

  2. repmgrd (if running) must be restarted.

  3. For major releases, e.g. from 4.0.x to 4.1, execute ALTER EXTENSION repmgr UPDATE on the primary node in the database where the repmgr extension is installed.

    This will update the extension metadata and, if necessary, apply changes to the repmgr extension objects.

Always check the release notes for every release as they may contain upgrade instructions particular to individual versions.

Note that it may be necessary to restart the PostgreSQL server if the upgrade contains changes to the shared object file used by repmgrd; check the release notes for details.


11.2. pg_upgrade and repmgr

pg_upgrade requires that if any functions are dependent on a shared library, this library must be present in both the old and new installations before pg_upgrade can be executed.

To minimize the risk of any upgrade issues (particularly if an upgrade to a new major repmgr version is involved), we recommend upgrading repmgr on the old server before running pg_upgrade to ensure that old and new versions are the same.

Note: This issue applies to any PostgreSQL extension which has dependencies on a shared library.

For further details please see the pg_upgrade documentation.

If replication slots are in use, bear in mind these will not be recreated by pg_upgrade. These will need to be recreated manually.


11.3. Upgrading from repmgr 3.x

The upgrade process consists of two steps:

  1. converting the repmgr.conf configuration files

  2. upgrading the repmgr schema using CREATE EXTENSION

A script is provided to assist with converting repmgr.conf.

The schema upgrade (which converts the repmgr metadata into a packaged PostgreSQL extension) is normally carried out automatically when the repmgr extension is created.

The shared library has been renamed from repmgr_funcs to repmgr - if it's set in shared_preload_libraries in postgresql.conf it will need to be updated to the new name:

    shared_preload_libraries = 'repmgr'


11.3.1. Converting repmgr.conf configuration files

With a completely new repmgr version, we've taken the opportunity to rename some configuration items for clarity and consistency, both between the configuration file and the column names in repmgr.nodes (e.g. node to node_id), and also for consistency with PostgreSQL naming conventions (e.g. loglevel to log_level).

Other configuration items have been changed to command line options, and vice-versa, e.g. to avoid hard-coding items such as a a node's upstream ID, which might change over time.

repmgr will issue a warning about deprecated/altered options.


11.3.1.1. Changed parameters in "repmgr.conf"

Following parameters have been added:

  • data_directory: this is mandatory and must contain the path to the node's data directory
  • monitoring_history: this replaces the repmgrd command line option --monitoring-history

Following parameters have been renamed:

Table 11-1. Parameters renamed in repmgr4

repmgr3repmgr4
nodenode_id
loglevellog_level
logfacilitylog_facility
logfilelog_file
barman_serverbarman_host
master_reponse_timeoutasync_query_timeout

Note: From repmgr 4, barman_server refers to the server configured in Barman (in repmgr 3, the deprecated cluster parameter was used for this); the physical Barman hostname is configured with barman_host (see Section 5.1.1 for details).

Following parameters have been removed:

  • cluster: is no longer required and will be ignored.
  • upstream_node: is replaced by the command-line parameter --upstream-node-id


11.3.1.2. Conversion script

To assist with conversion of repmgr.conf files, a Perl script is provided in contrib/convert-config.pl. Use like this:

    $ ./convert-config.pl /etc/repmgr.conf
    node_id=2
    node_name=node2
    conninfo=host=node2 dbname=repmgr user=repmgr connect_timeout=2
    pg_ctl_options='-l /var/log/postgres/startup.log'
    rsync_options=--exclude=postgresql.local.conf --archive
    log_level=INFO
    pg_basebackup_options=--no-slot
    data_directory=

The converted file is printed to STDOUT and the original file is not changed.

Please note that the the conversion script will add an empty placeholder parameter for data_directory, which is a required parameter in repmgr4 and which must be provided.


11.3.2. Upgrading the repmgr schema

Ensure repmgrd is not running, or any cron jobs which execute the repmgr binary.

Install repmgr 4 packages; any repmgr 3.x packages should be uninstalled (if not automatically uninstalled already by your packaging system).


11.3.2.1. Upgrading from repmgr 3.1.1 or earlier

If your repmgr version is 3.1.1 or earlier, you will need to update the schema to the latest version in the 3.x series (3.3.2) before converting the installation to repmgr 4.

To do this, apply the following upgrade scripts as appropriate for your current version:

For more details see the repmgr 3 upgrade notes.


11.3.2.2. Manually create the repmgr extension

In the database used by the existing repmgr installation, execute:

      CREATE EXTENSION repmgr FROM unpackaged;

This will move and convert all objects from the existing schema into the new, standard repmgr schema.

Note: there must be only one schema matching repmgr_% in the database, otherwise this step may not work.


11.3.2.3. Re-register each node

This is necessary to update the repmgr metadata with some additional items.

On the primary node, execute e.g.

      repmgr primary register -f /etc/repmgr.conf --force

On each standby node, execute e.g.

      repmgr standby register -f /etc/repmgr.conf --force

Check the data is updated as expected by examining the repmgr.nodes table; restart repmgrd if required.

The original repmgr_$cluster schema can be dropped at any time.

Tip: If you don't care about any data from the existing repmgr installation, (e.g. the contents of the events and monitoring tables), the manual CREATE EXTENSION step can be skipped; just re-register each node, starting with the primary node, and the repmgr extension will be automatically created.


Chapter 12. Automatic failover with repmgrd

repmgrd is a management and monitoring daemon which runs on each node in a replication cluster. It can automate actions such as failover and updating standbys to follow the new primary, as well as providing monitoring information about the state of each standby.


Chapter 13. repmgrd configuration

repmgrd is a daemon which runs on each PostgreSQL node, monitoring the local node, and (unless it's the primary node) the upstream server (the primary server or with cascading replication, another standby) which it's connected to.

repmgrd can be configured to provide failover capability in case the primary upstream node becomes unreachable, and/or provide monitoring data to the repmgr metadatabase.


13.1. repmgrd basic configuration

To use repmgrd, its associated function library must be included via postgresql.conf with:

        shared_preload_libraries = 'repmgr'

Changing this setting requires a restart of PostgreSQL; for more details see the PostgreSQL documentation.

To apply configuration file changes to a running repmgrd daemon, execute the operating system's rrepmgrd service reload command (see Package details for examples), or for instances which were manually started, execute kill -HUP, e.g. kill -HUP `cat /tmp/repmgrd.pid`.

Note: Check the repmgrd log to see what changes were applied, or if any issues were encountered when reloading the configuration.

Note that only a subset of configuration file parameters can be changed on a running repmgrd daemon.


13.1.1. automatic failover configuration

If using automatic failover, the following repmgrd options *must* be set in repmgr.conf :

          failover=automatic
          promote_command='/usr/bin/repmgr standby promote -f /etc/repmgr.conf --log-to-file'
          follow_command='/usr/bin/repmgr standby follow -f /etc/repmgr.conf --log-to-file --upstream-node-id=%n'

Adjust file paths as appropriate; alway specify the full path to the repmgr binary.

Note: repmgr will not apply pg_bindir when executing promote_command or follow_command; these can be user-defined scripts so must always be specified with the full path.

Note that the --log-to-file option will cause output generated by the repmgr command, when executed by repmgrd, to be logged to the same destination configured to receive log output for repmgrd. See repmgr.conf.sample for further repmgrd-specific settings.

When failover is set to automatic, upon detecting failure of the current primary, repmgrd will execute one of:

  • promote_command (if the current server is to become the new primary)
  • follow_command (if the current server needs to follow another server which has become the new primary)

Note: These commands can be any valid shell script which results in one of these two actions happening, but if repmgr's standby follow or standby promote commands are not executed (either directly as shown here, or from a script which performs other actions), the repmgr metadata will not be updated and repmgr will no longer function reliably.

The follow_command should provide the --upstream-node-id=%n option to repmgr standby follow; the %n will be replaced by repmgrd with the ID of the new primary node. If this is not provided, repmgr will attempt to determine the new primary by itself, but if the original primary comes back online after the new primary is promoted, there is a risk that repmgr standby follow will result in the node continuing to follow the original primary.


13.1.2. PostgreSQL service configuration

If using automatic failover, currently repmgrd will need to execute repmgr standby follow to restart PostgreSQL on standbys to have them follow a new primary.

To ensure this happens smoothly, it's essential to provide the appropriate system/service restart command appropriate to your operating system via service_restart_command in repmgr.conf. If you don't do this, repmgrd will default to using pg_ctl, which can result in unexpected problems, particularly on systemd-based systems.

For more details, see service command settings.


13.1.3. Monitoring configuration

To enable monitoring, set:

          monitoring_history=yes

in repmgr.conf.

The default monitoring interval is 2 seconds; this value can be explicitly set using:

          monitor_interval_secs=<seconds>

in repmgr.conf.

For more details on monitoring, see Monitoring with repmgrd.


13.2. repmgrd daemon

If installed from a package, the repmgrd can be started via the operating system's service command, e.g. in systemd using systemctl.

See appendix Package details for details of service commands for different distributions.

repmgrd can be started manually like this:

        repmgrd -f /etc/repmgr.conf --pid-file /tmp/repmgrd.pid

and stopped with kill `cat /tmp/repmgrd.pid`. Adjust paths as appropriate.


13.2.1. repmgrd's PID file

repmgrd will generate a PID file by default.

Note: This is a behaviour change from previous versions (earlier than 4.1), where the PID file had to be explicitly specified with the command line parameter --pid-file.

The PID file can be specified in repmgr.conf with the configuration parameter repmgrd_pid_file.

It can also be specified on the command line (as in previous versions) with the command line parameter --pid-file. Note this will override any value set in repmgr.conf with repmgrd_pid_file. --pid-file may be deprecated in future releases.

If a PID file location was specified by the package maintainer, repmgrd will use that. This only applies if repmgr was installed from a package and the package maintainer has specified the PID file location.

If none of the above apply, repmgrd will create a PID file in the operating system's temporary directory (das etermined by the environment variable TMPDIR, or if that is not set, will use /tmp).

To prevent a PID file being generated at all, provide the command line option --no-pid-file.

To see which PID file repmgrd would use, execute repmgrd with the option --show-pid-file. repmgrd will not start if this option is provided. Note that the value shown is the file repmgrd would use next time it starts, and is not necessarily the PID file currently in use.


13.2.2. repmgrd daemon configuration on Debian/Ubuntu

If repmgr was installed from Debian/Ubuntu packages, additional configuration is required before repmgrd is started as a daemon.

This is done via the file /etc/default/repmgrd, which by default looks like this:

# default settings for repmgrd. This file is source by /bin/sh from
# /etc/init.d/repmgrd

# disable repmgrd by default so it won't get started upon installation
# valid values: yes/no
REPMGRD_ENABLED=no

# configuration file (required)
#REPMGRD_CONF="/path/to/repmgr.conf"

# additional options
#REPMGRD_OPTS=""

# user to run repmgrd as
#REPMGRD_USER=postgres

# repmgrd binary
#REPMGRD_BIN=/usr/bin/repmgrd

# pid file
#REPMGRD_PIDFILE=/var/run/repmgrd.pid

Set REPMGRD_ENABLED to yes, and REPMGRD_CONF to the repmgr.conf file you are using.

If using systemd, you may need to execute systemctl daemon-reload. Also, if you attempted to start repmgrd using systemctl start repmgrd, you'll need to execute systemctl stop repmgrd. Because that's how systemd rolls.


13.3. repmgrd connection settings

In addition to the repmgr configuration settings, parameters in the conninfo string influence how repmgr makes a network connection to PostgreSQL. In particular, if another server in the replication cluster is unreachable at network level, system network settings will influence the length of time it takes to determine that the connection is not possible.

In particular explicitly setting a parameter for connect_timeout should be considered; the effective minimum value of 2 (seconds) will ensure that a connection failure at network level is reported as soon as possible, otherwise depending on the system settings (e.g. tcp_syn_retries in Linux) a delay of a minute or more is possible.

For further details on conninfo network connection parameters, see the PostgreSQL documentation.


13.4. repmgrd log rotation

To ensure the current repmgrd logfile (specified in repmgr.conf with the parameter log_file) does not grow indefinitely, configure your system's logrotate to regularly rotate it.

Sample configuration to rotate logfiles weekly with retention for up to 52 weeks and rotation forced if a file grows beyond 100Mb:

    /var/log/repmgr/repmgrd.log {
        missingok
        compress
        rotate 52
        maxsize 100M
        weekly
        create 0600 postgres postgres
        postrotate
            /usr/bin/killall -HUP repmgrd
        endscript
    }


Chapter 14. repmgrd demonstration

To demonstrate automatic failover, set up a 3-node replication cluster (one primary and two standbys streaming directly from the primary) so that the cluster looks something like this:

    $ repmgr -f /etc/repmgr.conf cluster show
     ID | Name  | Role    | Status    | Upstream | Location | Connection string
    ----+-------+---------+-----------+----------+----------+--------------------------------------
     1  | node1 | primary | * running |          | default  | host=node1 dbname=repmgr user=repmgr
     2  | node2 | standby |   running | node1    | default  | host=node2 dbname=repmgr user=repmgr
     3  | node3 | standby |   running | node1    | default  | host=node3 dbname=repmgr user=repmgr

Start repmgrd on each standby and verify that it's running by examining the log output, which at log level INFO will look like this:

    [2017-08-24 17:31:00] [NOTICE] using configuration file "/etc/repmgr.conf"
    [2017-08-24 17:31:00] [INFO] connecting to database "host=node2 dbname=repmgr user=repmgr"
    [2017-08-24 17:31:00] [NOTICE] starting monitoring of node node2 (ID: 2)
    [2017-08-24 17:31:00] [INFO] monitoring connection to upstream node "node1" (node ID: 1)

Each repmgrd should also have recorded its successful startup as an event:

    $ repmgr -f /etc/repmgr.conf cluster event --event=repmgrd_start
     Node ID | Name  | Event         | OK | Timestamp           | Details
    ---------+-------+---------------+----+---------------------+-------------------------------------------------------------
     3       | node3 | repmgrd_start | t  | 2017-08-24 17:35:54 | monitoring connection to upstream node "node1" (node ID: 1)
     2       | node2 | repmgrd_start | t  | 2017-08-24 17:35:50 | monitoring connection to upstream node "node1" (node ID: 1)
     1       | node1 | repmgrd_start | t  | 2017-08-24 17:35:46 | monitoring cluster primary "node1" (node ID: 1)  

Now stop the current primary server with e.g.:

    pg_ctl -D /var/lib/postgresql/data -m immediate stop

This will force the primary to shut down straight away, aborting all processes and transactions. This will cause a flurry of activity in the repmgrd log files as each repmgrd detects the failure of the primary and a failover decision is made. This is an extract from the log of a standby server (node2) which has promoted to new primary after failure of the original primary (node1).

    [2017-08-24 23:32:01] [INFO] node "node2" (node ID: 2) monitoring upstream node "node1" (node ID: 1) in normal state
    [2017-08-24 23:32:08] [WARNING] unable to connect to upstream node "node1" (node ID: 1)
    [2017-08-24 23:32:08] [INFO] checking state of node 1, 1 of 5 attempts
    [2017-08-24 23:32:08] [INFO] sleeping 1 seconds until next reconnection attempt
    [2017-08-24 23:32:09] [INFO] checking state of node 1, 2 of 5 attempts
    [2017-08-24 23:32:09] [INFO] sleeping 1 seconds until next reconnection attempt
    [2017-08-24 23:32:10] [INFO] checking state of node 1, 3 of 5 attempts
    [2017-08-24 23:32:10] [INFO] sleeping 1 seconds until next reconnection attempt
    [2017-08-24 23:32:11] [INFO] checking state of node 1, 4 of 5 attempts
    [2017-08-24 23:32:11] [INFO] sleeping 1 seconds until next reconnection attempt
    [2017-08-24 23:32:12] [INFO] checking state of node 1, 5 of 5 attempts
    [2017-08-24 23:32:12] [WARNING] unable to reconnect to node 1 after 5 attempts
    INFO:  setting voting term to 1
    INFO:  node 2 is candidate
    INFO:  node 3 has received request from node 2 for electoral term 1 (our term: 0)
    [2017-08-24 23:32:12] [NOTICE] this node is the winner, will now promote self and inform other nodes
    INFO: connecting to standby database
    NOTICE: promoting standby
    DETAIL: promoting server using 'pg_ctl -l /var/log/postgres/startup.log -w -D '/var/lib/pgsql/data' promote'
    INFO: reconnecting to promoted server
    NOTICE: STANDBY PROMOTE successful
    DETAIL: node 2 was successfully promoted to primary
    INFO:  node 3 received notification to follow node 2
    [2017-08-24 23:32:13] [INFO] switching to primary monitoring mode

The cluster status will now look like this, with the original primary (node1) marked as inactive, and standby node3 now following the new primary (node2):

    $ repmgr -f /etc/repmgr.conf cluster show
     ID | Name  | Role    | Status    | Upstream | Location | Connection string
    ----+-------+---------+-----------+----------+----------+----------------------------------------------------
     1  | node1 | primary | - failed  |          | default  | host=node1 dbname=repmgr user=repmgr
     2  | node2 | primary | * running |          | default  | host=node2 dbname=repmgr user=repmgr
     3  | node3 | standby |   running | node2    | default  | host=node3 dbname=repmgr user=repmgr

repmgr cluster event will display a summary of what happened to each server during the failover:

    $ repmgr -f /etc/repmgr.conf cluster event
     Node ID | Name  | Event                    | OK | Timestamp           | Details
    ---------+-------+--------------------------+----+---------------------+-----------------------------------------------------------------------------------
     3       | node3 | repmgrd_failover_follow  | t  | 2017-08-24 23:32:16 | node 3 now following new upstream node 2
     3       | node3 | standby_follow           | t  | 2017-08-24 23:32:16 | node 3 is now attached to node 2
     2       | node2 | repmgrd_failover_promote | t  | 2017-08-24 23:32:13 | node 2 promoted to primary; old primary 1 marked as failed
     2       | node2 | standby_promote          | t  | 2017-08-24 23:32:13 | node 2 was successfully promoted to primary


Chapter 15. repmgrd and cascading replication

Cascading replication - where a standby can connect to an upstream node and not the primary server itself - was introduced in PostgreSQL 9.2. repmgr and repmgrd support cascading replication by keeping track of the relationship between standby servers - each node record is stored with the node id of its upstream ("parent") server (except of course the primary server).

In a failover situation where the primary node fails and a top-level standby is promoted, a standby connected to another standby will not be affected and continue working as normal (even if the upstream standby it's connected to becomes the primary node). If however the node's direct upstream fails, the "cascaded standby" will attempt to reconnect to that node's parent.


Chapter 16. Handling network splits with repmgrd

A common pattern for replication cluster setups is to spread servers over more than one datacentre. This can provide benefits such as geographically- distributed read replicas and DR (disaster recovery capability). However this also means there is a risk of disconnection at network level between datacentre locations, which would result in a split-brain scenario if servers in a secondary data centre were no longer able to see the primary in the main data centre and promoted a standby among themselves.

repmgr enables provision of "witness server" to artificially create a quorum of servers in a particular location, ensuring that nodes in another location will not elect a new primary if they are unable to see the majority of nodes. However this approach does not scale well, particularly with more complex replication setups, e.g. where the majority of nodes are located outside of the primary datacentre. It also means the witness node needs to be managed as an extra PostgreSQL instance outside of the main replication cluster, which adds administrative and programming complexity.

repmgr4 introduces the concept of location: each node is associated with an arbitrary location string (default is default); this is set in repmgr.conf, e.g.:

    node_id=1
    node_name=node1
    conninfo='host=node1 user=repmgr dbname=repmgr connect_timeout=2'
    data_directory='/var/lib/postgresql/data'
    location='dc1'

In a failover situation, repmgrd will check if any servers in the same location as the current primary node are visible. If not, repmgrd will assume a network interruption and not promote any node in any other location (it will however enter degraded monitoring mode until a primary becomes visible).


Chapter 17. Using a witness server with repmgrd

In a situation caused e.g. by a network interruption between two data centres, it's important to avoid a "split-brain" situation where both sides of the network assume they are the active segment and the side without an active primary unilaterally promotes one of its standbys.

To prevent this situation happening, it's essential to ensure that one network segment has a "voting majority", so other segments will know they're in the minority and not attempt to promote a new primary. Where an odd number of servers exists, this is not an issue. However, if each network has an even number of nodes, it's necessary to provide some way of ensuring a majority, which is where the witness server becomes useful.

This is not a fully-fledged standby node and is not integrated into replication, but it effectively represents the "casting vote" when deciding which network segment has a majority. A witness server can be set up using repmgr witness register. Note that it only makes sense to create a witness server in conjunction with running repmgrd; the witness server will require its own repmgrd instance.


Chapter 18. "degraded monitoring" mode

In certain circumstances, repmgrd is not able to fulfill its primary mission of monitoring the node's upstream server. In these cases it enters "degraded monitoring" mode, where repmgrd remains active but is waiting for the situation to be resolved.

Situations where this happens are:

  • a failover situation has occurred, no nodes in the primary node's location are visible
  • a failover situation has occurred, but no promotion candidate is available
  • a failover situation has occurred, but the promotion candidate could not be promoted
  • a failover situation has occurred, but the node was unable to follow the new primary
  • a failover situation has occurred, but no primary has become available
  • a failover situation has occurred, but automatic failover is not enabled for the node
  • repmgrd is monitoring the primary node, but it is not available (and no other node has been promoted as primary)

Example output in a situation where there is only one standby with failover=manual, and the primary node is unavailable (but is later restarted):

    [2017-08-29 10:59:19] [INFO] node "node2" (node ID: 2) monitoring upstream node "node1" (node ID: 1) in normal state (automatic failover disabled)
    [2017-08-29 10:59:33] [WARNING] unable to connect to upstream node "node1" (node ID: 1)
    [2017-08-29 10:59:33] [INFO] checking state of node 1, 1 of 5 attempts
    [2017-08-29 10:59:33] [INFO] sleeping 1 seconds until next reconnection attempt
    (...)
    [2017-08-29 10:59:37] [INFO] checking state of node 1, 5 of 5 attempts
    [2017-08-29 10:59:37] [WARNING] unable to reconnect to node 1 after 5 attempts
    [2017-08-29 10:59:37] [NOTICE] this node is not configured for automatic failover so will not be considered as promotion candidate
    [2017-08-29 10:59:37] [NOTICE] no other nodes are available as promotion candidate
    [2017-08-29 10:59:37] [HINT] use "repmgr standby promote" to manually promote this node
    [2017-08-29 10:59:37] [INFO] node "node2" (node ID: 2) monitoring upstream node "node1" (node ID: 1) in degraded state (automatic failover disabled)
    [2017-08-29 10:59:53] [INFO] node "node2" (node ID: 2) monitoring upstream node "node1" (node ID: 1) in degraded state (automatic failover disabled)
    [2017-08-29 11:00:45] [NOTICE] reconnected to upstream node 1 after 68 seconds, resuming monitoring
    [2017-08-29 11:00:57] [INFO] node "node2" (node ID: 2) monitoring upstream node "node1" (node ID: 1) in normal state (automatic failover disabled)

By default, repmgrd will continue in degraded monitoring mode indefinitely. However a timeout (in seconds) can be set with degraded_monitoring_timeout, after which repmgrd will terminate.

Note: If repmgrd is monitoring a primary mode which has been stopped and manually restarted as a standby attached to a new primary, it will automatically detect the status change and update the node record to reflect the node's new status as an active standby. It will then resume monitoring the node as a standby.


Chapter 19. Monitoring with repmgrd

When repmgrd is running with the option monitoring_history=true, it will constantly write standby node status information to the monitoring_history table, providing a near-real time overview of replication status on all nodes in the cluster.

The view replication_status shows the most recent state for each node, e.g.:

    repmgr=# select * from repmgr.replication_status;
    -[ RECORD 1 ]-------------+------------------------------
    primary_node_id           | 1
    standby_node_id           | 2
    standby_name              | node2
    node_type                 | standby
    active                    | t
    last_monitor_time         | 2017-08-24 16:28:41.260478+09
    last_wal_primary_location | 0/6D57A00
    last_wal_standby_location | 0/5000000
    replication_lag           | 29 MB
    replication_time_lag      | 00:00:11.736163
    apply_lag                 | 15 MB
    communication_time_lag    | 00:00:01.365643

The interval in which monitoring history is written is controlled by the configuration parameter monitor_interval_secs; default is 2.

As this can generate a large amount of monitoring data in the table repmgr.monitoring_history. it's advisable to regularly purge historical data using the repmgr cluster cleanup command; use the -k/--keep-history option to specify how many day's worth of data should be retained.

It's possible to use repmgrd to run in monitoring mode only (without automatic failover capability) for some or all nodes by setting failover=manual in the node's repmgr.conf file. In the event of the node's upstream failing, no failover action will be taken and the node will require manual intervention to be reattached to replication. If this occurs, an event notification standby_disconnect_manual will be created.

Note that when a standby node is not streaming directly from its upstream node, e.g. recovering WAL from an archive, apply_lag will always appear as 0 bytes.

Tip: If monitoring history is enabled, the contents of the repmgr.monitoring_history table will be replicated to attached standbys. This means there will be a small but constant stream of replication activity which may not be desirable. To prevent this, convert the table to an UNLOGGED one with:

     ALTER TABLE repmgr.monitoring_history SET UNLOGGED;

This will however mean that monitoring history will not be available on another node following a failover, and the view repmgr.replication_status will not work on standbys.


Chapter 20. BDR failover with repmgrd

repmgr 4.x provides support for monitoring BDR nodes and taking action in case one of the nodes fails.

Note: Due to the nature of BDR 1.x/2.x, it's only safe to use this solution for a two-node scenario. Introducing additional nodes will create an inherent risk of node desynchronisation if a node goes down without being cleanly removed from the cluster.

In contrast to streaming replication, there's no concept of "promoting" a new primary node with BDR. Instead, "failover" involves monitoring both nodes with repmgrd and redirecting queries from the failed node to the remaining active node. This can be done by using an event notification script which is called by repmgrd to dynamically reconfigure a proxy server/connection pooler such as PgBouncer.


20.1. Prerequisites

repmgr 4 requires PostgreSQL 9.4 or 9.6 with the BDR 2 extension enabled and configured for a two-node BDR network. repmgr 4 packages must be installed on each node before attempting to configure repmgr.

Note: repmgr 4 will refuse to install if it detects more than two BDR nodes.

Application database connections *must* be passed through a proxy server/ connection pooler such as PgBouncer, and it must be possible to dynamically reconfigure that from repmgrd. The example demonstrated in this document will use PgBouncer

The proxy server / connection poolers must not be installed on the database servers.

For this example, it's assumed password-less SSH connections are available from the PostgreSQL servers to the servers where PgBouncer runs, and that the user on those servers has permission to alter the PgBouncer configuration files.

PostgreSQL connections must be possible between each node, and each node must be able to connect to each PgBouncer instance.


20.2. Configuration

A sample configuration for repmgr.conf on each BDR node would look like this:

        # Node information
        node_id=1
        node_name='node1'
        conninfo='host=node1 dbname=bdrtest user=repmgr connect_timeout=2'
        data_directory='/var/lib/postgresql/data'
        replication_type='bdr'

        # Event notification configuration
        event_notifications=bdr_failover
        event_notification_command='/path/to/bdr-pgbouncer.sh %n %e %s "%c" "%a" >> /tmp/bdr-failover.log 2>&1'

        # repmgrd options
        monitor_interval_secs=5
        reconnect_attempts=6
        reconnect_interval=5

Adjust settings as appropriate; copy and adjust for the second node (particularly the values node_id, node_name and conninfo).

Note that the values provided for the conninfo string must be valid for connections from both nodes in the replication cluster. The database must be the BDR-enabled database.

If defined, the event_notifications parameter will restrict execution of the script defined in event_notification_command to the specified event(s).

Note: event_notification_command is the script which does the actual "heavy lifting" of reconfiguring the proxy server/ connection pooler. It is fully user-definable; see section Defining the BDR failover "event_notification command" for a reference implementation.


20.3. repmgr setup

Register both nodes; example on node1:

        $ repmgr -f /etc/repmgr.conf bdr register
        NOTICE: attempting to install extension "repmgr"
        NOTICE: "repmgr" extension successfully installed
        NOTICE: node record created for node 'node1' (ID: 1)
        NOTICE: BDR node 1 registered (conninfo: host=node1 dbname=bdrtest user=repmgr)

and on node1:

        $ repmgr -f /etc/repmgr.conf bdr register
        NOTICE: node record created for node 'node2' (ID: 2)
        NOTICE: BDR node 2 registered (conninfo: host=node2 dbname=bdrtest user=repmgr)

The repmgr extension will be automatically created when the first node is registered, and will be propagated to the second node.

Important: Ensure the repmgr package is available on both nodes before attempting to register the first node.

At this point the meta data for both nodes has been created; executing repmgr cluster show (on either node) should produce output like this:

        $ repmgr -f /etc/repmgr.conf cluster show
        ID | Name  | Role | Status    | Upstream | Location | Connection string
       ----+-------+------+-----------+----------+--------------------------------------------------------
        1  | node1 | bdr  | * running |          | default  | host=node1 dbname=bdrtest user=repmgr connect_timeout=2
        2  | node2 | bdr  | * running |          | default  | host=node2 dbname=bdrtest user=repmgr connect_timeout=2

Additionally it's possible to display log of significant events; executing repmgr cluster event (on either node) should produce output like this:

        $ repmgr -f /etc/repmgr.conf cluster event
        Node ID | Event        | OK | Timestamp           | Details
       ---------+--------------+----+---------------------+----------------------------------------------
        2       | bdr_register | t  | 2017-07-27 17:51:48 | node record created for node 'node2' (ID: 2)
        1       | bdr_register | t  | 2017-07-27 17:51:00 | node record created for node 'node1' (ID: 1)
      

At this point there will only be records for the two node registrations (displayed here in reverse chronological order).


20.4. Defining the BDR failover "event_notification_command"

Key to "failover" execution is the event_notification_command, which is a user-definable script specified in repmpgr.conf and which can use a repmgr event notification to reconfigure the proxy server / connection pooler so it points to the other, still-active node. Details of the event will be passed as parameters to the script.

Following parameter placeholders are available for the script definition in repmpgr.conf; these will be replaced with the appropriate value when the script is executed:

%n

node ID

%e

event type

%t

success (1 or 0)

%t

timestamp

%d

details

%c

conninfo string of the next available node (bdr_failover and bdr_recovery)

%a

name of the next available node (bdr_failover and bdr_recovery)

Note that %c and %a are only provided with particular failover events, in this case bdr_failover.

The provided sample script (scripts/bdr-pgbouncer.sh) is configured as follows:

        event_notification_command='/path/to/bdr-pgbouncer.sh %n %e %s "%c" "%a"'

and parses the placeholder parameters like this:

        NODE_ID=$1
        EVENT_TYPE=$2
        SUCCESS=$3
        NEXT_CONNINFO=$4
        NEXT_NODE_NAME=$5

Note: The sample script also contains some hard-coded values for the PgBouncer configuration for both nodes; these will need to be adjusted for your local environment (ideally the scripts would be maintained as templates and generated by some kind of provisioning system).

The script performs following steps:

  • pauses PgBouncer on all nodes
  • recreates the PgBouncer configuration file on each node using the information provided by repmgrd (primarily the conninfo string) to configure PgBouncer
  • reloads the PgBouncer configuration
  • executes the RESUME command (in PgBouncer)

Following successful script execution, any connections to PgBouncer on the failed BDR node will be redirected to the active node.


20.5. Node monitoring and failover

At the intervals specified by monitor_interval_secs in repmgr.conf, repmgrd will ping each node to check if it's available. If a node isn't available, repmgrd will enter failover mode and check reconnect_attempts times at intervals of reconnect_interval to confirm the node is definitely unreachable. This buffer period is necessary to avoid false positives caused by transient network outages.

If the node is still unavailable, repmgrd will enter failover mode and execute the script defined in event_notification_command; an entry will be logged in the repmgr.events table and repmgrd will (unless otherwise configured) resume monitoring of the node in "degraded" mode until it reappears.

repmgrd logfile output during a failover event will look something like this on one node (usually the node which has failed, here node2):

            ...
    [2017-07-27 21:08:39] [INFO] starting continuous BDR node monitoring
    [2017-07-27 21:08:39] [INFO] monitoring BDR replication status on node "node2" (ID: 2)
    [2017-07-27 21:08:55] [INFO] monitoring BDR replication status on node "node2" (ID: 2)
    [2017-07-27 21:09:11] [INFO] monitoring BDR replication status on node "node2" (ID: 2)
    [2017-07-27 21:09:23] [WARNING] unable to connect to node node2 (ID 2)
    [2017-07-27 21:09:23] [INFO] checking state of node 2, 0 of 5 attempts
    [2017-07-27 21:09:23] [INFO] sleeping 1 seconds until next reconnection attempt
    [2017-07-27 21:09:24] [INFO] checking state of node 2, 1 of 5 attempts
    [2017-07-27 21:09:24] [INFO] sleeping 1 seconds until next reconnection attempt
    [2017-07-27 21:09:25] [INFO] checking state of node 2, 2 of 5 attempts
    [2017-07-27 21:09:25] [INFO] sleeping 1 seconds until next reconnection attempt
    [2017-07-27 21:09:26] [INFO] checking state of node 2, 3 of 5 attempts
    [2017-07-27 21:09:26] [INFO] sleeping 1 seconds until next reconnection attempt
    [2017-07-27 21:09:27] [INFO] checking state of node 2, 4 of 5 attempts
    [2017-07-27 21:09:27] [INFO] sleeping 1 seconds until next reconnection attempt
    [2017-07-27 21:09:28] [WARNING] unable to reconnect to node 2 after 5 attempts
    [2017-07-27 21:09:28] [NOTICE] setting node record for node 2 to inactive
    [2017-07-27 21:09:28] [INFO] executing notification command for event "bdr_failover"
    [2017-07-27 21:09:28] [DETAIL] command is:
      /path/to/bdr-pgbouncer.sh 2 bdr_failover 1 "host=host=node1 dbname=bdrtest user=repmgr connect_timeout=2" "node1"
    [2017-07-27 21:09:28] [INFO] node 'node2' (ID: 2) detected as failed; next available node is 'node1' (ID: 1)
    [2017-07-27 21:09:28] [INFO] monitoring BDR replication status on node "node2" (ID: 2)
    [2017-07-27 21:09:28] [DETAIL] monitoring node "node2" (ID: 2) in degraded mode
    ...

Output on the other node (node1) during the same event will look like this:

    ...
    [2017-07-27 21:08:35] [INFO] starting continuous BDR node monitoring
    [2017-07-27 21:08:35] [INFO] monitoring BDR replication status on node "node1" (ID: 1)
    [2017-07-27 21:08:51] [INFO] monitoring BDR replication status on node "node1" (ID: 1)
    [2017-07-27 21:09:07] [INFO] monitoring BDR replication status on node "node1" (ID: 1)
    [2017-07-27 21:09:23] [WARNING] unable to connect to node node2 (ID 2)
    [2017-07-27 21:09:23] [INFO] checking state of node 2, 0 of 5 attempts
    [2017-07-27 21:09:23] [INFO] sleeping 1 seconds until next reconnection attempt
    [2017-07-27 21:09:24] [INFO] checking state of node 2, 1 of 5 attempts
    [2017-07-27 21:09:24] [INFO] sleeping 1 seconds until next reconnection attempt
    [2017-07-27 21:09:25] [INFO] checking state of node 2, 2 of 5 attempts
    [2017-07-27 21:09:25] [INFO] sleeping 1 seconds until next reconnection attempt
    [2017-07-27 21:09:26] [INFO] checking state of node 2, 3 of 5 attempts
    [2017-07-27 21:09:26] [INFO] sleeping 1 seconds until next reconnection attempt
    [2017-07-27 21:09:27] [INFO] checking state of node 2, 4 of 5 attempts
    [2017-07-27 21:09:27] [INFO] sleeping 1 seconds until next reconnection attempt
    [2017-07-27 21:09:28] [WARNING] unable to reconnect to node 2 after 5 attempts
    [2017-07-27 21:09:28] [NOTICE] other node's repmgrd is handling failover
    [2017-07-27 21:09:28] [INFO] monitoring BDR replication status on node "node1" (ID: 1)
    [2017-07-27 21:09:28] [DETAIL] monitoring node "node2" (ID: 2) in degraded mode
    ...

This assumes only the PostgreSQL instance on node2 has failed. In this case the repmgrd instance running on node2 has performed the failover. However if the entire server becomes unavailable, repmgrd on node1 will perform the failover.


20.6. Node recovery

Following failure of a BDR node, if the node subsequently becomes available again, a bdr_recovery event will be generated. This could potentially be used to reconfigure PgBouncer automatically to bring the node back into the available pool, however it would be prudent to manually verify the node's status before exposing it to the application.

If the failed node comes back up and connects correctly, output similar to this will be visible in the repmgrd log:

        [2017-07-27 21:25:30] [DETAIL] monitoring node "node2" (ID: 2) in degraded mode
        [2017-07-27 21:25:46] [INFO] monitoring BDR replication status on node "node2" (ID: 2)
        [2017-07-27 21:25:46] [DETAIL] monitoring node "node2" (ID: 2) in degraded mode
        [2017-07-27 21:25:55] [INFO] active replication slot for node "node1" found after 1 seconds
        [2017-07-27 21:25:55] [NOTICE] node "node2" (ID: 2) has recovered after 986 seconds


20.7. Shutdown of both nodes

If both PostgreSQL instances are shut down, repmgrd will try and handle the situation as gracefully as possible, though with no failover candidates available there's not much it can do. Should this case ever occur, we recommend shutting down repmgrd on both nodes and restarting it once the PostgreSQL instances are running properly.

IV. repmgr command reference

Table of Contents
repmgr primary register -- initialise a repmgr installation and register the primary node
repmgr primary unregister -- unregister an inactive primary node
repmgr standby clone -- clone a PostgreSQL standby node from another PostgreSQL node
repmgr standby register -- add a standby's information to the repmgr metadata
repmgr standby unregister -- remove a standby's information from the repmgr metadata
repmgr standby promote -- promote a standby to a primary
repmgr standby follow -- attach a standby to a new primary
repmgr standby switchover -- promote a standby to primary and demote the existing primary to a standby
repmgr witness register -- add a witness node's information to the repmgr metadata
repmgr witness unregister -- remove a witness node's information to the repmgr metadata
repmgr node status -- show overview of a node's basic information and replication status
repmgr node check -- performs some health checks on a node from a replication perspective
repmgr node rejoin -- rejoin a dormant (stopped) node to the replication cluster
repmgr cluster show -- display information about each registered node in the replication cluster
repmgr cluster matrix --  runs repmgr cluster show on each node and summarizes output
repmgr cluster crosscheck -- cross-checks connections between each combination of nodes
repmgr cluster event -- output a formatted list of cluster events
repmgr cluster cleanup -- purge monitoring history

repmgr primary register

Name

repmgr primary register -- initialise a repmgr installation and register the primary node

Description

repmgr primary register registers a primary node in a streaming replication cluster, and configures it for use with repmgr, including installing the repmgr extension. This command needs to be executed before any standby nodes are registered.

Execution

Execute with the --dry-run option to check what would happen without actually registering the primary.

repmgr master register can be used as an alias for repmgr primary register.

Note: If providing the configuration file location with -f/--config-file, avoid using a relative path, as repmgr stores the configuration file location in the repmgr metadata for use when repmgr is executed remotely (e.g. during repmgr standby switchover). repmgr will attempt to convert the a relative path into an absolute one, but this may not be the same as the path you would explicitly provide (e.g. ./repmgr.conf might be converted to /path/to/./repmgr.conf, whereas you'd normally write /path/to/repmgr.conf).

Options

--dry-run

Check prerequisites but don't actually register the primary.

-F, --force

Overwrite an existing node record

Event notifications

Following event notifications will be generated:

  • cluster_created
  • primary_register

repmgr primary unregister

Name

repmgr primary unregister -- unregister an inactive primary node

Description

repmgr primary unregister unregisters an inactive primary node from the repmgr metadata. This is typically when the primary has failed and is being removed from the cluster after a new primary has been promoted.

Execution

repmgr primary unregister can be run on any active repmgr node, with the ID of the node to unregister passed as --node-id.

Execute with the --dry-run option to check what would happen without actually unregistering the node.

repmgr master unregister can be used as an alias for repmgr primary unregister.

Options

--dry-run

Check prerequisites but don't actually unregister the primary.

--node-id

ID of the inactive primary to be unregistered.

Event notifications

A primary_unregister event notification will be generated.

repmgr standby clone

Name

repmgr standby clone -- clone a PostgreSQL standby node from another PostgreSQL node

Description

repmgr standby clone clones a PostgreSQL node from another PostgreSQL node, typically the primary, but optionally from any other node in the cluster or from Barman. It creates the recovery.conf file required to attach the cloned node to the primary node (or another standby, if cascading replication is in use).

Note: repmgr standby clone does not start the standby, and after cloning a standby, the command repmgr standby register must be executed to notify repmgr of its existence.

Handling configuration files

Note that by default, all configuration files in the source node's data directory will be copied to the cloned node. Typically these will be postgresql.conf, postgresql.auto.conf, pg_hba.conf and pg_ident.conf. These may require modification before the standby is started.

In some cases (e.g. on Debian or Ubuntu Linux installations), PostgreSQL's configuration files are located outside of the data directory and will not be copied by default. repmgr can copy these files, either to the same location on the standby server (provided appropriate directory and file permissions are available), or into the standby's data directory. This requires passwordless SSH access to the primary server. Add the option --copy-external-config-files to the repmgr standby clone command; by default files will be copied to the same path as on the upstream server. Note that the user executing repmgr must have write access to those directories.

To have the configuration files placed in the standby's data directory, specify --copy-external-config-files=pgdata, but note that any include directives in the copied files may need to be updated.

Note: When executing repmgr standby clone with the --copy-external-config-files aand --dry-run options, repmgr will check the SSH connection to the source node, but will not verify whether the files can actually be copied.

During the actual clone operation, a check will be made before the database itself is cloned to determine whether the files can actually be copied; if any problems are encountered, the clone operation will be aborted, enabling the user to fix any issues before retrying the clone operation.

Tip: For reliable configuration file management we recommend using a configuration management tool such as Ansible, Chef, Puppet or Salt.

Customising recovery.conf

By default, repmgr will create a minimal recovery.conf containing following parameters:

  • standby_mode (always 'on')
  • recovery_target_timeline (always 'latest')
  • primary_conninfo
  • primary_slot_name (if replication slots in use)

The following additional parameters can be specified in repmgr.conf for inclusion in recovery.conf:

  • restore_command
  • archive_cleanup_command
  • recovery_min_apply_delay

Note: We recommend using Barman to manage WAL file archiving. For more details on combining repmgr and Barman, in particular using restore_command to configure Barman as a backup source of WAL files, see Cloning from Barman.

Managing WAL during the cloning process

When initially cloning a standby, you will need to ensure that all required WAL files remain available while the cloning is taking place. To ensure this happens when using the default pg_basebackup method, repmgr will set pg_basebackup's --xlog-method parameter to stream, which will ensure all WAL files generated during the cloning process are streamed in parallel with the main backup. Note that this requires two replication connections to be available (repmgr will verify sufficient connections are available before attempting to clone, and this can be checked before performing the clone using the --dry-run option).

To override this behaviour, in repmgr.conf set pg_basebackup's --xlog-method parameter to fetch:

      pg_basebackup_options='--xlog-method=fetch'

and ensure that wal_keep_segments is set to an appropriately high value. See the pg_basebackup documentation for details.

Note: From PostgreSQL 10, pg_basebackup's --xlog-method parameter has been renamed to --wal-method.

Using a standby cloned by another method

repmgr supports standbys cloned by another method (e.g. using barman's barman recover command).

To integrate the standby as a repmgr node, ensure the repmgr.conf file is created for the node, and that it has been registered using repmgr standby register. Then execute the command repmgr standby clone --recovery-conf-only. This will create the recovery.conf file needed to attach the node to its upstream, and will also create a replication slot on the upstream node if required.

Note that the upstream node must be running. An existing recovery.conf will not be overwritten unless the -F/--force option is provided.

Execute repmgr standby clone --recovery-conf-only --dry-run to check the prerequisites for creating the recovery.conf file, and display the contents of the file without actually creating it.

Note: --recovery-conf-only was introduced in repmgr 4.0.4.

Options

-d, --dbname=CONNINFO

Connection string of the upstream node to use for cloning.

--dry-run

Check prerequisites but don't actually clone the standby.

If --recovery-conf-only specified, the contents of the generated recovery.conf file will be displayed but the file itself not written.

-c, --fast-checkpoint

Force fast checkpoint (not effective when cloning from Barman).

--copy-external-config-files[={samepath|pgdata}]

Copy configuration files located outside the data directory on the source node to the same path on the standby (default) or to the PostgreSQL data directory.

--no-upstream-connection

When using Barman, do not connect to upstream node.

-R, --remote-user=USERNAME

Remote system username for SSH operations (default: current local system username).

--recovery-conf-only

Create recovery.conf file for a previously cloned instance. repmgr 4.0.4 and later.

--replication-user

User to make replication connections with (optional, not usually required).

--superuser

If the repmgr user is not a superuser, the name of a valid superuser must be provided with this option.

--upstream-conninfo

primary_conninfo value to write in recovery.conf when the intended upstream server does not yet exist.

--upstream-node-id

ID of the upstream node to replicate from (optional, defaults to primary node)

--without-barman

Do not use Barman even if configured.

Event notifications

A standby_clone event notification will be generated.

See also

See cloning standbys for details about various aspects of cloning.

repmgr standby register

Name

repmgr standby register -- add a standby's information to the repmgr metadata

Description

repmgr standby register adds a standby's information to the repmgr metadata. This command needs to be executed to enable promote/follow operations and to allow repmgrd to work with the node. An existing standby can be registered using this command. Execute with the --dry-run option to check what would happen without actually registering the standby.

Note: If providing the configuration file location with -f/--config-file, avoid using a relative path, as repmgr stores the configuration file location in the repmgr metadata for use when repmgr is executed remotely (e.g. during repmgr standby switchover). repmgr will attempt to convert the a relative path into an absolute one, but this may not be the same as the path you would explicitly provide (e.g. ./repmgr.conf might be converted to /path/to/./repmgr.conf, whereas you'd normally write /path/to/repmgr.conf).

Waiting for the the standby to start

By default, repmgr will wait 30 seconds for the standby to become available before aborting with a connection error. This is useful when setting up a standby from a script, as the standby may not have fully started up by the time repmgr standby register is executed.

To change the timeout, pass the desired value with the --wait-start option. A value of 0 will disable the timeout.

The timeout will be ignored if -F/--force was provided.

Waiting for the registration to propagate to the standby

Depending on your environment and workload, it may take some time for the standby's node record to propagate from the primary to the standby. Some actions (such as starting repmgrd) require that the standby's node record is present and up-to-date to function correctly.

By providing the option --wait-sync to the repmgr standby register command, repmgr will wait until the record is synchronised before exiting. An optional timeout (in seconds) can be added to this option (e.g. --wait-sync=60).

Registering an inactive node

Under some circumstances you may wish to register a standby which is not yet running; this can be the case when using provisioning tools to create a complex replication cluster. In this case, by using the -F/--force option and providing the connection parameters to the primary server, the standby can be registered.

Similarly, with cascading replication it may be necessary to register a standby whose upstream node has not yet been registered - in this case, using -F/--force will result in the creation of an inactive placeholder record for the upstream node, which will however later need to be registered with the -F/--force option too.

When used with repmgr standby register, care should be taken that use of the -F/--force option does not result in an incorrectly configured cluster.

Registering a node not cloned by repmgr

If you've cloned a standby using another method (e.g. barman's barman recover command), first execute repmgr standby clone --recovery-conf-only to add the recovery.conf file, then register the standby as usual.

Options

--dry-run

Check prerequisites but don't actually register the standby.

-F--force

Overwrite an existing node record

--upstream-node-id

ID of the upstream node to replicate from (optional)

--wait-start

wait for the standby to start (timeout in seconds, default 30 seconds)

--wait-sync

wait for the node record to synchronise to the standby (optional timeout in seconds)

Event notifications

A standby_register event notification will be generated immediately after the node record is updated on the primary.

If the --wait-sync option is provided, a standby_register_sync event notification will be generated immediately after the node record has synchronised to the standby.

If provided, repmgr will substitute the placeholders %p with the node ID of the primary node, %c with its conninfo string, and %a with its node name.

repmgr standby unregister

Name

repmgr standby unregister -- remove a standby's information from the repmgr metadata

Description

Unregisters a standby with repmgr. This command does not affect the actual replication, just removes the standby's entry from the repmgr metadata.

Execution

To unregister a running standby, execute:

        repmgr standby unregister -f /etc/repmgr.conf

This will remove the standby record from repmgr's internal metadata table (repmgr.nodes). A standby_unregister event notification will be recorded in the repmgr.events table.

If the standby is not running, the command can be executed on another node by providing the id of the node to be unregistered using the command line parameter --node-id, e.g. executing the following command on the primary server will unregister the standby with id 3:

        repmgr standby unregister -f /etc/repmgr.conf --node-id=3

Options

--node-id

node_id of the node to unregister (optional)

Event notifications

A standby_unregister event notification will be generated.

repmgr standby promote

Name

repmgr standby promote -- promote a standby to a primary

Description

Promotes a standby to a primary if the current primary has failed. This command requires a valid repmgr.conf file for the standby, either specified explicitly with -f/--config-file or located in a default location; no additional arguments are required.

If the standby promotion succeeds, the server will not need to be restarted. However any other standbys will need to follow the new server, by using repmgr standby follow; if repmgrd is active, it will handle this automatically.

Note that repmgr will wait for up to promote_check_timeout seconds (default: 60 seconds) to verify that the standby has been promoted, and will check the promotion every promote_check_interval seconds (default: 1 second). Both values can be defined in repmgr.conf.

Example

      $ repmgr -f /etc/repmgr.conf standby promote
      NOTICE: promoting standby to primary
      DETAIL: promoting server "node2" (ID: 2) using "pg_ctl -l /var/log/postgres/startup.log -w -D '/var/lib/postgres/data' promote"
      server promoting
      DEBUG: setting node 2 as primary and marking existing primary as failed
      NOTICE: STANDBY PROMOTE successful
      DETAIL: server "node2" (ID: 2) was successfully promoted to primary

Event notifications

A standby_promote event notification will be generated.

repmgr standby follow

Name

repmgr standby follow -- attach a standby to a new primary

Description

Attaches the standby to a new primary. This command requires a valid repmgr.conf file for the standby, either specified explicitly with -f/--config-file or located in a default location; no additional arguments are required.

This command will force a restart of the standby server, which must be running. It can only be used to attach an active standby to the current primary node (and not to another standby).

Tip: To re-add an inactive node to the replication cluster, use repmgr node rejoin.

repmgr standby follow will wait up to standby_follow_timeout seconds (default: 30) to verify the standby has actually connected to the new primary.

Example

      $ repmgr -f /etc/repmgr.conf standby follow
      INFO: setting node 3's primary to node 2
      NOTICE: restarting server using "pg_ctl -l /var/log/postgres/startup.log -w -D '/var/lib/postgres/data' restart"
      waiting for server to shut down........ done
      server stopped
      waiting for server to start.... done
      server started
      NOTICE: STANDBY FOLLOW successful
      DETAIL: node 3 is now attached to node 2

Options

--dry-run

Check prerequisites but don't actually follow a new standby.

Important: This does not guarantee the standby can follow the primary; in particular, whether the primary and standby timelines have diverged, can currently only be determined by actually attempting to attach the standby to the primary.

-w
--wait

Wait for a primary to appear. repmgr will wait for up to primary_follow_timeout seconds (default: 60 seconds) to verify that the standby is following the new primary. This value can be defined in repmgr.conf.

Event notifications

A standby_follow event notification will be generated.

If provided, repmgr will substitute the placeholders %p with the node ID of the primary being followed, %c with its conninfo string, and %a with its node name.

repmgr standby switchover

Name

repmgr standby switchover -- promote a standby to primary and demote the existing primary to a standby

Description

Promotes a standby to primary and demotes the existing primary to a standby. This command must be run on the standby to be promoted, and requires a passwordless SSH connection to the current primary.

If other standbys are connected to the demotion candidate, repmgr can instruct these to follow the new primary if the option --siblings-follow is specified. This requires a passwordless SSH connection between the promotion candidate (new primary) and the standbys attached to the demotion candidate (existing primary).

Note: Performing a switchover is a non-trivial operation. In particular it relies on the current primary being able to shut down cleanly and quickly. repmgr will attempt to check for potential issues but cannot guarantee a successful switchover.

For more details on performing a switchover, including preparation and configuration, see section Performing a switchover with repmgr.

Note: repmgrd should not be active on any nodes while a switchover is being executed. This restriction may be lifted in a later version.

repmgr will not perform the switchover if an exclusive backup is running on the current primary.

Options

--always-promote

Promote standby to primary, even if it is behind original primary (original primary will be shut down in any case).

--dry-run

Check prerequisites but don't actually execute a switchover.

Important: Success of --dry-run does not imply the switchover will complete successfully, only that the prerequisites for performing the operation are met.

-F
--force

Ignore warnings and continue anyway.

Specifically, if a problem is encountered when shutting down the current primary, using -F/--force will cause repmgr to continue by promoting the standby to be the new primary, and if --siblings-follow is specified, attach any other standbys to the new primary.

--force-rewind[=/path/to/pg_rewind]

Use pg_rewind to reintegrate the old primary if necessary (and the prerequisites for using pg_rewind are met). If using PostgreSQL 9.3 or 9.4, and the pg_rewind binary is not installed in the PostgreSQL bin directory, provide its full path. For more details see also Switchover and pg_rewind.

-R
--remote-user

System username for remote SSH operations (defaults to local system user).

--siblings-follow

Have standbys attached to the old primary follow the new primary.

Configuration file settings

Note that following parameters in repmgr.conf are relevant to the switchover operation:

  • reconnect_attempts: number of times to check the original primary for a clean shutdown after executing the shutdown command, before aborting
  • reconnect_interval: interval (in seconds) to check the original primary for a clean shutdown after executing the shutdown command (up to a maximum of reconnect_attempts tries)
  • replication_lag_critical: if replication lag (in seconds) on the standby exceeds this value, the switchover will be aborted (unless the -F/--force option is provided)
  • standby_reconnect_timeout: number of seconds to attempt to wait for the demoted primary to reconnect to the promoted primary (default: 60 seconds)

Execution

Execute with the --dry-run option to test the switchover as far as possible without actually changing the status of either node.

Important: repmgrd must be shut down on all nodes while a switchover is being executed. This restriction will be removed in a future repmgr version.

External database connections, e.g. from an application, should not be permitted while the switchover is taking place. In particular, active transactions on the primary can potentially disrupt the shutdown process.

Event notifications

standby_switchover and standby_promote event notifications will be generated for the new primary, and a node_rejoin event notification for the former primary (new standby).

If using an event notification script, standby_switchover will populate the placeholder parameter %p with the node ID of the former primary.

Exit codes

Following exit codes can be emitted by repmgr standby switchover:

SUCCESS (0)

The switchover completed successfully.

ERR_SWITCHOVER_FAIL (18)

The switchover could not be executed.

ERR_SWITCHOVER_INCOMPLETE (22)

The switchover was executed but a problem was encountered. Typically this means the former primary could not be reattached as a standby. Check preceding log messages for more information.

See also

For more details see the section Performing a switchover with repmgr.

repmgr witness register

Name

repmgr witness register -- add a witness node's information to the repmgr metadata

Description

repmgr witness register adds a witness server's node record to the repmgr metadata, and if necessary initialises the witness node by installing the repmgr extension and copying the repmgr metadata to the witness server. This command needs to be executed to enable use of the witness server with repmgrd.

When executing repmgr witness register, connection information for the cluster primary server must also be provided. repmgr will automatically use the user and dbname values defined in the conninfo string defined in the witness node's repmgr.conf, if these are not explicitly provided.

Execute with the --dry-run option to check what would happen without actually registering the witness server.

Example

    $ repmgr -f /etc/repmgr.conf witness register -h node1
    INFO: connecting to witness node "node3" (ID: 3)
    INFO: connecting to primary node
    NOTICE: attempting to install extension "repmgr"
    NOTICE: "repmgr" extension successfully installed
    INFO: witness registration complete
    NOTICE: witness node "node3" (ID: 3) successfully registered
      

Event notifications

A witness_register event notification will be generated.

repmgr witness unregister

Name

repmgr witness unregister -- remove a witness node's information to the repmgr metadata

Description

repmgr witness unregister removes a witness server's node record from the repmgr metadata.

The node does not have to be running to be unregistered, however if this is the case then either provide connection information for the primary server, or execute repmgr witness unregister on a running node and provide the parameter --node-id with the node ID of the witness server.

Execute with the --dry-run option to check what would happen without actually registering the witness server.

Examples

Unregistering a running witness node:

    $ repmgr -f /etc/repmgr.conf witness unregister
    INFO: connecting to witness node "node3" (ID: 3)
    INFO: unregistering witness node 3
    INFO: witness unregistration complete
    DETAIL: witness node with UD 3 successfully unregistered

Unregistering a non-running witness node:

        $ repmgr -f /etc/repmgr.conf witness unregister -h node1 -p 5501  -F
        INFO: connecting to node "node3" (ID: 3)
        NOTICE: unable to connect to node "node3" (ID: 3), removing node record on cluster primary only
        INFO: unregistering witness node 3
        INFO: witness unregistration complete
        DETAIL: witness node with id ID 3 successfully unregistered

Notes

This command will not make any changes to the witness node itself and will neither remove any data from the witness database nor stop the PostgreSQL instance.

A witness node which has been unregistered, can be re-registered with repmgr witness register --force.

Options

--dry-run

Check prerequisites but don't actually unregister the witness.

--node-id

Unregister witness server with the specified node ID.

Event notifications

A witness_unregister event notification will be generated.

repmgr node status

Name

repmgr node status -- show overview of a node's basic information and replication status

Description

Displays an overview of a node's basic information and replication status. This command must be run on the local node.

Example

        $ repmgr -f /etc/repmgr.conf node status
        Node "node1":
            PostgreSQL version: 10beta1
            Total data size: 30 MB
            Conninfo: host=node1 dbname=repmgr user=repmgr connect_timeout=2
            Role: primary
            WAL archiving: off
            Archive command: (none)
            Replication connections: 2 (of maximal 10)
            Replication slots: 0 (of maximal 10)
            Replication lag: n/a

Output format

  • --csv: generate output in CSV format

Exit codes

Following exit codes can be emitted by repmgr node status:

SUCCESS (0)

No issues were detected.

ERR_NODE_STATUS (25)

One or more issues were detected.

See also

See repmgr node check to diagnose issues and repmgr cluster show for an overview of all nodes in the cluster.

repmgr node check

Name

repmgr node check -- performs some health checks on a node from a replication perspective

Description

Performs some health checks on a node from a replication perspective. This command must be run on the local node.

Example

       $ repmgr -f /etc/repmgr.conf node check
       Node "node1":
            Server role: OK (node is primary)
            Replication lag: OK (N/A - node is primary)
            WAL archiving: OK (0 pending files)
            Downstream servers: OK (2 of 2 downstream nodes attached)
            Replication slots: OK (node has no replication slots)

Individual checks

Each check can be performed individually by supplying an additional command line parameter, e.g.:

        $ repmgr node check --role
        OK (node is primary)

Parameters for individual checks are as follows:

  • --role: checks if the node has the expected role
  • --replication-lag: checks if the node is lagging by more than replication_lag_warning or replication_lag_critical
  • --archive-ready: checks for WAL files which have not yet been archived, and returns WARNING or CRITICAL if the number exceeds archive_ready_warning or archive_ready_critical respectively.
  • --downstream: checks that the expected downstream nodes are attached
  • --slots: checks there are no inactive replication slots
  • --missing-slots: checks there are no missing replication slots

Output format

  • --csv: generate output in CSV format (not available for individual checks)
  • --nagios: generate output in a Nagios-compatible format

Exit codes

When executing repmgr node check with one of the individual checks listed above, repmgr will emit one of the following Nagios-style exit codes (even if --nagios is not supplied):

  • 0: OK
  • 1: WARNING
  • 2: ERROR
  • 3: UNKNOWN

Following exit codes can be emitted by repmgr status check if no individual check was specified.

SUCCESS (0)

No issues were detected.

ERR_NODE_STATUS (25)

One or more issues were detected.

repmgr node rejoin

Name

repmgr node rejoin -- rejoin a dormant (stopped) node to the replication cluster

Description

Enables a dormant (stopped) node to be rejoined to the replication cluster.

This can optionally use pg_rewind to re-integrate a node which has diverged from the rest of the cluster, typically a failed primary.

Tip: If the node is running and needs to be attached to the current primary, use repmgr standby follow.

Note repmgr standby follow can only be used for standbys which have not diverged from the rest of the cluster.

Usage

      repmgr node rejoin -d '$conninfo'

where $conninfo is the conninfo string of any reachable node in the cluster. repmgr.conf for the stopped node *must* be supplied explicitly if not otherwise available.

Options

--dry-run

Check prerequisites but don't actually execute the rejoin.

--force-rewind[=/path/to/pg_rewind]

Execute pg_rewind.

It is only necessary to provide the pg_rewind path if using PostgreSQL 9.3 or 9.4, and pg_rewind is not installed in the PostgreSQL bin directory.

--config-files

comma-separated list of configuration files to retain after executing pg_rewind.

Currently pg_rewind will overwrite the local node's configuration files with the files from the source node, so it's advisable to use this option to ensure they are kept.

--config-archive-dir

Directory to temporarily store configuration files specified with --config-files; default: /tmp.

-W/--no-wait

Don't wait for the node to rejoin cluster.

If this option is supplied, repmgr will restart the node but not wait for it to connect to the primary.

Configuration file settings

  • node_rejoin_timeout: the maximum length of time (in seconds) to wait for the node to reconnect to the replication cluster (defaults to the value set in standby_reconnect_timeout, 60 seconds).

Event notifications

A node_rejoin event notification will be generated.

Notes

Currently repmgr node rejoin can only be used to attach a standby to the current primary, not another standby.

The node must have been shut down cleanly; if this was not the case, it will need to be manually started (remove any existing recovery.conf file first) until it has reached a consistent recovery point, then shut down cleanly.

Tip: If PostgreSQL is started in single-user mode and input is directed from /dev/null/, it will perform recovery then immediately quit, and will then be in a state suitable for use by pg_rewind.

          rm -f /var/lib/pgsql/data/recovery.conf
          postgres --single -D /var/lib/pgsql/data/ < /dev/null

Using pg_rewind

repmgr node rejoin can optionally use pg_rewind to re-integrate a node which has diverged from the rest of the cluster, typically a failed primary. pg_rewind is available in PostgreSQL 9.5 and later as part of the core distribution, and can be installed from external sources for PostgreSQL 9.3 and 9.4.

Note: pg_rewind requires that either wal_log_hints is enabled, or that data checksums were enabled when the cluster was initialized. See the pg_rewind documentation for details.

To have repmgr node rejoin use pg_rewind, pass the command line option --force-rewind, which will tell repmgr to execute pg_rewind to ensure the node can be rejoined successfully.

Be aware that if pg_rewind is executed and actually performs a rewind operation, any configuration files in the PostgreSQL data directory will be overwritten with those from the source server.

To prevent this happening, provide a comma-separated list of files to retain using the --config-file command line option; the specified files will be archived in a temporary directory (whose parent directory can be specified with --config-archive-dir) and restored once the rewind operation is complete.

Example, first using --dry-run, then actually executing the node rejoin command.

    $ repmgr node rejoin -f /etc/repmgr.conf -d 'host=node1 dbname=repmgr user=repmgr' \
         --force-rewind --config-files=postgresql.local.conf,postgresql.conf --verbose --dry-run
    NOTICE: using provided configuration file "/etc/repmgr.conf"
    INFO: prerequisites for using pg_rewind are met
    INFO: file "postgresql.local.conf" would be copied to "/tmp/repmgr-config-archive-node1/postgresql.local.conf"
    INFO: file "postgresql.conf" would be copied to "/tmp/repmgr-config-archive-node1/postgresql.local.conf"
    INFO: 2 files would have been copied to "/tmp/repmgr-config-archive-node1"
    INFO: directory "/tmp/repmgr-config-archive-node1" deleted
    INFO: pg_rewind would now be executed
    DETAIL: pg_rewind command is:
      pg_rewind -D '/var/lib/postgresql/data' --source-server='host=node1 dbname=repmgr user=repmgr'

Note: If --force-rewind is used with the --dry-run option, this checks the prerequisites for using pg_rewind, but cannot predict the outcome of actually executing pg_rewind.

    $ repmgr node rejoin -f /etc/repmgr.conf -d 'host=node1 dbname=repmgr user=repmgr' \
         --force-rewind --config-files=postgresql.local.conf,postgresql.conf --verbose
    NOTICE: using provided configuration file "/etc/repmgr.conf"
    INFO: prerequisites for using pg_rewind are met
    INFO: 2 files copied to "/tmp/repmgr-config-archive-node1"
    NOTICE: executing pg_rewind
    NOTICE: 2 files copied to /var/lib/pgsql/data
    INFO: directory "/tmp/repmgr-config-archive-node1" deleted
    INFO: deleting "recovery.done"
    INFO: setting node 1's primary to node 2
    NOTICE: starting server using "pg_ctl-l /var/log/postgres/startup.log -w -D '/var/lib/pgsql/data' start"
    waiting for server to start.... done
    server started
    NOTICE: NODE REJOIN successful
    DETAIL: node 1 is now attached to node 2

repmgr cluster show

Name

repmgr cluster show -- display information about each registered node in the replication cluster

Description

Displays information about each registered node in the replication cluster. This command polls each registered server and shows its role (primary / standby / bdr) and status. It polls each server directly and can be run on any node in the cluster; this is also useful when analyzing connectivity from a particular node.

Execution

This command requires either a valid repmgr.conf file or a database connection string to one of the registered nodes; no additional arguments are needed.

To show database connection errors when polling nodes, run the command in --verbose mode.

Example

    $ repmgr -f /etc/repmgr.conf cluster show

     ID | Name  | Role    | Status    | Upstream | Location | Connection string
    ----+-------+---------+-----------+----------+----------+-----------------------------------------
     1  | node1 | primary | * running |          | default  | host=db_node1 dbname=repmgr user=repmgr
     2  | node2 | standby |   running | node1    | default  | host=db_node2 dbname=repmgr user=repmgr
     3  | node3 | standby |   running | node1    | default  | host=db_node3 dbname=repmgr user=repmgr

Notes

The column Role shows the expected server role according to the repmgr metadata. Status shows whether the server is running or unreachable. If the node has an unexpected role not reflected in the repmgr metadata, e.g. a node was manually promoted to primary, this will be highlighted with an exclamation mark, e.g.:

    $ repmgr -f /etc/repmgr.conf cluster show

     ID | Name  | Role    | Status               | Upstream | Location | Connection string
    ----+-------+---------+----------------------+----------+----------+-----------------------------------------
     1  | node1 | primary | ? unreachable        |          | default  | host=db_node1 dbname=repmgr user=repmgr
     2  | node2 | standby | ! running as primary | node1    | default  | host=db_node2 dbname=repmgr user=repmgr
     3  | node3 | standby |   running            | node1    | default  | host=db_node3 dbname=repmgr user=repmgr

    WARNING: following issues were detected
      node "node1" (ID: 1) is registered as an active primary but is unreachable
      node "node2" (ID: 2) is registered as standby but running as primary

Node availability is tested by connecting from the node where repmgr cluster show is executed, and does not necessarily imply the node is down. See repmgr cluster matrix and repmgr cluster crosscheck to get a better overviews of connections between nodes.

Options

--csv

repmgr cluster show accepts an optional parameter --csv, which outputs the replication cluster's status in a simple CSV format, suitable for parsing by scripts:

    $ repmgr -f /etc/repmgr.conf cluster show --csv
    1,-1,-1
    2,0,0
    3,0,1

The columns have following meanings:

  • node ID
  • availability (0 = available, -1 = unavailable)
  • recovery state (0 = not in recovery, 1 = in recovery, -1 = unknown)

--verbose

Display the full text of any database connection error messages

Exit codes

Following exit codes can be emitted by repmgr cluster show:

SUCCESS (0)

No issues were detected.

ERR_NODE_STATUS (25)

One or more issues were detected.

repmgr cluster matrix

Name

repmgr cluster matrix --  runs repmgr cluster show on each node and summarizes output

Description

repmgr cluster matrix runs repmgr cluster show on each node and arranges the results in a matrix, recording success or failure.

repmgr cluster matrix requires a valid repmgr.conf file on each node. Additionally, passwordless ssh connections are required between all nodes.

Example

Example 1 (all nodes up):

    $ repmgr -f /etc/repmgr.conf cluster matrix

    Name   | Id |  1 |  2 |  3
    -------+----+----+----+----
     node1 |  1 |  * |  * |  *
     node2 |  2 |  * |  * |  *
     node3 |  3 |  * |  * |  *

Example 2 (node1 and node2 up, node3 down):

    $ repmgr -f /etc/repmgr.conf cluster matrix

    Name   | Id |  1 |  2 |  3
    -------+----+----+----+----
     node1 |  1 |  * |  * |  x
     node2 |  2 |  * |  * |  x
     node3 |  3 |  ? |  ? |  ?
    

Each row corresponds to one server, and indicates the result of testing an outbound connection from that server.

Since node3 is down, all the entries in its row are filled with ?, meaning that there we cannot test outbound connections.

The other two nodes are up; the corresponding rows have x in the column corresponding to node3, meaning that inbound connections to that node have failed, and * in the columns corresponding to node1 and node2, meaning that inbound connections to these nodes have succeeded.

Example 3 (all nodes up, firewall dropping packets originating from node1 and directed to port 5432 on node3) - running repmgr cluster matrix from node1 gives the following output:

    $ repmgr -f /etc/repmgr.conf cluster matrix

    Name   | Id |  1 |  2 |  3
    -------+----+----+----+----
     node1 |  1 |  * |  * |  x
     node2 |  2 |  * |  * |  *
     node3 |  3 |  ? |  ? |  ?

Note this may take some time depending on the connect_timeout setting in the node conninfo strings; default is 1 minute which means without modification the above command would take around 2 minutes to run; see comment elsewhere about setting connect_timeout)

The matrix tells us that we cannot connect from node1 to node3, and that (therefore) we don't know the state of any outbound connection from node3.

In this case, the repmgr cluster crosscheck command will produce a more useful result.

Exit codes

Following exit codes can be emitted by repmgr cluster matrix:

SUCCESS (0)

The check completed successfully and all nodes are reachable.

ERR_NODE_STATUS (25)

One or more nodes could not be reached.

repmgr cluster crosscheck

Name

repmgr cluster crosscheck -- cross-checks connections between each combination of nodes

Description

repmgr cluster crosscheck is similar to repmgr cluster matrix, but cross-checks connections between each combination of nodes. In "Example 3" in repmgr cluster matrix we have no information about the state of node3. However by running repmgr cluster crosscheck it's possible to get a better overview of the cluster situation:

    $ repmgr -f /etc/repmgr.conf cluster crosscheck

    Name   | Id |  1 |  2 |  3
    -------+----+----+----+----
     node1 |  1 |  * |  * |  x
     node2 |  2 |  * |  * |  *
     node3 |  3 |  * |  * |  *

What happened is that repmgr cluster crosscheck merged its own repmgr cluster matrix with the repmgr cluster matrix output from node2; the latter is able to connect to node3 and therefore determine the state of outbound connections from that node.

Exit codes

Following exit codes can be emitted by repmgr cluster crosscheck:

SUCCESS (0)

The check completed successfully and all nodes are reachable.

ERR_NODE_STATUS (25)

One or more nodes could not be reached.

repmgr cluster event

Name

repmgr cluster event -- output a formatted list of cluster events

Description

Outputs a formatted list of cluster events, as stored in the repmgr.events table.

Usage

Output is in reverse chronological order, and can be filtered with the following options:

  • --all: outputs all entries
  • --limit: set the maximum number of entries to output (default: 20)
  • --node-id: restrict entries to node with this ID
  • --node-name: restrict entries to node with this name
  • --event: filter specific event (see event notifications for a full list)

The "Details" column can be omitted by providing --terse.

Output format

  • --csv: generate output in CSV format. Note that the Details column will currently not be emitted in CSV format.

Example

    $ repmgr -f /etc/repmgr.conf cluster event --event=standby_register
     Node ID | Name  | Event            | OK | Timestamp           | Details
    ---------+-------+------------------+----+---------------------+--------------------------------
     3       | node3 | standby_register | t  | 2017-08-17 10:28:55 | standby registration succeeded
     2       | node2 | standby_register | t  | 2017-08-17 10:28:53 | standby registration succeeded

repmgr cluster cleanup

Name

repmgr cluster cleanup -- purge monitoring history

Description

Purges monitoring history from the repmgr.monitoring_history table to prevent excessive table growth.

By default all data will be removed; Use the -k/--keep-history option to specify the number of days of monitoring history to retain.

This command can be executed manually or as a cronjob.

Usage

This command requires a valid repmgr.conf file for the node on which it is executed; no additional arguments are required.

Notes

Monitoring history will only be written if repmgrd is active, and monitoring_history is set to true in repmgr.conf.

Event notifications

A cluster_cleanup event notification will be generated.

See also

For more details see the sections Monitoring with repmgrd and repmgrd monitoring configuration.


Appendix A. Release notes

Changes to each repmgr release are documented in the release notes. Please read the release notes for all versions between your current version and the version you are plan to upgrade to before performing an upgrade, as there may be version-specific upgrade steps.

See also: Upgrading repmgr


A.1. Release 4.1.1

Wed September 5, 2018

repmgr 4.1.1 contains a number of usability enhancements and bug fixes.

We recommend upgrading to this version as soon as possible. This release can be installed as a simple package upgrade from repmgr 4.0 ~ 4.1.0; repmgrd (if running) should be restarted. See Upgrading repmgr for more details.


A.1.1. repmgr enhancements


A.1.2. repmgrd enhancements

  • Always reopen the log file after receiving SIGHUP. Previously this only happened if a configuration file change was detected. (GitHub #485).

  • Report version number after logger initialisation. (GitHub #487).

  • Improve cascaded standby failover handling. (GitHub #480).

  • Improve reconnection handling after brief network outages; if monitoring data being collected, this could lead to orphaned sessions on the primary. (GitHub #480).

  • Check promote_command and follow_command are defined when reloading configuration. These were checked on startup but not reload by repmgrd, which made it possible to make repmgrd with invalid values. It's unlikely anyone would want to do this, but we should make it impossible anyway. (GitHub #486).


A.1.3. Other

  • Text of any failed queries will now be logged as ERROR to assist logfile analysis at log levels higher than DEBUG. (GitHub #498).


A.1.4. Bug fixes

  • repmgr node rejoin: remove new upstream's replication slot if it still exists on the rejoined standby. (GitHub #499).

  • repmgrd: fix startup on witness node when local data is stale. (GitHub #488, #489).

  • Truncate version string reported by PostgreSQL if necessary; some distributions insert additional detail after the actual version. (GitHub #490).


A.2. Release 4.1.0

Tue July 31, 2018

repmgr 4.1.0 introduces some changes to repmgrd behaviour and some additional configuration parameters.

This release can be installed as a simple package upgrade from repmgr 4.0 ~ 4.0.6. The following post-upgrade steps must be carried out:

  • Execute ALTER EXTENSION repmgr UPDATE on the primary server in the database where repmgr is installed.

  • repmgrd must be restarted on all nodes where it is running.

A restart of the PostgreSQL server is not required for this release (unless upgrading from repmgr 3.x).

See Upgrading repmgr 4.x and later for more details.

Configuration changes are backwards-compatible and no changes to repmgr.conf are required. However users should review the changes listed below.

Note: Repository changes

Coinciding with this release, the 2ndQuadrant repository structure has changed. See section Installing from packages for details, particularly if you are using a RPM-based system.


A.2.1. Configuration file changes

  • Default for log_level is now INFO. This produces additional informative log output, without creating excessive additional log file volume, and matches the setting assumed for examples in the documentation. (GitHub #470).

  • recovery_min_apply_delay now accepts a minimum value of zero (GitHub #448).


A.2.2. repmgr enhancements

  • repmgr: always exit with an error if an unrecognised command line option is provided. This matches the behaviour of other PostgreSQL utilities such as psql. (GitHub #464).

  • repmgr: add -q/--quiet option to suppress non-error output. (GitHub #468).

  • repmgr cluster show, repmgr node check and repmgr node status return non-zero exit code if node status issues detected. (GitHub #456).

  • Add --csv output option for repmgr cluster event. (GitHub #471).

  • repmgr witness unregister can be run on any node, by providing the ID of the witness node with --node-id. (GitHub #472).

  • repmgr standby switchover will refuse to run if an exclusive backup is taking place on the current primary. (GitHub #476).


A.2.3. repmgrd enhancements

  • repmgrd: create a PID file by default (GitHub #457). For details, see repmgrd's PID file.

  • repmgrd: daemonize process by default. In case, for whatever reason, the user does not wish to daemonize the process, provide --daemonize=false. (GitHub #458).


A.2.4. Bug fixes


A.3. Release 4.0.6

Thu June 14, 2018

repmgr 4.0.6 contains a number of bug fixes and usability enhancements.

We recommend upgrading to this version as soon as possible. This release can be installed as a simple package upgrade from repmgr 4.0 ~ 4.0.5; repmgrd (if running) should be restarted. See Upgrading repmgr for more details.


A.3.1. Usability enhancements

  • repmgr cluster crosscheck and repmgr cluster matrix: return non-zero exit code if node connection issues detected (GitHub #447)

  • repmgr standby clone: Improve handling of external configuration file copying, including consideration in --dry-run check (GitHub #443)

  • When using --dry-run, force log level to INFO to ensure output will always be displayed (GitHub #441)

  • repmgr standby clone: Improve documentation of --recovery-conf-only mode (GitHub #438)

  • repmgr standby clone: Don't require presence of user parameter in conninfo string (GitHub #437)


A.3.2. Bug fixes

  • repmgr witness register: prevent registration of a witness server with the same name as an existing node

  • repmgr standby follow: check node has actually connected to new primary before reporting success (GitHub #444)

  • repmgr node rejoin: Fix bug when parsing --config-files parameter (GitHub #442)

  • repmgrd: ensure local node is counted as quorum member (GitHub #439)


A.4. Release 4.0.5

Wed May 2, 2018

repmgr 4.0.5 contains a number of usability enhancements related to pg_rewind usage, recovery.conf generation and (in repmgrd) handling of various corner-case situations, as well as a number of bug fixes.


A.4.1. Usability enhancements

  • Various documentation improvements, with particular emphasis on the importance of setting appropriate service commands instead of relying on pg_ctl.

  • Poll demoted primary after restart as a standby during a switchover operation (GitHub #408).

  • Add configuration parameter config_directory (GitHub #424).

  • Add sanity check if --upstream-node-id not supplied when executing repmgr standby register (GitHub #395).

  • Enable pg_rewind to be used with PostgreSQL 9.3/9.4 (GitHub #413).

  • When generating replication connection strings, set dbname=replication if appropriate (GitHub #421).

  • Enable provision of archive_cleanup_command in recovery.conf (GitHub #416).

  • Actively check for node to rejoin cluster (GitHub #415).

  • repmgrd: set connect_timeout=2 (if not explicitly set) when pinging a server.


A.4.2. Bug fixes

  • Fix display of conninfo parsing error messages.

  • Fix minimum accepted value for degraded_monitoring_timeout (GitHub #411).

  • Fix superuser password handling (GitHub #400)

  • Fix parsing of archive_ready_critical configuration file parameter (GitHub #426).

  • Fix repmgr cluster crosscheck output (GitHub #389)

  • Fix memory leaks in witness code (GitHub #402).

  • repmgrd: handle pg_ctl promote timeout (GitHub #425).

  • repmgrd: handle failover situation with only two nodes in the primary location, and at least one node in another location (GitHub #407).

  • repmgrd: prevent standby connection handle from going stale.


A.5. Release 4.0.4

Fri Mar 9, 2018

repmgr 4.0.4 contains some bug fixes and and a number of usability enhancements related to logging/diagnostics, event notifications and pre-action checks.

This release can be installed as a simple package upgrade from repmgr 4.0 ~ 4.0.3; repmgrd (if running) should be restarted. See Upgrading repmgr for more details.

Note: It is not possible to perform a switchover where the demotion candidate is running repmgr 4.0.2 or lower; all nodes should be upgraded to the latest version (4.0.4). This is due to additional checks introduced in 4.0.3 which require the presence of 4.0.3 or later versions on all nodes.


A.5.1. Usability enhancements

  • add repmgr standby clone --recovery-conf-only option to enable integration of a standby cloned from another source into a repmgr cluster (GitHub #382)

  • remove restriction on using replication slots when cloning from a Barman server (GitHub #379)

  • make repmgr standby promote timeout values configurable (GitHub #387)

  • add missing options to main --help output (GitHub #391, #392)


A.5.2. Bug fixes

  • ensure repmgr node rejoin honours the --dry-run option (GitHub #383)

  • improve replication slot warnings generated by repmgr node status (GitHub #385)

  • fix --superuser handling when cloning a standby (GitHub #380)

  • repmgrd: improve detection of status change from primary to standby

  • repmgrd: improve reconnection to the local node after a failover (previously a connection error due to the node starting up was being interpreted as the node being unavailable)

  • repmgrd: when running on a witness server, correctly connect to new primary after a failover

  • repmgrd: add event notification repmgrd_shutdown (GitHub #393)


A.6. Release 4.0.3

Thu Feb 15, 2018

repmgr 4.0.3 contains some bug fixes and and a number of usability enhancements related to logging/diagnostics, event notifications and pre-action checks.

This release can be installed as a simple package upgrade from repmgr 4.0 ~ 4.0.2; repmgrd (if running) should be restarted.

Note: It is not possible to perform a switchover where the demotion candidate is running repmgr 4.0.2 or lower; all nodes should be upgraded to 4.0.3. This is due to additional checks introduced in 4.0.3 which require the presence of 4.0.3 or later versions on all nodes.


A.6.1. Usability enhancements

  • improve repmgr standby switchover behaviour when pg_ctl is used to control the server and logging output is not explicitly redirected

  • improve repmgr standby switchover log messages and provide new exit code ERR_SWITCHOVER_INCOMPLETE when old primary could not be shut down cleanly

  • add check to verify the demotion candidate can make a replication connection to the promotion candidate before executing a switchover (GitHub #370)

  • add check for sufficient walsenders and replication slots on the promotion candidate before executing repmgr standby switchover (GitHub #371)

  • add --dry-run mode to repmgr standby follow (GitHub #368)

  • provide information about the primary node for repmgr standby register and repmgr standby follow event notifications (GitHub #375)

  • add standby_register_sync event notification, which is fired when repmgr standby register is run with the --wait-sync option and the new or updated standby node record has synchronised to the standby (GitHub #374)

  • when running repmgr cluster show, if any node is unreachable, output the error message encountered in the list of warnings (GitHub #369)


A.6.2. Bug fixes

  • ensure an inactive data directory can be overwritten when cloning a standby (GitHub #366)

  • repmgr node status upstream node display fixed (GitHub #363)

  • repmgr primary unregister: clarify usage and fix --help output (GitHub #373)

  • parsing of pg_basebackup_options fixed (GitHub #376)

  • ensure the pg_subtrans directory is created when cloning a standby in Barman mode

  • repmgr witness register: fix primary node check (GitHub #377).


A.7. Release 4.0.2

Thu Jan 18, 2018

repmgr 4.0.2 contains some bug fixes and small usability enhancements.

This release can be installed as a simple package upgrade from repmgr 4.0.1 or 4.0; repmgrd (if running) should be restarted.


A.7.1. Usability enhancements


A.7.2. Bug fixes

  • Add missing -W option to getopt_long() invocation (GitHub #350)

  • Automatically create slot name if missing (GitHub #343)

  • Fixes to parsing output of remote repmgr invocations (GitHub #349)

  • When registering BDR nodes, automatically create missing connection replication set (GitHub #347)

  • Handle missing node record in repmgr node rejoin (GitHub #358)


A.7.3. Documentation

  • The documentation can now be built as a single HTML file (GitHub pull request #353)


A.8. Release 4.0.1

Wed Dec 13, 2017

repmgr 4.0.1 is a bugfix release.


A.8.1. Bug fixes

  • ensure correct return codes are returned for repmgr node check --action= operations (GitHub #340)

  • Fix repmgr cluster show when repmgr schema not set in search path (GitHub #341)

  • When using --force-rewind with repmgr node rejoin delete any replication slots copied by pg_rewind (GitHub #334)

  • Only perform sanity check on accessibility of configuration files outside the data directory when --copy-external-config-files provided (GitHub #342)

  • Initialise "voting_term" table in application, not extension SQL (GitHub #344)


A.9. Release 4.0.0

Tue Nov 21, 2017

repmgr 4.0 is an entirely new version of repmgr, implementing repmgr as a native PostgreSQL extension, adding new and improving existing features, and making repmgr more user-friendly and intuitive to use. The new code base will make it easier to add additional functionality for future releases.

Note: With the new version, the opportunity has been taken to make some changes in the way repmgr is set up and configured. In particular changes have been made to some configuration file settings consistency for and clarity. Changes are covered in detail below

To standardise terminology, from this release primary is used to denote the read/write node in a streaming replication cluster. master is still accepted as an alias for repmgr commands (e.g. repmgr master register).

For detailed instructions on upgrading from repmgr 3.x, see Upgrading from repmgr 3.x.


A.9.1. Features and improvements

  • improved switchover: the switchover process has been improved and streamlined, speeding up the switchover process and can also instruct other standbys to follow the new primary once the switchover has completed. See Performing a switchover with repmgr for more details.

  • "--dry-run" option: many repmgr commands now provide a --dry-run option which will execute the command as far as possible without making any changes, which will enable possible issues to be identified before the intended operation is actually carried out.

  • easier upgrades: repmgr is now implemented as a native PostgreSQL extension, which means future upgrades can be carried out by installing the upgraded package and issuing ALTER EXTENSION repmgr UPDATE.

  • improved logging output: repmgr (and repmgrd) now provide more explicit logging output giving a better picture of what is going on. Where appropriate, DETAIL and HINT log lines provide additional detail and suggestions for resolving problems. Additionally, repmgrd now emits informational log lines at regular, configurable intervals to confirm that it's running correctly and which node(s) it's monitoring.

  • automatic configuration file location in packages: Many operating system packages place the repmgr configuration files in a version-specific subdirectory, e.g. /etc/repmgr/9.6/repmgr.conf; repmgr now makes it easy for package maintainers to provide a patch with the actual file location, meaning repmgr.conf does not need to be provided explicitly. This is currently the case for 2ndQuadrant-provided .deb and .rpm packages.

  • monitoring and status checks: New commands repmgr node check and repmgr node status providing information about a node's status and replication-related monitoring output.

  • node rejoin: New commands repmgr node rejoin enables a failed primary to be rejoined to a replication cluster, optionally using pg_rewind to synchronise its data, (note that pg_rewind may not be useable in some circumstances).

  • automatic failover: improved detection of node status; promotion decision based on a consensual model, with the promoted primary explicitly informing other standbys to follow it. The repmgrd daemon will continue functioning even if the monitored PostgreSQL instance is down, and resume monitoring if it reappears. Additionally, if the instance's role has changed (typically from a primary to a standby, e.g. following reintegration of a failed primary using repmgr node rejoin) repmgrd will automatically resume monitoring it as a standby.

  • new documentation: the existing documentation spread over multiple text files has been consolidated into DocBook format (as used by the main PostgreSQL project) and is now available online in HTML format.

    The DocBook files can easily be used to create versions of the documentation in other formats such as PDF.


A.9.2. New command line options

  • --dry-run: repmgr will attempt to perform the action as far as possible without making any changes to the database

  • --upstream-node-id: use to specify the upstream node the standby will connect later stream from, when cloning and registering a standby.

    This replaces the configuration file parameter upstream_node. as the upstream node is set when the standby is initially cloned, but can change over the lifetime of an installation (due to failovers, switchovers etc.) so it's pointless/confusing keeping the original value around in repmgr.conf.


A.9.3. Changed command line options

repmgr

  • --replication-user has been deprecated; it has been replaced by the configuration file option replication_user. The value (which defaults to the user provided in the conninfo string) will be stored in the repmgr metadata for use by repmgr standby clone and repmgr standby follow.

  • --recovery-min-apply-delay is now a configuration file parameter recovery_min_apply_delay, to ensure the setting does not get lost when a standby follows a new upstream.

  • --no-conninfo-password is deprecated; a password included in the environment variable PGPASSWORD will no longer be added to primary_conninfo by default; to force the inclusion of a password (not recommended), use the new configuration file parameter use_primary_conninfo_password. For details, ee section Managing passwords.

repmgrd

  • --monitoring-history is deprecated and is replaced by the configuration file option monitoring_history. This enables the setting to be changed without having to modify system service files.


A.9.4. Configuration file changes

Required settings

The following 4 parameters are mandatory in repmgr.conf:

  • node_id
  • node_name
  • conninfo
  • data_directory

Renamed settings

Some settings have been renamed for clarity and consistency:

  • node is now node_id
  • name is now node_name
  • barman_server is now barman_host
  • master_reponse_timeout is now async_query_timeout (to better indicate its purpose)

The following configuration file parameters have been renamed for consistency with other parameters (and conform to the pattern used by PostgreSQL itself, which uses the prefix log_ for logging parameters):

  • loglevel is now log_level
  • logfile is now log_file
  • logfacility is now log_facility

Removed settings

  • cluster has been removed
  • upstream_node - see note about --upstream-node-id above
  • retry_promote_interval_secsthis is now redundant due to changes in the failover/promotion mechanism; the new equivalent is primary_notification_timeout

Logging changes

  • default value for log_level is INFO rather than NOTICE.
  • new parameter log_status_interval, which causes repmgrd to emit a status log line at the specified interval


A.9.5. repmgrd

The shared library has been renamed from repmgr_funcs to repmgr, meaning shared_preload_libraries in postgresql.conf needs to be updated to the new name:

        shared_preload_libraries = 'repmgr'


Appendix B. Verifying digital signatures

B.1. repmgr source code signing key

The signing key ID used for repmgr source code bundles is: 0x297F1DCC.

To download the repmgr source key to your computer:

       curl -s https://repmgr.org/download/SOURCE-GPG-KEY-repmgr | gpg --import
       gpg --fingerprint 0x297F1DCC
     

then verify that the fingerprint is the expected value:

       085A BE38 6FD9 72CE 6365  340D 8365 683D 297F 1DCC

For checking tarballs, first download and import the repmgr source signing key as shown above. Then download both source tarball and the detached key (e.g. repmgr-4.0beta1.tar.gz and repmgr-4.0beta1.tar.gz.asc) from https://repmgr.org/download/ and use gpg to verify the key, e.g.:

       gpg --verify repmgr-4.0beta1.tar.gz.asc


Appendix C. FAQ (Frequently Asked Questions)

C.1. General

C.1.1. What's the difference between the repmgr versions?

repmgr 4 is a complete rewrite of the existing repmgr code base and implements repmgr as a PostgreSQL extension. It supports all PostgreSQL versions from 9.3 (although some repmgr features are not available for PostgreSQL 9.3 and 9.4).

repmgr 3.x builds on the improved replication facilities added in PostgreSQL 9.3, as well as improved automated failover support via repmgrd, and is not compatible with PostgreSQL 9.2 and earlier. We recommend upgrading to repmgr 4, as the repmgr 3.x series will no longer be actively maintained.

repmgr 2.x supports PostgreSQL 9.0 ~ 9.3. While it is compatible with PostgreSQL 9.3, we recommend using repmgr 4.x. repmgr 2.x is no longer maintained.


C.1.2. What's the advantage of using replication slots?

Replication slots, introduced in PostgreSQL 9.4, ensure that the primary server will retain WAL files until they have been consumed by all standby servers. This makes WAL file management much easier, and if used repmgr will no longer insist on a fixed minimum number (default: 5000) of WAL files being retained.

However this does mean that if a standby is no longer connected to the primary, the presence of the replication slot will cause WAL files to be retained indefinitely.


C.1.3. How many replication slots should I define in max_replication_slots?

Normally at least same number as the number of standbys which will connect to the node. Note that changes to max_replication_slots require a server restart to take effect, and as there is no particular penalty for unused replication slots, setting a higher figure will make adding new nodes easier.


C.1.4. Does repmgr support hash indexes?

Before PostgreSQL 10, hash indexes were not WAL logged and are therefore not suitable for use in streaming replication in PostgreSQL 9.6 and earlier. See the PostgreSQL documentation for details.

From PostgreSQL 10, this restriction has been lifted and hash indexes can be used in a streaming replication cluster.


C.1.5. Can repmgr assist with upgrading a PostgreSQL cluster?

For minor version upgrades, e.g. from 9.6.7 to 9.6.8, a common approach is to upgrade a standby to the latest version, perform a switchover promoting it to a primary, then upgrade the former primary.

For major version upgrades (e.g. from PostgreSQL 9.6 to PostgreSQL 10), the traditional approach is to "reseed" a cluster by upgrading a single node with pg_upgrade and recloning standbys from this.

To minimize downtime during major upgrades, for more recent PostgreSQL versions (PostgreSQL 9.4 and later), pglogical can be used to set up a parallel cluster using the newer PostgreSQL version, which can be kept in sync with the existing production cluster until the new cluster is ready to be put into production.


C.1.6. What does this error mean: ERROR: could not access file "$libdir/repmgr"?

It means the repmgr extension code is not installed in the PostgreSQL application directory. This typically happens when using PostgreSQL packages provided by a third-party vendor, which often have different filesystem layouts.

Either use PostgreSQL packages provided by the community or 2ndQuadrant; if this is not possible, contact your vendor for assistance.


C.2. repmgr

C.2.1. Can I register an existing PostgreSQL server with repmgr?

Yes, any existing PostgreSQL server which is part of the same replication cluster can be registered with repmgr. There's no requirement for a standby to have been cloned using repmgr.


C.2.2. Can I use a standby not cloned by repmgr as a repmgr node?

For a standby which has been manually cloned or recovered from an external backup manager such as Barman, the command repmgr standby clone --recovery-conf-only can be used to create the correct recovery.conf file for use with repmgr (and will create a replication slot if required). Once this has been done, register the node as usual.


C.2.4. How can a failed primary be re-added as a standby?

This is a two-stage process. First, the failed primary's data directory must be re-synced with the current primary; secondly the failed primary needs to be re-registered as a standby.

It's possible to use pg_rewind to re-synchronise the existing data directory, which will usually be much faster than re-cloning the server. However pg_rewind can only be used if PostgreSQL either has wal_log_hints enabled, or data checksums were enabled when the cluster was initialized.

Note that pg_rewind is available as part of the core PostgreSQL distribution from PostgreSQL 9.5, and as a third-party utility for PostgreSQL 9.3 and 9.4.

repmgr provides the command repmgr node rejoin which can optionally execute pg_rewind; see the repmgr node rejoin documentation for details, in particular the section Using pg_rewind.

If pg_rewind cannot be used, then the data directory will need to be re-cloned from scratch.


C.2.5. Is there an easy way to check my primary server is correctly configured for use with repmgr?

Execute repmgr standby clone with the --dry-run option; this will report any configuration problems which need to be rectified.


C.2.7. Do I need to include shared_preload_libraries = 'repmgr' in postgresql.conf if I'm not using repmgrd?

No, the repmgr shared library is only needed when running repmgrd. If you later decide to run repmgrd, you just need to add shared_preload_libraries = 'repmgr' and restart PostgreSQL.


C.2.8. I've provided replication permission for the repmgr user in pg_hba.conf but repmgr/repmgrd complains it can't connect to the server... Why?

repmgr and repmgrd need to be able to connect to the repmgr database with a normal connection to query metadata. The replication connection permission is for PostgreSQL's streaming replication (and doesn't necessarily need to be the repmgr user).


C.2.9. When cloning a standby, why do I need to provide the connection parameters for the primary server on the command line, not in the configuration file?

Cloning a standby is a one-time action; the role of the server being cloned from could change, so fixing it in the configuration file would create confusion. If repmgr needs to establish a connection to the primary server, it can retrieve this from the repmgr.nodes table on the local node, and if necessary scan the replication cluster until it locates the active primary.


C.2.10. When cloning a standby, how do I ensure the WAL files are placed in a custom directory?

Provide the option --waldir (--xlogdir in PostgreSQL 9.6 and earlier) with the absolute path to the WAL directory in pg_basebackup_options. For more details see pg_basebackup options when cloning a standby.


C.2.11. Why is there no foreign key on the node_id column in the repmgr.events table?

Under some circumstances event notifications can be generated for servers which have not yet been registered; it's also useful to retain a record of events which includes servers removed from the replication cluster which no longer have an entry in the repmgr.nodes table.


C.2.12. Why are some values in recovery.conf surrounded by pairs of single quotes?

This is to ensure that user-supplied values which are written as parameter values in recovery.conf are escaped correctly and do not cause errors when recovery.conf is parsed.

The escaping is performed by an internal PostgreSQL routine, which leaves strings consisting of digits and alphabetical characters only as-is, but wraps everything else in pairs of single quotes, even if the string does not contain any characters which need escaping.


C.3. repmgrd

C.3.1. How can I prevent a node from ever being promoted to primary?

In repmgr.conf, set its priority to a value of 0; apply the changed setting with repmgr standby register --force.

Additionally, if failover is set to manual, the node will never be considered as a promotion candidate.


C.3.2. Does repmgrd support delayed standbys?

repmgrd can monitor delayed standbys - those set up with recovery_min_apply_delay set to a non-zero value in recovery.conf - but as it's not currently possible to directly examine the value applied to the standby, repmgrd may not be able to properly evaluate the node as a promotion candidate.

We recommend that delayed standbys are explicitly excluded from promotion by setting priority to 0 in repmgr.conf.

Note that after registering a delayed standby, repmgrd will only start once the metadata added in the primary node has been replicated.


C.3.3. How can I get repmgrd to rotate its logfile?

Configure your system's logrotate service to do this; see Section 13.4.


C.3.4. I've recloned a failed primary as a standby, but repmgrd refuses to start?

Check you registered the standby after recloning. If unregistered, the standby cannot be considered as a promotion candidate even if failover is set to automatic, which is probably not what you want. repmgrd will start if failover is set to manual so the node's replication status can still be monitored, if desired.


C.3.5. repmgrd ignores pg_bindir when executing promote_command or follow_command

promote_command or follow_command can be user-defined scripts, so repmgr will not apply pg_bindir even if excuting repmgr. Always provide the full path; see Section 13.1.1 for more details.


C.3.6. repmgrd aborts startup with the error "upstream node must be running before repmgrd can start"

repmgrd does this to avoid starting up on a replication cluster which is not in a healthy state. If the upstream is unavailable, repmgrd may initiate a failover immediately after starting up, which could have unintended side-effects, particularly if repmgrd is not running on other nodes.

In particular, it's possible that the node's local copy of the repmgr.nodes copy is out-of-date, which may lead to incorrect failover behaviour.

The onus is therefore on the adminstrator to manually set the cluster to a stable, healthy state before starting repmgrd.


Appendix D. repmgr package details

This section provides technical details about various repmgr binary packages, such as location of the installed binaries and configuration files.


D.1. CentOS Packages

Currently, repmgr RPM packages are provided for versions 6.x and 7.x of CentOS. These should also work on matching versions of Red Hat Enterprise Linux, Scientific Linux and Oracle Enterprise Linux; together with CentOS, these are the same RedHat-based distributions for which the main community project (PGDG) provides packages (see the PostgreSQL RPM Building Project page for details).

Note these repmgr RPM packages are not designed to work with SuSE/OpenSuSE.

Note: repmgr packages are designed to be compatible with community-provided PostgreSQL packages. They may not work with vendor-specific packages such as those provided by RedHat for RHEL customers, as the filesystem layout may be different to the community RPMs. Please contact your support vendor for assistance.


D.1.1. CentOS repositories

repmgr packages are available from the public 2ndQuadrant repository, and also the PostgreSQL community repository. The 2ndQuadrant repository is updated immediately after each repmgr release.

Table D-1. 2ndQuadrant public repository

Repository URL:https://dl.2ndquadrant.com/
Repository documentation:https://repmgr.org/docs/4.1/installation-packages.html#INSTALLATION-PACKAGES-REDHAT-2NDQ

Table D-2. PostgreSQL community repository (PGDG)

Repository URL:https://yum.postgresql.org/repopackages.php
Repository documentation:https://yum.postgresql.org/

D.1.2. CentOS package details

The two tables below list relevant information, paths, commands etc. for the repmgr packages on CentOS 7 (with systemd) and CentOS 6 (no systemd). Substitute the appropriate PostgreSQL major version number for your installation.

Note: For PostgreSQL 9.6 and lower, the CentOS packages use a mixture of 9.6 and 96 in various places to designate the major version; e.g. the package name is repmgr96, but the binary directory is /var/lib/pgsql/9.6/data.

From PostgreSQL 10, the first part of the version number (e.g. 10) is the major version, so there is more consistency in file/path/package naming (package repmgr10, binary directory /var/lib/pgsql/10/data).

Table D-3. CentOS 7 packages

Package name example:repmgr10-4.0.4-1.rhel7.x86_64
Metapackage:(none)
Installation command:yum install repmgr10
Binary location:/usr/pgsql-10/bin
repmgr in default path:NO
Configuration file location:/etc/repmgr/10/repmgr.conf
Data directory:/var/lib/pgsql/10/data
repmgrd service command:systemctl [start|stop|restart|reload] repmgr10
repmgrd service file location:/usr/lib/systemd/system/repmgr10.service
repmgrd log file location:(not specified by package; set in repmgr.conf)

Table D-4. CentOS 6 packages

Package name example:repmgr96-4.0.4-1.rhel6.x86_64
Metapackage:(none)
Installation command:yum install repmgr96
Binary location:/usr/pgsql-9.6/bin
repmgr in default path:NO
Configuration file location:/etc/repmgr/9.6/repmgr.conf
Data directory:/var/lib/pgsql/9.6/data
repmgrd service command:service [start|stop|restart|reload] repmgr-9.6
repmgrd service file location:/etc/init.d/repmgr-9.6
repmgrd log file location:/var/log/repmgr/repmgrd-9.6.log

D.2. Debian/Ubuntu Packages

repmgr .deb packages are provided via the PostgreSQL Community APT repository, and are available for each community-supported PostgreSQL version, currently supported Debian releases, and currently supported Ubuntu LTS releases.


D.2.1. APT repository

repmgr packages are available from the PostgreSQL Community APT repository, which is updated immediately after each repmgr release.

Table D-5. 2ndQuadrant public repository

Repository URL:https://dl.2ndquadrant.com/
Repository documentation:https://repmgr.org/docs/4.1/installation-packages.html#INSTALLATION-PACKAGES-DEBIAN

Table D-6. PostgreSQL Community APT repository (PGDG)

Repository URL:http://apt.postgresql.org/
Repository documentation:https://wiki.postgresql.org/wiki/Apt)

D.2.2. Debian/Ubuntu package details

The table below lists relevant information, paths, commands etc. for the repmgr packages on Debian 9.x ("Stretch"). Substitute the appropriate PostgreSQL major version number for your installation.

See also Section 13.2.2 for some specifics related to configuring the repmgrd daemon.

Table D-7. Debian 9.x packages

Package name example:postgresql-10-repmgr
Metapackage:repmgr-common
Installation command:apt-get install postgresql-10-repmgr
Binary location:/usr/lib/postgresql/10/bin
repmgr in default path:Yes (via wrapper script /usr/bin/repmgr)
Configuration file location:(not set by package)
Data directory:/var/lib/postgresql/10/main
PostgreSQL service command:systemctl [start|stop|restart|reload] postgresql@10-main
repmgrd service command:systemctl [start|stop|restart|reload] repmgrd
repmgrd service file location:/etc/init.d/repmgrd (defaults in: /etc/defaults/repmgrd)
repmgrd log file location:(not specified by package; set in repmgr.conf)

Note: Instead of using the systemd service command directly, it's recommended to execute pg_ctlcluster (as root, either directly or via sudo), e.g.:

            pg_ctlcluster 10 main [start|stop|restart|reload]

For pre-systemd systems, pg_ctlcluster can be executed directly by the postgres user.


D.3. Snapshot packages

For testing new features and bug fixes, from time to time 2ndQuadrant provides so-called "snapshot packages" via its public repository. These packages are built from the repmgr source at a particular point in time, and are not formal releases.

Note: We do not recommend installing these packages in a production environment unless specifically advised.

To install a snapshot package, it's necessary to install the 2ndQuadrant public snapshot repository, following the instructions here: https://dl.2ndquadrant.com/default/release/site/ but replace release with snapshot in the appropriate URL.

For example, to install the snapshot RPM repository for PostgreSQL 9.6, execute (as root):

curl https://dl.2ndquadrant.com/default/snapshot/get/9.6/rpm | bash

or as a normal user with root sudo access:

curl https://dl.2ndquadrant.com/default/snapshot/get/9.6/rpm | sudo bash

Alternatively you can browse the repository here: https://dl.2ndquadrant.com/default/snapshot/browse/.

Once the repository is installed, installing or updating repmgr will result in the latest snapshot package being installed.

The package name will be formatted like this:

repmgr96-4.1.1-0.0git320.g5113ab0.1.el7.x86_64.rpm

containg the snapshot build number (here: 320) and the hash of the git commit it was built from (here: g5113ab0).

Note that the next formal release (in the above example 4.1.1), once available, will install in place of any snapshot builds.


D.4. Installing old package versions


D.4.1. Debian/Ubuntu

An archive of old packages (3.3.2 and later) for Debian/Ubuntu-based systems is available here: http://atalia.postgresql.org/morgue/r/repmgr/


D.4.2. RHEL/CentOS

Old RPM packages (3.2 and later) can be retrieved from the (deprecated) 2ndQuadrant repository at http://packages.2ndquadrant.com/ by installing the appropriate repository RPM:

Old versions can be located with e.g.:

          yum --showduplicates list repmgr96

(substitute the appropriate package name; see CentOS packages) and installed with:

          yum install {package_name}-{version}

where {package_name} is the base package name (e.g. repmgr96) and {version} is the version listed by the yum --showduplicates list ... command, e.g. 4.0.6-1.rhel6.

For example:

          yum install repmgr96-4.0.6-1.rhel6


D.5. Information for packagers

We recommend patching the following parameters when building the package as built-in default values for user convenience. These values can nevertheless be overridden by the user, if desired.

  • Configuration file location: the default configuration file location can be hard-coded by patching package_conf_file in configfile.c:

    		/* packagers: if feasible, patch configuration file path into "package_conf_file" */
    		char		package_conf_file[MAXPGPATH] = "";

    See also: configuration file location

  • PID file location: the default repmgrd PID file location can be hard-coded by patching package_pid_file in repmgrd.c:

    		/* packagers: if feasible, patch PID file path into "package_pid_file" */
    		char		package_pid_file[MAXPGPATH] = "";

    See also: repmgrd's PID file


Index


C

cloning
advanced options, Advanced cloning options
cascading replication, Cloning and cascading replication
from Barman, Cloning a standby from Barman
replication slots, Cloning and replication slots
using passwords, Managing passwords
concepts, Concepts
configuration
database user permissions, repmgr database user permissions
repmgr.conf location, Configuration file location
conninfo configuration file parameter, Required configuration file settings

D

data_directory configuration file parameter, Required configuration file settings
Debian/Ubuntu
repmgrd daemon configuration, repmgrd daemon configuration on Debian/Ubuntu

E

event notifications, Event Notifications

F

FAQ (Frequently Asked Questions), FAQ (Frequently Asked Questions)
Following a new primary, Following a new primary
see also repmgr standby follow

I

installation, Installation
from source, Installing repmgr from source
on Debian/Ubuntu etc., Debian/Ubuntu
on Red Hat/CentOS/Fedora etc., RedHat/CentOS/Fedora
requirements, Requirements for installing repmgr

L

log rotation
repmgrd, repmgrd log rotation
log settings
configuration in repmgr.conf, Log settings
log_facility configuration file parameter, Log settings
log_file configuration file parameter, Log settings
log_level configuration file parameter, Log settings
log_status_interval configuration file parameter, Log settings

M

monitoring
with repmgrd, Monitoring with repmgrd

N

node_id configuration file parameter, Required configuration file settings
node_name configuration file parameter, Required configuration file settings

P

packages, repmgr package details
CentOS packages, CentOS Packages
Debian/Ubuntu packages, Debian/Ubuntu Packages
information for packagers, Information for packagers
old versions, Installing old package versions
snaphots, Snapshot packages
pg_ctlcluster
service command settings, Service command settings
pg_rewind
using with "repmgr node rejoin", Using pg_rewind
using with "repmgr standby switchover", Switchover and pg_rewind
pg_upgrade, pg_upgrade and repmgr
PID file
repmgrd, repmgrd's PID file
promoting a standby, Promoting a standby server with repmgr
see also repmgr standby promote

R

recovery.conf
customising with "repmgr standby clone", Customising recovery.conf
generating for a standby cloned by another method, Using a standby cloned by another method
Release notes, Release notes
replication slots
cloning, Cloning and replication slots
repmgr cluster cleanup, repmgr cluster cleanup
repmgr cluster crosscheck, repmgr cluster crosscheck
repmgr cluster event, repmgr cluster event
repmgr cluster matrix, repmgr cluster matrix
repmgr cluster show, repmgr cluster show
repmgr node check, repmgr node check
repmgr node rejoin, repmgr node rejoin
repmgr node status, repmgr node status
repmgr primary register, repmgr primary register
repmgr primary unregister, repmgr primary unregister
repmgr standby clone, repmgr standby clone
see also cloning
repmgr standby follow, repmgr standby follow
repmgr standby promote, repmgr standby promote
repmgr standby register, repmgr standby register
repmgr standby switchover, repmgr standby switchover
repmgr standby unregister, repmgr standby unregister
repmgr witness register, repmgr witness register
see also witness server
repmgr witness unregister, repmgr witness unregister
repmgr.conf
location, Configuration file location
log settings, Log settings
required settings, Required configuration file settings
service command settings, Service command settings
repmgrd
automatic failover, Automatic failover with repmgrd
BDR, BDR failover with repmgrd
cascading replication, repmgrd and cascading replication
configuration, repmgrd configuration
Debian/Ubuntu and daemon configuration, repmgrd daemon configuration on Debian/Ubuntu
degraded monitoring, "degraded monitoring" mode
log rotation, repmgrd log rotation
monitoring, Monitoring with repmgrd
monitoring configuration, Monitoring configuration
network splits, Handling network splits with repmgrd
PID file, repmgrd's PID file
PostgreSQL service configuration, PostgreSQL service configuration
starting and stopping, repmgrd daemon
witness server, Using a witness server with repmgrd

S

service command settings
configuration in repmgr.conf, Service command settings
snapshot packages, Snapshot packages
switchover, Performing a switchover with repmgr
caveats, Caveats
execution, Executing the switchover command
preparation, Preparing for switchover

U

upgrading, Upgrading repmgr
from repmgr 3.x, Upgrading from repmgr 3.x
pg_upgrade, pg_upgrade and repmgr
repmgr 4.x and later, Upgrading repmgr 4.x and later

W

witness server, Using a witness server
see also Using a witness server with repmgrd