Copyright © 2010-2024 EDB
Legal Notice
repmgr is Copyright © 2010-2024 by EDB All rights reserved.
This program is free software: you can redistribute it and/or modify it under the terms of the GNU General Public License as published by the Free Software Foundation, either version 3 of the License, or (at your option) any later version.
This program is distributed in the hope that it will be useful, but WITHOUT ANY WARRANTY; without even the implied warranty of MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the GNU General Public License for more details.
You should have received a copy of the GNU General Public License along with this program. If not, see https://www.gnu.org/licenses/ to obtain one.
Abstract
This is the official documentation of repmgr 5.5.0 for use with PostgreSQL 12 - PostgreSQL 17.
repmgr is being continually developed and we strongly recommend using the latest version. Please check the repmgr website for details about the current repmgr version as well as the current repmgr documentation.
repmgr is developed by EDB along with contributions from other individuals and organisations. Contributions from the community are appreciated and welcome - get in touch via github or the mailing list/forum. Multiple EDB customers contribute funding to make repmgr development possible.
repmgr is fully supported by EDB's 24/7 Production Support. EDB, a Major Sponsor of the PostgreSQL project, continues to maintain repmgr. We welcome participation from other organisations and individual developers.
Table of Contents
Table of Contents
Table of Contents
This chapter provides a high-level overview of repmgr's components and functionality.
This guide assumes that you are familiar with PostgreSQL administration and streaming replication concepts. For further details on streaming replication, see the PostgreSQL documentation section on streaming replication.
The following terms are used throughout the repmgr documentation.
repmgr provides functionality to set up a so-called "witness server" to assist in determining a new primary server in a failover situation with more than one standby. The witness server itself is not part of the replication cluster, although it does contain a copy of the repmgr metadata schema.
The purpose of a witness server is to provide a "casting vote" where servers in the replication cluster are split over more than one location. In the event of a loss of connectivity between locations, the presence or absence of the witness server will decide whether a server at that location is promoted to primary; this is to prevent a "split-brain" situation where an isolated location interprets a network outage as a failure of the (remote) primary and promotes a (local) standby.
A witness server only needs to be created if repmgrd is in use.
repmgr is a suite of open-source tools to manage replication and failover within a cluster of PostgreSQL servers. It supports and enhances PostgreSQL's built-in streaming replication, which provides a single read/write primary server and one or more read-only standbys containing near-real time copies of the primary server's database. It provides two main tools:
A command-line tool used to perform administrative tasks such as:
A daemon which actively monitors servers in a replication cluster and performs the following tasks:
In order to effectively manage a replication cluster, repmgr needs to store
information about the servers in the cluster in a dedicated database schema.
This schema is automatically created by the repmgr extension, which is installed
during the first step in initializing a repmgr-administered cluster
(repmgr primary register
)
and contains the following objects:
repmgr.events
: records events of interestrepmgr.nodes
: connection and status information for each server in the
replication clusterrepmgr.monitoring_history
: historical standby monitoring information
written by repmgrd
repmgr.nodes
, additionally showing the
name of the server's upstream node
The repmgr metadata schema can be stored in an existing database or in its own dedicated database. Note that the repmgr metadata schema cannot reside on a database server which is not part of the replication cluster managed by repmgr.
A database user must be available for repmgr to access this database and perform
necessary changes. This user does not need to be a superuser, however some operations
such as initial installation of the repmgr extension will require a superuser
connection (this can be specified where required with the command line option
--superuser
).
Table of Contents
repmgr can be installed from binary packages provided by your operating system's packaging system, or from source.
In general we recommend using binary packages, unless unavailable for your operating system.
Source installs are mainly useful if you want to keep track of the very latest repmgr development and contribute to development. They're also the only option if there are no packages for your operating system yet.
Before installing repmgr make sure you satisfy the installation requirements.
repmgr is developed and tested on Linux and OS X, but should work on any UNIX-like system supported by PostgreSQL itself. There is no support for Microsoft Windows.
repmgr 5.5.0 is compatible with all PostgreSQL versions from 9.4. See section repmgr compatibility matrix for an overview of version compatibility.
If upgrading from repmgr 3.x, please see the section Upgrading from repmgr 3.x.
All servers in the replication cluster must be running the same major version of PostgreSQL, and we recommend that they also run the same minor version.
repmgr must be installed on each server in the replication cluster. If installing repmgr from packages, the package version must match the PostgreSQL version. If installing from source, repmgr must be compiled against the same major version.
The same "major" repmgr version (e.g. 5.5.0.x
) must
be installed on all node in the replication cluster. We strongly recommend keeping all
nodes on the same (preferably latest) "minor" repmgr version to minimize the risk
of incompatibilities.
If different "major" repmgr versions (e.g. 4.1.x and 5.5.0.x) are installed on different nodes, in the best case repmgr (in particular repmgrd) will not run. In the worst case, you will end up with a broken cluster.
A dedicated system user for repmgr is not required; as many repmgr and
repmgrd actions require direct access to the PostgreSQL data directory,
these commands should be executed by the postgres
user.
See also Prerequisites for configuration for information on networking requirements.
We recommend using a session multiplexer utility such as screen
or
tmux
when performing long-running actions (such as cloning a database)
on a remote server - this will ensure the repmgr action won't be prematurely
terminated if your ssh
session to the server is interrupted or closed.
The following table provides an overview of which repmgr version supports which PostgreSQL version.
Table 2.1. repmgr compatibility matrix
repmgr version | Supported? | Latest release | Supported PostgreSQL versions | Notes |
---|---|---|---|---|
repmgr 5.4 | (dev) | 5.5.0 (2024-XX-XX) | 9.4, 9.5, 9.6, 10, 11, 12, 13, 15 | |
repmgr 5.3 | YES | 5.5.0 (2024-XX-XX) | 9.4, 9.5, 9.6, 10, 11, 12, 13, 14, 15 | PostgreSQL 15 supported from repmgr 5.3.3 |
repmgr 5.2 | NO | 5.2.1 (2020-12-07) | 9.4, 9.5, 9.6, 10, 11, 12, 13 | |
repmgr 5.1 | NO | 5.1.0 (2020-04-13) | 9.3, 9.4, 9.5, 9.6, 10, 11, 12 | |
repmgr 5.0 | NO | 5.0 (2019-10-15) | 9.3, 9.4, 9.5, 9.6, 10, 11, 12 | |
repmgr 4.x | NO | 4.4 (2019-06-27) | 9.3, 9.4, 9.5, 9.6, 10, 11 | |
repmgr 3.x | NO | 3.3.2 (2017-05-30) | 9.3, 9.4, 9.5, 9.6 | |
repmgr 2.x | NO | 2.0.3 (2015-04-16) | 9.0, 9.1, 9.2, 9.3, 9.4 |
The repmgr 2.x and 3.x series are no longer maintained or supported. We strongly recommend upgrading to the latest repmgr version.
Following the release of repmgr 5.0, there will be no further releases of the repmgr 4.x series. Note that repmgr 5.x is an incremental development of the 4.x series and repmgr 4.x users should upgrade to this as soon as possible.
Note that some repmgr functionality is not available in PostgreSQL 9.4:
In PostgreSQL 9.4, pg_rewind
is not part of the core
distribution. pg_rewind
will need to be compiled separately to be able
to use any repmgr functionality which takes advantage of it.
PostgreSQL 9.3 has reached the end of its community support period (final release was 9.3.25 in November 2018) and will no longer be updated with security or bugfixes.
Beginning with repmgr 5.2, repmgr no longer supports PostgreSQL 9.3.
PostgreSQL 9.4 has reached the end of its community support period (final release was 9.4.26 in February 2020) and will no longer be updated with security or bugfixes.
We recommend that users of these versions migrate to a supported PostgreSQL version as soon as possible.
For further details, see the PostgreSQL Versioning Policy.
We recommend installing repmgr using the available packages for your system.
repmgr RPM packages for RedHat/CentOS variants and Fedora are available from the EDB public repository; see following section for details.
Currently the EDB public repository provides support for RedHat/CentOS versions 6,7 and 8.
RPM packages for repmgr are also available via Yum through the PostgreSQL Global Development Group (PGDG) RPM repository (https://yum.postgresql.org/). Follow the instructions for your distribution (RedHat, CentOS, Fedora, etc.) and architecture as detailed there. Note that it can take some days for new repmgr packages to become available via the this repository.
repmgr RPM packages are designed to be compatible with the community-provided PostgreSQL packages and EDB's PostgreSQL Extended Server (formerly 2ndQPostgres). They may not work with vendor-specific packages such as those provided by RedHat for RHEL customers, as the PostgreSQL filesystem layout may be different to the community RPMs. Please contact your support vendor for assistance.
See also FAQ entry Compatibility with third party vendor packages.
For more information on the package contents, including details of installation paths and relevant service commands, see the appendix section CentOS packages.
EDB provides a dedicated yum
public repository for EDB software,
including repmgr. We recommend using this for all future repmgr releases.
General instructions for using this repository can be found on its homepage. Specific instructions for installing repmgr follow below.
Installation
Locate the repository RPM for your PostgreSQL version from the list at: https://dl.enterprisedb.com/
Install the repository definition for your distribution and PostgreSQL version (this enables the EDB repository as a source of repmgr packages).
For example, for PostgreSQL 14 on Rocky Linux 8, execute:
curl https://dl.enterprisedb.com/default/release/get/14/rpm | sudo bash
Verify that the repository is installed with:
sudo dnf repolist
The output should contain two entries like this:
2ndquadrant-dl-default-release-pg14 2ndQuadrant packages (PG14) for 8 - x86_64 2ndquadrant-dl-default-release-pg14-debug 2ndQuadrant packages (PG14) for 8 - x86_64 - Debug
Install the repmgr version appropriate for your PostgreSQL version (e.g. repmgr14
):
sudo dnf install repmgr14
To determine the names of available packages, execute:
dnf search repmgr
In CentOS 7 and earlier, use yum
instead of dnf
.
Compatibility with PGDG Repositories
The EDB repmgr yum repository packages use the same definitions and file system layout as the main PGDG repository.
Normally yum will prioritize the repository with the most recent repmgr version. Once the PGDG repository has been updated, it doesn't matter which repository the packages are installed from.
To ensure the EDB repository is always prioritised, set the priority
option
in the repository configuration file (e.g. /etc/yum.repos.d/2ndquadrant-dl-default-release-pg14.repo
accordingly.
With CentOS 7 and earlier, the package yum-plugin-priorities
must be installed
to be able to set the repository priority.
Installing a specific package version
To install a specific package version, execute dnf --showduplicates list
for the package in question:
[root@localhost ~]# dnf --showduplicates list repmgr10 Last metadata expiration check: 0:09:15 ago on Fri 11 Mar 2022 01:09:19 AM UTC. Installed Packages repmgr10.x86_64 5.3.1-1.el8 @2ndquadrant-dl-default-release-pg10 Available Packages repmgr10.x86_64 5.0.0-1.rhel8 pgdg10 repmgr10.x86_64 5.1.0-1.el8 2ndquadrant-dl-default-release-pg10 repmgr10.x86_64 5.1.0-1.rhel8 pgdg10 repmgr10.x86_64 5.1.0-2.el8 2ndquadrant-dl-default-release-pg10 repmgr10.x86_64 5.2.0-1.el8 2ndquadrant-dl-default-release-pg10 repmgr10.x86_64 5.2.0-1.rhel8 pgdg10 repmgr10.x86_64 5.2.1-1.el8 2ndquadrant-dl-default-release-pg10 repmgr10.x86_64 5.3.0-1.el8 2ndquadrant-dl-default-release-pg10 repmgr10.x86_64 5.3.1-1.el8 2ndquadrant-dl-default-release-pg10
then append the appropriate version number to the package name with a hyphen, e.g.:
[root@localhost ~]# dnf install repmgr10-5.3.0-1.el8
Installing old packages
See appendix Installing old package versions for details on how to retrieve older package versions.
.deb packages for repmgr are available from the PostgreSQL Community APT repository (https://apt.postgresql.org/). Instructions can be found in the APT section of the PostgreSQL Wiki (https://wiki.postgresql.org/wiki/Apt).
For more information on the package contents, including details of installation paths and relevant service commands, see the appendix section Debian/Ubuntu packages.
EDB provides a public apt repository for EDB software, including repmgr.
General instructions for using this repository can be found on its homepage. Specific instructions for installing repmgr follow below.
Installation
Install the repository definition for your distribution and PostgreSQL version (this enables the EDB repository as a source of repmgr packages) by executing:
curl https://dl.enterprisedb.com/default/release/get/deb | sudo bash
This will automatically install the following additional packages, if not already present:
lsb-release
apt-transport-https
Install the repmgr version appropriate for your PostgreSQL version (e.g. repmgr11
):
sudo apt-get install postgresql-11-repmgr
For packages for PostgreSQL 9.6 and earlier, the package name includes
a period between major and minor version numbers, e.g.
postgresql-9.6-repmgr
.
Installing old packages
See appendix Installing old package versions for details on how to retrieve older package versions.
To install repmgr the prerequisites for compiling PostgreSQL must be installed. These are described in PostgreSQL's documentation on build requirements and build requirements for documentation.
Most mainstream Linux distributions and other UNIX variants provide simple ways to install the prerequisites from packages.
Debian
and Ubuntu
: First
add the apt.postgresql.org
repository to your sources.list
if you
have not already done so, and ensure the source repository is enabled.
If not configured, the source repository can be added by including
a deb-src
line as a copy of the existing deb
line in the repository file, which is usually
/etc/apt/sources.list.d/pgdg.list
, e.g.:
deb https://apt.postgresql.org/pub/repos/apt/ stretch-pgdg main deb-src https://apt.postgresql.org/pub/repos/apt/ stretch-pgdg main
Then install the prerequisites for building PostgreSQL with e.g.:
sudo apt-get update sudo apt-get build-dep postgresql-9.6
Select the appropriate PostgreSQL version for your target repmgr version.
If using apt-get build-dep
is not possible, the
following packages may need to be installed manually:
flex
libedit-dev
libkrb5-dev
libpam0g-dev
libreadline-dev
libselinux1-dev
libssl-dev
libxml2-dev
libxslt1-dev
RHEL or CentOS 6.x or 7.x
: install the appropriate repository RPM
for your system from
yum.postgresql.org. Then install the prerequisites for building
PostgreSQL with:
sudo yum check-update sudo yum groupinstall "Development Tools" sudo yum install yum-utils openjade docbook-dtds docbook-style-dsssl docbook-style-xsl sudo yum-builddep postgresql96
Select the appropriate PostgreSQL version for your target repmgr version.
If using yum-builddep
is not possible, the
following packages may need to be installed manually:
flex
libselinux-devel
libxml2-devel
libxslt-devel
openssl-devel
pam-devel
readline-devel
If building against PostgreSQL 11 or later configured with the --with-llvm
option
(this is the case with the PGDG-provided packages) you'll also need to install the
llvm-toolset-7-clang
package. This is available via the
Software Collections (SCL) Repository.
There are two ways to get the repmgr source code: with git, or by downloading tarballs of released versions.
Use git if you expect to update often, you want to keep track of development or if you want to contribute changes to repmgr. There is no reason not to use git if you're familiar with it.
The source for repmgr is maintained at https://github.com/EnterpriseDB/repmgr.
There are also tags for each repmgr release, e.g.
v4.4.0
.
Clone the source code using git:
git clone https://github.com/EnterpriseDB/repmgr
For more information on using git see git-scm.com.
Official release source code is uploaded as tarballs to the repmgr website along with a tarball checksum and a matching GnuPG signature. See http://repmgr.org/ for the download information. See Verifying digital signatures for information on verifying digital signatures.
You will need to download the repmgr source, e.g. repmgr-4.0.tar.gz
.
You may optionally verify the package checksums from the
.md5
files and/or verify the GnuPG signatures
per Verifying digital signatures.
After you unpack the source code archives using tar xf
the installation process is the same as if you were installing from a git
clone.
To installing repmgr from source, simply execute:
./configure && make install
Ensure pg_config
for the target PostgreSQL version is in
$PATH
.
The repmgr documentation is (like the main PostgreSQL project) written in DocBook XML format. To build it locally as HTML, you'll need to install the required packages as described in the PostgreSQL documentation.
The minimum PostgreSQL version for building the repmgr documentation is PostgreSQL 9.5.
In repmgr 4.3 and earlier, the documentation can only be built against PostgreSQL 9.6 or earlier.
To build the documentation as HTML, execute:
./configure && make doc
The generated HTML files will be placed in the doc/html
subdirectory of your source tree.
To build the documentation as a single HTML file, after configuring and building the main repmgr source as described above, execute:
./configure && make doc-repmgr.html
To build the documentation as a PDF file, after configuring and building the main repmgr source as described above, execute:
./configure && make doc-repmgr-A4.pdf
Table of Contents
This section gives a quick introduction to repmgr, including setting up a sample repmgr installation and a basic replication cluster.
These instructions for demonstration purposes and are not suitable for a production install, as issues such as account security considerations, and system administration best practices are omitted.
To upgrade an existing repmgr 3.x installation, see section Upgrading from repmgr 3.x.
The following section will describe how to set up a basic replication cluster with a primary and a standby server using the repmgr command line tool.
We'll assume the primary is called node1
with IP address
192.168.1.11
, and the standby is called node2
with IP address 192.168.1.12
Following software must be installed on both servers:
At network level, connections between the PostgreSQL port (default: 5432
)
must be possible in both directions.
If you want repmgr to copy configuration files which are
located outside the PostgreSQL data directory, and/or to test
switchover
functionality, you will also need passwordless SSH connections between both servers, and
rsync should be installed.
For testing repmgr, it's possible to use multiple PostgreSQL
instances running on different ports on the same computer, with
passwordless SSH access to localhost
enabled.
On the primary server, a PostgreSQL instance must be initialised and running. The following replication settings may need to be adjusted:
# Enable replication connections; set this value to at least one more # than the number of standbys which will connect to this server # (note that repmgr will execute "pg_basebackup" in WAL streaming mode, # which requires two free WAL senders). # # See: https://www.postgresql.org/docs/current/runtime-config-replication.html#GUC-MAX-WAL-SENDERS max_wal_senders = 10 # If using replication slots, set this value to at least one more # than the number of standbys which will connect to this server. # Note that repmgr will only make use of replication slots if # "use_replication_slots" is set to "true" in "repmgr.conf". # (If you are not intending to use replication slots, this value # can be set to "0"). # # See: https://www.postgresql.org/docs/current/runtime-config-replication.html#GUC-MAX-REPLICATION-SLOTS max_replication_slots = 10 # Ensure WAL files contain enough information to enable read-only queries # on the standby. # # PostgreSQL 9.5 and earlier: one of 'hot_standby' or 'logical' # PostgreSQL 9.6 and later: one of 'replica' or 'logical' # ('hot_standby' will still be accepted as an alias for 'replica') # # See: https://www.postgresql.org/docs/current/runtime-config-wal.html#GUC-WAL-LEVEL wal_level = 'hot_standby' # Enable read-only queries on a standby # (Note: this will be ignored on a primary but we recommend including # it anyway, in case the primary later becomes a standby) # # See: https://www.postgresql.org/docs/current/runtime-config-replication.html#GUC-HOT-STANDBY hot_standby = on # Enable WAL file archiving # # See: https://www.postgresql.org/docs/current/runtime-config-wal.html#GUC-ARCHIVE-MODE archive_mode = on # Set archive command to a dummy command; this can later be changed without # needing to restart the PostgreSQL instance. # # See: https://www.postgresql.org/docs/current/runtime-config-wal.html#GUC-ARCHIVE-COMMAND archive_command = '/bin/true'
Rather than editing these settings in the default postgresql.conf
file, create a separate file such as postgresql.replication.conf
and
include it from the end of the main configuration file with:
include 'postgresql.replication.conf'
.
Additionally, if you are intending to use pg_rewind,
and the cluster was not initialised using data checksums, you may want to consider enabling
wal_log_hints
; for more details see Using pg_rewind.
See also the PostgreSQL configuration section in the repmgr configuration guide.
Create a dedicated PostgreSQL superuser account and a database for the repmgr metadata, e.g.
createuser -s repmgr createdb repmgr -O repmgr
For the examples in this document, the name repmgr
will be
used for both user and database, but any names can be used.
For the sake of simplicity, the repmgr
user is created
as a superuser. If desired, it's possible to create the repmgr
user as a normal user. However for certain operations superuser permissions
are required; in this case the command line option --superuser
can be provided to specify a superuser.
It's also assumed that the repmgr
user will be used to make the
replication connection from the standby to the primary; again this can be
overridden by specifying a separate replication user when registering each node.
repmgr will install the repmgr
extension, which creates a
repmgr
schema containing the repmgr's metadata tables as
well as other functions and views. We also recommend that you set the
repmgr
user's search path to include this schema name, e.g.
ALTER USER repmgr SET search_path TO repmgr, "$user", public;
Ensure the repmgr
user has appropriate permissions in pg_hba.conf
and
can connect in replication mode; pg_hba.conf
should contain entries
similar to the following:
local replication repmgr trust host replication repmgr 127.0.0.1/32 trust host replication repmgr 192.168.1.0/24 trust local repmgr repmgr trust host repmgr repmgr 127.0.0.1/32 trust host repmgr repmgr 192.168.1.0/24 trust
Note that these are simple settings for testing purposes. Adjust according to your network environment and authentication requirements.
On the standby, do not create a PostgreSQL instance (i.e.
do not execute initdb or any database creation
scripts provided by packages), but do ensure the destination
data directory (and any other directories which you want PostgreSQL to use)
exist and are owned by the postgres
system user. Permissions
must be set to 0700
(drwx------
).
repmgr will place a copy of the primary's database files in this directory. It will however refuse to run if a PostgreSQL instance has already been created there.
Check the primary database is reachable from the standby using psql:
psql 'host=node1 user=repmgr dbname=repmgr connect_timeout=2'
repmgr stores connection information as libpq
connection strings throughout. This documentation refers to them as conninfo
strings; an alternative name is DSN
(data source name
).
We'll use these in place of the -h hostname -d databasename -U username
syntax.
Create a repmgr.conf
file on the primary server. The file must
contain at least the following parameters:
node_id=1 node_name='node1' conninfo='host=node1 user=repmgr dbname=repmgr connect_timeout=2' data_directory='/var/lib/postgresql/data'
repmgr.conf
should not be stored inside the PostgreSQL data directory,
as it could be overwritten when setting up or reinitialising the PostgreSQL
server. See sections Configuration and configuration file
for further details about repmgr.conf
.
repmgr only uses pg_bindir
when it executes
PostgreSQL binaries directly.
For user-defined scripts such as promote_command
and the
various service_*_command
s, you must
always explicitly provide the full path to the binary or script being
executed, even if it is repmgr itself.
This is because these options can contain user-defined scripts in arbitrary
locations, so prepending pg_bindir
may break them.
For Debian-based distributions we recommend explicitly setting
pg_bindir
to the directory where pg_ctl
and other binaries
not in the standard path are located. For PostgreSQL 9.6 this would be /usr/lib/postgresql/9.6/bin/
.
If your distribution places the repmgr binaries in a location other than the
PostgreSQL installation directory, specify this with repmgr_bindir
to enable repmgr to perform operations (e.g.
repmgr cluster crosscheck
)
on other nodes.
See the file repmgr.conf.sample for details of all available configuration parameters.
To enable repmgr to support a replication cluster, the primary node must
be registered with repmgr. This installs the repmgr
extension and metadata objects, and adds a metadata record for the primary server:
$ repmgr -f /etc/repmgr.conf primary register INFO: connecting to primary database... NOTICE: attempting to install extension "repmgr" NOTICE: "repmgr" extension successfully installed NOTICE: primary node record (id: 1) registered
Verify status of the cluster like this:
$ repmgr -f /etc/repmgr.conf cluster show ID | Name | Role | Status | Upstream | Connection string ----+-------+---------+-----------+----------+-------------------------------------------------------- 1 | node1 | primary | * running | | host=node1 dbname=repmgr user=repmgr connect_timeout=2
The record in the repmgr
metadata table will look like this:
repmgr=# SELECT * FROM repmgr.nodes; -[ RECORD 1 ]----+------------------------------------------------------- node_id | 1 upstream_node_id | active | t node_name | node1 type | primary location | default priority | 100 conninfo | host=node1 dbname=repmgr user=repmgr connect_timeout=2 repluser | repmgr slot_name | config_file | /etc/repmgr.conf
Each server in the replication cluster will have its own record. If repmgrd
is in use, the fields upstream_node_id
, active
and
type
will be updated when the node's status or role changes.
Create a repmgr.conf
file on the standby server. It must contain at
least the same parameters as the primary's repmgr.conf
, but with
the mandatory values node
, node_name
, conninfo
(and possibly data_directory
) adjusted accordingly, e.g.:
node_id=2 node_name='node2' conninfo='host=node2 user=repmgr dbname=repmgr connect_timeout=2' data_directory='/var/lib/postgresql/data'
Use the --dry-run
option to check the standby can be cloned:
$ repmgr -h node1 -U repmgr -d repmgr -f /etc/repmgr.conf standby clone --dry-run NOTICE: using provided configuration file "/etc/repmgr.conf" NOTICE: destination directory "/var/lib/postgresql/data" provided INFO: connecting to source node NOTICE: checking for available walsenders on source node (2 required) INFO: sufficient walsenders available on source node (2 required) NOTICE: standby will attach to upstream node 1 HINT: consider using the -c/--fast-checkpoint option INFO: all prerequisites for "standby clone" are met
If no problems are reported, the standby can then be cloned with:
$ repmgr -h node1 -U repmgr -d repmgr -f /etc/repmgr.conf standby clone NOTICE: using configuration file "/etc/repmgr.conf" NOTICE: destination directory "/var/lib/postgresql/data" provided INFO: connecting to source node NOTICE: checking for available walsenders on source node (2 required) INFO: sufficient walsenders available on source node (2 required) INFO: creating directory "/var/lib/postgresql/data"... NOTICE: starting backup (using pg_basebackup)... HINT: this may take some time; consider using the -c/--fast-checkpoint option INFO: executing: pg_basebackup -l "repmgr base backup" -D /var/lib/postgresql/data -h node1 -U repmgr -X stream NOTICE: standby clone (using pg_basebackup) complete NOTICE: you can now start your PostgreSQL server HINT: for example: pg_ctl -D /var/lib/postgresql/data start
This has cloned the PostgreSQL data directory files from the primary node1
using PostgreSQL's pg_basebackup
utility. Replication configuration
containing the correct parameters to start streaming from this primary server will be
automatically appended to postgresql.auto.conf
. (In PostgreSQL 11
and earlier the file recovery.conf
will be created).
By default, any configuration files in the primary's data directory will be
copied to the standby. Typically these will be postgresql.conf
,
postgresql.auto.conf
, pg_hba.conf
and
pg_ident.conf
. These may require modification before the standby
is started.
Make any adjustments to the standby's PostgreSQL configuration files now, then start the server.
For more details on repmgr standby clone
, see the
command reference.
A more detailed overview of cloning options is available in the
administration manual.
Connect to the primary server and execute:
repmgr=# SELECT * FROM pg_stat_replication; -[ RECORD 1 ]----+------------------------------ pid | 19111 usesysid | 16384 usename | repmgr application_name | node2 client_addr | 192.168.1.12 client_hostname | client_port | 50378 backend_start | 2017-08-28 15:14:19.851581+09 backend_xmin | state | streaming sent_location | 0/7000318 write_location | 0/7000318 flush_location | 0/7000318 replay_location | 0/7000318 sync_priority | 0 sync_state | async
This shows that the previously cloned standby (node2
shown in the field
application_name
) has connected to the primary from IP address
192.168.1.12
.
From PostgreSQL 9.6 you can also use the view
pg_stat_wal_receiver
to check the replication status from the standby.
repmgr=# SELECT * FROM pg_stat_wal_receiver; Expanded display is on. -[ RECORD 1 ]---------+-------------------------------------------------------------------------------- pid | 18236 status | streaming receive_start_lsn | 0/3000000 receive_start_tli | 1 received_lsn | 0/7000538 received_tli | 1 last_msg_send_time | 2017-08-28 15:21:26.465728+09 last_msg_receipt_time | 2017-08-28 15:21:26.465774+09 latest_end_lsn | 0/7000538 latest_end_time | 2017-08-28 15:20:56.418735+09 slot_name | sender_host | node1 sender_port | 5432 conninfo | user=repmgr dbname=replication host=node1 application_name=node2
Note that the conninfo
value is that generated in postgresql.auto.conf
(PostgreSQL 11 and earlier: recovery.conf
) and will differ slightly from the primary's
conninfo
as set in repmgr.conf
- among others it will contain the
connecting node's name as application_name
.
Register the standby server with:
$ repmgr -f /etc/repmgr.conf standby register NOTICE: standby node "node2" (ID: 2) successfully registered
Check the node is registered by executing repmgr cluster show
on the standby:
$ repmgr -f /etc/repmgr.conf cluster show ID | Name | Role | Status | Upstream | Location | Priority | Timeline | Connection string ----+-------+---------+-----------+----------+----------+----------+----------+-------------------------------------- 1 | node1 | primary | * running | | default | 100 | 1 | host=node1 dbname=repmgr user=repmgr 2 | node2 | standby | running | node1 | default | 100 | 1 | host=node2 dbname=repmgr user=repmgr
Both nodes are now registered with repmgr and the records have been copied to the standby server.
Table of Contents
Table of Contents
Following software must be installed on both servers:
At network level, connections between the PostgreSQL port (default: 5432
)
must be possible between all nodes.
Passwordless SSH
connectivity between all servers in the replication cluster
is not required, but is necessary in the following cases:
rsync
must also be installed on all servers.
repmgr cluster matrix
and repmgr cluster crosscheck
Consider setting ConnectTimeout
to a low value in your SSH configuration.
This will make it faster to detect any SSH connection errors.
The following PostgreSQL configuration parameters may need to be changed in order for repmgr (and replication itself) to function correctly.
hot_standby
hot_standby
must always be set to on
, as repmgr needs
to be able to connect to each server it manages.
Note that hot_standby
defaults to on
from PostgreSQL 10
and later; in PostgreSQL 9.6 and earlier, the default was off
.
PostgreSQL documentation: hot_standby.
wal_level
wal_level
must be one of replica
or logical
(PostgreSQL 9.5 and earlier: one of hot_standby
or logical
).
PostgreSQL documentation: wal_level.
max_wal_senders
max_wal_senders
must be set to a value of 2
or greater.
In general you will need one WAL sender for each standby which will attach to the PostgreSQL
instance; additionally repmgr will require two free WAL senders in order to clone further
standbys.
max_wal_senders
should be set to an appropriate value on all PostgreSQL
instances in the replication cluster which may potentially become a primary server or
(in cascading replication) the upstream server of a standby.
PostgreSQL documentation: max_wal_senders.
From PostgreSQL 12, max_wal_senders
must be set to the same or a higher value as the primary node
(at the time the node was cloned), otherwise the standby will refuse
to start (unless hot_standby
is set to off
, which
will prevent the node from accepting queries).
max_replication_slots
If you are intending to use replication slots, max_replication_slots
must be set to a non-zero value.
max_replication_slots
should be set to an appropriate value on all PostgreSQL
instances in the replication cluster which may potentially become a primary server or
(in cascading replication) the upstream server of a standby.
PostgreSQL documentation: max_replication_slots.
wal_log_hints
If you are intending to use pg_rewind,
and the cluster was not initialised using data checksums, you may want to consider enabling
wal_log_hints
.
For more details see Using pg_rewind.
PostgreSQL documentation: wal_log_hints.
archive_mode
We suggest setting archive_mode
to on
(and
archive_command
to /bin/true
; see below)
even if you are currently not planning to use WAL file archiving.
This will make it simpler to set up WAL file archiving if it is ever required,
as changes to archive_mode
require a full PostgreSQL server
restart, while archive_command
changes can be applied via a normal
configuration reload.
However, repmgr itself does not require WAL file archiving.
PostgreSQL documentation: archive_mode.
archive_command
If you have set archive_mode
to on
but are not currently planning
to use WAL file archiving, set archive_command
to a command which does nothing but returns
true
, such as /bin/true
. See above for details.
PostgreSQL documentation: archive_command.
wal_keep_segments
/ wal_keep_size
Normally there is no need to set wal_keep_segments
(PostgreSQL 13 and later: wal_keep_size
; default: 0
),
as it is not a reliable way of ensuring that all required WAL
segments are available to standbys. Replication slots and/or an archiving solution
such as Barman are recommended to ensure standbys have a reliable
source of WAL segments at all times.
The only reason ever to set wal_keep_segments
/ wal_keep_size
is you have you have configured pg_basebackup_options
in repmgr.conf
to include the setting --wal-method=fetch
(PostgreSQL 9.6 and earlier: --xlog-method=fetch
)
and you have not set restore_command
in repmgr.conf
to fetch WAL files from a reliable source such as Barman,
in which case you'll need to set wal_keep_segments
to a sufficiently high number to ensure that all WAL files required by the standby
are retained. However we do not recommend WAL retention in this way.
PostgreSQL documentation: wal_keep_segments.
See also the PostgreSQL configuration section in the Quick-start guide.
repmgr and repmgrd
use a common configuration file, by default called
repmgr.conf
(although any name can be used if explicitly specified).
repmgr.conf
must contain a number of required parameters, including
the database connection string for the local node and the location
of its data directory; other values will be inferred from defaults if
not explicitly supplied. See section required configuration file settings
for more details.
repmgr.conf
is a plain text file with one parameter/value
combination per line.
Whitespace is insignificant (except within a quoted parameter value) and blank lines are ignored.
Hash marks (#
) designate the remainder of the line as a comment.
Parameter values that are not simple identifiers or numbers should be single-quoted.
To embed a single quote in a parameter value, write either two quotes (preferred) or backslash-quote.
Example of a valid repmgr.conf
file:
# repmgr.conf node_id=1 node_name= node1 conninfo ='host=node1 dbname=repmgr user=repmgr connect_timeout=2' data_directory = '/var/lib/pgsql/12/data'
Beginning with repmgr 5.0, configuration file parsing has been tightened up and now matches the way PostgreSQL itself parses configuration files.
This means repmgr.conf
files used with earlier repmgr
versions may need slight modification before they can be used with repmgr 5
and later.
The main change is that repmgr requires most string values to be enclosed in single quotes. For example, this was previously valid:
conninfo=host=node1 user=repmgr dbname=repmgr connect_timeout=2
but must now be changed to:
conninfo='host=node1 user=repmgr dbname=repmgr connect_timeout=2'
From repmgr 5.2, the configuration file can contain the following include directives:
include
: include the specified file,
either as an absolute path or path relative to the current file
include_if_exists
: include the specified file.
The file is specified as an absolute path or path relative to the current file.
However, if it does not exist, an error will not be raised.
include_dir
: include files in the specified directory
which have the .conf
suffix.
The directory is specified either as an absolute path or path
relative to the current file
These behave in exactly the same way as the PostgreSQL configuration file processing; see the PostgreSQL documentation for additional details.
The following sections document some sections of the configuration file:
For a full list of annotated configuration items, see the file repmgr.conf.sample.
For repmgrd-specific settings, see Chapter 13.
The following parameters in the configuration file can be overridden with command line options:
-L/--log-level
overrides log_level
in
repmgr.conf
-b/--pg_bindir
overrides pg_bindir
in
repmgr.conf
The configuration file will be searched for in the following locations:
a configuration file specified by the -f/--config-file
command line option
a location specified by the package maintainer (if repmgr as installed from a package and the package maintainer has specified the configuration file location)
repmgr.conf
in the local directory
/etc/repmgr.conf
the directory reported by pg_config --sysconfdir
In examples provided in this documentation, it is assumed the configuration file is located
at /etc/repmgr.conf
. If repmgr is installed from a package, the
configuration file will probably be located at another location specified by the packager;
see appendix Package details for configuration file locations in
different packaging systems.
Note that if a file is explicitly specified with -f/--config-file
,
an error will be raised if it is not found or not readable, and no attempt will be made to
check default locations; this is to prevent repmgr unexpectedly
reading the wrong configuration file.
If providing the configuration file location with -f/--config-file
,
avoid using a relative path, particularly when executing repmgr primary register
and repmgr standby register, as repmgr stores the configuration file location
in the repmgr metadata for use when repmgr is executed remotely (e.g. during
repmgr standby switchover). repmgr will attempt to convert the
a relative path into an absolute one, but this may not be the same as the path you
would explicitly provide (e.g. ./repmgr.conf
might be converted
to /path/to/./repmgr.conf
, whereas you'd normally write
/path/to/repmgr.conf
).
When upgrading the PostgreSQL cluster to a new major version, repmgr.conf
will probably needed to be updated.
Usually pg_bindir
and data_directory
will need to be modified,
particularly if the default package locations are used, as these usually change.
It's also possible the location of repmgr.conf
itself will change
(e.g. from /etc/repmgr/11/repmgr.conf
to /etc/repmgr/12/repmgr.conf
).
This is stored as part of the repmgr metadata and is used by repmgr to execute repmgr remotely
(e.g. during a switchover operation).
If the content and/or location of repmgr.conf
has changed, the repmgr metadata
needs to be updated to reflect this. The repmgr metadata can be updated on each node with:
Each repmgr.conf
file must contain the following parameters:
node_id
(int
)
A unique integer greater than zero which identifies the node.
node_name
(string
)
An arbitrary (but unique) string; we recommend using the server's hostname
or another identifier unambiguously associated with the server to avoid
confusion. Avoid choosing names which reflect the node's current role,
e.g. primary
or standby1
as roles can change and if you end up in a solution where the current primary is
called standby1
(for example), things will be confusing
to say the least.
The string's maximum length is 63 characters and it should contain only printable ASCII characters.
conninfo
(string
)
Database connection information as a conninfo string. All servers in the cluster must be able to connect to the local node using this string.
For details on conninfo strings, see section Connection Strings in the PosgreSQL documentation.
If repmgrd is in use, consider explicitly setting
connect_timeout
in the conninfo
string to determine the length of time which elapses before a network
connection attempt is abandoned; for details see
the PostgreSQL documentation.
data_directory
(string
)The node's data directory. This is needed by repmgr when performing operations when the PostgreSQL instance is not running and there's no other way of determining the data directory.
See optional configuration file settings for further configuration options.
This section documents a subset of optional configuration settings; for a full and annotated view of all configuration options see the sample repmgr.conf file
config_directory
(string
)
If PostgreSQL configuration files are located outside the data
directory, specify the directory where the main
postgresql.conf
file is located.
This enables explicit provision of an external configuration file
directory, which if set will be passed to pg_ctl
as the
-D
parameter. Otherwise pg_ctl
will
default to using the data directory, which will cause some operations
to fail if the configuration files are not present there.
This is implemented primarily for feature completeness and for
development/testing purposes. Users who have installed repmgr from
a package should not rely on to stop/start/restart PostgreSQL,
instead they should set the appropriate service_..._command
for their operating system. For more details see
service command settings.
replication_user
(string
)
PostgreSQL user to make replication connections with. If not set defaults, to the user defined in conninfo.
replication_type
(string
)
Must be physical
(the default).
location
(string
)
An arbitrary string defining the location of the node; this is used during failover to check visibility of the current primary node.
For more details see Handling network splits with repmgrd.
use_replication_slots
(boolean
)
Whether to use physical replication slots.
When using replication slots,
max_replication_slots
should be configured for
at least the number of standbys which will connect
to the primary.
ssh_options
(string
)
Options to append to the ssh
command when executed
by repmgr.
We recommend adding -q
to suppress any superfluous
SSH chatter such as login banners, and also an explicit
ConnectTimeout
value,
e.g.:
ssh_options='-q -o ConnectTimeout=10'
pg_bindir
(string
)
Path to the PostgreSQL binary directory (location of pg_ctl,
pg_basebackup etc.). Only required
if these are not in the system PATH
.
When repmgr is executed via SSH (e.g. when running
repmgr standby switchover
,
repmgr cluster matrix
or
repmgr cluster crosscheck
,
or if it is executed as cronjob), a login shell will not be used and only the
default system PATH
will be set. Therefore it's recommended to set
pg_bindir
so repmgr can correctly invoke binaries on a remote
system and avoid potential path issues.
Debian/Ubuntu users: you will probably need to set this to the directory where
pg_ctl is located, e.g. /usr/lib/postgresql/9.6/bin/
.
NOTE: pg_bindir
is only used when repmgr directly
executes PostgreSQL binaries; any user-defined scripts
must be specified with the full path.
See the sample repmgr.conf file for a full and annotated view of all configuration options.
By default, repmgr and repmgrd write log output to
STDERR
. An alternative log destination can be specified
(either a file or syslog
).
The repmgr application itself will continue to write log output to STDERR
even if another log destination is configured, as otherwise any output resulting from a command
line operation will "disappear" into the log.
This behaviour can be overriden with the command line option --log-to-file
,
which will redirect all logging output to the configured log destination. This is recommended
when repmgr is executed by another application, particularly repmgrd,
to enable log output generated by the repmgr application to be stored for later reference.
log_level
(string
)
One of DEBUG
, INFO
, NOTICE
,
WARNING
, ERROR
, ALERT
, CRIT
or EMERG
.
Default is INFO
.
Note that DEBUG
will produce a substantial amount of log output
and should not be enabled in normal use.
log_facility
(string
)
Logging facility: possible values are STDERR
(default), or for
syslog integration, one of LOCAL0
, LOCAL1
, ...
,
LOCAL7
, USER
.
log_file
(string
)
If log_facility is set to STDERR
, log output
can be redirected to the specified file.
See Section 13.4 for information on configuring log rotation.
log_status_interval
(integer
)
This setting causes repmgrd to emit a status log
line at the specified interval (in seconds, default 300
)
describing repmgrd's current state, e.g.:
[2018-07-12 00:47:32] [INFO] monitoring connection to upstream node "node1" (ID: 1)
In some circumstances, repmgr (and repmgrd) need to
be able to stop, start or restart PostgreSQL. repmgr commands which need to do this
include repmgr standby follow
,
repmgr standby switchover
and
repmgr node rejoin
.
By default, repmgr will use PostgreSQL's pg_ctl
utility to control the PostgreSQL
server. However this can lead to various problems, particularly when PostgreSQL has been
installed from packages, and especially so if systemd is in use.
If using systemd, ensure you have RemoveIPC
set to off
.
See the PostgreSQL documentation section
systemd RemoveIPC
and also the systemd
entry in the PostgreSQL wiki for details.
With this in mind, we recommend to always configure repmgr to use the available system service commands.
To do this, specify the appropriate command for each action
in repmgr.conf
using the following configuration
parameters:
service_start_command service_stop_command service_restart_command service_reload_command
repmgr will not apply pg_bindir
when executing any of these commands;
these can be user-defined scripts so must always be specified with the full path.
It's also possible to specify a service_promote_command
.
This is intended for systems which provide a package-level promote command,
such as Debian's pg_ctlcluster, to promote the
PostgreSQL from standby to primary.
If your packaging system does not provide such a command, it can be left empty, and repmgr will generate the appropriate `pg_ctl ... promote` command.
Do not confuse this with promote_command
, which is used
by repmgrd to execute repmgr standby promote.
To confirm which command repmgr will execute for each action, use
repmgr node service --list-actions --action=...
, e.g.:
repmgr -f /etc/repmgr.conf node service --list-actions --action=stop repmgr -f /etc/repmgr.conf node service --list-actions --action=start repmgr -f /etc/repmgr.conf node service --list-actions --action=restart repmgr -f /etc/repmgr.conf node service --list-actions --action=reload
These commands will be executed by the system user which repmgr runs as (usually postgres
)
and will probably require passwordless sudo access to be able to execute the command.
For example, using systemd on CentOS 7, the service commands can be set as follows:
service_start_command = 'sudo systemctl start postgresql-9.6' service_stop_command = 'sudo systemctl stop postgresql-9.6' service_restart_command = 'sudo systemctl restart postgresql-9.6' service_reload_command = 'sudo systemctl reload postgresql-9.6'
and /etc/sudoers
should be set as follows:
Defaults:postgres !requiretty postgres ALL = NOPASSWD: /usr/bin/systemctl stop postgresql-9.6, \ /usr/bin/systemctl start postgresql-9.6, \ /usr/bin/systemctl restart postgresql-9.6, \ /usr/bin/systemctl reload postgresql-9.6
Debian/Ubuntu users: instead of calling sudo systemctl
directly, use
sudo pg_ctlcluster
, e.g.:
service_start_command = 'sudo pg_ctlcluster 9.6 main start' service_stop_command = 'sudo pg_ctlcluster 9.6 main stop' service_restart_command = 'sudo pg_ctlcluster 9.6 main restart' service_reload_command = 'sudo pg_ctlcluster 9.6 main reload'
and set /etc/sudoers
accordingly.
While pg_ctlcluster
will work when executed as user postgres
,
it's strongly recommended to use sudo pg_ctlcluster
on systemd
systems, to ensure systemd has a correct picture of
the PostgreSQL application state.
If the repmgr database user (the PostgreSQL user defined in the
conninfo
setting is a superuser, no further user permissions need
to be granted.
In principle the repmgr database user does not need to be a superuser.
In this case the repmgr will need to be granted execution permissions on certain
functions, and membership of certain roles. However be aware that repmgr does
expect to be able to execute certain commands which are restricted to superusers;
in this case either a superuser must be specified with the -S
/--superuser
(where available) option, or the corresponding action should be executed manually as a superuser.
The following sections describe the actions needed to use repmgr with a non-superuser, and relevant caveats.
repmgr requires a database user with the REPLICATION
role
to be able to create a replication connection and (if configured) to administer
replication slots.
By default this is the database user defined in the conninfo
setting. This user can be:
REPLICATION
role
repmgr.conf
parameter replication_user
with the REPLICATION
role
A non-superuser repmgr database user should be a member of the following predefined roles (PostgreSQL 10 and later):
pg_read_all_stats
(to read the status
column of pg_stat_replication
and execute pg_database_size()
on all databases)
pg_read_all_settings
(to access the data_directory
setting)
Alternatively the meta-role pg_monitor
can be granted, which includes membership
of the above predefined roles.
PostgreSQL 15 introduced the pg_checkpoint
predefined role which allows a
non-superuser repmgr database user to perform a CHECKPOINT command.
Membership of these roles can be granted with e.g. GRANT pg_read_all_stats TO repmgr
.
Users of PostgreSQL 9.6 or earlier should upgrade to a supported PostgreSQL version, or provide
the -S
/--superuser
where available.
repmgr requires that the database defined in the conninfo
setting contains the repmgr
extension. The database user defined in the
conninfo
setting must be able to access this database and
the database objects contained within the extension.
The repmgr
extension can only be installed by a superuser.
If the repmgr user is a superuser, repmgr will create the extension automatically.
Alternatively, the extension can be created manually by a superuser
(with "CREATE EXTENSION repmgr
") before executing
repmgr primary register.
If the repmgr database user is not a superuser, EXECUTE
permission should be
granted on the following function:
pg_wal_replay_resume()
(required by repmgrd during failover operations;
if permission is not granted, the failoved process may not function reliably if a node
has WAL replay paused)
pg_promote()
(PostgreSQL 12 and later; if permission is not granted,
repmgr will fall back to pg_ctl promote
)
EXECUTE
permission on functions can be granted with e.g.:
GRANT EXECUTE ON FUNCTION pg_catalog.pg_wal_replay_resume() TO repmgr
.
In some circumstances, repmgr may need to perform an operation which cannot be delegated to a non-superuser.
The CHECKPOINT
command is executed by
repmgr standby switchover. This can only
be executed by a superuser; if the repmgr user is not a superuser,
the -S
/--superuser
should be used.
From PostgreSQL 15 the pg_checkpoint
predefined role removes the need of
superuser permissions to perform CHECKPOINT
command.
If repmgr is not able to execute CHECKPOINT
,
there is a risk that the demotion candidate may not be able to shut down as smoothly as might otherwise
have been the case.
ALTER SYSTEM
is executed by repmgrd if
standby_disconnect_on_failover
is set to true
in
repmgr.conf
. Until PostgreSQL 14 ALTER SYSTEM
can only be executed by
a superuser; if the repmgr user is not a superuser, this functionality will not be available.
From PostgreSQL 15 a specific ALTER SYSTEM privilege can be granted with e.g.
GRANT ALTER SYSTEM ON PARAMETER wal_retrieve_retry_interval TO repmgr
.
The following repmgr commands provide the -S
/--superuser
option:
--copy-external-config-files
provided)CHECKPOINT
)repmgr node check --data-directory-config
; note this is also called by repmgr standby switchover)CHECKPOINT
via the --checkpoint
; note this is also called by repmgr standby switchover)
For security purposes it's desirable to protect database access using a password.
PostgreSQL has three ways of providing a password:
conninfo
string
(e.g. "host=node1 dbname=repmgr user=repmgr password=foo
")
PGPASSWORD
)
We strongly advise against including the password in the conninfo
string, as
this will result in the database password being exposed in various places, including in the
repmgr.conf
file, the repmgr.nodes
table, any output
generated by repmgr which lists the node conninfo
strings (e.g.
repmgr cluster show) and in the repmgr log file,
particularly at log_level=DEBUG
.
Currently repmgr does not fully support use of the password
option in the
conninfo
string.
Exporting the password as an environment variable (PGPASSWORD
) is considered
less insecure, but the PostgreSQL documentation explicitly recommends against doing this:
| ||
--Environment Variables |
The most secure option for managing passwords is to use a dedicated password file; see the following section for more details.
The most secure way of storing passwords is in a password file,
which by default is ~/.pgpass
. This file
can only be read by the system user who owns the file, and
PostgreSQL will refuse to use the file unless read/write
permissions are restricted to the file owner. The password(s)
contained in the file will not be directly accessed by
repmgr (or any other libpq-based client software such as psql).
For full details see the PostgreSQL password file documentation.
For use with repmgr, the ~/.pgpass
must two entries for each
node in the replication cluster: one for the repmgr user who accesses the repmgr metadatabase,
and one for replication connections (regardless of whether a dedicated replication user is used).
The file must be present on each node in the replication cluster.
A ~/.pgpass
file for a 3-node cluster where the repmgr
database user
is used for both for accessing the repmgr metadatabase and for replication connections would look like this:
node1:5432:repmgr:repmgr:foo node1:5432:replication:repmgr:foo node2:5432:repmgr:repmgr:foo node2:5432:replication:repmgr:foo node3:5432:repmgr:repmgr:foo node3:5432:replication:repmgr:foo
If a dedicated replication user (here: repluser
) is in use, the file would look like this:
node1:5432:repmgr:repmgr:foo node1:5432:replication:repluser:foo node2:5432:repmgr:repmgr:foo node2:5432:replication:repluser:foo node3:5432:repmgr:repmgr:foo node3:5432:replication:repluser:foo
If you are planning to use the -S
/--superuser
option,
there must also be an entry enabling the superuser to connect to the repmgr database.
Assuming the superuser is postgres
, the file would look like this:
node1:5432:repmgr:repmgr:foo node1:5432:repmgr:postgres:foo node1:5432:replication:repluser:foo node2:5432:repmgr:repmgr:foo node2:5432:repmgr:postgres:foo node2:5432:replication:repluser:foo node3:5432:repmgr:repmgr:foo node3:5432:repmgr:postgres:foo node3:5432:replication:repluser:foo
The ~/.pgpass
file can be simplified with the use of wildcards if
there is no requirement to restrict provision of passwords to particular hosts, ports
or databases. The preceding file could then be formatted like this:
*:*:*:repmgr:foo *:*:*:postgres:foo
It's possible to specify an alternative location for the ~/.pgpass
file, either via
the environment variable PGPASSFILE
, or (from PostgreSQL 9.6) using the
passfile
parameter in connection strings.
If using the passfile
parameter, it's essential to ensure the file is in the same
location on all nodes, as when connecting to a remote node, the file referenced is the one on the
local node.
Additionally, you must specify the passfile location in repmgr.conf
with the passfile
option so repmgr can write the correct path when creating the
primary_conninfo
parameter for replication configuration on standbys.
Table of Contents
repmgr standby clone can use EDB's Barman application to clone a standby (and also as a fallback source for WAL files).
Barman (aka PgBarman) should be considered as an integral part of any PostgreSQL replication cluster. For more details see: https://www.pgbarman.org/.
Barman support provides the following advantages:
the primary node does not need to perform a new backup every time a new standby is cloned
a standby node can be disconnected for longer periods without losing the ability to catch up, and without causing accumulation of WAL files on the primary node
WAL management on the primary becomes much easier as there's no need
to use replication slots, and wal_keep_segments
(PostgreSQL 13 and later: wal_keep_size
)
does not need to be set.
Currently repmgr's support for cloning from Barman is implemented by using rsync to clone from the Barman server.
It is therefore not able to make use of Barman's parallel restore facility, which is executed on the Barman server and clones to the target server.
Barman's parallel restore facility can be used by executing it manually on
the Barman server and configuring replication on the resulting cloned
standby using
repmgr standby clone --replication-conf-only
.
In order to enable Barman support for repmgr standby clone
, following
prerequisites must be met:
the Barman catalogue must include at least one valid backup for this server;
the barman_host
setting in repmgr.conf
is set to the SSH
hostname of the Barman server;
the barman_server
setting in repmgr.conf
is the same as the
server configured in Barman.
For example, assuming Barman is located on the host "barmansrv
"
under the "barman
" user account,
repmgr.conf
should contain the following entries:
barman_host='barman@barmansrv' barman_server='pg'
Here pg
corresponds to a section in Barman's configuration file for a specific
server backup configuration, which would look something like:
[pg] description = "Main cluster" ...
More details on Barman configuration can be found in the Barman documentation's configuration section.
To use a non-default Barman configuration file on the Barman server,
specify this in repmgr.conf
with barman_config
:
barman_config='/path/to/barman.conf'
We also recommend configuring the restore_command
setting in repmgr.conf
to use the barman-wal-restore
script
(see section Using Barman as a WAL file source below).
If you have a non-default SSH configuration on the Barman
server, e.g. using a port other than 22, then you can set those
parameters in a dedicated Host section in ~/.ssh/config
corresponding to the value of barman_host
in
repmgr.conf
. See the Host
section in man 5 ssh_config
for more details.
If you wish to place WAL files in a location outside the main
PostgreSQL data directory, set --waldir
(PostgreSQL 9.6 and earlier: --xlogdir
) in
pg_basebackup_options
to the target directory
(must be an absolute filepath). repmgr will create and
symlink to this directory in exactly the same way
pg_basebackup would.
It's now possible to clone a standby from Barman, e.g.:
$ repmgr -f /etc/repmgr.conf -h node1 -U repmgr -d repmgr standby clone NOTICE: destination directory "/var/lib/postgresql/data" provided INFO: connecting to Barman server to verify backup for "test_cluster" INFO: checking and correcting permissions on existing directory "/var/lib/postgresql/data" INFO: creating directory "/var/lib/postgresql/data/repmgr"... INFO: connecting to Barman server to fetch server parameters INFO: connecting to source node DETAIL: current installation size is 30 MB NOTICE: retrieving backup from Barman... (...) NOTICE: standby clone (from Barman) complete NOTICE: you can now start your PostgreSQL server HINT: for example: pg_ctl -D /var/lib/postgresql/data start
Barman support is automatically enabled if barman_server
is set. Normally it is good practice to use Barman, for instance
when fetching a base backup while cloning a standby; in any case,
Barman mode can be disabled using the --without-barman
command line option.
As a fallback in case streaming replication is interrupted, PostgreSQL can optionally
retrieve WAL files from an archive, such as that provided by Barman. This is done by
setting restore_command
in the replication configuration to
a valid shell command which can retrieve a specified WAL file from the archive.
barman-wal-restore
is a Python script provided as part of the barman-cli
package (Barman 2.0 ~ 2.7) or as part of the core Barman distribution (Barman 2.8 and later).
To use barman-wal-restore
with repmgr,
assuming Barman is located on the host "barmansrv
"
under the "barman
" user account,
and that barman-wal-restore
is located as an executable at
/usr/bin/barman-wal-restore
,
repmgr.conf
should include the following lines:
barman_host='barman@barmansrv' barman_server='pg' restore_command='/usr/bin/barman-wal-restore barmansrv pg %f %p'
barman-wal-restore
supports command line switches to
control parallelism (--parallel=N
) and compression
(--bzip2
, --gzip
).
You can find information on how to install and setup pg-backup-api in the pg-backup-api documentation.
This mode (`pg-backupapi`) was introduced in v5.4.0 as a way to further integrate with Barman letting Barman handle the restore. This also reduces the ssh keys that need to share between the backup and postgres nodes. As long as you have access to the API service by HTTP calls, you could perform recoveries right away. You just need to instruct Barman through the API which backup you need and on which node the backup needs to to be restored on.
In order to enable pg_backupapi mode
support for repmgr standby clone
,
you need the following lines in repmgr.conf:
pg_backupapi_host: Where pg-backup-api is hosted
pg_backupapi_node_name: Name of the server as understood by Barman
pg_backupapi_remote_ssh_command: How Barman will be connecting as to the node
pg_backupapi_backup_id: ID of the existing backup you need to restore
This is an example of how repmgr.conf would look like:
pg_backupapi_host = '192.168.122.154' pg_backupapi_node_name = 'burrito' pg_backupapi_remote_ssh_command = 'ssh john_doe@192.168.122.1' pg_backupapi_backup_id = '20230223T093201'
pg_backupapi_host
is the variable name that enables this mode, and when you set it,
all the rest of the above variables are required. Also, remember that this service is just an interface
between Barman and repmgr, hence if something fails during a recovery, you should check Barman's logs upon
why the process couldn't finish properly.
Despite in Barman you can define shortcuts like "lastest" or "oldest", they are not supported for the time being in pg-backup-api. These shortcuts will be supported in a future release.
This is a real example of repmgr's output cloning with the API. Note that during this operation, we stopped the service for a little while and repmgr had to retry but that doesn't affect the final outcome. The primary is listening on localhost's port 6001:
$ repmgr -f ~/nodes/node_3/repmgr.conf standby clone -U repmgr -p 6001 -h localhost NOTICE: destination directory "/home/mario/nodes/node_3/data" provided INFO: Attempting to use `pg_backupapi` new restore mode INFO: connecting to source node DETAIL: connection string is: user=repmgr port=6001 host=localhost DETAIL: current installation size is 8541 MB DEBUG: 1 node records returned by source node DEBUG: connecting to: "user=repmgr dbname=repmgr host=localhost port=6001 connect_timeout=2 fallback_application_name=repmgr options=-csearch_path=" DEBUG: upstream_node_id determined as 1 INFO: Attempting to use `pg_backupapi` new restore mode INFO: replication slot usage not requested; no replication slot will be set up for this standby NOTICE: starting backup (using pg_backupapi)... INFO: Success creating the task: operation id '20230309T150647' INFO: status IN_PROGRESS INFO: status IN_PROGRESS Incorrect reply received for that operation ID. INFO: Retrying... INFO: status IN_PROGRESS INFO: status IN_PROGRESS INFO: status IN_PROGRESS INFO: status IN_PROGRESS INFO: status IN_PROGRESS INFO: status IN_PROGRESS INFO: status IN_PROGRESS INFO: status IN_PROGRESS INFO: status IN_PROGRESS INFO: status IN_PROGRESS INFO: status IN_PROGRESS INFO: status IN_PROGRESS INFO: status IN_PROGRESS INFO: status IN_PROGRESS INFO: status IN_PROGRESS INFO: status DONE NOTICE: standby clone (from pg_backupapi) complete NOTICE: you can now start your PostgreSQL server HINT: for example: pg_ctl -D /home/mario/nodes/node_3/data start HINT: after starting the server, you need to register this standby with "repmgr standby register"
Replication slots were introduced with PostgreSQL 9.4 and are designed to ensure
that any standby connected to the primary using a replication slot will always
be able to retrieve the required WAL files. This removes the need to manually
manage WAL file retention by estimating the number of WAL files that need to
be maintained on the primary using wal_keep_segments
(PostgreSQL 13 and later: wal_keep_size
).
Do however be aware that if a standby is disconnected, WAL will continue to
accumulate on the primary until either the standby reconnects or the replication
slot is dropped.
To enable repmgr to use replication slots, set the boolean parameter
use_replication_slots
in repmgr.conf
:
use_replication_slots=true
Replication slots must be enabled in postgresql.conf
by
setting the parameter max_replication_slots
to at least the
number of expected standbys (changes to this parameter require a server restart).
When cloning a standby, repmgr will automatically generate an appropriate
slot name, which is stored in the repmgr.nodes
table, and create the slot
on the upstream node:
repmgr=# SELECT node_id, upstream_node_id, active, node_name, type, priority, slot_name FROM repmgr.nodes ORDER BY node_id; node_id | upstream_node_id | active | node_name | type | priority | slot_name ---------+------------------+--------+-----------+---------+----------+--------------- 1 | | t | node1 | primary | 100 | repmgr_slot_1 2 | 1 | t | node2 | standby | 100 | repmgr_slot_2 3 | 1 | t | node3 | standby | 100 | repmgr_slot_3 (3 rows)
repmgr=# SELECT slot_name, slot_type, active, active_pid FROM pg_replication_slots ; slot_name | slot_type | active | active_pid ---------------+-----------+--------+------------ repmgr_slot_2 | physical | t | 23658 repmgr_slot_3 | physical | t | 23687 (2 rows)
Note that a slot name will be created by default for the primary but not
actually used unless the primary is converted to a standby using e.g.
repmgr standby switchover
.
Further information on replication slots in the PostgreSQL documentation: https://www.postgresql.org/docs/current/interactive/warm-standby.html#STREAMING-REPLICATION-SLOTS
While replication slots can be useful for streaming replication, it's recommended to monitor for inactive slots as these will cause WAL files to build up indefinitely, possibly leading to server failure.
As an alternative we recommend using EDB's Barman, which offloads WAL management to a separate server, removing the requirement to use a replication slot for each individual standby to reserve WAL. See section Cloning from Barman for more details on using repmgr together with Barman.
Cascading replication, introduced with PostgreSQL 9.2, enables a standby server to replicate from another standby server rather than directly from the primary, meaning replication changes "cascade" down through a hierarchy of servers. This can be used to reduce load on the primary and minimize bandwidth usage between sites. For more details, see the PostgreSQL cascading replication documentation.
repmgr supports cascading replication. When cloning a standby,
set the command-line parameter --upstream-node-id
to the
node_id
of the server the standby should connect to, and
repmgr will create a replication configuration file to point to it. Note
that if --upstream-node-id
is not explicitly provided,
repmgr will set the standby's replication configuration to
point to the primary node.
To demonstrate cascading replication, first ensure you have a primary and standby
set up as shown in the Quick-start guide.
Then create an additional standby server with repmgr.conf
looking
like this:
node_id=3 node_name=node3 conninfo='host=node3 user=repmgr dbname=repmgr' data_directory='/var/lib/postgresql/data'
Clone this standby (using the connection parameters for the existing standby),
ensuring --upstream-node-id
is provide with the node_id
of the previously created standby (if following the example, this will be 2
):
$ repmgr -h node2 -U repmgr -d repmgr -f /etc/repmgr.conf standby clone --upstream-node-id=2 NOTICE: using configuration file "/etc/repmgr.conf" NOTICE: destination directory "/var/lib/postgresql/data" provided INFO: connecting to upstream node INFO: connected to source node, checking its state NOTICE: checking for available walsenders on upstream node (2 required) INFO: sufficient walsenders available on upstream node (2 required) INFO: successfully connected to source node DETAIL: current installation size is 29 MB INFO: creating directory "/var/lib/postgresql/data"... NOTICE: starting backup (using pg_basebackup)... HINT: this may take some time; consider using the -c/--fast-checkpoint option INFO: executing: 'pg_basebackup -l "repmgr base backup" -D /var/lib/postgresql/data -h node2 -U repmgr -X stream ' NOTICE: standby clone (using pg_basebackup) complete NOTICE: you can now start your PostgreSQL server HINT: for example: pg_ctl -D /var/lib/postgresql/data start
then register it (note that --upstream-node-id
must be provided here
too):
$ repmgr -f /etc/repmgr.conf standby register --upstream-node-id=2 NOTICE: standby node "node2" (ID: 2) successfully registered
After starting the standby, the cluster will look like this, showing that node3
is attached to node2
, not the primary (node1
).
$ repmgr -f /etc/repmgr.conf cluster show ID | Name | Role | Status | Upstream | Location | Connection string ----+-------+---------+-----------+----------+----------+-------------------------------------- 1 | node1 | primary | * running | | default | host=node1 dbname=repmgr user=repmgr 2 | node2 | standby | running | node1 | default | host=node2 dbname=repmgr user=repmgr 3 | node3 | standby | running | node2 | default | host=node3 dbname=repmgr user=repmgr
Under some circumstances when setting up a cascading replication
cluster, you may wish to clone a downstream standby whose upstream node
does not yet exist. In this case you can clone from the primary (or
another upstream node); provide the parameter --upstream-conninfo
to explicitly set the upstream's primary_conninfo
string
in the replication configuration.
As repmgr uses pg_basebackup
to clone a standby, it's possible to
provide additional parameters for pg_basebackup
to customise the
cloning process.
By default, pg_basebackup
performs a checkpoint before beginning the backup
process. However, a normal checkpoint may take some time to complete;
a fast checkpoint can be forced with repmgr standby clone
's
-c/--fast-checkpoint
option.
Note that this may impact performance of the server being cloned from (typically the primary)
so should be used with care.
If Barman is set up for the cluster, it's possible to clone the standby directly from Barman, without any impact on the server the standby is being cloned from. For more details see Cloning from Barman.
Other options can be passed to pg_basebackup
by including them
in the repmgr.conf
setting pg_basebackup_options
.
Not that by default, repmgr executes pg_basebackup
with -X/--wal-method
(PostgreSQL 9.6 and earlier: -X/--xlog-method
) set to stream
.
From PostgreSQL 9.6, if replication slots are in use, it will also create a replication slot before
running the base backup, and execute pg_basebackup
with the
-S/--slot
option set to the name of the previously created replication slot.
These parameters can set by the user in pg_basebackup_options
, in which case they
will override the repmgr default values. However normally there's no reason to do this.
If using a separate directory to store WAL files, provide the option --waldir
(--xlogdir
in PostgreSQL 9.6 and earlier) with the absolute path to the
WAL directory. Any WALs generated during the cloning process will be copied here, and
a symlink will automatically be created from the main data directory.
The --waldir
(--xlogdir
) option,
if present in pg_basebackup_options
, will be honoured by repmgr
when cloning from Barman (repmgr 5.2 and later).
See the PostgreSQL pg_basebackup documentation for more details of available options.
If replication connections to a standby's upstream server are password-protected, the standby must be able to provide the password so it can begin streaming replication.
The recommended way to do this is to store the password in the postgres
system
user's ~/.pgpass
file. For more information on using the password file, see
the documentation section password file.
If using a pgpass
file, an entry for the replication user (by default the
user who connects to the repmgr
database) must
be provided, with database name set to replication
, e.g.:
node1:5432:replication:repmgr:12345
If, for whatever reason, you wish to include the password in the replication configuration file,
set use_primary_conninfo_password
to true
in
repmgr.conf
. This will read a password set in PGPASSWORD
(but not ~/.pgpass
) and place it into the primary_conninfo
string in the replication configuration. Note that PGPASSWORD
will need to be set during any action which causes the replication configuration file to be
rewritten, e.g. repmgr standby follow.
In some circumstances it might be desirable to create a dedicated replication-only
user (in addition to the user who manages the repmgr metadata). In this case,
the replication user should be set in repmgr.conf
via the parameter
replication_user
; repmgr will use this value when making
replication connections and generating the replication configuration. This
value will also be stored in the parameter repmgr.nodes
table for each node; it no longer needs to be explicitly specified when
cloning a node or executing repmgr standby follow.
repmgr provides a tablespace_mapping
configuration
file option, which will makes it possible to map the tablespace on the source node to
a different location on the local node.
To use this, add tablespace_mapping
to repmgr.conf
like this:
tablespace_mapping='/var/lib/pgsql/tblspc1=/data/pgsql/tblspc1'
where the left-hand value represents the tablespace on the source node, and the right-hand value represents the tablespace on the standby to be cloned.
This parameter can be provided multiple times.
If a primary server fails or needs to be removed from the replication cluster, a new primary server must be designated, to ensure the cluster continues to function correctly. This can be done with repmgr standby promote, which promotes the standby on the current server to primary.
To demonstrate this, set up a replication cluster with a primary and two attached standby servers so that the cluster looks like this:
$ repmgr -f /etc/repmgr.conf cluster show ID | Name | Role | Status | Upstream | Location | Connection string ----+-------+---------+-----------+----------+----------+-------------------------------------- 1 | node1 | primary | * running | | default | host=node1 dbname=repmgr user=repmgr 2 | node2 | standby | running | node1 | default | host=node2 dbname=repmgr user=repmgr 3 | node3 | standby | running | node1 | default | host=node3 dbname=repmgr user=repmgr
Stop the current primary with e.g.:
$ pg_ctl -D /var/lib/postgresql/data -m fast stop
At this point the replication cluster will be in a partially disabled state, with both standbys accepting read-only connections while attempting to connect to the stopped primary. Note that the repmgr metadata table will not yet have been updated; executing repmgr cluster show will note the discrepancy:
$ repmgr -f /etc/repmgr.conf cluster show ID | Name | Role | Status | Upstream | Location | Connection string ----+-------+---------+---------------+----------+----------+-------------------------------------- 1 | node1 | primary | ? unreachable | | default | host=node1 dbname=repmgr user=repmgr 2 | node2 | standby | running | node1 | default | host=node2 dbname=repmgr user=repmgr 3 | node3 | standby | running | node1 | default | host=node3 dbname=repmgr user=repmgr WARNING: following issues were detected node "node1" (ID: 1) is registered as an active primary but is unreachable
Now promote the first standby with:
$ repmgr -f /etc/repmgr.conf standby promote
This will produce output similar to the following:
INFO: connecting to standby database NOTICE: promoting standby DETAIL: promoting server using "pg_ctl -l /var/log/postgresql/startup.log -w -D '/var/lib/postgresql/data' promote" server promoting INFO: reconnecting to promoted server NOTICE: STANDBY PROMOTE successful DETAIL: node 2 was successfully promoted to primary
Executing repmgr cluster show will show the current state; as there is now an active primary, the previous warning will not be displayed:
$ repmgr -f /etc/repmgr.conf cluster show ID | Name | Role | Status | Upstream | Location | Connection string ----+-------+---------+-----------+----------+----------+-------------------------------------- 1 | node1 | primary | - failed | | default | host=node1 dbname=repmgr user=repmgr 2 | node2 | primary | * running | | default | host=node2 dbname=repmgr user=repmgr 3 | node3 | standby | running | node1 | default | host=node3 dbname=repmgr user=repmgr
However the sole remaining standby (node3
) is still trying to replicate from the failed
primary; repmgr standby follow must now be executed to rectify this situation
(see Chapter 7 for example).
Following the failure or removal of the replication cluster's existing primary server, repmgr standby follow can be used to make "orphaned" standbys follow the new primary and catch up to its current state.
To demonstrate this, assuming a replication cluster in the same state as the end of the preceding section (Promoting a standby), execute this:
$ repmgr -f /etc/repmgr.conf standby follow INFO: changing node 3's primary to node 2 NOTICE: restarting server using "pg_ctl -l /var/log/postgresql/startup.log -w -D '/var/lib/postgresql/data' restart" waiting for server to shut down......... done server stopped waiting for server to start.... done server started NOTICE: STANDBY FOLLOW successful DETAIL: node 3 is now attached to node 2
The standby is now replicating from the new primary and
repmgr cluster show
output reflects this:
$ repmgr -f /etc/repmgr.conf cluster show ID | Name | Role | Status | Upstream | Location | Connection string ----+-------+---------+-----------+----------+----------+-------------------------------------- 1 | node1 | primary | - failed | | default | host=node1 dbname=repmgr user=repmgr 2 | node2 | primary | * running | | default | host=node2 dbname=repmgr user=repmgr 3 | node3 | standby | running | node2 | default | host=node3 dbname=repmgr user=repmgr
Note that with cascading replication, repmgr standby follow
can also be
used to detach a standby from its current upstream server and follow the
primary. However it's currently not possible to have it follow another standby;
we hope to improve this in a future release.
Table of Contents
A typical use-case for replication is a combination of primary and standby server, with the standby serving as a backup which can easily be activated in case of a problem with the primary. Such an unplanned failover would normally be handled by promoting the standby, after which an appropriate action must be taken to restore the old primary.
In some cases however it's desirable to promote the standby in a planned way, e.g. so maintenance can be performed on the primary; this kind of switchover is supported by the repmgr standby switchover command.
repmgr standby switchover
differs from other repmgr
actions in that it also performs actions on other servers (the demotion
candidate, and optionally any other servers which are to follow the new primary),
which means passwordless SSH access is required to those servers from the one where
repmgr standby switchover
is executed.
repmgr standby switchover
performs a relatively complex
series of operations on two servers, and should therefore be performed after
careful preparation and with adequate attention. In particular you should
be confident that your network environment is stable and reliable.
Additionally you should be sure that the current primary can be shut down quickly and cleanly. In particular, access from applications should be minimalized or preferably blocked completely. Also be aware that if there is a backlog of files waiting to be archived, PostgreSQL will not shut down until archiving completes.
We recommend running repmgr standby switchover
at the
most verbose logging level (--log-level=DEBUG --verbose
)
and capturing all output to assist troubleshooting any problems.
Please also read carefully the sections Preparing for switchover and Caveats below.
As mentioned in the previous section, success of the switchover operation depends on repmgr being able to shut down the current primary server quickly and cleanly.
Ensure that the promotion candidate has sufficient free walsenders available
(PostgreSQL configuration item max_wal_senders
), and if replication
slots are in use, at least one free slot is available for the demotion candidate (
PostgreSQL configuration item max_replication_slots
).
Ensure that a passwordless SSH connection is possible from the promotion candidate
(standby) to the demotion candidate (current primary). If --siblings-follow
will be used, ensure that passwordless SSH connections are possible from the
promotion candidate to all nodes attached to the demotion candidate
(including the witness server, if in use).
repmgr expects to find the repmgr binary in the same path on the remote server as on the local server.
Double-check which commands will be used to stop/start/restart the current
primary; this can be done by e.g. executing repmgr node service
on the current primary:
repmgr -f /etc/repmgr.conf node service --list-actions --action=stop repmgr -f /etc/repmgr.conf node service --list-actions --action=start repmgr -f /etc/repmgr.conf node service --list-actions --action=restart
These commands can be defined in repmgr.conf
with
service_start_command
, service_stop_command
and service_restart_command
.
If repmgr is installed from a package. you should set these commands to use the appropriate service commands defined by the package/operating system as these will ensure PostgreSQL is stopped/started properly taking into account configuration and log file locations etc.
If the service_*_command
options aren't defined, repmgr will
fall back to using pg_ctl to stop/start/restart
PostgreSQL, which may not work properly, particularly when executed on a remote
server.
For more details, see service command settings.
On systemd
systems we strongly recommend using the appropriate
systemctl
commands (typically run via sudo
) to ensure
systemd
is informed about the status of the PostgreSQL service.
If using sudo
for the systemctl
calls, make sure the
sudo
specification doesn't require a real tty for the user. If not set
this way, repmgr
will fail to stop the primary.
See the service command settings documentation section for further details.
Check that access from applications is minimalized or preferably blocked completely, so applications are not unexpectedly interrupted.
If an exclusive backup is running on the current primary, or if WAL replay is paused on the standby, repmgr will not perform the switchover.
Check there is no significant replication lag on standbys attached to the current primary.
If WAL file archiving is set up, check that there is no backlog of files waiting
to be archived, as PostgreSQL will not finally shut down until all of these have been
archived. If there is a backlog exceeding archive_ready_warning
WAL files,
repmgr will emit a warning before attempting to perform a switchover; you can also check
manually with repmgr node check --archive-ready
.
From repmgr 4.2, repmgr will instruct any running repmgrd instances to pause operations while the switchover is being carried out, to prevent repmgrd from unintentionally promoting a node. For more details, see pausing the repmgrd service.
Users of repmgr versions prior to 4.2 should ensure that repmgrd is not running on any nodes while a switchover is being executed.
Finally, consider executing repmgr standby switchover
with the
--dry-run
option; this will perform any necessary checks and inform you about
success/failure, and stop before the first actual command is run (which would be the shutdown of the
current primary). Example output:
$ repmgr standby switchover -f /etc/repmgr.conf --siblings-follow --dry-run NOTICE: checking switchover on node "node2" (ID: 2) in --dry-run mode INFO: SSH connection to host "node1" succeeded INFO: archive mode is "off" INFO: replication lag on this standby is 0 seconds INFO: all sibling nodes are reachable via SSH NOTICE: local node "node2" (ID: 2) will be promoted to primary; current primary "node1" (ID: 1) will be demoted to standby INFO: following shutdown command would be run on node "node1": "pg_ctl -l /var/log/postgresql/startup.log -D '/var/lib/postgresql/data' -m fast -W stop" INFO: parameter "shutdown_check_timeout" is set to 60 seconds
Be aware that --dry-run
checks the prerequisites
for performing the switchover and some basic sanity checks on the
state of the database which might effect the switchover operation
(e.g. replication lag); it cannot however guarantee the switchover
operation will succeed. In particular, if the current primary
does not shut down cleanly, repmgr will not be able to reliably
execute the switchover (as there would be a danger of divergence
between the former and new primary nodes).
See repmgr standby switchover for a full list of available
command line options and repmgr.conf
settings relevant
to performing a switchover.
If the demotion candidate does not shut down smoothly or cleanly, there's a risk it will have a slightly divergent timeline and will not be able to attach to the new primary. To fix this situation without needing to reclone the old primary, it's possible to use the pg_rewind utility, which will usually be able to resync the two servers.
To have repmgr execute pg_rewind if it detects this
situation after promoting the new primary, add the --force-rewind
option.
If repmgr detects a situation where it needs to execute pg_rewind,
it will execute a CHECKPOINT
on the new primary before executing
pg_rewind.
For more details on pg_rewind, see section Using pg_rewind
in the repmgr node rejoin
documentation and
the PostgreSQL documentation at
https://www.postgresql.org/docs/current/app-pgrewind.html.
To demonstrate switchover, we will assume a replication cluster with a
primary (node1
) and one standby (node2
);
after the switchover node2
should become the primary with
node1
following it.
The switchover command must be run from the standby which is to be promoted, and in its simplest form looks like this:
$ repmgr -f /etc/repmgr.conf standby switchover NOTICE: executing switchover on node "node2" (ID: 2) INFO: searching for primary node INFO: checking if node 1 is primary INFO: current primary node is 1 INFO: SSH connection to host "node1" succeeded INFO: archive mode is "off" INFO: replication lag on this standby is 0 seconds NOTICE: local node "node2" (ID: 2) will be promoted to primary; current primary "node1" (ID: 1) will be demoted to standby NOTICE: stopping current primary node "node1" (ID: 1) NOTICE: issuing CHECKPOINT DETAIL: executing server command "pg_ctl -l /var/log/postgres/startup.log -D '/var/lib/pgsql/data' -m fast -W stop" INFO: checking primary status; 1 of 6 attempts NOTICE: current primary has been cleanly shut down at location 0/3001460 NOTICE: promoting standby to primary DETAIL: promoting server "node2" (ID: 2) using "pg_ctl -l /var/log/postgres/startup.log -w -D '/var/lib/pgsql/data' promote" server promoting NOTICE: STANDBY PROMOTE successful DETAIL: server "node2" (ID: 2) was successfully promoted to primary INFO: setting node 1's primary to node 2 NOTICE: starting server using "pg_ctl -l /var/log/postgres/startup.log -w -D '/var/lib/pgsql/data' restart" NOTICE: NODE REJOIN successful DETAIL: node 1 is now attached to node 2 NOTICE: switchover was successful DETAIL: node "node2" is now primary NOTICE: STANDBY SWITCHOVER is complete
The old primary is now replicating as a standby from the new primary, and the cluster status will now look like this:
$ repmgr -f /etc/repmgr.conf cluster show ID | Name | Role | Status | Upstream | Location | Connection string ----+-------+---------+-----------+----------+----------+-------------------------------------- 1 | node1 | standby | running | node2 | default | host=node1 dbname=repmgr user=repmgr 2 | node2 | primary | * running | | default | host=node2 dbname=repmgr user=repmgr
If repmgrd is in use, it's worth double-checking that
all nodes are unpaused by executing
repmgr service status
(repmgr 4.2 - 4.4: repmgr daemon status
).
Users of repmgr versions prior to 4.2 will need to manually restart repmgrd on all nodes after the switchover is completed.
fast
shutdown mode (the default in 9.5
and later). If relying on pg_ctl
to perform database server operations,
you should include -m fast
in pg_ctl_options
in repmgr.conf
.
pg_rewind
*requires* that either wal_log_hints
is enabled, or that
data checksums were enabled when the cluster was initialized. See the
pg_rewind documentation
for details.
As emphasised previously, performing a switchover is a non-trivial operation and there are a number of potential issues which can occur. While repmgr attempts to perform sanity checks, there's no guaranteed way of determining the success of a switchover without actually carrying it out.
repmgr may abort a switchover with a message like:
ERROR: shutdown of the primary server could not be confirmed HINT: check the primary server status before performing any further actions
This means the shutdown of the old primary has taken longer than repmgr expected, and it has given up waiting.
In this case, check the PostgreSQL log on the primary server to see what is going
on. It's entirely possible the shutdown process is just taking longer than the
timeout set by the configuration parameter shutdown_check_timeout
(default: 60 seconds), in which case you may need to adjust this parameter.
Note that shutdown_check_timeout
is set on the node where
repmgr standby switchover
is executed (promotion candidate); setting it on the
demotion candidate (former primary) will have no effect.
If the primary server has shut down cleanly, and no other node has been promoted, it is safe to restart it, in which case the replication cluster will be restored to its original configuration.
repmgr may abort a switchover with a message like:
ERROR: unable to perform a switchover while primary server is in exclusive backup mode HINT: stop backup before attempting the switchover
This means an exclusive backup is running on the current primary; interrupting this will not only abort the backup, but potentially leave the primary with an ambiguous backup state.
To proceed, either wait until the backup has finished, or cancel it with the command
SELECT pg_stop_backup()
. For more details see the PostgreSQL
documentation section
Making an exclusive low level backup.
Each time repmgr or repmgrd perform a significant event, a record
of that event is written into the repmgr.events
table together with
a timestamp, an indication of failure or success, and further details
if appropriate. This is useful for gaining an overview of events
affecting the replication cluster. However note that this table has
advisory character and should be used in combination with the repmgr
and PostgreSQL logs to obtain details of any events.
Example output after a primary was registered and a standby cloned and registered:
repmgr=# SELECT * from repmgr.events ; node_id | event | successful | event_timestamp | details ---------+------------------+------------+-------------------------------+------------------------------------------------------------------------------------- 1 | primary_register | t | 2016-01-08 15:04:39.781733+09 | 2 | standby_clone | t | 2016-01-08 15:04:49.530001+09 | Cloned from host 'repmgr_node1', port 5432; backup method: pg_basebackup; --force: N 2 | standby_register | t | 2016-01-08 15:04:50.621292+09 | (3 rows)
Alternatively, use repmgr cluster event to output a formatted list of events.
Additionally, event notifications can be passed to a user-defined program
or script which can take further action, e.g. send email notifications.
This is done by setting the event_notification_command
parameter in
repmgr.conf
.
The following format placeholders are provided for all event notifications:
%n
node ID
%e
event type
%s
success (1) or failure (0)
%t
timestamp
%d
details
The values provided for %t
and %d
may contain spaces, so should be quoted in the provided command
configuration, e.g.:
event_notification_command='/path/to/some/script %n %e %s "%t" "%d"'
The following parameters are provided for a subset of event notifications; their meaning may change according to context:
%p
node ID of the current primary (repmgr standby register and repmgr standby follow)
node ID of the demoted primary (repmgr standby switchover only)
node ID of the former primary (repmgrd_failover_promote
only)
%c
conninfo
string of the primary node
(repmgr standby register and repmgr standby follow)
%a
name of the current primary node (repmgr standby register and repmgr standby follow)
The values provided for %c
and %a
may contain spaces, so should always be quoted.
By default, all notification types will be passed to the designated script;
the notification types can be filtered to explicitly named ones using the
event_notifications
parameter, e.g.:
event_notifications='primary_register,standby_register,witness_register'
Events generated by the repmgr command:
Events generated by repmgrd (streaming replication mode):
repmgrd_start
repmgrd_shutdown
repmgrd_reload
repmgrd_failover_promote
repmgrd_failover_follow
repmgrd_failover_aborted
repmgrd_standby_reconnect
repmgrd_promote_error
repmgrd_local_disconnect
repmgrd_local_reconnect
repmgrd_upstream_disconnect
repmgrd_upstream_reconnect
standby_disconnect_manual
standby_failure
standby_recovery
child_node_disconnect
child_node_reconnect
child_node_new_connect
child_nodes_disconnect_command
Note that under some circumstances (e.g. when no replication cluster primary
could be located), it will not be possible to write an entry into the
repmgr.events
table, in which case executing a script via event_notification_command
can serve as a fallback by generating some form of notification.
Table of Contents
repmgr is updated regularly with minor releases (e.g. 4.0.1 to 4.0.2) containing bugfixes and other minor improvements. Any substantial new functionality will be included in a major release (e.g. 4.0 to 4.1).
From version 4, repmgr consists of three elements:
With minor releases, usually changes are only made to the repmgr and repmgrd executables. In this case, the upgrade is quite straightforward, and is simply a case of installing the new version, and restarting repmgrd (if running).
For major releases, the repmgr PostgreSQL extension will need to be updated to the latest version. Additionally, if the shared library module has been updated (this is sometimes, but not always the case), PostgreSQL itself will need to be restarted on each node.
Always check the release notes for every release as they may contain upgrade instructions particular to individual versions.
A minor release upgrade involves updating repmgr from one minor release to another
minor release within the same major release (e.g. 5.3.1
to 5.3.2
).
An upgrade between minor releases of differing major releases (e.g. 5.2.1
to 5.3.2
)
is a major upgrade.
The process for installing minor version upgrades is quite straightforward:
Some packaging systems (e.g. Debian/Ubuntu may restart repmgrd as part of the package upgrade process.
Minor version upgrades can be performed in any order on the nodes in the replication cluster.
A PostgreSQL restart is usually not required for minor version upgrades
within the same major version (e.g. 5.3.1
to 5.3.2
).
Be sure to check the release notes.
The same repmgr "major version" (e.g. 5.3
) must be
installed on all nodes in the replication cluster. While it's possible to have differing
repmgr "minor versions" (e.g. 5.3.1
and 5.3.2
)
on different nodes, we strongly recommend updating all nodes to the latest minor version.
"major version" upgrades need to be planned more carefully, as they may include changes to the repmgr metadata (which need to be propagated from the primary to all standbys) and/or changes to the shared object file used by repmgrd (which require a PostgreSQL restart).
With this in mind,
If running a systemd
-based Linux distribution, execute (as root
,
or with appropriate sudo
permissions):
systemctl daemon-reload
If the repmgr shared library module has been updated (check the release notes!), restart PostgreSQL, then repmgrd (if in use) on each node, The order in which this is applied to individual nodes is not critical, and it's also fine to restart PostgreSQL on all nodes first before starting repmgrd.
Note that if the upgrade requires a PostgreSQL restart, repmgrd will only function correctly once all nodes have been restarted.
On the primary node, execute
ALTER EXTENSION repmgr UPDATE
in the database where repmgr is installed.
If the repmgr upgrade requires a PostgreSQL restart, combine the repmgr upgrade with a PostgreSQL minor version upgrade, which will require a restart in any case.
New PostgreSQL minor versions are usually released every couple of months; see the Roadmap for the current schedule.
From repmgr 4.2, once the upgrade is complete, execute the
repmgr service status
(repmgr 4.2 - 4.4: repmgr daemon status
)
command (on any node) to show an overview of the status of repmgrd on all nodes.
pg_upgrade requires that if any functions are dependent on a shared library, this library must be present in both the old and new installations before pg_upgrade can be executed.
To minimize the risk of any upgrade issues (particularly if an upgrade to a new major repmgr version is involved), we recommend upgrading repmgr on the old server before running pg_upgrade to ensure that old and new versions are the same.
This issue applies to any PostgreSQL extension which has dependencies on a shared library.
For further details please see the pg_upgrade documentation.
If replication slots are in use, bear in mind these will not be recreated by pg_upgrade. These will need to be recreated manually.
Use repmgr node check
to determine which replication slots need to be recreated.
If you are intending to upgrade a standby using the rsync
method described
in the pg_upgrade documentation,
you must ensure the standby's replication configuration is present and correct
before starting the standby.
Use repmgr standby clone --replication-conf-only to generate the correct replication configuration.
If upgrading from PostgreSQL 11 or earlier, be sure to delete recovery.conf
, if present,
otherwise PostgreSQL will refuse to start.
The upgrade process consists of two steps:
repmgr.conf
configuration files
CREATE EXTENSION
(PostgreSQL 12 and earlier)
A script is provided to assist with converting repmgr.conf
.
The schema upgrade (which converts the repmgr metadata into a packaged PostgreSQL extension) is normally carried out automatically when the repmgr extension is created.
The shared library has been renamed from repmgr_funcs
to
repmgr
- if it's set in shared_preload_libraries
in postgresql.conf
it will need to be updated to the new name:
shared_preload_libraries = 'repmgr'
With a completely new repmgr version, we've taken the opportunity
to rename some configuration items for
clarity and consistency, both between the configuration file and
the column names in repmgr.nodes
(e.g. node
to node_id
), and
also for consistency with PostgreSQL naming conventions
(e.g. loglevel
to log_level
).
Other configuration items have been changed to command line options, and vice-versa, e.g. to avoid hard-coding items such as a a node's upstream ID, which might change over time.
repmgr will issue a warning about deprecated/altered options.
Following parameters have been added:
data_directory
: this is mandatory and must
contain the path to the node's data directorymonitoring_history
: this replaces the
repmgrd command line option
--monitoring-history
Following parameters have been renamed:
Table 10.1. Parameters renamed in repmgr4
repmgr3 | repmgr4 |
---|---|
node | node_id |
loglevel | log_level |
logfacility | log_facility |
logfile | log_file |
barman_server | barman_host |
master_reponse_timeout | async_query_timeout |
From repmgr 4, barman_server
refers
to the server configured in Barman (in repmgr 3, the deprecated
cluster
parameter was used for this);
the physical Barman hostname is configured with
barman_host
(see Section 5.1.1
for details).
Following parameters have been removed:
cluster
: is no longer required and will
be ignored.upstream_node
: is replaced by the
command-line parameter --upstream-node-id
To assist with conversion of repmgr.conf
files, a Perl script
is provided in contrib/convert-config.pl
.
Use like this:
$ ./convert-config.pl /etc/repmgr.conf node_id=2 node_name='node2' conninfo='host=node2 dbname=repmgr user=repmgr connect_timeout=2' pg_ctl_options='-l /var/log/postgres/startup.log' rsync_options='--exclude=postgresql.local.conf --archive' log_level='INFO' pg_basebackup_options='--no-slot' data_directory=''
The converted file is printed to STDOUT
and the original file is not
changed.
Please note that the the conversion script will add an empty
placeholder parameter for data_directory
, which
is a required parameter from repmgr 4. This must be manually modified to contain
the correct data directory.
Ensure repmgrd is not running, or any cron jobs which execute the
repmgr
binary.
Install the latest repmgr package; any repmgr 3.x
packages
should be uninstalled (if not automatically uninstalled already by your packaging system).
If you don't care about any data from the existing repmgr installation,
(e.g. the contents of the events
and monitoring
tables), the following steps can be skipped; proceed to Section 10.3.4.
If your repmgr version is 3.1.1 or earlier, you will need to update the schema to the latest version in the 3.x series (3.3.2) before converting the installation to repmgr 4.
To do this, apply the following upgrade scripts as appropriate for your current version:
For more details see the repmgr 3 upgrade notes.
In the database used by the existing repmgr installation, execute:
CREATE EXTENSION repmgr FROM unpackaged
This will move and convert all objects from the existing schema
into the new, standard repmgr
schema.
There must be only one schema matching repmgr_%
in the
database, otherwise this step may not work.
Beginning with PostgreSQL 13, the CREATE EXTENSION ... FROM unpackaged
syntax is no longer available. In the unlikely event you have ended up with an
installation running PostgreSQL 13 or later and containing the legacy repmgr
schema, there is no convenient way of upgrading this; instead you'll just need
to re-register the nodes as detailed in the following section,
which will create the repmgr extension automatically.
Any historical data you wish to retain (e.g. the contents of the events
and monitoring
tables) will need to be exported manually.
This is necessary to update the repmgr
metadata with some additional items.
On the primary node, execute e.g.
repmgr primary register -f /etc/repmgr.conf --force
If not already present (e.g. after executing CREATE EXTENSION repmgr FROM unpackaged
),
the repmgr extension will be automatically created by repmgr primary register
.
On each standby node, execute e.g.
repmgr standby register -f /etc/repmgr.conf --force
Check the data is updated as expected by examining the repmgr.nodes
table; restart repmgrd if required.
The original repmgr_$cluster
schema can be dropped at any time.
Once the cluster has been registered with the current repmgr version, the legacy
repmgr_$cluster
schema can be dropped at any time with:
DROP SCHEMA repmgr_$cluster CASCADE
(substitute $cluster
with the value of the clustername
variable used in repmgr 3.x).
Table of Contents
Table of Contents
repmgrd ("replication manager daemon
")
is a management and monitoring daemon which runs
on each node in a replication cluster. It can automate actions such as
failover and updating standbys to follow the new primary, as well as
providing monitoring information about the state of each standby.
repmgrd is designed to be straightforward to set up and does not require additional external infrastructure.
Functionality provided by repmgrd includes:
single command
To demonstrate automatic failover, set up a 3-node replication cluster (one primary and two standbys streaming directly from the primary) so that the cluster looks something like this:
$ repmgr -f /etc/repmgr.conf cluster show --compact ID | Name | Role | Status | Upstream | Location | Prio. ----+-------+---------+-----------+----------+----------+------- 1 | node1 | primary | * running | | default | 100 2 | node2 | standby | running | node1 | default | 100 3 | node3 | standby | running | node1 | default | 100
See section Required configuration for automatic failover
for an example of minimal repmgr.conf
file settings suitable for use with repmgrd.
Start repmgrd on each standby and verify that it's running by examining the
log output, which at log level INFO
will look like this:
[2019-08-15 07:14:42] [NOTICE] repmgrd (repmgrd 5.0) starting up [2019-08-15 07:14:42] [INFO] connecting to database "host=node2 dbname=repmgr user=repmgr connect_timeout=2" INFO: set_repmgrd_pid(): provided pidfile is /var/run/repmgr/repmgrd-12.pid [2019-08-15 07:14:42] [NOTICE] starting monitoring of node "node2" (ID: 2) [2019-08-15 07:14:42] [INFO] monitoring connection to upstream node "node1" (ID: 1)
Each repmgrd should also have recorded its successful startup as an event:
$ repmgr -f /etc/repmgr.conf cluster event --event=repmgrd_start Node ID | Name | Event | OK | Timestamp | Details ---------+-------+---------------+----+---------------------+-------------------------------------------------------- 3 | node3 | repmgrd_start | t | 2019-08-15 07:14:42 | monitoring connection to upstream node "node1" (ID: 1) 2 | node2 | repmgrd_start | t | 2019-08-15 07:14:41 | monitoring connection to upstream node "node1" (ID: 1) 1 | node1 | repmgrd_start | t | 2019-08-15 07:14:39 | monitoring cluster primary "node1" (ID: 1)
Now stop the current primary server with e.g.:
pg_ctl -D /var/lib/postgresql/data -m immediate stop
This will force the primary to shut down straight away, aborting all processes
and transactions. This will cause a flurry of activity in the repmgrd log
files as each repmgrd detects the failure of the primary and a failover
decision is made. This is an extract from the log of a standby server (node2
)
which has promoted to new primary after failure of the original primary (node1
).
[2019-08-15 07:27:50] [WARNING] unable to connect to upstream node "node1" (ID: 1) [2019-08-15 07:27:50] [INFO] checking state of node 1, 1 of 3 attempts [2019-08-15 07:27:50] [INFO] sleeping 5 seconds until next reconnection attempt [2019-08-15 07:27:55] [INFO] checking state of node 1, 2 of 3 attempts [2019-08-15 07:27:55] [INFO] sleeping 5 seconds until next reconnection attempt [2019-08-15 07:28:00] [INFO] checking state of node 1, 3 of 3 attempts [2019-08-15 07:28:00] [WARNING] unable to reconnect to node 1 after 3 attempts [2019-08-15 07:28:00] [INFO] primary and this node have the same location ("default") [2019-08-15 07:28:00] [INFO] local node's last receive lsn: 0/900CBF8 [2019-08-15 07:28:00] [INFO] node 3 last saw primary node 12 second(s) ago [2019-08-15 07:28:00] [INFO] last receive LSN for sibling node "node3" (ID: 3) is: 0/900CBF8 [2019-08-15 07:28:00] [INFO] node "node3" (ID: 3) has same LSN as current candidate "node2" (ID: 2) [2019-08-15 07:28:00] [INFO] visible nodes: 2; total nodes: 2; no nodes have seen the primary within the last 4 seconds [2019-08-15 07:28:00] [NOTICE] promotion candidate is "node2" (ID: 2) [2019-08-15 07:28:00] [NOTICE] this node is the winner, will now promote itself and inform other nodes [2019-08-15 07:28:00] [INFO] promote_command is: "/usr/pgsql-12/bin/repmgr -f /etc/repmgr/12/repmgr.conf standby promote" NOTICE: promoting standby to primary DETAIL: promoting server "node2" (ID: 2) using "/usr/pgsql-12/bin/pg_ctl -w -D '/var/lib/pgsql/12/data' promote" NOTICE: waiting up to 60 seconds (parameter "promote_check_timeout") for promotion to complete NOTICE: STANDBY PROMOTE successful DETAIL: server "node2" (ID: 2) was successfully promoted to primary [2019-08-15 07:28:01] [INFO] 3 followers to notify [2019-08-15 07:28:01] [NOTICE] notifying node "node3" (ID: 3) to follow node 2 INFO: node 3 received notification to follow node 2 [2019-08-15 07:28:01] [INFO] switching to primary monitoring mode [2019-08-15 07:28:01] [NOTICE] monitoring cluster primary "node2" (ID: 2)
The cluster status will now look like this, with the original primary (node1
)
marked as inactive, and standby node3
now following the new primary
(node2
):
$ repmgr -f /etc/repmgr.conf cluster show --compact ID | Name | Role | Status | Upstream | Location | Prio. ----+-------+---------+-----------+----------+----------+------- 1 | node1 | primary | - failed | | default | 100 2 | node2 | primary | * running | | default | 100 3 | node3 | standby | running | node2 | default | 100
repmgr cluster event
will display a summary of
what happened to each server during the failover:
$ repmgr -f /etc/repmgr.conf cluster event Node ID | Name | Event | OK | Timestamp | Details ---------+-------+----------------------------+----+---------------------+------------------------------------------------------------- 3 | node3 | repmgrd_failover_follow | t | 2019-08-15 07:38:03 | node 3 now following new upstream node 2 3 | node3 | standby_follow | t | 2019-08-15 07:38:02 | standby attached to upstream node "node2" (ID: 2) 2 | node2 | repmgrd_reload | t | 2019-08-15 07:38:01 | monitoring cluster primary "node2" (ID: 2) 2 | node2 | repmgrd_failover_promote | t | 2019-08-15 07:38:01 | node 2 promoted to primary; old primary 1 marked as failed 2 | node2 | standby_promote | t | 2019-08-15 07:38:01 | server "node2" (ID: 2) was successfully promoted to primary
Table of Contents
repmgrd is a management and monitoring daemon which runs on each node in a replication cluster. It can automate actions such as failover and updating standbys to follow the new primary, as well as providing monitoring information about the state of each standby.
A witness server is a normal PostgreSQL instance which is not part of the streaming replication cluster; its purpose is, if a failover situation occurs, to provide proof that it is the primary server itself which is unavailable, rather than e.g. a network split between different physical locations.
A typical use case for a witness server is a two-node streaming replication setup, where the primary and standby are in different locations (data centres). By creating a witness server in the same location (data centre) as the primary, if the primary becomes unavailable it's possible for the standby to decide whether it can promote itself without risking a "split brain" scenario: if it can't see either the witness or the primary server, it's likely there's a network-level interruption and it should not promote itself. If it can see the witness but not the primary, this proves there is no network interruption and the primary itself is unavailable, and it can therefore promote itself (and ideally take action to fence the former primary).
Never install a witness server on the same physical host as another node in the replication cluster managed by repmgr - it's essential the witness is not affected in any way by failure of another node.
For more complex replication scenarios, e.g. with multiple datacentres, it may be preferable to use location-based failover, which ensures that only nodes in the same location as the primary will ever be promotion candidates; see Handling network splits with repmgrd for more details.
A witness server will only be useful if repmgrd is in use.
To create a witness server, set up a normal PostgreSQL instance on a server in the same physical location as the cluster's primary server.
This instance should not be on the same physical host as the primary server, as otherwise if the primary server fails due to hardware issues, the witness server will be lost too.
A PostgreSQL instance can only accommodate a single witness server.
If you are planning to use a single server to support more than one witness server, a separate PostgreSQL instance is required for each witness server in use.
The witness server should be configured in the same way as a normal repmgr node; see section Configuration.
Register the witness server with repmgr witness register. This will create the repmgr extension on the witness server, and make a copy of the repmgr metadata.
As the witness server is not part of the replication cluster, further changes to the repmgr metadata will be synchronised by repmgrd.
Once the witness server has been configured, repmgrd should be started.
To unregister a witness server, use repmgr witness unregister.
A common pattern for replication cluster setups is to spread servers over more than one datacentre. This can provide benefits such as geographically- distributed read replicas and DR (disaster recovery capability). However this also means there is a risk of disconnection at network level between datacentre locations, which would result in a split-brain scenario if servers in a secondary data centre were no longer able to see the primary in the main data centre and promoted a standby among themselves.
repmgr enables provision of "witness server" to
artificially create a quorum of servers in a particular location, ensuring
that nodes in another location will not elect a new primary if they
are unable to see the majority of nodes. However this approach does not
scale well, particularly with more complex replication setups, e.g.
where the majority of nodes are located outside of the primary datacentre.
It also means the witness
node needs to be managed as an
extra PostgreSQL instance outside of the main replication cluster, which
adds administrative and programming complexity.
repmgr4
introduces the concept of location
:
each node is associated with an arbitrary location string (default is
default
); this is set in repmgr.conf
, e.g.:
node_id=1 node_name=node1 conninfo='host=node1 user=repmgr dbname=repmgr connect_timeout=2' data_directory='/var/lib/postgresql/data' location='dc1'
In a failover situation, repmgrd will check if any servers in the same location as the current primary node are visible. If not, repmgrd will assume a network interruption and not promote any node in any other location (it will however enter degraded monitoring mode until a primary becomes visible).
In more complex replication setups, particularly where replication occurs between multiple datacentres, it's possible that some but not all standbys get cut off from the primary (but not from the other standbys).
In this situation, normally it's not desirable for any of the standbys which have been cut off to initiate a failover, as the primary is still functioning and standbys are connected. Beginning with repmgr 4.4 it is now possible for the affected standbys to build a consensus about whether the primary is still available to some standbys ("primary visibility consensus"). This is done by polling each standby (and the witness, if present) for the time it last saw the primary; if any have seen the primary very recently, it's reasonable to infer that the primary is still available and a failover should not be started.
The time the primary was last seen by each node can be checked by executing
repmgr service status
(repmgr 4.2 - 4.4: repmgr daemon status
)
which includes this in its output, e.g.:
$ repmgr -f /etc/repmgr.conf service status ID | Name | Role | Status | Upstream | repmgrd | PID | Paused? | Upstream last seen ----+-------+---------+-----------+----------+---------+-------+---------+-------------------- 1 | node1 | primary | * running | | running | 27259 | no | n/a 2 | node2 | standby | running | node1 | running | 27272 | no | 1 second(s) ago 3 | node3 | standby | running | node1 | running | 27282 | no | 0 second(s) ago 4 | node4 | witness | * running | node1 | running | 27298 | no | 1 second(s) ago
To enable this functionality, in repmgr.conf
set:
primary_visibility_consensus=true
primary_visibility_consensus
must be set to
true
on all nodes for it to be effective.
The following sample repmgrd log output demonstrates the behaviour in a situation where one of three standbys is no longer able to connect to the primary, but can connect to the two other standbys ("sibling nodes"):
[2019-05-17 05:36:12] [WARNING] unable to reconnect to node 1 after 3 attempts [2019-05-17 05:36:12] [INFO] 2 active sibling nodes registered [2019-05-17 05:36:12] [INFO] local node's last receive lsn: 0/7006E58 [2019-05-17 05:36:12] [INFO] checking state of sibling node "node3" (ID: 3) [2019-05-17 05:36:12] [INFO] node "node3" (ID: 3) reports its upstream is node 1, last seen 1 second(s) ago [2019-05-17 05:36:12] [NOTICE] node 3 last saw primary node 1 second(s) ago, considering primary still visible [2019-05-17 05:36:12] [INFO] last receive LSN for sibling node "node3" (ID: 3) is: 0/7006E58 [2019-05-17 05:36:12] [INFO] node "node3" (ID: 3) has same LSN as current candidate "node2" (ID: 2) [2019-05-17 05:36:12] [INFO] checking state of sibling node "node4" (ID: 4) [2019-05-17 05:36:12] [INFO] node "node4" (ID: 4) reports its upstream is node 1, last seen 0 second(s) ago [2019-05-17 05:36:12] [NOTICE] node 4 last saw primary node 0 second(s) ago, considering primary still visible [2019-05-17 05:36:12] [INFO] last receive LSN for sibling node "node4" (ID: 4) is: 0/7006E58 [2019-05-17 05:36:12] [INFO] node "node4" (ID: 4) has same LSN as current candidate "node2" (ID: 2) [2019-05-17 05:36:12] [INFO] 2 nodes can see the primary [2019-05-17 05:36:12] [DETAIL] following nodes can see the primary: - node "node3" (ID: 3): 1 second(s) ago - node "node4" (ID: 4): 0 second(s) ago [2019-05-17 05:36:12] [NOTICE] cancelling failover as some nodes can still see the primary [2019-05-17 05:36:12] [NOTICE] election cancelled [2019-05-17 05:36:14] [INFO] node "node2" (ID: 2) monitoring upstream node "node1" (ID: 1) in degraded state
In this situation it will cancel the failover and enter degraded monitoring node, waiting for the primary to reappear.
If standby_disconnect_on_failover
is set to true
in
repmgr.conf
, in a failover situation repmgrd will forcibly disconnect
the local node's WAL receiver, and wait for the WAL receiver on all sibling nodes to be
disconnected, before making a failover decision.
standby_disconnect_on_failover
is available with PostgreSQL 9.5 and later.
Until PostgreSQL 14 this requires that the repmgr
database user is a superuser.
From PostgreSQL 15 a specific ALTER SYSTEM privilege can be granted to the repmgr
database
user with e.g. GRANT ALTER SYSTEM ON PARAMETER wal_retrieve_retry_interval TO repmgr
.
By doing this, it's possible to ensure that, at the point the failover decision is made, no nodes are receiving data from the primary and their LSN location will be static.
standby_disconnect_on_failover
must be set to the same value on
all nodes.
Note that when using standby_disconnect_on_failover
there will be a delay of 5 seconds
plus however many seconds it takes to confirm the WAL receiver is disconnected before
repmgrd proceeds with the failover decision.
repmgrd will wait up to sibling_nodes_disconnect_timeout
seconds (default:
30
) to confirm that the WAL receiver on all sibling nodes hase been
disconnected before proceding with the failover operation. If the timeout is reached, the
failover operation will go ahead anyway.
Following the failover operation, no matter what the outcome, each node will reconnect its WAL receiver.
If using standby_disconnect_on_failover
, we recommend that the
primary_visibility_consensus
option is also used.
From repmgr 4.3, repmgr makes it possible to provide a script to repmgrd which, in a failover situation, will be executed by the promotion candidate (the node which has been selected to be the new primary) to confirm whether the node should actually be promoted.
To use this, failover_validation_command
in repmgr.conf
to a script executable by the postgres
system user, e.g.:
failover_validation_command=/path/to/script.sh %n
The %n
parameter will be replaced with the node ID when the script is
executed. A number of other parameters are also available, see section
"Optional configuration for automatic failover" for details.
This script must return an exit code of 0
to indicate the node should promote itself.
Any other value will result in the promotion being aborted and the election rerun.
There is a pause of election_rerun_interval
seconds before the election is rerun.
Sample repmgrd log file output during which the failover validation script rejects the proposed promotion candidate:
[2019-03-13 21:01:30] [INFO] visible nodes: 2; total nodes: 2; no nodes have seen the primary within the last 4 seconds [2019-03-13 21:01:30] [NOTICE] promotion candidate is "node2" (ID: 2) [2019-03-13 21:01:30] [NOTICE] executing "failover_validation_command" [2019-03-13 21:01:30] [DETAIL] /usr/local/bin/failover-validation.sh 2 [2019-03-13 21:01:30] [INFO] output returned by failover validation command: Node ID: 2 [2019-03-13 21:01:30] [NOTICE] failover validation command returned a non-zero value: "1" [2019-03-13 21:01:30] [NOTICE] promotion candidate election will be rerun [2019-03-13 21:01:30] [INFO] 1 followers to notify [2019-03-13 21:01:30] [NOTICE] notifying node "node3" (ID: 3) to rerun promotion candidate selection INFO: node 3 received notification to rerun promotion candidate election [2019-03-13 21:01:30] [NOTICE] rerunning election after 15 seconds ("election_rerun_interval")
Cascading replication - where a standby can connect to an upstream node and not the primary server itself - was introduced in PostgreSQL 9.2. repmgr and repmgrd support cascading replication by keeping track of the relationship between standby servers - each node record is stored with the node id of its upstream ("parent") server (except of course the primary server).
In a failover situation where the primary node fails and a top-level standby
is promoted, a standby connected to another standby will not be affected
and continue working as normal (even if the upstream standby it's connected
to becomes the primary node). If however the node's direct upstream fails,
the "cascaded standby" will attempt to reconnect to that node's parent
(unless failover
is set to manual
in
repmgr.conf
).
This functionality is available in repmgr 4.4 and later.
When running on the primary node, repmgrd can monitor connections and in particular disconnections by its attached child nodes (standbys, and if in use, the witness server), and optionally execute a custom command if certain criteria are met (such as the number of attached nodes falling to zero following a failover to a new primary); this command can be used for example to "fence" the node and ensure it is isolated from any applications attempting to access the replication cluster.
Currently repmgrd can only detect disconnections of streaming replication standbys and cannot determine whether a standby has disconnected and fallen back to archive recovery.
See section caveats below.
repmgrd monitors attached child nodes and decides whether to invoke the user-defined command based on the following process and criteria:
Every few seconds (defined by the configuration parameter child_nodes_check_interval
;
default: 5
seconds, a value of 0
disables this altogether), repmgrd queries
the pg_stat_replication
system view and compares
the nodes present there against the list of nodes registered with repmgr which
should be attached to the primary.
If a witness server is in use, repmgrd connects to it and checks which upstream node it is following.
If a child node (standby) is no longer present in pg_stat_replication
,
repmgrd notes the time it detected the node's absence, and additionally generates a
child_node_disconnect
event.
If a witness server is in use, and it is no longer following the primary, or not
reachable at all, repmgrd notes the time it detected the node's absence, and additionally generates a
child_node_disconnect
event.
If a child node (standby) which was absent from pg_stat_replication
reappears,
repmgrd clears the time it detected the node's absence, and additionally generates a
child_node_reconnect
event.
If a witness server is in use, which was previously not reachable or not following the
primary node, has become reachable and is following the primary node, repmgrd clears the
time it detected the node's absence, and additionally generates a
child_node_reconnect
event.
If an entirely new child node (standby or witness) is detected, repmgrd adds it to its internal list
and additionally generates a child_node_new_connect
event.
If the child_nodes_disconnect_command
parameter is set in
repmgr.conf
, repmgrd will then loop through all child nodes.
If it determines that insufficient child nodes are connected, and a
minimum of child_nodes_disconnect_timeout
seconds (default: 30
)
has elapsed since the last node became disconnected, repmgrd will then execute the
child_nodes_disconnect_command
script.
By default, the child_nodes_disconnect_command
will only be executed
if all child nodes are disconnected. If child_nodes_connected_min_count
is set, the child_nodes_disconnect_command
script will be triggered
if the number of connected child nodes falls below the specified value (e.g.
if set to 2
, the script will be triggered if only one child node
is connected). Alternatively, if child_nodes_disconnect_min_count
and more than that number of child nodes disconnects, the script will be triggered.
By default, a witness node, if in use, will not be counted as a
child node for the purposes of determining whether to execute
child_nodes_disconnect_command
.
To enable the witness node to be counted as a child node, set
child_nodes_connected_include_witness
in repmgr.conf
to true
(and reload the configuration if repmgrd
is running).
Note that child nodes which are not attached when repmgrd starts will not be considered as missing, as repmgrd cannot know why they are not attached.
This example shows typical repmgrd log output from a three-node cluster
(primary and two child nodes), with child_nodes_connected_min_count
set to 2
.
repmgrd on the primary has started up, while two child nodes are being provisioned:
[2019-04-24 15:25:33] [INFO] monitoring primary node "node1" (ID: 1) in normal state [2019-04-24 15:25:35] [NOTICE] new node "node2" (ID: 2) has connected [2019-04-24 15:25:35] [NOTICE] 1 (of 1) child nodes are connected, but at least 2 child nodes required [2019-04-24 15:25:35] [INFO] no child nodes have detached since repmgrd startup (...) [2019-04-24 15:25:44] [NOTICE] new node "node3" (ID: 3) has connected [2019-04-24 15:25:46] [INFO] monitoring primary node "node1" (ID: 1) in normal state (...)
One of the child nodes has disconnected; repmgrd
is now waiting child_nodes_disconnect_timeout
seconds
before executing child_nodes_disconnect_command
:
[2019-04-24 15:28:11] [INFO] monitoring primary node "node1" (ID: 1) in normal state [2019-04-24 15:28:17] [INFO] monitoring primary node "node1" (ID: 1) in normal state [2019-04-24 15:28:19] [NOTICE] node "node3" (ID: 3) has disconnected [2019-04-24 15:28:19] [NOTICE] 1 (of 2) child nodes are connected, but at least 2 child nodes required [2019-04-24 15:28:19] [INFO] most recently detached child node was 3 (ca. 0 seconds ago), not triggering "child_nodes_disconnect_command" [2019-04-24 15:28:19] [DETAIL] "child_nodes_disconnect_timeout" set To 30 seconds (...)
child_nodes_disconnect_command
is executed once:
[2019-04-24 15:28:49] [INFO] most recently detached child node was 3 (ca. 30 seconds ago), triggering "child_nodes_disconnect_command" [2019-04-24 15:28:49] [INFO] "child_nodes_disconnect_command" is: "/usr/bin/fence-all-the-things.sh" [2019-04-24 15:28:51] [NOTICE] 1 (of 2) child nodes are connected, but at least 2 child nodes required [2019-04-24 15:28:51] [INFO] "child_nodes_disconnect_command" was previously executed, taking no action
The following caveats should be considered if you are intending to use this functionality.
If a child node is configured to use archive recovery, it's possible that the child node will disconnect from the primary node and fall back to archive recovery. In this case repmgrd will nevertheless register a node disconnection.
repmgr relies on application_name
in the child node's
primary_conninfo
string to be the same as the node name
defined in the node's repmgr.conf
file. Furthermore,
this application_name
must be unique across the replication
cluster.
If a custom application_name
is used, or the
application_name
is not unique across the replication
cluster, repmgr will not be able to reliably monitor child node connections.
The following parameters, set in repmgr.conf
,
control how child node disconnection monitoring operates.
child_nodes_check_interval
Interval (in seconds) after which repmgrd queries the
pg_stat_replication
system view and compares the nodes present
there against the list of nodes registered with repmgr which should be attached to the primary.
Default is 5
seconds, a value of 0
disables this check
altogether.
child_nodes_disconnect_command
User-definable script to be executed when repmgrd
determines that an insufficient number of child nodes are connected. By default
the script is executed when no child nodes are executed, but the execution
threshold can be modified by setting one of child_nodes_connected_min_count
orchild_nodes_disconnect_min_count
(see below).
The child_nodes_disconnect_command
script can be
any user-defined script or program. It must be able
to be executed by the system user under which the PostgreSQL server itself
runs (usually postgres
).
If child_nodes_disconnect_command
is not set, no action
will be taken.
If specified, the following format placeholder will be substituted when
executing child_nodes_disconnect_command
:
%p
ID of the node executing the child_nodes_disconnect_command
script.
The child_nodes_disconnect_command
script will only be executed once
while the criteria for its execution are met. If the criteria for its execution are no longer
met (i.e. some child nodes have reconnected), it will be executed again if
the criteria for its execution are met again.
The child_nodes_disconnect_command
script will not be executed if
repmgrd is paused.
child_nodes_disconnect_timeout
If repmgrd determines that an insufficient number of
child nodes are connected, it will wait for the specified number of seconds
to execute the child_nodes_disconnect_command
.
Default: 30
seconds.
child_nodes_connected_min_count
If the number of child nodes connected falls below the number specified in
this parameter, the child_nodes_disconnect_command
script
will be executed.
For example, if child_nodes_connected_min_count
is set
to 2
, the child_nodes_disconnect_command
script will be executed if one or no child nodes are connected.
Note that child_nodes_connected_min_count
overrides any value
set in child_nodes_disconnect_min_count
.
If neither of child_nodes_connected_min_count
or
child_nodes_disconnect_min_count
are set,
the child_nodes_disconnect_command
script
will be executed when no child nodes are connected.
A witness node, if in use, will not be counted as a child node unless
child_nodes_connected_include_witness
is set to true
.
child_nodes_disconnect_min_count
If the number of disconnected child nodes exceeds the number specified in
this parameter, the child_nodes_disconnect_command
script
will be executed.
For example, if child_nodes_disconnect_min_count
is set
to 2
, the child_nodes_disconnect_command
script will be executed if more than two child nodes are disconnected.
Note that any value set in child_nodes_disconnect_min_count
will be overriden by child_nodes_connected_min_count
.
If neither of child_nodes_connected_min_count
or
child_nodes_disconnect_min_count
are set,
the child_nodes_disconnect_command
script
will be executed when no child nodes are connected.
A witness node, if in use, will not be counted as a child node unless
child_nodes_connected_include_witness
is set to true
.
child_nodes_connected_include_witness
Whether to count the witness node (if in use) as a child node when
determining whether to execute child_nodes_disconnect_command
.
Default to false
.
The following event notifications may be generated:
child_node_disconnect
This event is generated after repmgrd detects that a child node is no longer streaming from the primary node.
Example:
$ repmgr cluster event --event=child_node_disconnect Node ID | Name | Event | OK | Timestamp | Details ---------+-------+-----------------------+----+---------------------+-------------------------------------------- 1 | node1 | child_node_disconnect | t | 2019-04-24 12:41:36 | node "node3" (ID: 3) has disconnected
child_node_reconnect
This event is generated after repmgrd detects that a child node has resumed streaming from the primary node.
Example:
$ repmgr cluster event --event=child_node_reconnect Node ID | Name | Event | OK | Timestamp | Details ---------+-------+----------------------+----+---------------------+------------------------------------------------------------ 1 | node1 | child_node_reconnect | t | 2019-04-24 12:42:19 | node "node3" (ID: 3) has reconnected after 42 seconds
child_node_new_connect
This event is generated after repmgrd detects that a new child node has been registered with repmgr and has connected to the primary.
Example:
$ repmgr cluster event --event=child_node_new_connect Node ID | Name | Event | OK | Timestamp | Details ---------+-------+------------------------+----+---------------------+--------------------------------------------- 1 | node1 | child_node_new_connect | t | 2019-04-24 12:41:30 | new node "node3" (ID: 3) has connected
child_nodes_disconnect_command
This event is generated after repmgrd detects
that sufficient child nodes have been disconnected for a sufficient amount
of time to trigger execution of the child_nodes_disconnect_command
.
Example:
$ repmgr cluster event --event=child_nodes_disconnect_command Node ID | Name | Event | OK | Timestamp | Details ---------+-------+--------------------------------+----+---------------------+-------------------------------------------------------- 1 | node1 | child_nodes_disconnect_command | t | 2019-04-24 13:08:17 | "child_nodes_disconnect_command" successfully executed
Table of Contents
repmgrd is a daemon process which runs on each PostgreSQL node, monitoring the local node, and (unless it's the primary node) the upstream server (the primary server or with cascading replication, another standby) which it's connected to.
repmgrd can be configured to provide failover capability in case the primary or upstream node becomes unreachable, and/or provide monitoring data to the repmgr metadatabase.
From repmgr 4.4, when running on the primary node, repmgrd can also monitor standby disconnections/reconnections (see Monitoring standby disconnections on the primary).
To use repmgrd, its associated function library must be
included via postgresql.conf
with:
shared_preload_libraries = 'repmgr'
Changing this setting requires a restart of PostgreSQL; for more details see the PostgreSQL documentation.
The following configuraton options apply to repmgrd in all circumstances:
monitor_interval_secs
The interval (in seconds, default: 2
) to check the availability of the upstream node.
connection_check_type
The option connection_check_type
is used to select the method
repmgrd uses to determine whether the upstream node is available.
Possible values are:
ping
(default) - uses PQping()
to
determine server availability
connection
- determines server availability
by attempting to make a new connection to the upstream node
query
- determines server availability
by executing an SQL statement on the node via the existing connection
The query is a minimal throwaway query - SELECT 1
-
which is used to determine that the server can accept queries.
reconnect_attempts
The number of attempts (default: 6
) will be made to reconnect to an unreachable
upstream node before initiating a failover.
There will be an interval of reconnect_interval
seconds between each reconnection
attempt.
reconnect_interval
Interval (in seconds, default: 10
) between attempts to reconnect to an unreachable
upstream node.
The number of reconnection attempts is defined by the parameter reconnect_attempts
.
degraded_monitoring_timeout
Interval (in seconds) after which repmgrd will terminate if either of the servers (local node and or upstream node) being monitored is no longer available (degraded monitoring mode).
-1
(default) disables this timeout completely.
See also repmgr.conf.sample
for an annotated sample configuration file.
The following repmgrd options must be set in
repmgr.conf
:
failover
promote_command
follow_command
Example:
failover=automatic promote_command='/usr/bin/repmgr standby promote -f /etc/repmgr.conf --log-to-file' follow_command='/usr/bin/repmgr standby follow -f /etc/repmgr.conf --log-to-file --upstream-node-id=%n'
Details of each option are as follows:
failover
failover
can be one of automatic
or manual
.
If failover
is set to manual
, repmgrd
will not take any action if a failover situation is detected, and the node may need to
be modified manually (e.g. by executing repmgr standby follow
).
promote_command
The program or script defined in promote_command
will be executed
in a failover situation when repmgrd determines that
the current node is to become the new primary node.
Normally promote_command
is set as repmgr's
repmgr standby promote
command.
When invoking repmgr standby promote
(either directly via
the promote_command
, or in a script called
via promote_command
), --siblings-follow
must not be included as a
command line option for repmgr standby promote
.
It is also possible to provide a shell script to e.g. perform user-defined tasks
before promoting the current node. In this case the script must
at some point execute repmgr standby promote
to promote the node; if this is not done, repmgr metadata will not be updated and
repmgr will no longer function reliably.
Example:
promote_command='/usr/bin/repmgr standby promote -f /etc/repmgr.conf --log-to-file'
Note that the --log-to-file
option will cause
output generated by the repmgr command, when executed by repmgrd,
to be logged to the same destination configured to receive log output for repmgrd.
repmgr will not apply pg_bindir
when executing promote_command
or follow_command
; these can be user-defined scripts so must always be
specified with the full path.
follow_command
The program or script defined in follow_command
will be executed
in a failover situation when repmgrd determines that
the current node is to follow the new primary node.
Normally follow_command
is set as repmgr's
repmgr standby follow
command.
The follow_command
parameter
should provide the --upstream-node-id=%n
option to repmgr standby follow
; the %n
will be replaced by
repmgrd with the ID of the new primary node. If this is not provided,
repmgr standby follow
will attempt to determine the new primary by itself, but if the
original primary comes back online after the new primary is promoted, there is a risk that
repmgr standby follow
will result in the node continuing to follow
the original primary.
It is also possible to provide a shell script to e.g. perform user-defined tasks
before promoting the current node. In this case the script must
at some point execute repmgr standby follow
to promote the node; if this is not done, repmgr metadata will not be updated and
repmgr will no longer function reliably.
Example:
follow_command='/usr/bin/repmgr standby follow -f /etc/repmgr.conf --log-to-file --upstream-node-id=%n'
Note that the --log-to-file
option will cause
output generated by the repmgr command, when executed by repmgrd,
to be logged to the same destination configured to receive log output for repmgrd.
repmgr will not apply pg_bindir
when executing promote_command
or follow_command
; these can be user-defined scripts so must always be
specified with the full path.
The following configuraton options can be used to fine-tune automatic failover:
priority
Indicates a preferred priority (default: 100
) for promoting nodes.
Note that the priority setting is only applied if two or more nodes are determined as promotion candidates; in that case the node with the higher priority is selected.
A value of zero will always prevent the node being promoted to primary, even if there is no other promotion candidate.
failover_validation_command
User-defined script to execute for an external mechanism to validate the failover decision made by repmgrd.
This option must be identically configured on all nodes.
One or more of the following parameter placeholders may be provided, which will be replaced by repmgrd with the appropriate value:
%n
: node ID%a
: node name%v
: number of visible nodes%u
: number of shared upstream nodes%t
: total number of nodes
See also: Failover validation.
primary_visibility_consensus
If true
, only continue with failover if no standbys
(or the witness server, if present) have seen the primary node recently.
This option must be identically configured on all nodes.
always_promote
Default: false
.
If true
, promote the local node even if its
repmgr metadata is not up-to-date.
Normally repmgr expects its metadata (stored in the repmgr.nodes
table) to be up-to-date so repmgrd can take the correct action during a failover.
However it's possible that updates made on the primary may not
have propagated to the standby (promotion candidate). In this case repmgrd will
default to not promoting the standby. This behaviour can be overridden by setting
always_promote
to true
.
standby_disconnect_on_failover
In a failover situation, disconnect the local node's WAL receiver.
This option is available from PostgreSQL 9.5 and later.
This option must be identically configured on all nodes.
Additionally the repmgr user must be a superuser for this option.
repmgrd will refuse to start if this option is set but either of these prerequisites is not met.
See also: Standby disconnection on failover.
repmgrd_exit_on_inactive_node
This parameter is available in repmgr 5.3 and later.
If a node was marked as inactive but is running, and this option is set to
true
, repmgrd will abort on startup.
By default, repmgrd_exit_on_inactive_node
is set
to false
, in which case repmgrd will set the
node record to active on startup.
Setting this parameter to true
causes repmgrd
to behave in the same way it did in repmgr 5.2 and earlier.
The following options can be used to further fine-tune failover behaviour. In practice it's unlikely these will need to be changed from their default values, but are available as configuration options should the need arise.
election_rerun_interval
If failover_validation_command
is set, and the command returns
an error, pause the specified amount of seconds (default: 15) before rerunning the election.
sibling_nodes_disconnect_timeout
If standby_disconnect_on_failover
is true
, the
maximum length of time (in seconds, default: 30
)
to wait for other standbys to confirm they have disconnected their
WAL receivers.
For further details and a reference implementation, see the separate document Fencing a failed master node with repmgrd and PgBouncer.
If using automatic failover, currently repmgrd will need to execute
repmgr standby follow
to restart PostgreSQL on standbys to have them follow a new primary.
To ensure this happens smoothly, it's essential to provide the appropriate system/service restart
command appropriate to your operating system via service_restart_command
in repmgr.conf
. If you don't do this, repmgrd
will default to using pg_ctl
, which can result in unexpected problems,
particularly on systemd-based systems.
For more details, see service command settings.
If you are intending to use the repmgr daemon start
and repmgr daemon stop
commands, the following
parameters must be set in repmgr.conf
:
repmgrd_service_start_command
repmgrd_service_stop_command
Example (for repmgr with PostgreSQL 12 on CentOS 7):
repmgrd_service_start_command='sudo systemctl repmgr12 start' repmgrd_service_stop_command='sudo systemctl repmgr12 stop'
For more details see the reference page for each command.
To enable monitoring, set:
monitoring_history=yes
in repmgr.conf
.
Monitoring data is written at the interval defined by
the option monitor_interval_secs
(see above).
For more details on monitoring, see Storing monitoring data. For information on monitoring standby disconnections, see Monitoring standby disconnections on the primary.
To apply configuration file changes to a running repmgrd
daemon, execute the operating system's repmgrd service reload command
(see Package details for examples),
or for instances which were manually started, execute kill -HUP
, e.g.
kill -HUP `cat /tmp/repmgrd.pid`
.
Check the repmgrd log to see what changes were applied, or if any issues were encountered when reloading the configuration.
Note that only the following subset of configuration file parameters can be changed on a running repmgrd daemon:
async_query_timeout
child_nodes_check_interval
child_nodes_connected_include_witness
child_nodes_connected_min_count
child_nodes_disconnect_command
child_nodes_disconnect_min_count
child_nodes_disconnect_timeout
connection_check_type
conninfo
degraded_monitoring_timeout
event_notification_command
event_notifications
failover_validation_command
failover
follow_command
log_facility
log_file
log_level
log_status_interval
monitor_interval_secs
monitoring_history
primary_notification_timeout
primary_visibility_consensus
always_promote
promote_command
reconnect_attempts
reconnect_interval
retry_promote_interval_secs
repmgrd_standby_startup_timeout
sibling_nodes_disconnect_timeout
standby_disconnect_on_failover
The following set of configuration file parameters must be updated via
repmgr standby register --force
,
as they require changes to the repmgr.nodes
table so they are visible to
all nodes in the replication cluster:
node_id
node_name
data_directory
location
priority
After executing repmgr standby register --force
,
repmgrd must be restarted for the changes to take effect.
If installed from a package, the repmgrd can be started
via the operating system's service command, e.g. in systemd
using systemctl
.
See appendix Package details for details of service commands for different distributions.
The commands repmgr daemon start
and
repmgr daemon stop
can be used
as convenience wrappers to start and stop repmgrd on the local node.
repmgr daemon start
and
repmgr daemon stop
require
that the appropriate start/stop commands are configured as
repmgrd_service_start_command
and repmgrd_service_stop_command
in repmgr.conf
.
repmgrd can be started manually like this:
repmgrd -f /etc/repmgr.conf --pid-file /tmp/repmgrd.pid
and stopped with kill `cat /tmp/repmgrd.pid`
. Adjust paths as appropriate.
repmgrd will generate a PID file by default.
This is a behaviour change from previous versions (earlier than 4.1), where
the PID file had to be explicitly specified with the command line
parameter --pid-file
.
The PID file can be specified in repmgr.conf
with the configuration
parameter repmgrd_pid_file
.
It can also be specified on the command line (as in previous versions) with
the command line parameter --pid-file
. Note this will override
any value set in repmgr.conf
with repmgrd_pid_file
.
--pid-file
may be deprecated in future releases.
If a PID file location was specified by the package maintainer, repmgrd will use that. This only applies if repmgr was installed from a package and the package maintainer has specified the PID file location.
If none of the above apply, repmgrd will create a PID file
in the operating system's temporary directory (as determined by the environment variable
TMPDIR
, or if that is not set, will use /tmp
).
To prevent a PID file being generated at all, provide the command line option
--no-pid-file
.
To see which PID file repmgrd would use, execute repmgrd
with the option --show-pid-file
. repmgrd
will not start if this option is provided. Note that the value shown is the
file repmgrd would use next time it starts, and is
not necessarily the PID file currently in use.
If repmgr was installed from Debian/Ubuntu packages, additional configuration is required before repmgrd is started as a daemon.
This is done via the file /etc/default/repmgrd
, which by default
looks like this:
# default settings for repmgrd. This file is source by /bin/sh from # /etc/init.d/repmgrd # disable repmgrd by default so it won't get started upon installation # valid values: yes/no REPMGRD_ENABLED=no # configuration file (required) #REPMGRD_CONF="/path/to/repmgr.conf" # additional options REPMGRD_OPTS="--daemonize=false" # user to run repmgrd as #REPMGRD_USER=postgres # repmgrd binary #REPMGRD_BIN=/usr/bin/repmgrd # pid file #REPMGRD_PIDFILE=/var/run/repmgrd.pid
Set REPMGRD_ENABLED
to yes
, and REPMGRD_CONF
to the repmgr.conf
file you are using.
See Debian/Ubuntu packages for details of the Debian/Ubuntu packages and
typical file locations (including repmgr.conf
).
From repmgrd 4.1, ensure REPMGRD_OPTS
includes
--daemonize=false
, as daemonization is handled by the service command.
If using systemd, you may need to execute systemctl daemon-reload
.
Also, if you attempted to start repmgrd using systemctl start repmgrd
,
you'll need to execute systemctl stop repmgrd
. Because that's how systemd
rolls.
The command repmgr service status
provides an overview of the repmgrd daemon status (including pause status)
on all nodes in the cluster.
From repmgr 5.3, repmgr node check --repmgrd
can be used to check the status of repmgrd (including pause status)
on the local node.
In addition to the repmgr configuration settings, parameters in the
conninfo
string influence how repmgr makes a network connection to
PostgreSQL. In particular, if another server in the replication cluster
is unreachable at network level, system network settings will influence
the length of time it takes to determine that the connection is not possible.
In particular explicitly setting a parameter for connect_timeout
should be considered; the effective minimum value of 2
(seconds) will ensure that a connection failure at network level is reported
as soon as possible, otherwise depending on the system settings (e.g.
tcp_syn_retries
in Linux) a delay of a minute or more
is possible.
For further details on conninfo
network connection
parameters, see the
PostgreSQL documentation.
To ensure the current repmgrd logfile
(specified in repmgr.conf
with the parameter
log_file
) does not grow indefinitely, configure your
system's logrotate
to regularly rotate it.
Sample configuration to rotate logfiles weekly with retention for up to 52 weeks and rotation forced if a file grows beyond 100Mb:
/var/log/repmgr/repmgrd.log { missingok compress rotate 52 maxsize 100M weekly create 0600 postgres postgres postrotate /usr/bin/killall -HUP repmgrd endscript }
Table of Contents
In normal operation, repmgrd monitors the state of the PostgreSQL node it is running on, and will take appropriate action if problems are detected, e.g. (if so configured) promote the node to primary, if the existing primary has been determined as failed.
However, repmgrd is unable to distinguish between planned outages (such as performing a switchover or installing PostgreSQL maintenance released), and an actual server outage. In versions prior to repmgr 4.2 it was necessary to stop repmgrd on all nodes (or at least on all nodes where repmgrd is configured for automatic failover) to prevent repmgrd from making unintentional changes to the replication cluster.
From repmgr 4.2, repmgrd can now be "paused", i.e. instructed not to take any action such as performing a failover. This can be done from any node in the cluster, removing the need to stop/restart each repmgrd individually.
For major PostgreSQL upgrades, e.g. from PostgreSQL 11 to PostgreSQL 12, repmgrd should be shut down completely and only started up once the repmgr packages for the new PostgreSQL major version have been installed.
In order to be able to pause/unpause repmgrd, following prerequisites must be met:
pause
/unpause
operation is executed, using the
conninfo
string shown by repmgr cluster show
.
These conditions are required for normal repmgr operation in any case.
To pause repmgrd, execute repmgr service pause
(repmgr 4.2 - 4.4: repmgr daemon pause
),
e.g.:
$ repmgr -f /etc/repmgr.conf service pause NOTICE: node 1 (node1) paused NOTICE: node 2 (node2) paused NOTICE: node 3 (node3) paused
The state of repmgrd on each node can be checked with
repmgr service status
(repmgr 4.2 - 4.4: repmgr daemon status
),
e.g.:
$ repmgr -f /etc/repmgr.conf service status ID | Name | Role | Status | repmgrd | PID | Paused? ----+-------+---------+---------+---------+------+--------- 1 | node1 | primary | running | running | 7851 | yes 2 | node2 | standby | running | running | 7889 | yes 3 | node3 | standby | running | running | 7918 | yes
If executing a switchover with repmgr standby switchover
,
repmgr will automatically pause/unpause the repmgrd service as part of the switchover process.
If the primary (in this example, node1
) is stopped, repmgrd
running on one of the standbys (here: node2
) will react like this:
[2019-08-28 12:22:21] [WARNING] unable to connect to upstream node "node1" (node ID: 1) [2019-08-28 12:22:21] [INFO] checking state of node 1, 1 of 5 attempts [2019-08-28 12:22:21] [INFO] sleeping 1 seconds until next reconnection attempt ... [2019-08-28 12:22:24] [INFO] sleeping 1 seconds until next reconnection attempt [2019-08-28 12:22:25] [INFO] checking state of node 1, 5 of 5 attempts [2019-08-28 12:22:25] [WARNING] unable to reconnect to node 1 after 5 attempts [2019-08-28 12:22:25] [NOTICE] node is paused [2019-08-28 12:22:33] [INFO] node "node2" (ID: 2) monitoring upstream node "node1" (node ID: 1) in degraded state [2019-08-28 12:22:33] [DETAIL] repmgrd paused by administrator [2019-08-28 12:22:33] [HINT] execute "repmgr service unpause" to resume normal failover mode
If the primary becomes available again (e.g. following a software upgrade), repmgrd will automatically reconnect, e.g.:
[2019-08-28 12:25:41] [NOTICE] reconnected to upstream node "node1" (ID: 1) after 8 seconds, resuming monitoring
To unpause the repmgrd service, execute
repmgr service unpause
((repmgr 4.2 - 4.4: repmgr daemon unpause
),
e.g.:
$ repmgr -f /etc/repmgr.conf service unpause NOTICE: node 1 (node1) unpaused NOTICE: node 2 (node2) unpaused NOTICE: node 3 (node3) unpaused
If the previous primary is no longer accessible when repmgrd
is unpaused, no failover action will be taken. Instead, a new primary must be manually promoted using
repmgr standby promote
,
and any standbys attached to the new primary with
repmgr standby follow
.
This is to prevent execution of repmgr service unpause
resulting in the automatic promotion of a new primary, which may be a problem particularly
in larger clusters, where repmgrd could select a different promotion
candidate to the one intended by the administrator.
The pause state of each node will be stored over a PostgreSQL restart.
repmgr service pause
and
repmgr service unpause
can be
executed even if repmgrd is not running; in this case,
repmgrd will start up in whichever pause state has been set.
repmgr service pause
and
repmgr service unpause
do not start/stop repmgrd.
The commands repmgr daemon start
and repmgr daemon stop
(if correctly configured) can be used to start/stop
repmgrd on individual nodes.
If WAL replay has been paused (using pg_wal_replay_pause()
,
on PostgreSQL 9.6 and earlier pg_xlog_replay_pause()
),
in a failover situation repmgrd will
automatically resume WAL replay.
This is because if WAL replay is paused, but WAL is pending replay, PostgreSQL cannot be promoted until WAL replay is resumed.
repmgr standby promote
will refuse to promote a node in this state, as the PostgreSQL
promote
command will not be acted on until
WAL replay is resumed, leaving the cluster in a potentially
unstable state. In this case it is up to the user to
decide whether to resume WAL replay.
In certain circumstances, repmgrd is not able to fulfill its primary mission of monitoring the node's upstream server. In these cases it enters "degraded monitoring" mode, where repmgrd remains active but is waiting for the situation to be resolved.
Situations where this happens are:
Example output in a situation where there is only one standby with failover=manual
,
and the primary node is unavailable (but is later restarted):
[2017-08-29 10:59:19] [INFO] node "node2" (ID: 2) monitoring upstream node "node1" (ID: 1) in normal state (automatic failover disabled) [2017-08-29 10:59:33] [WARNING] unable to connect to upstream node "node1" (ID: 1) [2017-08-29 10:59:33] [INFO] checking state of node 1, 1 of 5 attempts [2017-08-29 10:59:33] [INFO] sleeping 1 seconds until next reconnection attempt (...) [2017-08-29 10:59:37] [INFO] checking state of node 1, 5 of 5 attempts [2017-08-29 10:59:37] [WARNING] unable to reconnect to node 1 after 5 attempts [2017-08-29 10:59:37] [NOTICE] this node is not configured for automatic failover so will not be considered as promotion candidate [2017-08-29 10:59:37] [NOTICE] no other nodes are available as promotion candidate [2017-08-29 10:59:37] [HINT] use "repmgr standby promote" to manually promote this node [2017-08-29 10:59:37] [INFO] node "node2" (ID: 2) monitoring upstream node "node1" (ID: 1) in degraded state (automatic failover disabled) [2017-08-29 10:59:53] [INFO] node "node2" (ID: 2) monitoring upstream node "node1" (ID: 1) in degraded state (automatic failover disabled) [2017-08-29 11:00:45] [NOTICE] reconnected to upstream node "node1" (ID: 1) after 68 seconds, resuming monitoring [2017-08-29 11:00:57] [INFO] node "node2" (ID: 2) monitoring upstream node "node1" (ID: 1) in normal state (automatic failover disabled)
By default, repmgrd
will continue in degraded monitoring mode indefinitely.
However a timeout (in seconds) can be set with degraded_monitoring_timeout
,
after which repmgrd will terminate.
If repmgrd is monitoring a primary mode which has been stopped and manually restarted as a standby attached to a new primary, it will automatically detect the status change and update the node record to reflect the node's new status as an active standby. It will then resume monitoring the node as a standby.
When repmgrd is running with the option monitoring_history=true
,
it will constantly write standby node status information to the
monitoring_history
table, providing a near-real time
overview of replication status on all nodes
in the cluster.
The view replication_status
shows the most recent state
for each node, e.g.:
repmgr=# select * from repmgr.replication_status; -[ RECORD 1 ]-------------+------------------------------ primary_node_id | 1 standby_node_id | 2 standby_name | node2 node_type | standby active | t last_monitor_time | 2017-08-24 16:28:41.260478+09 last_wal_primary_location | 0/6D57A00 last_wal_standby_location | 0/5000000 replication_lag | 29 MB replication_time_lag | 00:00:11.736163 apply_lag | 15 MB communication_time_lag | 00:00:01.365643
The interval in which monitoring history is written is controlled by the
configuration parameter monitor_interval_secs
;
default is 2.
As this can generate a large amount of monitoring data in the table
repmgr.monitoring_history
. it's advisable to regularly
purge historical data using the repmgr cluster cleanup
command; use the -k/--keep-history
option to
specify how many day's worth of data should be retained.
It's possible to use repmgrd to run in monitoring
mode only (without automatic failover capability) for some or all
nodes by setting failover=manual
in the node's
repmgr.conf
file. In the event of the node's upstream failing,
no failover action will be taken and the node will require manual intervention to
be reattached to replication. If this occurs, an
event notification
standby_disconnect_manual
will be created.
Note that when a standby node is not streaming directly from its upstream
node, e.g. recovering WAL from an archive, apply_lag
will always appear as
0 bytes
.
If monitoring history is enabled, the contents of the repmgr.monitoring_history
table will be replicated to attached standbys. This means there will be a small but
constant stream of replication activity which may not be desirable. To prevent
this, convert the table to an UNLOGGED
one with:
ALTER TABLE repmgr.monitoring_history SET UNLOGGED;
This will however mean that monitoring history will not be available on
another node following a failover, and the view repmgr.replication_status
will not work on standbys.
Table of Contents
repmgr primary register — initialise a repmgr installation and register the primary node
repmgr primary register
registers a primary node in a
streaming replication cluster, and configures it for use with repmgr, including
installing the repmgr extension. This command needs to be executed before any
standby nodes are registered.
repmgr will attempt to install the repmgr extension as part of this command,
however this will fail if the repmgr
user is not a superuser.
It's possible to install the repmgr extension manually before executing
repmgr primary register
; in this case repmgr will
detect the presence of the extension and skip that step.
Execute with the --dry-run
option to check what would happen without
actually registering the primary.
If providing the configuration file location with -f/--config-file
,
avoid using a relative path, as repmgr stores the configuration file location
in the repmgr metadata for use when repmgr is executed remotely (e.g. during
repmgr standby switchover). repmgr will attempt to convert the
a relative path into an absolute one, but this may not be the same as the path you
would explicitly provide (e.g. ./repmgr.conf
might be converted
to /path/to/./repmgr.conf
, whereas you'd normally write
/path/to/repmgr.conf
).
repmgr master register
can be used as an alias for
repmgr primary register
.
The repmgr
user must be a superuser in order for repmgr
to be able to install the repmgr
extension.
If this is not the case, the repmgr
extension can be installed
manually before executing repmgr primary register
.
A future repmgr release will enable the provision of a --superuser
name for the installation of the extension.
--dry-run
Check prerequisites but don't actually register the primary.
-F
, --force
Overwrite an existing node record
Following event notifications will be generated:
cluster_created
primary_register
repmgr primary unregister — unregister an inactive primary node
repmgr primary unregister
unregisters an inactive primary node
from the repmgr metadata. This is typically when the primary has failed and is
being removed from the cluster after a new primary has been promoted.
repmgr primary unregister
can be run on any active repmgr node,
with the ID of the node to unregister passed as --node-id
.
Execute with the --dry-run
option to check what would happen without
actually unregistering the node.
repmgr master unregister
can be used as an alias for
repmgr primary unregister
.
--dry-run
Check prerequisites but don't actually unregister the primary.
--node-id
ID of the inactive primary to be unregistered.
--force
Forcibly unregister the node if it is registered as an active primary, as long as it has no registered standbys; or if it is registered as a primary but running as a standby.
A primary_unregister
event notification will be generated.
repmgr standby clone — clone a PostgreSQL standby node from another PostgreSQL node
repmgr standby clone
clones a PostgreSQL node from another
PostgreSQL node, typically the primary, but optionally from any other node in
the cluster or from Barman. It creates the replication configuration required
to attach the cloned node to the primary node (or another standby, if cascading replication
is in use).
repmgr standby clone
does not start the standby, and after cloning
a standby, the command repmgr standby register
must be executed to
notify repmgr of its existence.
Note that by default, all configuration files in the source node's data
directory will be copied to the cloned node. Typically these will be
postgresql.conf
, postgresql.auto.conf
,
pg_hba.conf
and pg_ident.conf
.
These may require modification before the standby is started.
In some cases (e.g. on Debian or Ubuntu Linux installations), PostgreSQL's
configuration files are located outside of the data directory and will
not be copied by default. repmgr can copy these files, either to the same
location on the standby server (provided appropriate directory and file permissions
are available), or into the standby's data directory. This requires passwordless
SSH access to the primary server. Add the option --copy-external-config-files
to the repmgr standby clone
command; by default files will be copied to
the same path as on the upstream server. Note that the user executing repmgr
must have write access to those directories.
To have the configuration files placed in the standby's data directory, specify
--copy-external-config-files=pgdata
, but note that
any include directives in the copied files may need to be updated.
When executing repmgr standby clone
with the
--copy-external-config-files
aand --dry-run
options, repmgr will check the SSH connection to the source node, but
will not verify whether the files can actually be copied.
During the actual clone operation, a check will be made before the database itself is cloned to determine whether the files can actually be copied; if any problems are encountered, the clone operation will be aborted, enabling the user to fix any issues before retrying the clone operation.
For reliable configuration file management we recommend using a configuration management tool such as Ansible, Chef, Puppet or Salt.
By default, repmgr will create a minimal replication configuration containing following parameters:
primary_conninfo
primary_slot_name
(if replication slots in use)For PostgreSQL 11 and earlier, these parameters will also be set:
standby_mode
(always 'on'
)recovery_target_timeline
(always 'latest'
)
The following additional parameters can be specified in repmgr.conf
for inclusion in the replication configuration:
restore_command
archive_cleanup_command
recovery_min_apply_delay
We recommend using Barman to manage
WAL file archiving. For more details on combining repmgr and Barman,
in particular using restore_command
to configure Barman as a backup source of
WAL files, see Cloning from Barman.
When initially cloning a standby, you will need to ensure
that all required WAL files remain available while the cloning is taking
place. To ensure this happens when using the default pg_basebackup
method,
repmgr will set pg_basebackup
's --wal-method
parameter to stream
,
which will ensure all WAL files generated during the cloning process are
streamed in parallel with the main backup. Note that this requires two
replication connections to be available (repmgr will verify sufficient
connections are available before attempting to clone, and this can be checked
before performing the clone using the --dry-run
option).
To override this behaviour, in repmgr.conf
set
pg_basebackup
's --wal-method
parameter to fetch
:
pg_basebackup_options='--wal-method=fetch'
and ensure that wal_keep_segments
(PostgreSQL 13 and later:
wal_keep_size
) is set to an appropriately high value. Note
however that this is not a particularly reliable way of ensuring sufficient
WAL is retained and is not recommended.
See the
pg_basebackup documentation for details.
If using PostgreSQL 9.6 or earlier, replace --wal-method
with --xlog-method
.
To ensure that WAL files are placed in a directory outside of the main data
directory (e.g. to keep them on a separate disk for performance reasons),
specify the location with --waldir
(PostgreSQL 9.6 and earlier: --xlogdir
) in
the repmgr.conf
parameter pg_basebackup_options
,
e.g.:
pg_basebackup_options='--waldir=/path/to/wal-directory'
This setting will also be honored by repmgr when cloning from Barman (repmgr 5.2 and later).
repmgr supports standbys cloned by another method (e.g. using barman's
barman recover
command).
To integrate the standby as a repmgr node, once the standby has been cloned,
ensure the repmgr.conf
file is created for the node, and that it has been registered using
repmgr standby register
.
To register a standby which is not running, execute repmgr standby register --force and provide the connection details for the primary.
See Registering an inactive node for more details.
Then execute the command repmgr standby clone --replication-conf-only
.
This will create the recovery.conf
file needed to attach
the node to its upstream (in PostgreSQL 12 and later: append replication configuration
to postgresql.auto.conf
), and will also create a replication slot on the
upstream node if required.
The upstream node must be running so the correct replication configuration can be obtained.
If the standby is running, the replication configuration will not be written unless the
-F/--force
option is provided.
Execute repmgr standby clone --replication-conf-only --dry-run
to check the prerequisites for creating the recovery configuration,
and display the configuration changes which would be made without actually
making any changes.
In PostgreSQL 13 and later, the PostgreSQL configuration must be reloaded for replication configuration changes to take effect.
In PostgreSQL 12 and earlier, the PostgreSQL instance must be restarted for replication configuration changes to take effect.
-d, --dbname=CONNINFO
Connection string of the upstream node to use for cloning.
--dry-run
Check prerequisites but don't actually clone the standby.
If --replication-conf-only
specified, the contents of
the generated recovery configuration will be displayed
but not written.
-c, --fast-checkpoint
Force fast checkpoint (not effective when cloning from Barman).
--copy-external-config-files[={samepath|pgdata}]
Copy configuration files located outside the data directory on the source node to the same path on the standby (default) or to the PostgreSQL data directory.
Note that to be able to use this option, the repmgr user must be a superuser or
member of the pg_read_all_settings
predefined role.
If this is not the case, provide a valid superuser with the
-S
/--superuser
option.
--no-upstream-connection
When using Barman, do not connect to upstream node.
--recovery-min-apply-delay
Set PostgreSQL configuration recovery_min_apply_delay
parameter
to the provided value.
This overrides any recovery_min_apply_delay
provided via
repmgr.conf
.
For more details on this parameter, see: recovery_min_apply_delay.
-R, --remote-user=USERNAME
Remote system username for SSH operations (default: current local system username).
--replication-conf-only
Create recovery configuration for a previously cloned instance.
In PostgreSQL 12 and later, the replication configuration will be
written to postgresql.auto.conf
.
In PostgreSQL 11 and earlier, the replication configuration will be
written to recovery.conf
.
--replication-user
User to make replication connections with (optional, not usually required).
-S
/--superuser
The name of a valid PostgreSQL superuser can be provided with this option.
This is only required if the --copy-external-config-files
was provided
and the repmgr user is not a superuser or member of the pg_read_all_settings
predefined role.
--upstream-conninfo
primary_conninfo
value to include in the recovery configuration
when the intended upstream server does not yet exist.
Note that repmgr may modify the provided value, in particular to set the
correct application_name
.
--upstream-node-id
ID of the upstream node to replicate from (optional, defaults to primary node)
--verify-backup
Verify a cloned node using the pg_verifybackup utility (PostgreSQL 13 and later).
This option can currently only be used when cloning directly from an upstream node.
--without-barman
Do not use Barman even if configured.
A standby_clone
event notification will be generated.
See cloning standbys for details about various aspects of cloning.
repmgr standby register — add a standby's information to the repmgr metadata
repmgr standby register
adds a standby's information to
the repmgr metadata. This command needs to be executed to enable
promote/follow operations and to allow repmgrd to work with the node.
An existing standby can be registered using this command. Execute with the
--dry-run
option to check what would happen without actually registering the
standby.
If providing the configuration file location with -f/--config-file
,
avoid using a relative path, as repmgr stores the configuration file location
in the repmgr metadata for use when repmgr is executed remotely (e.g. during
repmgr standby switchover). repmgr will attempt to convert the
a relative path into an absolute one, but this may not be the same as the path you
would explicitly provide (e.g. ./repmgr.conf
might be converted
to /path/to/./repmgr.conf
, whereas you'd normally write
/path/to/repmgr.conf
).
By default, repmgr will wait 30 seconds for the standby to become available before
aborting with a connection error. This is useful when setting up a standby from a script,
as the standby may not have fully started up by the time repmgr standby register
is executed.
To change the timeout, pass the desired value with the --wait-start
option.
A value of 0
will disable the timeout.
The timeout will be ignored if -F/--force
was provided.
Depending on your environment and workload, it may take some time for the standby's node record to propagate from the primary to the standby. Some actions (such as starting repmgrd) require that the standby's node record is present and up-to-date to function correctly.
By providing the option --wait-sync
to the
repmgr standby register
command, repmgr will wait
until the record is synchronised before exiting. An optional timeout (in
seconds) can be added to this option (e.g. --wait-sync=60
).
Under some circumstances you may wish to register a standby which is not yet running; this can be the case when using provisioning tools to create a complex replication cluster, or if the node was not cloned by repmgr.
In this case, by using the -F/--force
option and providing the connection parameters to the primary server,
the standby can be registered even if it has not yet been started.
Connection parameters can either be provided either as a conninfo
string
(e.g. -d 'host=node1 user=repmgr'
or as individual connection parameters
(-h/--host
, -d/--dbname
,
-U/--user
, -p/--port
etc.).
Similarly, with cascading replication it may be necessary to register
a standby whose upstream node has not yet been registered - in this case,
using -F/--force
will result in the creation of an inactive placeholder
record for the upstream node, which will however later need to be registered
with the -F/--force
option too.
When used with repmgr standby register
, care should be taken that use of the
-F/--force
option does not result in an incorrectly configured cluster.
If you've cloned a standby using another method (e.g. barman's
barman recover
command), register the node as detailed in section
Registering an inactive node then execute
repmgr standby clone --replication-conf-only
to generate the appropriate replication configuration.
--dry-run
Check prerequisites but don't actually register the standby.
-F
/--force
Overwrite an existing node record
--upstream-node-id
ID of the upstream node to replicate from (optional)
--wait-start
wait for the standby to start (timeout in seconds, default 30 seconds)
--wait-sync
wait for the node record to synchronise to the standby (optional timeout in seconds)
A standby_register
event notification
will be generated immediately after the node record is updated on the primary.
If the --wait-sync
option is provided, a standby_register_sync
event notification will be generated immediately after the node record has synchronised to the
standby.
If provided, repmgr will substitute the placeholders %p
with the node ID of the
primary node, %c
with its conninfo
string, and
%a
with its node name.
repmgr standby unregister — remove a standby's information from the repmgr metadata
Unregisters a standby with repmgr. This command does not affect the actual replication, just removes the standby's entry from the repmgr metadata.
To unregister a running standby, execute:
repmgr standby unregister -f /etc/repmgr.conf
This will remove the standby record from repmgr's internal metadata
table (repmgr.nodes
). A standby_unregister
event notification will be recorded in the repmgr.events
table.
If the standby is not running, the command can be executed on another
node by providing the id of the node to be unregistered using
the command line parameter --node-id
, e.g. executing the following
command on the primary server will unregister the standby with
id 3
:
repmgr standby unregister -f /etc/repmgr.conf --node-id=3
--node-id
node_id
of the node to unregister (optional)
A standby_unregister
event notification will be generated.
repmgr standby promote — promote a standby to a primary
Promotes a standby to a primary if the current primary has failed. This
command requires a valid repmgr.conf
file for the standby, either
specified explicitly with -f/--config-file
or located in a
default location; no additional arguments are required.
If repmgrd is active, you must execute
repmgr service pause
(repmgr 4.2 - 4.4: repmgr service pause
)
to temporarily disable repmgrd while making any changes
to the replication cluster.
If the standby promotion succeeds, the server will not need to be restarted. However any other standbys will need to follow the new primary, and will need to be restarted to do this.
Beginning with repmgr 4.4,
the option --siblings-follow
can be used to have
all other standbys (and a witness server, if in use)
follow the new primary.
If using repmgrd, when invoking
repmgr standby promote
(either directly via
the promote_command
, or in a script called
via promote_command
), --siblings-follow
must not be included as a
command line option for repmgr standby promote
.
In repmgr 4.3 and earlier,
repmgr standby follow
must be executed on each standby individually.
repmgr will wait for up to promote_check_timeout
seconds
(default: 60
) to verify that the standby has been promoted, and will
check the promotion every promote_check_interval
seconds (default: 1 second).
Both values can be defined in repmgr.conf
.
In PostgreSQL 12 and earlier, if WAL replay is paused on the standby, and not all WAL files on the standby have been replayed, repmgr will not attempt to promote it.
This is because if WAL replay is paused, PostgreSQL itself will not react to a promote command until WAL replay is resumed and all pending WAL has been replayed. This means attempting to promote PostgreSQL in this state will leave PostgreSQL in a condition where the promotion may occur at a unpredictable point in the future.
Note that if the standby is in archive recovery, repmgr will not be able to determine if more WAL is pending replay, and will abort the promotion attempt if WAL replay is paused.
This restriction does not apply to PostgreSQL 13 and later.
$ repmgr -f /etc/repmgr.conf standby promote NOTICE: promoting standby to primary DETAIL: promoting server "node2" (ID: 2) using "pg_ctl -l /var/log/postgres/startup.log -w -D '/var/lib/postgres/data' promote" server promoting NOTICE: STANDBY PROMOTE successful DETAIL: server "node2" (ID: 2) was successfully promoted to primary
pg_promote() (PostgreSQL 12 and later)
From PostgreSQL 12, repmgr will attempt to use the built-in pg_promote()
function to promote a standby to primary.
By default, execution of pg_promote()
is restricted to superusers.
If the repmgr
user does not have permission to execute
pg_promote()
, repmgr will fall back to using "pg_ctl promote
".
Execute repmgr standby promote
with the --dry-run
to check whether the repmgr user has permission to execute pg_promote()
.
If the repmgr
user is not a superuser, execution permission for this
function can be granted with e.g.:
GRANT EXECUTE ON FUNCTION pg_catalog.pg_promote TO repmgr
Note that permissions are only effective for the database they are granted in, so this must be executed in the repmgr database to be effective.
For more details on pg_promote()
, see the
PostgreSQL documentation.
--dry-run
Check if this node can be promoted, but don't carry out the promotion.
--siblings-follow
Have all sibling nodes (nodes formerly attached to the same upstream node as the promotion candidate) follow this node after it has been promoted.
Note that a witness server, if in use, is also counted as a "sibling node" as it needs to be instructed to synchronise its metadata with the new primary.
Do not provide this option when configuring
repmgrd's promote_command
.
-F
--force
Ignore warnings and continue anyway.
This option is relevant in the following situations if --siblings-follow
was specified:
Note that if the -F
/--force
option is used when any of the above
situations is encountered, the onus is on the user to manually resolve any resulting issues.
The following parameters in repmgr.conf
are relevant to the
promote operation:
promote_check_interval
:
interval (in seconds, default: 1 second) to wait between each check
to determine whether the standby has been promoted.
promote_check_timeout
:
time (in seconds, default: 60 seconds) to wait to verify that the standby has been promoted
before exiting with ERR_PROMOTION_FAIL
.
service_promote_command
:
a command which will be executed instead of pg_ctl promote
or (in PostgreSQL 12 and later) pg_promote()
.
This is intended for systems which provide a package-level promote command, such as Debian's pg_ctlcluster, to promote the PostgreSQL from standby to primary.
Following exit codes can be emitted by repmgr standby promote
:
SUCCESS (0)
The standby was successfully promoted to primary.
ERR_DB_CONN (6)
repmgr was unable to connect to the local PostgreSQL node.
PostgreSQL must be running before the node can be promoted.
ERR_PROMOTION_FAIL (8)
The node could not be promoted to primary for one of the following reasons:
A standby_promote
event notification will be generated.
repmgr standby follow — attach a running standby to a new upstream node
Attaches the standby ("follow candidate") to a new upstream node ("follow target"). Typically this will be the primary, but this command can also be used to attach the standby to another standby.
This command requires a valid repmgr.conf
file for the standby,
either specified explicitly with -f/--config-file
or located in a
default location; no additional arguments are required.
The standby node ("follow candidate") must be running. If the new upstream ("follow target") is not the primary, the cluster primary must be running and accessible from the standby node.
To re-add an inactive node to the replication cluster, use repmgr node rejoin.
By default repmgr will attempt to attach the standby to the current primary.
If --upstream-node-id
is provided, repmgr will attempt
to attach the standby to the specified node, which can be another standby.
In PostgreSQL 12 and earlier, this command will force a restart of PostgreSQL on the standby node.
In PostgreSQL 13 and later, by default this command will signal PostgreSQL to reload its
configuration, which will cause PostgreSQL to follow the new upstream without
a restart. If this behaviour is not desired for whatever reason, the configuration
file parameter standby_follow_restart
can be set true
to always force a restart.
repmgr standby follow
will wait up to
standby_follow_timeout
seconds (default: 30
)
to verify the standby has actually connected to the new upstream node.
If recovery_min_apply_delay
is set for the standby, it
will not attach to the new upstream node until it has replayed available
WAL.
Conversely, if the standby is attached to an upstream standby
which has recovery_min_apply_delay
set, the upstream
standby's replay state may actually be behind that of its new downstream node.
$ repmgr -f /etc/repmgr.conf standby follow INFO: setting node 3's primary to node 2 NOTICE: restarting server using "pg_ctl -l /var/log/postgres/startup.log -w -D '/var/lib/postgres/data' restart" waiting for server to shut down........ done server stopped waiting for server to start.... done server started NOTICE: STANDBY FOLLOW successful DETAIL: node 3 is now attached to node 2
--dry-run
Check prerequisites but don't actually follow a new upstream node.
This will also verify whether the standby is capable of following the new upstream node.
If a standby was turned into a primary by removing recovery.conf
(PostgreSQL 12 and later: standby.signal
),
repmgr will not be able to determine whether that primary's timeline
has diverged from the timeline of the standby ("follow candidate").
We recommend always to use repmgr standby promote
to promote a standby to primary, as this will ensure that the new primary
will perform a timeline switch (making it practical to check for timeline divergence)
and also that repmgr metadata is updated correctly.
--upstream-node-id
Node ID of the new upstream node ("follow target").
If not provided, repmgr will attempt to follow the current primary node.
Note that when using repmgrd, --upstream-node-id
should always be configured;
see Automatic failover configuration
for details.
-w
--wait
Wait for a primary to appear. repmgr will wait for up to
primary_follow_timeout
seconds
(default: 60 seconds) to verify that the standby is following the new primary.
This value can be defined in repmgr.conf
.
Execute with the --dry-run
option to test the follow operation as
far as possible, without actually changing the status of the node.
Note that repmgr will first attempt to determine whether the standby ("follow candidate") is capable of following the new upstream node ("follow target").
If, for example, the new upstream node has diverged from this node's timeline, for example if the new upstream node was promoted to primary while this node was still attached to the original primary, it will not be possible to follow the new upstream node, and repmgr will emit an error message like this:
ERROR: this node cannot attach to follow target node "node3" (ID 3) DETAIL: follow target server's timeline 2 forked off current database system timeline 1 before current recovery point 0/6108880
In this case, it may be possible to have this node follow the new upstream
using repmgr node rejoin
with the --force-rewind
to execute pg_rewind
.
This does mean that transactions which exist on this node, but not the new upstream,
will be lost.
One of the following exit codes will be emitted by repmgr standby follow
:
SUCCESS (0)
The follow operation succeeded; or if --dry-run
was provided,
no issues were detected which would prevent the follow operation.
ERR_BAD_CONFIG (1)
A configuration issue was detected which prevented repmgr from continuing with the follow operation.
ERR_NO_RESTART (4)
The node could not be restarted.
ERR_DB_CONN (6)
repmgr was unable to establish a database connection to one of the nodes.
ERR_FOLLOW_FAIL (23)
repmgr was unable to complete the follow command.
A standby_follow
event notification will be generated.
If provided, repmgr will substitute the placeholders %p
with the node ID of the node
being followed, %c
with its conninfo
string, and
%a
with its node name.
repmgr standby switchover — promote a standby to primary and demote the existing primary to a standby
Promotes a standby to primary and demotes the existing primary to a standby. This command must be run on the standby to be promoted, and requires a passwordless SSH connection to the current primary.
If other nodes are connected to the demotion candidate, repmgr can instruct
these to follow the new primary if the option --siblings-follow
is specified. This requires a passwordless SSH connection between the promotion
candidate (new primary) and the nodes attached to the demotion candidate
(existing primary). Note that a witness server, if in use, is also
counted as a "sibling node" as it needs to be instructed to
synchronise its metadata with the new primary.
Performing a switchover is a non-trivial operation. In particular it relies on the current primary being able to shut down cleanly and quickly. repmgr will attempt to check for potential issues but cannot guarantee a successful switchover.
repmgr will refuse to perform the switchover if an exclusive backup is running on the current primary, or if WAL replay is paused on the standby.
For more details on performing a switchover, including preparation and configuration, see section Performing a switchover with repmgr.
From repmgr 4.2, repmgr will instruct any running repmgrd instances to pause operations while the switchover is being carried out, to prevent repmgrd from unintentionally promoting a node. For more details, see pausing the repmgrd service.
Users of repmgr versions prior to 4.2 should ensure that repmgrd is not running on any nodes while a switchover is being executed.
data_directory
repmgr needs to be able to determine the location of the data directory on the
demotion candidate. If the repmgr is not a superuser or member of the pg_read_all_settings
predefined roles,
the name of a superuser should be provided with the -S
/--superuser
option.
CHECKPOINT
repmgr executes CHECKPOINT
on the demotion candidate as part of the shutdown
process to ensure it shuts down as smoothly as possible.
Note that CHECKPOINT
requires database superuser permissions to execute.
If the repmgr
user is not a superuser, the name of a superuser should be
provided with the -S
/--superuser
option. From PostgreSQL 15 the pg_checkpoint
predefined role removes the need for superuser permissions to perform CHECKPOINT
command.
If repmgr is unable to execute the CHECKPOINT
command, the switchover
can still be carried out, albeit at a greater risk that the demotion candidate may not
be able to shut down as smoothly as might otherwise have been the case.
pg_promote() (PostgreSQL 12 and later)
From PostgreSQL 12, repmgr defaults to using the built-in pg_promote()
function to
promote a standby to primary.
Note that execution of pg_promote()
is restricted to superusers or to
any user who has been granted execution permission for this function. If the repmgr user
is not permitted to execute pg_promote()
, repmgr will fall back to using
"pg_ctl promote
". For more details see
repmgr standby promote.
--always-promote
Promote standby to primary, even if it is behind or has diverged from the original primary. The original primary will be shut down in any case, and will need to be manually reintegrated into the replication cluster.
--dry-run
Check prerequisites but don't actually execute a switchover.
Success of --dry-run
does not imply the switchover will
complete successfully, only that
the prerequisites for performing the operation are met.
-F
--force
Ignore warnings and continue anyway.
Specifically, if a problem is encountered when shutting down the current primary,
using -F/--force
will cause repmgr to continue by promoting
the standby to be the new primary, and if --siblings-follow
is
specified, attach any other standbys to the new primary.
--force-rewind[=/path/to/pg_rewind]
Use pg_rewind to reintegrate the old primary if necessary (and the prerequisites for using pg_rewind are met).
If using PostgreSQL 9.4, and the pg_rewind
binary is not installed in the PostgreSQL bin
directory,
provide its full path. For more details see also Switchover and pg_rewind
and Using pg_rewind.
-R
--remote-user
System username for remote SSH operations (defaults to local system user).
--repmgrd-no-pause
Don't pause repmgrd while executing a switchover.
This option should not be used unless you take steps by other means to ensure repmgrd is paused or not running on all nodes.
This option cannot be used together with --repmgrd-force-unpause
.
--repmgrd-force-unpause
Always unpause all repmgrd instances after executing a switchover. This will ensure that any repmgrd instances which were paused before the switchover will be unpaused.
This option cannot be used together with --repmgrd-no-pause
.
--siblings-follow
Have nodes attached to the old primary follow the new primary.
This will also ensure that a witness node, if in use, is updated with the new primary's data.
In a future repmgr release, --siblings-follow
will be applied
by default.
-S
/--superuser
Use the named superuser instead of the normal repmgr user to perform actions requiring superuser permissions.
The following parameters in repmgr.conf
are relevant to the
switchover operation:
replication_lag_critical
If replication lag (in seconds) on the standby exceeds this value, the
switchover will be aborted (unless the -F/--force
option
is provided)
shutdown_check_timeout
The maximum number of seconds to wait for the demotion candidate (current primary) to shut down, before aborting the switchover.
Note that this parameter is set on the node where repmgr standby switchover
is executed (promotion candidate); setting it on the demotion candidate (former primary) will
have no effect.
In versions prior to repmgr 4.2, repmgr standby switchover
would
use the values defined in reconnect_attempts
and reconnect_interval
to determine the timeout for demotion candidate shutdown.
wal_receive_check_timeout
After the primary has shut down, the maximum number of seconds to wait for the walreceiver on the standby to flush WAL to disk before comparing WAL receive location with the primary's shut down location.
standby_reconnect_timeout
The maximum number of seconds to attempt to wait for the demotion candidate (former primary) to reconnect to the promoted primary (default: 60 seconds)
Note that this parameter is set on the node where repmgr standby switchover
is executed (promotion candidate); setting it on the demotion candidate (former primary) will
have no effect.
node_rejoin_timeout
maximum number of seconds to attempt to wait for the demotion candidate (former primary) to reconnect to the promoted primary (default: 60 seconds)
Note that this parameter is set on the the demotion candidate (former primary);
setting it on the node where repmgr standby switchover
is
executed will have no effect.
However, this value must be less than standby_reconnect_timeout
on the
promotion candidate (the node where repmgr standby switchover
is executed).
Execute with the --dry-run
option to test the switchover as far as
possible without actually changing the status of either node.
External database connections, e.g. from an application, should not be permitted while the switchover is taking place. In particular, active transactions on the primary can potentially disrupt the shutdown process.
standby_switchover
and standby_promote
event notifications will be generated for the new primary,
and a node_rejoin
event notification for the former primary (new standby).
If using an event notification script, standby_switchover
will populate the placeholder parameter %p
with the node ID of
the former primary.
One of the following exit codes will be emitted by repmgr standby switchover
:
SUCCESS (0)
The switchover completed successfully; or if --dry-run
was provided,
no issues were detected which would prevent the switchover operation.
ERR_SWITCHOVER_FAIL (18)
The switchover could not be executed.
ERR_SWITCHOVER_INCOMPLETE (22)
The switchover was executed but a problem was encountered. Typically this means the former primary could not be reattached as a standby. Check preceding log messages for more information.
repmgr standby follow, repmgr node rejoin
For more details on performing a switchover operation, see the section Performing a switchover with repmgr.
repmgr witness register — add a witness node's information to the repmgr metadata
repmgr witness register
adds a witness server's node
record to the repmgr metadata, and if necessary initialises the witness
node by installing the repmgr extension and copying the repmgr metadata
to the witness server. This command needs to be executed to enable
use of the witness server with repmgrd.
When executing repmgr witness register
, database connection
information for the cluster primary server must also be provided.
In most cases it's only necessary to provide the primary's hostname with
the -h
/--host
option; repmgr will
automatically use the user
and dbname
values defined in the conninfo
string defined in the
witness node's repmgr.conf
, unless these are explicitly
provided as command line options.
The primary server must be registered with repmgr primary register
before the witness
server can be registered.
Execute with the --dry-run
option to check what would happen
without actually registering the witness server.
$ repmgr -f /etc/repmgr.conf witness register -h node1 INFO: connecting to witness node "node3" (ID: 3) INFO: connecting to primary node NOTICE: attempting to install extension "repmgr" NOTICE: "repmgr" extension successfully installed INFO: witness registration complete NOTICE: witness node "node3" (ID: 3) successfully registered
--dry-run
Check prerequisites but don't actually register the witness
-F
/--force
Overwrite an existing node record
A witness_register
event notification will be generated.
repmgr witness unregister — remove a witness node's information to the repmgr metadata
repmgr witness unregister
removes a witness server's node
record from the repmgr metadata.
The node does not have to be running to be unregistered, however if this is the
case then either provide connection information for the primary server, or
execute repmgr witness unregister
on a running node and
provide the parameter --node-id
with the node ID of the
witness server.
Execute with the --dry-run
option to check what would happen
without actually registering the witness server.
Unregistering a running witness node:
$ repmgr -f /etc/repmgr.conf witness unregister INFO: connecting to witness node "node3" (ID: 3) INFO: unregistering witness node 3 INFO: witness unregistration complete DETAIL: witness node with UD 3 successfully unregistered
Unregistering a non-running witness node:
$ repmgr -f /etc/repmgr.conf witness unregister -h node1 -p 5501 -F INFO: connecting to node "node3" (ID: 3) NOTICE: unable to connect to node "node3" (ID: 3), removing node record on cluster primary only INFO: unregistering witness node 3 INFO: witness unregistration complete DETAIL: witness node with id ID 3 successfully unregistered
This command will not make any changes to the witness node itself and will neither remove any data from the witness database nor stop the PostgreSQL instance.
A witness node which has been unregistered, can be re-registered with repmgr witness register --force.
--dry-run
Check prerequisites but don't actually unregister the witness.
--node-id
Unregister witness server with the specified node ID.
A witness_unregister
event notification will be generated.
repmgr node status — show overview of a node's basic information and replication status
Displays an overview of a node's basic information and replication status. This command must be run on the local node.
$ repmgr -f /etc/repmgr.conf node status Node "node1": PostgreSQL version: 10beta1 Total data size: 30 MB Conninfo: host=node1 dbname=repmgr user=repmgr connect_timeout=2 Role: primary WAL archiving: off Archive command: (none) Replication connections: 2 (of maximal 10) Replication slots: 0 (of maximal 10) Replication lag: n/a
--csv
: generate output in CSV format
One of the following exit codes will be emitted by repmgr node status
:
SUCCESS (0)
No issues were detected.
ERR_NODE_STATUS (25)
One or more issues were detected.
See repmgr node check to diagnose issues and repmgr cluster show for an overview of all nodes in the cluster.
repmgr node check — performs some health checks on a node from a replication perspective
Performs some health checks on a node from a replication perspective. This command must be run on the local node.
Currently repmgr performs health checks on physical replication slots only, with the aim of warning about streaming replication standbys which have become detached and the associated risk of uncontrolled WAL file growth.
Execution on the primary server:
$ repmgr -f /etc/repmgr.conf node check Node "node1": Server role: OK (node is primary) Replication lag: OK (N/A - node is primary) WAL archiving: OK (0 pending files) Upstream connection: OK (N/A - is primary) Downstream servers: OK (2 of 2 downstream nodes attached) Replication slots: OK (node has no physical replication slots) Missing replication slots: OK (node has no missing physical replication slots) Configured data directory: OK (configured "data_directory" is "/var/lib/postgresql/data")
Execution on a standby server:
$ repmgr -f /etc/repmgr.conf node check Node "node2": Server role: OK (node is standby) Replication lag: OK (0 seconds) WAL archiving: OK (0 pending archive ready files) Upstream connection: OK (node "node2" (ID: 2) is attached to expected upstream node "node1" (ID: 1)) Downstream servers: OK (this node has no downstream nodes) Replication slots: OK (node has no physical replication slots) Missing physical replication slots: OK (node has no missing physical replication slots) Configured data directory: OK (configured "data_directory" is "/var/lib/postgresql/data")
Each check can be performed individually by supplying an additional command line parameter, e.g.:
$ repmgr node check --role OK (node is primary)
Parameters for individual checks are as follows:
--role
: checks if the node has the expected role
--replication-lag
: checks if the node is lagging by more than
replication_lag_warning
or replication_lag_critical
--archive-ready
: checks for WAL files which have not yet been archived,
and returns WARNING
or CRITICAL
if the number
exceeds archive_ready_warning
or archive_ready_critical
respectively.
--downstream
: checks that the expected downstream nodes are attached
--upstream
: checks that the node is attached to its expected upstream
--slots
: checks there are no inactive physical replication slots
--missing-slots
: checks there are no missing physical replication slots
--data-directory-config
: checks the data directory configured in
repmgr.conf
matches the actual data directory.
This check is not directly related to replication, but is useful to verify repmgr
is correctly configured.
A separate check is available to verify whether repmgrd is running, This is not included in the general output, as this does not per-se constitute a check of the node's replication status.
--repmgrd
: checks whether repmgrd is running.
If repmgrd is running but paused, status 1
(WARNING
) is returned.
Several checks are provided for diagnostic purposes and are not included in the general output:
--db-connection
: checks if repmgr can connect to the
database on the local node.
This option is particularly useful in combination with SSH
, as
it can be used to troubleshoot connection issues encountered when repmgr is
executed remotely (e.g. during a switchover operation).
--replication-config-owner
: checks if the file containing replication
configuration (PostgreSQL 12 and later: postgresql.auto.conf
;
PostgreSQL 11 and earlier: recovery.conf
) is
owned by the same user who owns the data directory.
Incorrect ownership of these files (e.g. if they are owned by root
)
will cause operations which need to update the replication configuration
(e.g. repmgr standby follow
or repmgr standby promote
)
to fail.
-S
/--superuser
: connect as the
named superuser instead of the repmgr user
--csv
: generate output in CSV format (not available
for individual checks)
--nagios
: generate output in a Nagios-compatible format
(for individual checks only)
When executing repmgr node check
with one of the individual
checks listed above, repmgr will emit one of the following Nagios-style exit codes
(even if --nagios
is not supplied):
0
: OK
1
: WARNING
2
: ERROR
3
: UNKNOWN
One of the following exit codes will be emitted by repmgr status check
if no individual check was specified.
SUCCESS (0)
No issues were detected.
ERR_NODE_STATUS (25)
One or more issues were detected.
repmgr node rejoin — rejoin a dormant (stopped) node to the replication cluster
Enables a dormant (stopped) node to be rejoined to the replication cluster.
This can optionally use pg_rewind to re-integrate a node which has diverged from the rest of the cluster, typically a failed primary.
Note that repmgr node rejoin
can only be used to attach
a standby to the current primary, not another standby.
If the node is running and needs to be attached to the current primary, use repmgr standby follow.
Note repmgr standby follow can only be used for standbys which have not diverged from the rest of the cluster.
repmgr node rejoin -d '$conninfo'
where $conninfo
is the PostgreSQL conninfo
string of the
current primary node (or that of any reachable node in the cluster, but
not the local node). This is so that repmgr can fetch up-to-date information
about the current state of the cluster.
repmgr.conf
for the stopped node *must* be supplied explicitly if not
otherwise available.
--dry-run
Check prerequisites but don't actually execute the rejoin.
--force-rewind
Execute pg_rewind.
See Using pg_rewind for more details on using pg_rewind.
--config-files
comma-separated list of configuration files to retain after executing pg_rewind.
Currently pg_rewind will overwrite the local node's configuration files with the files from the source node, so it's advisable to use this option to ensure they are kept.
--config-archive-dir
Directory to temporarily store configuration files specified with
--config-files
; default: /tmp
.
-W/--no-wait
Don't wait for the node to rejoin cluster.
If this option is supplied, repmgr will restart the node but not wait for it to connect to the primary.
node_rejoin_timeout
:
the maximum length of time (in seconds) to wait for
the node to reconnect to the replication cluster (defaults to
the value set in standby_reconnect_timeout
,
60 seconds).
Note that standby_reconnect_timeout
must be
set to a value equal to or greater than
node_rejoin_timeout
.
A node_rejoin
event notification will be generated.
One of the following exit codes will be emitted by repmgr node rejoin
:
SUCCESS (0)
The node rejoin succeeded; or if --dry-run
was provided,
no issues were detected which would prevent the node rejoin.
ERR_BAD_CONFIG (1)
A configuration issue was detected which prevented repmgr from continuing with the node rejoin.
ERR_NO_RESTART (4)
The node could not be restarted.
ERR_REJOIN_FAIL (24)
The node rejoin operation failed.
Currently repmgr node rejoin
can only be used to attach
a standby to the current primary, not another standby.
The node's PostgreSQL instance must have been shut down cleanly. If this was not the case, it will need to be started up until it has reached a consistent recovery point, then shut down cleanly.
In PostgreSQL 13 and later, this will be done automatically
if the --force-rewind
is provided (even if an actual rewind
is not necessary).
With PostgreSQL 12 and earlier, PostgreSQL will need to be started and shut down manually; see below for the best way to do this.
If PostgreSQL is started in single-user mode and
input is directed from /dev/null/
, it will perform recovery
then immediately quit, and will then be in a state suitable for use by
pg_rewind.
rm -f /var/lib/pgsql/data/recovery.conf postgres --single -D /var/lib/pgsql/data/ < /dev/null
Note that standby.signal
(PostgreSQL 11 and earlier:
recovery.conf
) must be removed
from the data directory for PostgreSQL to be able to start in single
user mode.
pg_rewind
repmgr node rejoin
can optionally use pg_rewind
to re-integrate a
node which has diverged from the rest of the cluster, typically a failed primary.
pg_rewind
requires that either
wal_log_hints
is enabled, or that
data checksums were enabled when the cluster was initialized. See the
pg_rewind
documentation for details.
Additionally, full_page_writes
must be enabled; this is the default and
normally should never be disabled.
We strongly recommend familiarizing yourself with pg_rewind
before attempting
to use it with repmgr, as while it is an extremely useful tool, it is not
a "magic bullet" which can resolve all problematic replication situations.
A typical use-case for pg_rewind
is when a scenario like the following
is encountered:
$ repmgr node rejoin -f /etc/repmgr.conf -d 'host=node3 dbname=repmgr user=repmgr' \ --force-rewind --config-files=postgresql.local.conf,postgresql.conf --verbose --dry-run NOTICE: rejoin target is node "node3" (node ID: 3) INFO: replication connection to the rejoin target node was successful INFO: local and rejoin target system identifiers match DETAIL: system identifier is 6652184002263212600 ERROR: this node cannot attach to rejoin target node 3 DETAIL: rejoin target server's timeline 2 forked off current database system timeline 1 before current recovery point 0/610D710 HINT: use --force-rewind to execute pg_rewind
Here, node3
was promoted to a primary while the local node was
still attached to the previous primary; this can potentially happen during e.g. a
network split. pg_rewind
can re-sync the local node with node3
,
removing the need for a full reclone.
To have repmgr node rejoin
use pg_rewind
,
pass the command line option --force-rewind
, which will tell repmgr
to execute pg_rewind
to ensure the node can be rejoined successfully.
pg_rewind
and configuration file retention
Be aware that if pg_rewind
is executed and actually performs a
rewind operation, any configuration files in the PostgreSQL data directory will be
overwritten with those from the source server.
To prevent this happening, provide a comma-separated list of files to retain
using the --config-file
command line option; the specified files
will be archived in a temporary directory (whose parent directory can be specified with
--config-archive-dir
, default: /tmp
)
and restored once the rewind operation is complete.
repmgr node rejoin
and pg_rewind
Example, first using --dry-run
, then actually executing the
node rejoin command
.
$ repmgr node rejoin -f /etc/repmgr.conf -d 'host=node3 dbname=repmgr user=repmgr' \ --config-files=postgresql.local.conf,postgresql.conf --verbose --force-rewind --dry-run NOTICE: rejoin target is node "node3" (node ID: 3) INFO: replication connection to the rejoin target node was successful INFO: local and rejoin target system identifiers match DETAIL: system identifier is 6652460429293670710 NOTICE: pg_rewind execution required for this node to attach to rejoin target node 3 DETAIL: rejoin target server's timeline 2 forked off current database system timeline 1 before current recovery point 0/610D710 INFO: prerequisites for using pg_rewind are met INFO: file "postgresql.local.conf" would be copied to "/tmp/repmgr-config-archive-node2/postgresql.local.conf" INFO: file "postgresql.replication-setup.conf" would be copied to "/tmp/repmgr-config-archive-node2/postgresql.replication-setup.conf" INFO: pg_rewind would now be executed DETAIL: pg_rewind command is: pg_rewind -D '/var/lib/postgresql/data' --source-server='host=node3 dbname=repmgr user=repmgr' INFO: prerequisites for executing NODE REJOIN are met
If --force-rewind
is used with the --dry-run
option,
this checks the prerequisites for using pg_rewind, but is
not an absolute guarantee that actually executing pg_rewind
will succeed. See also section Caveats below.
$ repmgr node rejoin -f /etc/repmgr.conf -d 'host=node3 dbname=repmgr user=repmgr' \ --config-files=postgresql.local.conf,postgresql.conf --verbose --force-rewind NOTICE: pg_rewind execution required for this node to attach to rejoin target node 3 DETAIL: rejoin target server's timeline 2 forked off current database system timeline 1 before current recovery point 0/610D710 NOTICE: executing pg_rewind DETAIL: pg_rewind command is "pg_rewind -D '/var/lib/postgresql/data' --source-server='host=node3 dbname=repmgr user=repmgr'" NOTICE: 2 files copied to /var/lib/postgresql/data NOTICE: setting node 2's upstream to node 3 NOTICE: starting server using "pg_ctl -l /var/log/postgres/startup.log -w -D '/var/lib/pgsql/data' start" NOTICE: NODE REJOIN successful DETAIL: node 2 is now attached to node 3
pg_rewind
and PostgreSQL 9.4
pg_rewind is available in PostgreSQL 9.5 and later as part of the core distribution.
Users of PostgreSQL 9.4 will need to manually install it; the source code is available here:
https://github.com/vmware/pg_rewind.
If the pg_rewind
binary is not installed in the PostgreSQL bin
directory, provide
its full path on the demotion candidate with --force-rewind
.
Note that building the 9.4 version of pg_rewind requires the PostgreSQL source code.
repmgr node rejoin
repmgr node rejoin
attempts to determine whether it will succeed by
comparing the timelines and relative WAL positions of the local node (rejoin candidate) and primary
(rejoin target). This is particularly important if planning to use pg_rewind,
which currently (as of PostgreSQL 12) may appear to succeed (or indicate there is no action
needed) but potentially allow an impossible action, such as trying to rejoin a standby to a
primary which is behind the standby. repmgr will prevent this situation from occurring.
Currently it is not possible to detect a situation where the rejoin target
is a standby which has been "promoted" by removing recovery.conf
(PostgreSQL 12 and later: standby.signal
) and restarting it.
In this case there will be no information about the point the rejoin target diverged
from the current standby; the rejoin operation will fail and
the current standby's PostgreSQL log will contain entries with the text
"record with incorrect prev-link
".
In PostgreSQL 9.5 and earlier, it is not possible to use pg_rewind to attach to a target node with a lower timeline than the local node.
We strongly recommend running repmgr node rejoin
with the
--dry-run
option first. Additionally it might be a good idea
to execute the pg_rewind command displayed by
repmgr with the pg_rewind --dry-run
option. Note that pg_rewind does not indicate that it
is running in --dry-run
mode.
In all PostgreSQL released before February 2021, pg_rewind contains a corner-case bug which affects standbys in a very specific situation.
This situation occurs when a standby was shut down before its primary node, and an attempt is made to attach this standby to another primary in the same cluster (following a "split brain" situation where the standby was connected to the wrong primary). In this case, repmgr will correctly determine that pg_rewind should be executed, however pg_rewind incorrectly decides that no action is necessary.
In this situation, repmgr will report something like:
NOTICE: pg_rewind execution required for this node to attach to rejoin target node 1 DETAIL: rejoin target server's timeline 3 forked off current database system timeline 2 before current recovery point 0/7019C10
but when executed, pg_rewind will report:
pg_rewind: servers diverged at WAL location 0/7015540 on timeline 2 pg_rewind: no rewind required
and if an attempt is made to attach the standby to the new primary, PostgreSQL logs on the standby will contain errors like:
[2020-09-07 15:01:41 UTC] LOG: 00000: replication terminated by primary server [2020-09-07 15:01:41 UTC] DETAIL: End of WAL reached on timeline 2 at 0/7015540. [2020-09-07 15:01:41 UTC] LOG: 00000: new timeline 3 forked off current database system timeline 2 before current recovery point 0/7019C10
Currently it is not possible to resolve this situation using pg_rewind. A patch was submitted and is included in all PostgreSQL versions released in February 2021 or later.
As a workaround, start the primary server the standby was previously attached to,
and ensure the standby can be attached to it. If pg_rewind was actually executed,
it will have copied in the .history
file from the target primary server; this must
be removed. repmgr node rejoin
can then be used to attach the standby to the original
primary. Ensure any changes pending on the primary have propagated to the standby. Then shut down the primary
server first, before shutting down the standby. It should then be possible to
use repmgr node rejoin
to attach the standby to the new primary.
repmgr node service — show or execute the system service command to stop/start/restart/reload/promote a node
Shows or executes the system service command to stop/start/restart/reload a node.
This command is mainly meant for internal repmgr usage, but is useful for confirming the command configuration.
--dry-run
Log the steps which would be taken, including displaying the command which would be executed.
--action
The action to perform. One of start
, stop
,
restart
, reload
or promote
.
If the parameter --list-actions
is provided together with
--action
, the command which would be executed will be printed.
--list-actions
List all configured commands.
If the parameter --action
is provided together with
--list-actions
, the command which would be executed for that
particular action will be printed.
--checkpoint
Issue a CHECKPOINT
before stopping or restarting the node.
Note that a superuser connection is required to be able to execute the
CHECKPOINT
command. From PostgreSQL 15 the pg_checkpoint
predefined role removes the need for superuser permissions to perform CHECKPOINT
command.
-S
/--superuser
Connect as the named superuser instead of the normal repmgr user.
One of the following exit codes will be emitted by repmgr node service
:
SUCCESS (0)
No issues were detected.
ERR_LOCAL_COMMAND (5)
Execution of the system service command failed.
See what action would be taken for a restart:
[postgres@node1 ~]$ repmgr -f /etc/repmgr/12/repmgr.conf node service --action=restart --checkpoint --dry-run INFO: a CHECKPOINT would be issued here INFO: would execute server command "sudo service postgresql-12 restart"
Restart the PostgreSQL instance:
[postgres@node1 ~]$ repmgr -f /etc/repmgr/12/repmgr.conf node service --action=restart --checkpoint NOTICE: issuing CHECKPOINT DETAIL: executing server command "sudo service postgresql-12 restart" Redirecting to /bin/systemctl restart postgresql-12.service
List all commands:
[postgres@node1 ~]$ repmgr -f /etc/repmgr/12/repmgr.conf node service --list-actions Following commands would be executed for each action: start: "sudo service postgresql-12 start" stop: "sudo service postgresql-12 stop" restart: "sudo service postgresql-12 restart" reload: "sudo service postgresql-12 reload" promote: "/usr/pgsql-12/bin/pg_ctl -w -D '/var/lib/pgsql/12/data' promote"
List a single command:
[postgres@node1 ~]$ repmgr -f /etc/repmgr/12/repmgr.conf node service --list-actions --action=promote /usr/pgsql-12/bin/pg_ctl -w -D '/var/lib/pgsql/12/data' promote
repmgr cluster show — display information about each registered node in the replication cluster
Displays information about each registered node in the replication cluster. This
command polls each registered server and shows its role (primary
/
standby
) and status. It polls each server
directly and can be run on any node in the cluster; this is also useful when analyzing
connectivity from a particular node.
For PostgreSQL 9.6 and later, the output will also contain the node's current timeline ID.
Node availability is tested by connecting from the node where
repmgr cluster show
is executed, and does not necessarily imply the node
is down. See repmgr cluster matrix and repmgr cluster crosscheck to get
better overviews of connections between nodes.
This command requires either a valid repmgr.conf
file or a database
connection string to one of the registered nodes; no additional arguments are needed.
To show database connection errors when polling nodes, run the command in
--verbose
mode.
$ repmgr -f /etc/repmgr.conf cluster show ID | Name | Role | Status | Upstream | Location | Priority | Timeline | Connection string ----+-------+---------+-----------+----------+----------+----------+----------+----------------------------------------- 1 | node1 | primary | * running | | default | 100 | 1 | host=db_node1 dbname=repmgr user=repmgr 2 | node2 | standby | running | node1 | default | 100 | 1 | host=db_node2 dbname=repmgr user=repmgr 3 | node3 | standby | running | node1 | default | 100 | 1 | host=db_node3 dbname=repmgr user=repmgr 4 | node4 | standby | running | node1 | default | 100 | 1 | host=db_node4 dbname=repmgr user=repmgr 5 | node5 | witness | * running | node1 | default | 0 | n/a | host=db_node5 dbname=repmgr user=repmgr
The column Role
shows the expected server role according to the
repmgr metadata.
Status
shows whether the server is running or unreachable.
If the node has an unexpected role not reflected in the repmgr metadata, e.g. a node was manually
promoted to primary, this will be highlighted with an exclamation mark.
If a connection to the node cannot be made, this will be highlighted with a question mark.
Note that the node will only be shown as ? unreachable
if a connection is not possible at network level; if the PostgreSQL instance on the
node is pingable but not accepting connections, it will be shown as ? running
.
In the following example, executed on node3
, node1
is not reachable
at network level and assumed to be down; node2
has been promoted to primary
(but node3
is not attached to it, and its metadata has not yet been updated);
node4
is running but rejecting connections (from node3
at least).
ID | Name | Role | Status | Upstream | Location | Priority | Timeline | Connection string ----+-------+---------+----------------------+----------+----------+----------+----------+---------------------------------------------------- 1 | node1 | primary | ? unreachable | | default | 100 | | host=db_node1 dbname=repmgr user=repmgr 2 | node2 | standby | ! running as primary | ? node1 | default | 100 | 2 | host=db_node2 dbname=repmgr user=repmgr 3 | node3 | standby | running | ? node1 | default | 100 | 1 | host=db_node3 dbname=repmgr user=repmgr 4 | node4 | standby | ? running | ? node1 | default | 100 | | host=db_node4 dbname=repmgr user=repmgr WARNING: following issues were detected - unable to connect to node "node1" (ID: 1) - node "node1" (ID: 1) is registered as an active primary but is unreachable - node "node2" (ID: 2) is registered as standby but running as primary - unable to connect to node "node2" (ID: 2)'s upstream node "node1" (ID: 1) - unable to determine if node "node2" (ID: 2) is attached to its upstream node "node1" (ID: 1) - unable to connect to node "node3" (ID: 3)'s upstream node "node1" (ID: 1) - unable to determine if node "node3" (ID: 3) is attached to its upstream node "node1" (ID: 1) - unable to connect to node "node4" (ID: 4) HINT: execute with --verbose option to see connection error messages
To diagnose connection issues, execute repmgr cluster show
with the --verbose
option; this will display the error message
for each failed connection attempt.
Use repmgr cluster matrix and repmgr cluster crosscheck to diagnose connection issues across the whole replication cluster.
--csv
repmgr cluster show
accepts an optional parameter --csv
, which
outputs the replication cluster's status in a simple CSV format, suitable for
parsing by scripts, e.g.:
$ repmgr -f /etc/repmgr.conf cluster show --csv 1,-1,-1 2,0,0 3,0,1
The columns have following meanings:
--compact
Suppress display of the conninfo
column.
--terse
Suppress warnings about connection issues.
--verbose
Display the full text of any database connection error messages
One of the following exit codes will be emitted by repmgr cluster show
:
SUCCESS (0)
No issues were detected.
ERR_BAD_CONFIG (1)
An issue was encountered while attempting to retrieve repmgr metadata.
ERR_DB_CONN (6)
repmgr was unable to connect to the local PostgreSQL instance.
ERR_NODE_STATUS (25)
One or more issues were detected with the replication configuration, e.g. a node was not in its expected state.
repmgr cluster matrix — runs repmgr cluster show on each node and summarizes output
repmgr cluster matrix
runs repmgr cluster show
on each
node and arranges the results in a matrix, recording success or failure.
repmgr cluster matrix
requires a valid repmgr.conf
file on each node. Additionally, passwordless ssh
connections are required between
all nodes.
Example 1 (all nodes up):
$ repmgr -f /etc/repmgr.conf cluster matrix Name | Id | 1 | 2 | 3 -------+----+----+----+---- node1 | 1 | * | * | * node2 | 2 | * | * | * node3 | 3 | * | * | *
Example 2 (node1
and node2
up, node3
down):
$ repmgr -f /etc/repmgr.conf cluster matrix Name | Id | 1 | 2 | 3 -------+----+----+----+---- node1 | 1 | * | * | x node2 | 2 | * | * | x node3 | 3 | ? | ? | ?
Each row corresponds to one server, and indicates the result of testing an outbound connection from that server.
Since node3
is down, all the entries in its row are filled with
?
, meaning that there we cannot test outbound connections.
The other two nodes are up; the corresponding rows have x
in the
column corresponding to node3
, meaning that inbound connections to
that node have failed, and *
in the columns corresponding to
node1
and node2
, meaning that inbound connections
to these nodes have succeeded.
Example 3 (all nodes up, firewall dropping packets originating
from node1
and directed to port 5432 on node3
) -
running repmgr cluster matrix
from node1
gives the following output:
$ repmgr -f /etc/repmgr.conf cluster matrix Name | Id | 1 | 2 | 3 -------+----+----+----+---- node1 | 1 | * | * | x node2 | 2 | * | * | * node3 | 3 | ? | ? | ?
Note this may take some time depending on the connect_timeout
setting in the node conninfo
strings; default is
1 minute
which means without modification the above
command would take around 2 minutes to run; see comment elsewhere about setting
connect_timeout
)
The matrix tells us that we cannot connect from node1
to node3
,
and that (therefore) we don't know the state of any outbound
connection from node3
.
In this case, the repmgr cluster crosscheck command will produce a more useful result.
One of the following exit codes will be emitted by repmgr cluster matrix
:
SUCCESS (0)
The check completed successfully and all nodes are reachable.
ERR_BAD_SSH (12)
One or more nodes could not be accessed via SSH.
ERR_NODE_STATUS (25)
PostgreSQL on one or more nodes could not be reached.
This error code overrides ERR_BAD_SSH
.
repmgr cluster crosscheck — cross-checks connections between each combination of nodes
repmgr cluster crosscheck
is similar to repmgr cluster matrix,
but cross-checks connections between each combination of nodes. In "Example 3" in
repmgr cluster matrix we have no information about the state of node3
.
However by running repmgr cluster crosscheck
it's possible to get a better
overview of the cluster situation:
$ repmgr -f /etc/repmgr.conf cluster crosscheck Name | Id | 1 | 2 | 3 -------+----+----+----+---- node1 | 1 | * | * | x node2 | 2 | * | * | * node3 | 3 | * | * | *
What happened is that repmgr cluster crosscheck
merged its own
repmgr cluster matrix
with the
repmgr cluster matrix
output from node2
; the latter is
able to connect to node3
and therefore determine the state of outbound connections from that node.
One of the following exit codes will be emitted by repmgr cluster crosscheck
:
SUCCESS (0)
The check completed successfully and all nodes are reachable.
ERR_BAD_SSH (12)
One or more nodes could not be accessed via SSH.
This only applies to nodes unreachable from the node where this command is executed.
It's also possible that the crosscheck establishes that connections between PostgreSQL on all nodes are functioning, even if SSH access between some nodes is not possible.
ERR_NODE_STATUS (25)
PostgreSQL on one or more nodes could not be reached.
This error code overrides ERR_BAD_SSH
.
repmgr cluster event — output a formatted list of cluster events
Outputs a formatted list of cluster events, as stored in the repmgr.events
table.
Output is in reverse chronological order, and can be filtered with the following options:
--all
: outputs all entries--limit
: set the maximum number of entries to output (default: 20)--node-id
: restrict entries to node with this ID--node-name
: restrict entries to node with this name--event
: filter specific event (see event notifications for a full list)
The "Details" column can be omitted by providing --compact
.
--csv
: generate output in CSV format. Note that the Details
column will currently not be emitted in CSV format.
$ repmgr -f /etc/repmgr.conf cluster event --event=standby_register Node ID | Name | Event | OK | Timestamp | Details ---------+-------+------------------+----+---------------------+------------------------------------------------------- 3 | node3 | standby_register | t | 2019-04-16 10:59:59 | standby registration succeeded; upstream node ID is 1 2 | node2 | standby_register | t | 2019-04-16 10:59:57 | standby registration succeeded; upstream node ID is 1
repmgr cluster cleanup — purge monitoring history
Purges monitoring history from the repmgr.monitoring_history
table to
prevent excessive table growth.
By default all data will be removed; Use the -k/--keep-history
option to specify the number of days of monitoring history to retain.
This command can be executed manually or as a cronjob.
This command requires a valid repmgr.conf
file for the node on which it is
executed; no additional arguments are required.
Monitoring history will only be written if repmgrd is active, and
monitoring_history
is set to true
in
repmgr.conf
.
A cluster_cleanup
event notification will be generated.
--node-id
Only delete monitoring records for the specified node.
For more details see the sections Storing monitoring data and repmgrd monitoring configuration.
repmgr service status — display information about the status of repmgrd on each node in the cluster
This command provides an overview over all active nodes in the cluster and the state of each node's repmgrd instance. It can be used to check the result of repmgr service pause and repmgr service unpause operations.
PostgreSQL should be accessible on all nodes (using the conninfo
string shown by
repmgr cluster show
)
from the node where repmgr service status
is executed.
repmgr service status
can be executed on any active node in the
replication cluster. A valid repmgr.conf
file is required.
If a node is not accessible, or PostgreSQL itself is not running on the node,
repmgr will not be able to determine the status of that node's repmgrd instance,
and "n/a
" will be displayed in the node's repmgrd
column.
After restarting PostgreSQL on any node, the repmgrd instance will take a second or two before it is able to update its status. Until then, repmgrd will be shown as not running.
repmgrd running normally on all nodes:
$ repmgr -f /etc/repmgr.conf service status ID | Name | Role | Status | Upstream | repmgrd | PID | Paused? | Upstream last seen ----+-------+---------+-----------+----------+---------+-------+---------+-------------------- 1 | node1 | primary | * running | | running | 96563 | no | n/a 2 | node2 | standby | running | node1 | running | 96572 | no | 1 second(s) ago 3 | node3 | standby | running | node1 | running | 96584 | no | 0 second(s) ago
repmgrd paused on all nodes (using repmgr service pause):
$ repmgr -f /etc/repmgr.conf service status ID | Name | Role | Status | Upstream | repmgrd | PID | Paused? | Upstream last seen ----+-------+---------+-----------+----------+---------+-------+---------+-------------------- 1 | node1 | primary | * running | | running | 96563 | yes | n/a 2 | node2 | standby | running | node1 | running | 96572 | yes | 1 second(s) ago 3 | node3 | standby | running | node1 | running | 96584 | yes | 0 second(s) ago
repmgrd not running on one node:
$ repmgr -f /etc/repmgr.conf service status ID | Name | Role | Status | Upstream | repmgrd | PID | Paused? | Upstream last seen ----+-------+---------+-----------+----------+-------------+-------+---------+-------------------- 1 | node1 | primary | * running | | running | 96563 | yes | n/a 2 | node2 | standby | running | node1 | not running | n/a | n/a | n/a 3 | node3 | standby | running | node1 | running | 96584 | yes | 0 second(s) ago
--csv
repmgr service status
accepts an optional parameter --csv
, which
outputs the replication cluster's status in a simple CSV format, suitable for
parsing by scripts, e.g.:
$ repmgr -f /etc/repmgr.conf service status --csv 1,node1,primary,1,1,5722,1,100,-1,default 2,node2,standby,1,0,-1,1,100,1,default 3,node3,standby,1,1,5779,1,100,1,default
The columns have following meanings:
--detail
Display additional information (location
, priority
)
about the repmgr configuration.
--verbose
Display the full text of any database connection error messages.
repmgr service pause — Instruct all repmgrd instances in the replication cluster to pause failover operations
This command can be run on any active node in the replication cluster to instruct all running repmgrd instances to "pause" themselves, i.e. take no action (such as promoting themselves or following a new primary) if a failover event is detected.
This functionality is useful for performing maintenance operations, such as switchovers or upgrades, which might otherwise trigger a failover if repmgrd is running normally.
It's important to wait a few seconds after restarting PostgreSQL on any node before running
repmgr service pause
, as the repmgrd instance
on the restarted node will take a second or two before it has updated its status.
repmgr service unpause will instruct all previously paused repmgrd instances to resume normal failover operation.
PostgreSQL must be accessible on all nodes (using the conninfo
string shown by
repmgr cluster show
)
from the node where repmgr service pause
is executed.
repmgr service pause
can be executed on any active node in the
replication cluster. A valid repmgr.conf
file is required.
It will have no effect on previously paused nodes.
$ repmgr -f /etc/repmgr.conf service pause NOTICE: node 1 (node1) paused NOTICE: node 2 (node2) paused NOTICE: node 3 (node3) paused
--dry-run
Check if nodes are reachable but don't pause repmgrd.
One of the following exit codes will be emitted by repmgr service unpause
:
SUCCESS (0)
repmgrd could be paused on all nodes.
ERR_REPMGRD_PAUSE (26)
repmgrd could not be paused on one or mode nodes.
repmgr service unpause — Instruct all repmgrd instances in the replication cluster to resume failover operations
This command can be run on any active node in the replication cluster to instruct all running repmgrd instances to "unpause" (following a previous execution of repmgr service pause) and resume normal failover/monitoring operation.
It's important to wait a few seconds after restarting PostgreSQL on any node before running
repmgr service pause
, as the repmgrd instance
on the restarted node will take a second or two before it has updated its status.
PostgreSQL must be accessible on all nodes (using the conninfo
string shown by
repmgr cluster show
)
from the node where repmgr service pause
is executed.
repmgr service unpause
can be executed on any active node in the
replication cluster. A valid repmgr.conf
file is required.
It will have no effect on nodes which are not already paused.
$ repmgr -f /etc/repmgr.conf service unpause NOTICE: node 1 (node1) unpaused NOTICE: node 2 (node2) unpaused NOTICE: node 3 (node3) unpaused
--dry-run
Check if nodes are reachable but don't unpause repmgrd.
One of the following exit codes will be emitted by repmgr service unpause
:
SUCCESS (0)
repmgrd could be unpaused on all nodes.
ERR_REPMGRD_PAUSE (26)
repmgrd could not be unpaused on one or mode nodes.
repmgr daemon start — Start the repmgrd daemon on the local node
This command starts the repmgrd service on the local node.
By default, repmgr will wait for up to 15 seconds to confirm that repmgrd
started. This behaviour can be overridden by specifying a different value using the --wait
option, or disabled altogether with the --no-wait
option.
The repmgr.conf
parameter repmgrd_service_start_command
must be set for repmgr daemon start
to work; see section
repmgr daemon start configuration for details.
--dry-run
Check prerequisites but don't actually attempt to start repmgrd.
This action will output the command which would be executed.
-w
--wait
Wait for the specified number of seconds to confirm that repmgrd started successfully.
Note that providing --wait=0
is the equivalent of --no-wait
.
--no-wait
Don't wait to confirm that repmgrd started successfully.
This is equivalent to providing --wait=0
.
The following parameter in repmgr.conf
is relevant
to repmgr daemon start
:
repmgrd_service_start_command
repmgr daemon start
will execute the command defined by the
repmgrd_service_start_command
parameter in repmgr.conf
.
This must be set to a shell command which will start repmgrd;
if repmgr was installed from a package, this will be the service command defined by the
package. For more details see Appendix: repmgr package details.
If repmgr was installed from a system package, and you do not configure
repmgrd_service_start_command
to an appropriate service command, this may
result in the system becoming confused about the state of the repmgrd
service; this is particularly the case with systemd
.
One of the following exit codes will be emitted by repmgr daemon start
:
SUCCESS (0)
The repmgrd start command (defined in
repmgrd_service_start_command
) was successfully executed.
If the --wait
option was provided, repmgr will confirm that
repmgrd has actually started up.
ERR_BAD_CONFIG (1)
repmgrd_service_start_command
is not defined in
repmgr.conf
.
ERR_DB_CONN (6)
repmgr was unable to connect to the local PostgreSQL node.
PostgreSQL must be running before repmgrd
can be started. Additionally, unless the --no-wait
option was
provided, repmgr needs to be able to connect to the local PostgreSQL node
to determine the state of repmgrd.
ERR_REPMGRD_SERVICE (27)
The repmgrd start command (defined in
repmgrd_service_start_command
) was not successfully executed.
This can also mean that repmgr was unable to confirm whether repmgrd
successfully started (unless the --no-wait
option was provided).
repmgr daemon stop — Stop the repmgrd daemon on the local node
This command stops the repmgrd daemon on the local node.
By default, repmgr will wait for up to 15 seconds to confirm that repmgrd
stopped. This behaviour can be overridden by specifying a different value using the --wait
option, or disabled altogether with the --no-wait
option.
If PostgreSQL is not running on the local node, under some circumstances repmgr may not be able to confirm if repmgrd has actually stopped.
The repmgr.conf
parameter repmgrd_service_stop_command
must be set for repmgr daemon stop
to work; see section
repmgr daemon stop configuration for details.
repmgr daemon stop
will execute the command defined by the
repmgrd_service_stop_command
parameter in repmgr.conf
.
This must be set to a shell command which will stop repmgrd;
if repmgr was installed from a package, this will be the service command defined by the
package. For more details see Appendix: repmgr package details.
If repmgr was installed from a system package, and you do not configure
repmgrd_service_stop_command
to an appropriate service command, this may
result in the system becoming confused about the state of the repmgrd
service; this is particularly the case with systemd
.
--dry-run
Check prerequisites but don't actually attempt to stop repmgrd.
This action will output the command which would be executed.
-w
--wait
Wait for the specified number of seconds to confirm that repmgrd stopped successfully.
Note that providing --wait=0
is the equivalent of --no-wait
.
--no-wait
Don't wait to confirm that repmgrd stopped successfully.
This is equivalent to providing --wait=0
.
The following parameter in repmgr.conf
is relevant
to repmgr daemon stop
:
repmgrd_service_stop_command
repmgr daemon stop
will execute the command defined by the
repmgrd_service_stop_command
parameter in repmgr.conf
.
This must be set to a shell command which will stop repmgrd;
if repmgr was installed from a package, this will be the service command defined by the
package. For more details see Appendix: repmgr package details.
If repmgr was installed from a system package, and you do not configure
repmgrd_service_stop_command
to an appropriate service command, this may
result in the system becoming confused about the state of the repmgrd
service; this is particularly the case with systemd
.
One of the following exit codes will be emitted by repmgr daemon stop
:
SUCCESS (0)
repmgrd could be stopped.
ERR_BAD_CONFIG (1)
repmgrd_service_stop_command
is not defined in
repmgr.conf
.
ERR_REPMGRD_SERVICE (27)
repmgrd could not be stopped.
Table of Contents
Changes to each repmgr release are documented in the release notes. Please read the release notes for all versions between your current version and the version you are plan to upgrade to before performing an upgrade, as there may be version-specific upgrade steps.
See also: Upgrading repmgr
Wed 20 November, 2024
repmgr 5.5.0 is a major release.
This release adds support for PostgreSQL 17
Fixes warnings detected by the -Wshadow gcc flag added in PostgreSQL 16.
Tue 04 Jul, 2023
repmgr 5.4.1 is a minor release providing ...
repmgrd: ensure witness node metadata is updated if the primary node changed while the witness repmgrd was not running.
Thu 15 March, 2023
repmgr 5.4.0 is a major release.
This release provides support for cloning standbys using backups taken with barman with the use of pg-backup-api.
Minor fixes to the documentation.
Mon 17 October, 2022
repmgr 5.3.3 is a minor release providing support for PostgreSQL 15 and a repmgrd bug fix.
If upgrading from an earlier repmgr version, any running repmgrd instances should be restarted.
If upgrading from repmgr 5.2.1 or earlier, a PostgreSQL restart is required.
repmgrd: ensure event notification script is called for event
repmgrd_upstream_disconnect
. GitHub #760.
Wed 25 May, 2022
repmgr 5.3.2 is a minor release.
Any running repmgrd instances should be restarted following this upgrade.
If upgrading from repmgr 5.2.1 or earlier, a PostgreSQL restart is required.
repmgr node status
:
fix output with --downstream
--nagios
option combination.
GitHub #749.
repmgr standby clone
:
don't treat inability to determine the cluster size as a fatal error.
The cluster size is displayed for informational purposes and is not essential for execution of the clone operation. As the repmgr user may not have permissions for all databases in the cluster, ignore the cluster size query if it fails.
repmgrd: ensure the witness node record on the primary is always marked
as active
if previously marked inactive
.
GitHub #754.
repmgrd: if standby_disconnect_on_failover
is set, verify
repmgr is a superuser before attempting to disable the WAL receiver.
If the repmgr user is a non-superuser, and a replication-only user exists,
ensure redundant replication slots are dropped correctly even
if the -S/--superuser
option is not provided.
Tue 15 February, 2022
repmgr 5.3.1 is a minor release.
If repmgrd is in use, it should be restarted on all nodes where it is running.
If upgrading from repmgr 5.2.1 or earlier, a PostgreSQL restart is required.
Fix upgrade path from repmgr 4.2 and 4.3 to repmgr 5.3.
repmgrd: ensure potentially open connections are closed.
In some cases, when recovering from degraded state in local node monitoring, new connection was opened to the local node without closing the old one, which will result in memory leakage.
Tue 12 October, 2021
repmgr 5.3.0 is a major release.
This release provides support for PostgreSQL 14, released in September 2021.
Note that this release includes changes to the repmgr shared library module, meaning a PostgreSQL restart is required on all nodes where repmgr is installed.
repmgr standby switchover
:
Improve handling of node rejoin failure on the demotion candidate.
Previously repmgr did not check whether repmgr node rejoin
actually
succeeded on the demotion candidate, and would always wait up to node_rejoin_timeout
seconds for it to attach to the promotion candidate, even if this would never happen.
This makes it easier to identify unexpected events during a switchover operation, such as the demotion candidate being unexpectedly restarted by an external process.
Note that the output of the repmgr node rejoin
operation on the demotion candidate will now be logged to a temporary file on that node;
the location of the file will be reported in the error message, if one is emitted.
repmgrd: at startup, if node record is marked as "inactive", attempt to set it to "active".
This behaviour can be overridden by setting the configuration parameter
repmgrd_exit_on_inactive_node
to true
.
repmgr node rejoin
:
emit rejoin target note information as NOTICE
.
This makes it clearer what repmgr is trying to do.
repmgr node check:
option --repmgrd
added to check repmgrd
status.
Add %p
event notification parameter
providing the node ID of the former primary for the repmgrd_failover_promote
event.
repmgr standby clone
:
if using --replication-conf-only
on a node
which was set up without replication slots, but the repmgr configuration
was since changed to use_replication_slots=1
,
repmgr will now set slot_name
in the
node record, if it was previously empty.
repmgrd: rename internal shared library functions to minimize the risk of clashes with other shared libraries.
This does not affect user-facing SQL functions. Howe
repmgrd: ensure short option -s
is accepted.
Mon 7 December, 2020
repmgr 5.2.1 is a minor release.
repmgr standby clone:
option --recovery-min-apply-delay
added, overriding any
setting present in repmgr.conf
.
Configuration: fix parsing of replication_type
configuration parameter. GitHub #672.
repmgr standby clone:
handle case where postgresql.auto.conf
is absent on the
source node.
repmgr standby clone:
in PostgreSQL 11 and later, an existing data directory's permissions
will not be changed to 0700
if they are already set to
0750
.
repmgrd: prevent termination when local node not available and
standby_disconnect_on_failover
is set. GitHub #675.
repmgrd: ensure reconnect_interval
is correctly handled.
GitHub #673.
repmgr witness --help
: fix witness unregister
description. GitHub #676.
Thu 22 October, 2020
repmgr 5.2.0 is a major release.
This release provides support for PostgreSQL 13, released in September 2020.
This release removes support for PostgreSQL 9.3, which was designated EOL in November 2018.
Configuration: support include
, include_dir
and
include_if_exists
directives (see configuration file include directives).
repmgr standby switchover
:
Improve sanity check failure log output from the demotion candidate.
If database connection configuration is not consistent across all nodes, it's possible remote repmgr invocations (e.g. during switchover, from the promotion candidate to the demotion candidate) will not be able to connect to the database. This will now be explicitly reported as a database connection failure, rather than as a failure of the respective sanity check.
repmgr cluster crosscheck / repmgr cluster matrix: improve text mode output format, in particular so that node identifiers of arbitrary length are displayed correctly.
repmgr primary unregister:
the --force
can be provided to unregister an active primary node, provided
it has no registered standby nodes.
repmgr standby clone: new option
--verify-backup
to run PostgreSQL's
pg_verifybackup
utility after cloning a standby to verify the integrity of the copied data
(PostgreSQL 13 and later).
repmgr standby clone:
when cloning from Barman, setting --waldir
(PostgreSQL 9.6 and earlier: --xlogdir
) in
pg_basebackup_options
will cause repmgr to create
a WAL directory outside of the main data directory and symlink
it from there, in the same way as would happen when cloning
using pg_basebackup.
repmgr standby follow: In PostgreSQL 13 and later, a standby no longer requires a restart to follow a new upstream node.
The old behaviour (always restarting the standby to follow a new node)
can be restored by setting the configuration file parameter
standby_follow_restart
to true
.
repmgr node rejoin: enable a node to attach to a target node even the target node has a lower timeline (PostgreSQL 9.6 and later).
repmgr node rejoin: in PostgreSQL 13 and later, support pg_rewind's ability to automatically run crash recovery on a PostgreSQL instance which was not shut down cleanly.
repmgr node check:
option --db-connection
added to check if repmgr
can connect to the database on the local node.
repmgr node check:
report database connection error if the --optformat
was provided.
Improve handling of pg_control read errors.
It is now possible to dump the contents of repmgr metadata tables with pg_dump.
Following additional parameters can be provided to failover_validation_command
:
%n
: node ID%a
: node name%v
: number of visible nodes%u
: number of shared upstream nodes%t
: total number of nodes
Configuration option always_promote
(default: false
)
to control whether a node should be promoted if the repmgr metadata is not up-to-date
on that node.
repmgr standby clone: fix issue with cloning from Barman where the tablespace mapping file was not flushed to disk before attempting to retrieve files from Barman. GitHub #650.
repmgr node rejoin: ensure that when verifying a standby node has attached to its upstream, the node has started streaming before confirming the success of the rejoin operation.
repmgrd: ensure primary connection is reset if same as upstream. GitHub #633.
Mon 13 April, 2020
repmgr 5.1.0 is a major release.
For details on how to upgrade an existing repmgr installation, see documentation section Upgrading a major version release.
If repmgrd is in use, a PostgreSQL restart is required; in that case we suggest combining this repmgr upgrade with the next PostgreSQL minor release, which will require a PostgreSQL restart in any case.
The repmgr standby clone
--recovery-conf-only
option has been renamed to
--replication-conf-only
. --recovery-conf-only
will
still be accepted as an alias.
The requirement that the repmgr user is a database superuser has been removed as far as possible.
In theory, repmgr can be operated with a normal database user for managing the repmgr database, and a separate replication user for managing replication connections (and replication slots, if these are in use).
Some operations will still require superuser permissions, e.g. for issuing
a CHECKPOINT
as par of a switchover operation; in this case
a valid superuser should be provided with the -S
/--superuser
option.
repmgr standby clone
:
Warn if neither of data page checksums or wal_log_hints
are active,
as this will preclude later usage of pg_rewind
.
repmgr standby promote
:
when executed with --dry-run
, the method which would be used to promote the node
will be displayed.
repmgr standby follow
:
Improve logging and checking of potential failure situations.
repmgr standby switchover
:
Replication configuration files (PostgreSQL 11 and earlier:
recovery.conf
; PostgreSQL 12 and later: postgresql.auto.conf
)
will be checked to ensure they are owned by the same user who owns the PostgreSQL
data directory.
repmgr standby switchover
:
Provide additional information in --dry-run mode
output.
repmgr standby switchover
:
Checks that the demotion candidate's registered repmgr.conf file can be found, to
prevent confusing references to an incorrectly configured data directory. GitHub 615.
repmgr node check
:
accept option -S
/--superuser
. GitHub #621.
repmgr node check
:
add --upstream
option to check whether the node is attached
to the expected upstream node.
Ensure repmgr node rejoin
checks for available replication slots on the rejoin target.
repmgr standby follow
and
repmgr node rejoin
will now return
an error code if the operation fails if a replication slot is not available or cannot
be created on the follow/rejoin target.
repmgr standby promote
:
in --dry-run mode
, display promote command which will be executed.
repmgr standby promote
will check if the repmgr
user has permission to execute
pg_promote()
and fall back to pg_ctl promote
if
necessary.
repmgr standby switchover
:
check for demotion candidate reattachment as late as possible to avoid spurious failure
reports.
repmgrd: check for presence of promote_command
and
follow_command
on receipt of SIGHUP
. GitHub 614.
Fix situation where replication connections were not created correctly, which could lead to spurious replication connection failures in some situations, e.g. where password files are used.
Ensure postgresql.auto.conf
is created with
correct permissions (PostgreSQL 12 and later).
Tue 15 October, 2019
repmgr 5.0 is a major release.
For details on how to upgrade an existing repmgr installation, see documentation section Upgrading a major version release.
If repmgrd is in use, a PostgreSQL restart is required; in that case we suggest combining this repmgr upgrade with the next PostgreSQL minor release, which will require a PostgreSQL restart in any case.
repmgr now parses configuration files in the same way that PostgreSQL itself does, which means some files used with earlier repmgr versions may need slight modification before they can be used with repmgr 5 and later.
The main change is that string parameters should always be enclosed in single quotes.
For example, in repmgr 4.4 and earlier, the following repmgr.conf
entry was valid:
conninfo=host=node1 user=repmgr dbname=repmgr connect_timeout=2
This must now be changed to:
conninfo='host=node1 user=repmgr dbname=repmgr connect_timeout=2'
Note that simple string identifiers (e.g. node_name=node1
)
may remain unquoted, though we recommend always enclosing
strings in single quotes.
Additionally, leading/trailing white space between single quotes will no longer be trimmed; the entire string between single quotes will be taken literally.
Strings enclosed in double quotes (e.g. node_name="node1"
)
will now be rejected; previously they were accepted, but the double quotes were
interpreted as part of the string, which was a frequent cause of confusion.
This syntax matches that used by PostgreSQL itself.
Some "repmgr daemon ...
" commands have been renamed to
"repmgr service ...
" as they have a cluster-wide effect
and to avoid giving the impression they affect only the local repmgr daemon.
The following commands are affected:
repmgr daemon pause
(now repmgr service pause
)
repmgr daemon unpause
(now repmgr service unpause
)
repmgr daemon status
(now repmgr service status
)
The "repmgr daemon ...
" form will still be accepted
for backwards compatibility.
The following command line options, which have been deprecated since repmgr 3.3 (and which no longer had any effect other than to generate a warning about their use) have been removed:
--data-dir
--no-conninfo-password
--recovery-min-apply-delay
Support for PostgreSQL 12 added.
Beginning with PostgreSQL 12, replication configuration has been integrated
into the main PostgreSQL configuraton system and the conventional
recovery.conf
file is no longer valid.
repmgr has been modified to be compatible with this change.
repmgr additionally takes advantage of the new pg_promote()
function, which enables a standby to be promoted to primary using an SQL
command.
For an overview of general changes to replication configuration, see this blog entry: Replication configuration changes in PostgreSQL 12
The repmgr configuration file is now parsed using
flex
, meaning it will be parsed in
the same way as PostgreSQL parses its own configuration
files.
This makes configuration file parsing more robust and consistent.
See item Configuration file parsing has been made stricter for details.
repmgr standby clone
:
checks for availability of the repmgr extension on the upstream node have
been improved and error messages improved.
When executing repmgr remotely, if the repmgr log level was explicitly
provided (with -L
/--log-level
), that log level
will be passed to the remote repmgr.
This makes it possible to return log output when executing repmgr
remotely at a different level to the one defined in the remote
repmgr's repmgr.conf
.
This is particularly useful when DEBUG
output is required.
Check role membership when trying to read pg_settings
.
Previously repmgr assumed only superusers could read pg_settings
,
but from PostgreSQL 10, all members of the roles pg_read_all_settings
or pg_monitor
are permitted to do this as well.
repmgrd: Fix handling of upstream node change check.
repmgrd has a check to see if the upstream node has unexpectedly changed, e.g. if the repmgrd service is paused and the PostgreSQL instance has been pointed to another node.
However this check was relying on the node record on the local node being up-to-date, which may not be the case immediately after a failover, when the node is still replaying records updated prior to the node's own record being updated. In this case it will mistakenly assume the node is following the original primary and attempt to restart monitoring, which will fail as the original primary is no longer available.
To prevent this, the node's record on the upstream node is checked
to see if the reported upstream node_id
matches
the expected node_id
. GitHub #587/#588.
Thu 27 June, 2019
repmgr 4.4 is a major release.
For details on how to upgrade an existing repmgr installation, see documentation section Upgrading a major version release.
If repmgrd is in use, a PostgreSQL restart is required; in that case we suggest combining this repmgr upgrade with the next PostgreSQL minor release, which will require a PostgreSQL restart in any case.
On Debian-based systems, including Ubuntu, if using repmgrd
please ensure that in the file /etc/init.d/repmgrd
, the parameter
REPMGRD_OPTS
contains "--daemonize=false
", e.g.:
# additional options REPMGRD_OPTS="--daemonize=false"
For further details, see repmgrd configuration on Debian/Ubuntu.
repmgr standby clone
:
prevent a standby from being cloned from a witness server (PostgreSQL 9.6 and later only).
repmgr witness register
:
prevent a witness server from being registered on the replication cluster primary server
(PostgreSQL 9.6 and later only).
Registering a witness on the primary node would defeat the purpose of having a witness server, which is intended to remain running even if the cluster's primary goes down.
repmgr standby follow
:
note that an active, reachable cluster primary is required for this command;
and provide a more helpful error message if no reachable primary could be found.
repmgr: when executing repmgr standby switchover
,
if --siblings-follow
is not supplied, list all nodes which repmgr considers
to be siblings (this will include the witness server, if in use), and
which will remain attached to the old primary.
repmgr: when executing repmgr standby switchover
,
ignore nodes which are unreachable and marked as inactive.
Previously it would abort if any node was unreachable,
as that means it was unable to check if repmgrd is running.
However if the node has been marked as inactive in the repmgr metadata, it's reasonable to assume the node is no longer part of the replication cluster and does not need to be checked.
repmgr standby switchover
and repmgr standby promote
:
when executing with the --dry-run
option, continue checks as far as possible
even if errors are encountered.
repmgr standby promote
:
add --siblings-follow
(similar to
repmgr standby switchover
).
If using repmgrd, when invoking
repmgr standby promote
(either directly via
the promote_command
, or in a script called
via promote_command
), --siblings-follow
must not be included as a
command line option for repmgr standby promote
.
repmgr standby switchover
:
add --repmgrd-force-unpause
to unpause all repmgrd instances after executing a switchover.
This will ensure that any repmgrd instances which were paused before the switchover will be
unpaused.
repmgr daemon status
:
make output similar to that of
repmgr cluster show
for consistency and to make it easier to identify nodes not in the expected
state.
repmgr cluster show
:
display each node's timeline ID (PostgreSQL 9.6 and later only).
repmgr cluster show
and repmgr daemon status
:
show the upstream node name as reported by each individual node - this helps visualise
situations where the cluster is in an unexpected state, and provide a better idea of the
actual cluster state.
For example, if a cluster has divided somehow and a set of nodes are following a new primary, when running either of these commands, repmgr will now show the name of the primary those nodes are actually following, rather than the now outdated node name recorded on the other side of the "split". A warning will also be issued about the unexpected situation.
repmgr cluster show
and repmgr daemon status
:
check if a node is attached to its advertised upstream node, and issue a
warning if the node is not attached.
On the primary node, repmgrd is now able to monitor standby connections and, if the number of nodes connected falls below a certain (configurable) value, execute a custom script.
This provided an additional method for fencing an isolated primary node, and/or taking other action if one or more standys become disconnected.
See section Monitoring standby disconnections on the primary node for more details.
In a failover situation, repmgrd nodes on the standbys of the failed primary are now able confirm among themselves that none can still see the primary before continuing with the failover.
The repmgr.conf
option primary_visibility_consensus
must
be set to true
to enable this functionality.
See section Primary visibility consensus for more details.
Ensure BDR2-specific functionality cannot be used on BDR3 and later.
The BDR support present in repmgr is for specific BDR2 use cases.
repmgr: when executing repmgr standby clone
in --dry-run
mode, ensure provision of the --force
option
does not result in an existing data directory being modified in any way.
repmgr: when executing repmgr primary register
with the --force
option, if another primary record exists but the associated node is
unreachable (or running as a standby), set that node's record to inactive to enable the current node
to be registered as a primary.
repmgr: when executing repmgr standby clone
with the --upstream-conninfo
, ensure that application_name
is set correctly in primary_conninfo
.
repmgr: when executing repmgr standby switchover
,
don't abort if one or more nodes are not reachable and
they are marked as inactive.
repmgr: canonicalize the data directory path when parsing the configuration file, so
the provided path matches the path PostgreSQL reports as its data directory.
Otherwise, if e.g. the data directory is configured with a trailing slash,
repmgr node check --data-directory-config
will return a spurious error.
repmgrd: fix memory leak which occurs while the monitored PostgreSQL node is not running.
The repmgr documentation has been converted to DocBook XML format, as currently used by the main PostgreSQL project. This means it can now be built against any PostgreSQL version from 9.5 (previously it was not possible to build the documentation against PostgreSQL 10 or later), and makes it easier to provide the documentation in other formats such as PDF.
For further details see: Building repmgr documentation
Tue April 2, 2019
repmgr 4.3 is a major release.
For details on how to upgrade an existing repmgr installation, see documentation section Upgrading a major version release.
If repmgrd is in use, a PostgreSQL restart is required; in that case we suggest combining this repmgr upgrade with the next PostgreSQL minor release, which will require a PostgreSQL restart in any case.
On Debian-based systems, including Ubuntu, if using repmgrd
please ensure that in the file /etc/init.d/repmgrd
, the parameter
REPMGRD_OPTS
contains "--daemonize=false
", e.g.:
# additional options REPMGRD_OPTS="--daemonize=false"
For further details, see repmgrd configuration on Debian/Ubuntu.
repmgr standby follow
:
option --upstream-node-id
can now be used to specify another standby
to follow.
repmgr standby follow
:
verify that it is actually possible to follow another node.
repmgr node rejoin
:
verify that it is actually possible to attach the node to the current primary.
New commands repmgr daemon start
and
repmgr daemon stop
:
these provide a standardized way of starting and stopping repmgrd.
GitHub #528.
These commands require the configuration file settings
repmgrd_service_start_command
and repmgrd_service_stop_command
in repmgr.conf
to be set.
repmgr daemon status
additionally displays the node priority and the interval (in seconds) since the
repmgrd instance last verified its upstream node was available.
Add --compact
option to repmgr cluster show
(GitHub #521).
This makes it easier to copy the output into emails, chats etc. as a compact table.
repmgr cluster show
:
differentiate between unreachable nodes and nodes which are running but rejecting connections.
This makes it possible to see whether a node is unreachable at network level, or if it is running but rejecting connections for some reason.
Add --dry-run
to repmgr standby promote
(GitHub #522).
repmgr --version-number
outputs the "raw"
repmgr version number (e.g. 40300
). This is intended
for use by scripts etc. requiring an easily parseable representation
of the repmgr version.
repmgr node check --data-directory-config
option added; this is to confirm repmgr is correctly configured. GitHub #523.
Add check to repmgr standby switchover
to ensure the data directory on the demotion candidate is configured correctly in repmgr.conf
.
This is to ensure that repmgr, when remotely executed on the demotion candidate, can correctly verify
that PostgreSQL on the demotion candidate was shut down cleanly. GitHub #523.
repmgrd will no longer consider nodes where repmgrd is not running as promotion candidates.
Previously, if repmgrd was not running on a node, but that node qualified as the promotion candidate, it would never be promoted due to the absence of a running repmgrd.
Add option connection_check_type
to enable selection of the method
repmgrd uses to determine whether the upstream node is available.
Possible values are ping
(default; uses PQping()
to
determine server availability), connection
(attempts to make a new connection to
the upstream node), and query
(determines server availability
by executing an SQL statement on the node via the existing connection).
New configuration option failover_validation_command
to allow an external mechanism to validate the failover decision made by repmgrd.
New configuration option standby_disconnect_on_failover
to force standbys to disconnect their WAL receivers before making a failover decision.
In a failover situation, repmgrd will not attempt to promote a node if another primary has already appeared (e.g. by being promoted manually). GitHub #420.
repmgr cluster show
:
fix display of node IDs with multiple digits.
ensure repmgr primary unregister
behaves correctly when executed on a witness server. GitHub #548.
ensure repmgr standby register
fails when --upstream-node-id
is the same as the local node ID.
repmgr: when executing repmgr standby clone
,
recheck primary/upstream connection(s) after the data copy operation is complete, as these may
have gone away.
repmgr: when executing repmgr standby switchover
,
prevent escaping issues with connection URIs when executing repmgr node rejoin
on the demotion candidate. GitHub #525.
repmgr: when executing repmgr standby switchover
,
verify the standby (promotion candidate) is currently attached to the primary (demotion candidate). GitHub #519.
repmgr: when executing repmgr standby switchover
,
avoid a potential race condition when comparing received WAL on the standby to the primary's shutdown location,
as the standby's walreceiver may not have yet flushed all received WAL to disk. GitHub #518.
repmgr: when executing repmgr witness register
,
check the node to connected is actually the primary (i.e. not the witness server). GitHub #528.
repmgr node check
will only consider physical replication slots, as the purpose
of slot checks is to warn about potential issues with
streaming replication standbys which are no longer attached.
repmgrd: on a cascaded standby, don't fail over if
failover=manual
. GitHub #531.
Wed October 24, 2018
repmgr 4.2 is a major release, with the main new feature being the ability to pause repmgrd, e.g. during planned maintenance operations. Various other usability enhancements and a couple of bug fixes are also included; see notes below for details.
A restart of the PostgreSQL server is required for this release. For detailed upgrade instructions, see Upgrading a major version release.
On Debian-based systems, including Ubuntu, if using repmgrd
please ensure that the in the file /etc/init.d/repmgrd
, the parameter
REPMGRD_OPTS
contains "--daemonize=false
", e.g.:
# additional options REPMGRD_OPTS="--daemonize=false"
For further details, see repmgrd daemon configuration on Debian/Ubuntu.
New parameter shutdown_check_timeout
(default: 60 seconds) added;
this provides an explicit timeout for
repmgr standby switchover
to check that the demotion candidate (current primary) has shut down. Previously, the parameters
reconnect_attempts
and reconnect_interval
were used to calculate a timeout, but these are actually
intended for primary failure detection. (GitHub #504).
New parameter repmgr_bindir
added, to facilitate remote invocation of repmgr
when the repmgr binary is located somewhere other than the PostgreSQL binary directory, as it
cannot be assumed all package maintainers will install repmgr there.
This parameter is optional; if not set (the default), repmgr will fall back
to pg_bindir
(if set).
(GitHub #246).
repmgr cluster cleanup
now accepts the --node-id
option to delete records for only one
node. (GitHub #493).
When running
repmgr cluster matrix
and
repmgr cluster crosscheck
,
repmgr will report nodes unreachable via SSH, and emit return code ERR_BAD_SSH
.
(GitHub #246).
Users relying on
repmgr cluster crosscheck
to return a non-zero return code as a way of detecting connectivity errors should be aware
that ERR_BAD_SSH
will be returned if there is an SSH connection error
from the node where the command is executed, even if the command is able to establish
that PostgreSQL connectivity is fine. Therefore the exact return code should be checked
to determine what kind of connectivity error has been detected.
repmgrd can now be "paused", i.e. instructed not to take any action such as a failover, even if the prerequisites for such an action are detected.
This removes the need to stop repmgrd on all nodes when performing a planned operation such as a switchover.
For further details, see Pausing repmgrd.
repmgr: fix "Missing replication slots" label in
repmgr node check
. (GitHub #507)
repmgrd: fix parsing of -d/--daemonize
option.
Wed September 5, 2018
repmgr 4.1.1 contains a number of usability enhancements and bug fixes.
We recommend upgrading to this version as soon as possible. This release can be installed as a simple package upgrade from repmgr 4.0 ~ 4.1.0; repmgrd (if running) should be restarted. See Upgrading repmgr for more details.
repmgr standby switchover --dry-run
no longer copies external configuration files to test they can be copied; this avoids making
any changes to the target system. (GitHub #491).
repmgr cluster cleanup
:
add cluster_cleanup
event. (GitHub #492).
repmgr standby switchover
:
improve detection of free walsenders. (GitHub #495).
Improve messages emitted during
repmgr standby promote
.
Always reopen the log file after
receiving SIGHUP
. Previously this only happened if
a configuration file change was detected.
(GitHub #485).
Report version number after logger initialisation. (GitHub #487).
Improve cascaded standby failover handling. (GitHub #480).
Improve reconnection handling after brief network outages; if monitoring data being collected, this could lead to orphaned sessions on the primary. (GitHub #480).
Check promote_command
and follow_command
are defined when reloading configuration. These were checked on startup but
not reload by repmgrd, which made it possible to
make repmgrd with invalid values. It's unlikely
anyone would want to do this, but we should make it impossible anyway.
(GitHub #486).
Text of any failed queries will now be logged as ERROR
to assist
logfile analysis at log levels higher than DEBUG
.
(GitHub #498).
repmgr node rejoin
:
remove new upstream's replication slot if it still exists on the rejoined
standby. (GitHub #499).
repmgrd: fix startup on witness node when local data is stale. (GitHub #488, #489).
Truncate version string reported by PostgreSQL if necessary; some distributions insert additional detail after the actual version. (GitHub #490).
Tue July 31, 2018
repmgr 4.1.0 introduces some changes to repmgrd behaviour and some additional configuration parameters.
This release can be installed as a simple package upgrade from repmgr 4.0 ~ 4.0.6. The following post-upgrade steps must be carried out:
Execute ALTER EXTENSION repmgr UPDATE
on the primary server in the database where repmgr is installed.
repmgrd must be restarted on all nodes where it is running.
A restart of the PostgreSQL server is not required for this release (unless upgrading from repmgr 3.x).
See Upgrading repmgr 4.x and later for more details.
Configuration changes are backwards-compatible and no changes to
repmgr.conf
are required. However users should
review the changes listed below.
Repository changes
Coinciding with this release, the 2ndQuadrant repository structure has changed. See section Installing from packages for details, particularly if you are using a RPM-based system.
Default for log_level is now INFO
.
This produces additional informative log output, without creating excessive additional
log file volume, and matches the setting assumed for examples in the documentation.
(GitHub #470).
recovery_min_apply_delay
now accepts a minimum value
of zero
(GitHub #448).
repmgr: always exit with an error if an unrecognised command line option is provided. This matches the behaviour of other PostgreSQL utilities such as psql. (GitHub #464).
repmgr: add -q/--quiet
option to suppress non-error
output. (GitHub #468).
repmgr cluster show
,
repmgr node check
and
repmgr node status
return non-zero exit code if node status issues detected. (GitHub #456).
Add --csv
output option for
repmgr cluster event
.
(GitHub #471).
repmgr witness unregister
can be run on any node, by providing the ID of the witness node with --node-id
.
(GitHub #472).
repmgr standby switchover
will refuse to run if an exclusive backup is taking place on the current primary.
(GitHub #476).
repmgrd: create a PID file by default (GitHub #457). For details, see repmgrd's PID file.
repmgrd: daemonize process by default.
In case, for whatever reason, the user does not wish to daemonize the
process, provide --daemonize=false
.
(GitHub #458).
repmgr standby register --wait-sync
:
fix behaviour when no timeout provided.
repmgr cluster cleanup
:
add missing help options. (GitHub #461/#462).
Ensure witness node follows new primary after switchover. (GitHub #453).
repmgr node check
and
repmgr node status
:
fix witness node handling. (GitHub #451).
When using repmgr standby clone
with --recovery-conf-only
and replication slots, ensure
primary_slot_name
is set correctly. (GitHub #474).
Thu June 14, 2018
repmgr 4.0.6 contains a number of bug fixes and usability enhancements.
We recommend upgrading to this version as soon as possible. This release can be installed as a simple package upgrade from repmgr 4.0 ~ 4.0.5; repmgrd (if running) should be restarted. See Upgrading repmgr for more details.
repmgr cluster crosscheck
and
repmgr cluster matrix
:
return non-zero exit code if node connection issues detected (GitHub #447)
repmgr standby clone
:
Improve handling of external configuration file copying, including consideration in
--dry-run
check
(GitHub #443)
When using --dry-run
, force log level to INFO
to ensure output will always be displayed
(GitHub #441)
repmgr standby clone
:
Improve documentation of --recovery-conf-only
mode
(GitHub #438)
repmgr standby clone
:
Don't require presence of user
parameter in conninfo string
(GitHub #437)
repmgr witness register
:
prevent registration of a witness server with the same name as an existing node
repmgr standby follow
:
check node has actually connected to new primary before reporting success
(GitHub #444)
repmgr node rejoin
:
Fix bug when parsing --config-files
parameter
(GitHub #442)
repmgrd: ensure local node is counted as quorum member (GitHub #439)
Wed May 2, 2018
repmgr 4.0.5 contains a number of usability enhancements related to
pg_rewind usage, recovery.conf
generation and (in repmgrd) handling of various
corner-case situations, as well as a number of bug fixes.
Various documentation improvements, with particular emphasis on the importance of setting appropriate service commands instead of relying on pg_ctl.
Poll demoted primary after restart as a standby during a switchover operation (GitHub #408).
Add configuration parameter config_directory
(GitHub #424).
Add sanity check if --upstream-node-id
not supplied when executing
repmgr standby register (GitHub #395).
Enable pg_rewind to be used with PostgreSQL 9.3/9.4 (GitHub #413).
When generating replication connection strings, set dbname=replication
if appropriate (GitHub #421).
Enable provision of archive_cleanup_command
in recovery.conf
(GitHub #416).
Actively check for node to rejoin cluster (GitHub #415).
repmgrd: set connect_timeout=2
(if not explicitly set)
when pinging a server.
Fix display of conninfo parsing error messages.
Fix minimum accepted value for degraded_monitoring_timeout
(GitHub #411).
Fix superuser password handling (GitHub #400)
Fix parsing of archive_ready_critical
configuration file parameter (GitHub #426).
Fix repmgr cluster crosscheck
output (GitHub #389)
Fix memory leaks in witness code (GitHub #402).
repmgrd: handle pg_ctl promote
timeout (GitHub #425).
repmgrd: handle failover situation with only two nodes in the primary location, and at least one node in another location (GitHub #407).
repmgrd: prevent standby connection handle from going stale.
Fri Mar 9, 2018
repmgr 4.0.4 contains some bug fixes and and a number of usability enhancements related to logging/diagnostics, event notifications and pre-action checks.
This release can be installed as a simple package upgrade from repmgr 4.0 ~ 4.0.3; repmgrd (if running) should be restarted. See Upgrading repmgr for more details.
It is not possible to perform a switchover where the demotion candidate is running repmgr 4.0.2 or lower; all nodes should be upgraded to the latest version (4.0.4). This is due to additional checks introduced in 4.0.3 which require the presence of 4.0.3 or later versions on all nodes.
add repmgr standby clone --recovery-conf-only
option to enable integration of a standby cloned from another source into a repmgr cluster (GitHub #382)
remove restriction on using replication slots when cloning from a Barman server (GitHub #379)
make repmgr standby promote
timeout values configurable (GitHub #387)
add missing options to main --help
output (GitHub #391, #392)
ensure repmgr node rejoin
honours the --dry-run
option (GitHub #383)
improve replication slot warnings generated by
repmgr node status
(GitHub #385)
fix --superuser handling when cloning a standby (GitHub #380)
repmgrd: improve detection of status change from primary to standby
repmgrd: improve reconnection to the local node after a failover (previously a connection error due to the node starting up was being interpreted as the node being unavailable)
repmgrd: when running on a witness server, correctly connect to new primary after a failover
repmgrd: add event notification
repmgrd_shutdown
(GitHub #393)
Thu Feb 15, 2018
repmgr 4.0.3 contains some bug fixes and and a number of usability enhancements related to logging/diagnostics, event notifications and pre-action checks.
This release can be installed as a simple package upgrade from repmgr 4.0 ~ 4.0.2; repmgrd (if running) should be restarted.
It is not possible to perform a switchover where the demotion candidate is running repmgr 4.0.2 or lower; all nodes should be upgraded to 4.0.3. This is due to additional checks introduced in 4.0.3 which require the presence of 4.0.3 or later versions on all nodes.
improve repmgr standby switchover
behaviour when pg_ctl
is used to control the server and logging output is
not explicitly redirected
improve repmgr standby switchover
log messages and provide new exit code ERR_SWITCHOVER_INCOMPLETE
when old primary could
not be shut down cleanly
add check to verify the demotion candidate can make a replication connection to the promotion candidate before executing a switchover (GitHub #370)
add check for sufficient walsenders and replication slots on the promotion candidate before executing
repmgr standby switchover
(GitHub #371)
add --dry-run mode to repmgr standby follow
(GitHub #368)
provide information about the primary node for
repmgr standby register
and
repmgr standby follow
event notifications (GitHub #375)
add standby_register_sync
event notification, which is fired when
repmgr standby register
is run with the --wait-sync
option and the new or updated standby node
record has synchronised to the standby (GitHub #374)
when running repmgr cluster show
,
if any node is unreachable, output the error message encountered in the list of warnings
(GitHub #369)
ensure an inactive data directory can be overwritten when cloning a standby (GitHub #366)
repmgr node status
upstream node display fixed (GitHub #363)
repmgr primary unregister
:
clarify usage and fix --help
output (GitHub #373)
parsing of pg_basebackup_options
fixed (GitHub #376)
ensure the pg_subtrans
directory is created when cloning a
standby in Barman mode
repmgr witness register
:
fix primary node check (GitHub #377).
Thu Jan 18, 2018
repmgr 4.0.2 contains some bug fixes and small usability enhancements.
This release can be installed as a simple package upgrade from repmgr 4.0.1 or 4.0; repmgrd (if running) should be restarted.
Recognize the -t
/--terse
option for
repmgr cluster event
to hide
the Details
column (GitHub #360)
Add "--wait-start" option for
repmgr standby register
(GitHub #356)
Add %p
event notification parameter
for repmgr standby switchover
Add missing -W option to getopt_long()
invocation (GitHub #350)
Automatically create slot name if missing (GitHub #343)
Fixes to parsing output of remote repmgr invocations (GitHub #349)
When registering BDR nodes, automatically create missing connection replication set (GitHub #347)
Handle missing node record in repmgr node rejoin
(GitHub #358)
The documentation can now be built as a single HTML file (GitHub pull request #353)
Wed Dec 13, 2017
repmgr 4.0.1 is a bugfix release.
ensure correct return codes are returned for
repmgr node check --action=
operations
(GitHub #340)
Fix repmgr cluster show when repmgr
schema not set in search path
(GitHub #341)
When using --force-rewind
with repmgr node rejoin
delete any replication slots copied by pg_rewind
(GitHub #334)
Only perform sanity check on accessibility of configuration files outside
the data directory when --copy-external-config-files
provided (GitHub #342)
Initialise "voting_term" table in application, not extension SQL (GitHub #344)
Tue Nov 21, 2017
repmgr 4.0 is an entirely new version of repmgr, implementing repmgr as a native PostgreSQL extension, adding new and improving existing features, and making repmgr more user-friendly and intuitive to use. The new code base will make it easier to add additional functionality for future releases.
With the new version, the opportunity has been taken to make some changes in the way repmgr is set up and configured. In particular changes have been made to some configuration file settings consistency for and clarity. Changes are covered in detail below
To standardise terminology, from this release primary
is used to
denote the read/write node in a streaming replication cluster. master
is still accepted as an alias for repmgr commands
(e.g. repmgr master register
).
For detailed instructions on upgrading from repmgr 3.x, see Upgrading from repmgr 3.x.
improved switchover:
the switchover
process has been improved and streamlined,
speeding up the switchover process and can also instruct other standbys
to follow the new primary once the switchover has completed. See
Performing a switchover with repmgr for more details.
"--dry-run" option: many repmgr commands now provide
a --dry-run
option which will execute the command as far
as possible without making any changes, which will enable possible issues
to be identified before the intended operation is actually carried out.
easier upgrades: repmgr is now implemented as a native PostgreSQL extension, which means future upgrades can be carried out by installing the upgraded package and issuing ALTER EXTENSION repmgr UPDATE.
improved logging output:
repmgr (and repmgrd) now provide more explicit
logging output giving a better picture of what is going on. Where appropriate,
DETAIL
and HINT
log lines provide additional
detail and suggestions for resolving problems. Additionally, repmgrd
now emits informational log lines at regular, configurable intervals
to confirm that it's running correctly and which node(s) it's monitoring.
automatic configuration file location in packages:
Many operating system packages place the repmgr configuration files
in a version-specific subdirectory, e.g. /etc/repmgr/9.6/repmgr.conf
;
repmgr now makes it easy for package maintainers to provide a patch
with the actual file location, meaning repmgr.conf
does not need to be provided explicitly. This is currently the case
for 2ndQuadrant-provided .deb
and .rpm
packages.
monitoring and status checks: New commands repmgr node check and repmgr node status providing information about a node's status and replication-related monitoring output.
node rejoin: New commands repmgr node rejoin enables a failed primary to be rejoined to a replication cluster, optionally using pg_rewind to synchronise its data, (note that pg_rewind may not be useable in some circumstances).
automatic failover: improved detection of node status; promotion decision based on a consensual model, with the promoted primary explicitly informing other standbys to follow it. The repmgrd daemon will continue functioning even if the monitored PostgreSQL instance is down, and resume monitoring if it reappears. Additionally, if the instance's role has changed (typically from a primary to a standby, e.g. following reintegration of a failed primary using repmgr node rejoin) repmgrd will automatically resume monitoring it as a standby.
new documentation: the existing documentation spread over multiple text files has been consolidated into DocBook format (as used by the main PostgreSQL project) and is now available online in HTML format.
The DocBook files can easily be used to create versions of the documentation in other formats such as PDF.
--dry-run
: repmgr will attempt to perform
the action as far as possible without making any changes to the
database
--upstream-node-id
: use to specify the upstream node
the standby will connect later stream from, when cloning
and registering a standby.
This replaces the configuration file parameter upstream_node
.
as the upstream node is set when the standby is initially cloned, but can change
over the lifetime of an installation (due to failovers, switchovers etc.) so it's
pointless/confusing keeping the original value around in repmgr.conf
.
repmgr
--replication-user
has been deprecated; it has been replaced
by the configuration file option replication_user
.
The value (which defaults to the user provided in the conninfo
string) will be stored in the repmgr metadata for use by
repmgr standby clone and repmgr standby follow.
--recovery-min-apply-delay
is now a configuration file parameter
recovery_min_apply_delay
, to ensure the setting does not get lost
when a standby follows a new upstream.
--no-conninfo-password
is deprecated; a password included in
the environment variable PGPASSWORD
will no longer be added
to primary_conninfo
by default; to force the inclusion
of a password (not recommended), use the new configuration file parameter
use_primary_conninfo_password
. For details, ee section
Managing passwords.
repmgrd
--monitoring-history
is deprecated and is replaced by the
configuration file option monitoring_history
.
This enables the setting to be changed without having to modify system service
files.
Required settings
The following 4 parameters are mandatory in repmgr.conf
:
Renamed settings
Some settings have been renamed for clarity and consistency:
node
is now node_id
name
is now node_name
barman_server
is now barman_host
master_reponse_timeout
is now
async_query_timeout
(to better indicate its purpose)
The following configuration file parameters have been renamed for consistency
with other parameters (and conform to the pattern used by PostgreSQL itself,
which uses the prefix log_
for logging parameters):
loglevel
is now log_level
logfile
is now log_file
logfacility
is now log_facility
Removed settings
cluster
has been removedupstream_node
- see note about
--upstream-node-id
aboveretry_promote_interval_secs
this is now redundant due
to changes in the failover/promotion mechanism; the new equivalent is
primary_notification_timeout
Logging changes
log_level
is INFO
rather than NOTICE
.
log_status_interval
, which causes
repmgrd to emit a status log
line at the specified interval
The shared library has been renamed from repmgr_funcs
to
repmgr
, meaning shared_preload_libraries
in postgresql.conf
needs to be updated to the new name:
shared_preload_libraries = 'repmgr'
Table of Contents
The signing key ID used for repmgr source code bundles is:
0x297F1DCC
.
To download the repmgr source key to your computer:
curl -s https://repmgr.org/download/SOURCE-GPG-KEY-repmgr | gpg --import gpg --fingerprint 0x297F1DCC
then verify that the fingerprint is the expected value:
085A BE38 6FD9 72CE 6365 340D 8365 683D 297F 1DCC
For checking tarballs, first download and import the repmgr
source signing key as shown above. Then download both source tarball and the detached
key (e.g. repmgr-4.0beta1.tar.gz
and
repmgr-4.0beta1.tar.gz.asc
) from
https://repmgr.org/download/
and use gpg to verify the key, e.g.:
gpg --verify repmgr-4.0beta1.tar.gz.asc
Table of Contents
max_replication_slots
?ERROR: could not access file "$libdir/repmgr"
?$third_party_vendor
's packages?repmgr
postgresql.conf
and pg_hba.conf
from the PostgreSQL configuration
directory in /etc
?shared_preload_libraries = 'repmgr'
in postgresql.conf
if I'm not using repmgrd?repmgr
user in pg_hba.conf
but repmgr
/repmgrd complains it can't connect to the server... Why?node_id
column in the repmgr.events
table?recovery.conf
(PostgreSQL 11 and earlier) surrounded by pairs of single quotes?promote_command
or follow_command
upstream node must be running before repmgrd can start
"
max_replication_slots
?ERROR: could not access file "$libdir/repmgr"
?$third_party_vendor
's packages?repmgr 4 is a complete rewrite of the previous repmgr code base and implements repmgr as a PostgreSQL extension. It supports all PostgreSQL versions from 9.3 (although some repmgr features are not available for PostgreSQL 9.3 and 9.4).
repmgr 5 is fundamentally the same code base as repmgr 4, but provides support for the revised replication configuration mechanism in PostgreSQL 12.
Support for PostgreSQL 9.3 is no longer available from repmgr 5.2.
repmgr 3.x builds on the improved replication facilities added in PostgreSQL 9.3, as well as improved automated failover support via repmgrd, and is not compatible with PostgreSQL 9.2 and earlier. We recommend upgrading to repmgr 4, as the repmgr 3.x series is no longer maintained.
repmgr 2.x supports PostgreSQL 9.0 ~ 9.3. While it is compatible with PostgreSQL 9.3, we recommend using repmgr 4.x. repmgr 2.x is no longer maintained.
See also repmgr compatibility matrix and Should I upgrade repmgr?.
Replication slots, introduced in PostgreSQL 9.4, ensure that the primary server will retain WAL files until they have been consumed by all standby servers. This means standby servers should never fail due to not being able to retrieve required WAL files from the primary.
However this does mean that if a standby is no longer connected to the primary, the presence of the replication slot will cause WAL files to be retained indefinitely, and eventually lead to disk space exhaustion.
Our recommended configuration is to configure Barman as a fallback source of WAL files, rather than maintain replication slots for each standby. See also: Using Barman as a WAL file source.
max_replication_slots
?
Normally at least same number as the number of standbys which will connect
to the node. Note that changes to max_replication_slots
require a server
restart to take effect, and as there is no particular penalty for unused
replication slots, setting a higher figure will make adding new nodes
easier.
Before PostgreSQL 10, hash indexes were not WAL logged and are therefore not suitable for use in streaming replication in PostgreSQL 9.6 and earlier. See the PostgreSQL documentation for details.
From PostgreSQL 10, this restriction has been lifted and hash indexes can be used in a streaming replication cluster.
For minor version upgrades, e.g. from 9.6.7 to 9.6.8, a common approach is to upgrade a standby to the latest version, perform a switchover promoting it to a primary, then upgrade the former primary.
For major version upgrades (e.g. from PostgreSQL 9.6 to PostgreSQL 10), the traditional approach is to "reseed" a cluster by upgrading a single node with pg_upgrade and recloning standbys from this.
To minimize downtime during major upgrades from PostgreSQL 9.4 and later, pglogical can be used to set up a parallel cluster using the newer PostgreSQL version, which can be kept in sync with the existing production cluster until the new cluster is ready to be put into production.
ERROR: could not access file "$libdir/repmgr"
?It means the repmgr extension code is not installed in the PostgreSQL application directory. This typically happens when using PostgreSQL packages provided by a third-party vendor, which often have different filesystem layouts.
Either use PostgreSQL packages provided by the community or EnterpriseDB; if this is not possible, contact your vendor for assistance.
See appendix Installing old package versions for details.
No.
repmgr (together with repmgrd) assists with managing replication. It does not actually perform replication, which is part of the core PostgreSQL functionality.
No. See preceding question.
Yes. If different "major" repmgr versions (e.g. 3.3.x and 4.1.x) are present, repmgr (in particular repmgrd) may not run, or run properly, or in the worst case (if different repmgrd versions are running and there are differences in the failover implementation) break your replication cluster.
If different "minor" repmgr versions (e.g. 4.1.1 and 4.1.6) are installed, repmgr will function, but we strongly recommend always running the same version to ensure there are no unexpected surprises, e.g. a newer version behaving slightly differently to the older version.
See also Should I upgrade repmgr?.
Yes.
We don't release new versions for fun, you know. Upgrading may require a little effort, but running an older repmgr version with bugs which have since been fixed may end up costing you more effort. The same applies to PostgreSQL itself.
In some circumstances repmgr may need to access a PostgreSQL data directory while the PostgreSQL server is not running, e.g. to confirm it shut down cleanly during a switchover.
Additionally, this provides support when using repmgr on PostgreSQL 9.6 and
earlier, where the repmgr
user is not a superuser; in that
case the repmgr
user will not be able to access the
data_directory
configuration setting, access to which is restricted
to superusers.
In PostgreSQL 10 and later, non-superusers can be added to the
default role
pg_read_all_settings
(or the meta-role pg_monitor
)
which will enable them to read this setting.
$third_party_vendor
's packages?repmgr packages provided by EnterpriseDB are compatible with the community-provided PostgreSQL packages and specified software provided by EnterpriseDB.
A number of other vendors provide their own versions of PostgreSQL packages, often with different package naming schemes and/or file locations.
We cannot guarantee that repmgr packages will be compatible with these packages.
It may be possible to override package dependencies (e.g. rpm --nodeps
for CentOS-based systems or dpkg --force-depends
for Debian-based systems).
repmgr
postgresql.conf
and pg_hba.conf
from the PostgreSQL configuration
directory in /etc
?shared_preload_libraries = 'repmgr'
in postgresql.conf
if I'm not using repmgrd?repmgr
user in pg_hba.conf
but repmgr
/repmgrd complains it can't connect to the server... Why?node_id
column in the repmgr.events
table?recovery.conf
(PostgreSQL 11 and earlier) surrounded by pairs of single quotes?Yes, any existing PostgreSQL server which is part of the same replication cluster can be registered with repmgr. There's no requirement for a standby to have been cloned using repmgr.
For a standby which has been manually cloned or recovered from an external
backup manager such as Barman, the command
repmgr standby clone --replication-conf-only
can be used to create the correct replication configuration file for
use with repmgr (and will create a replication slot if required). Once this has been done,
register the node as usual.
See section Customising replication configuration.
This is a two-stage process. First, the failed primary's data directory must be re-synced with the current primary; secondly the failed primary needs to be re-registered as a standby.
It's possible to use pg_rewind
to re-synchronise the existing data
directory, which will usually be much
faster than re-cloning the server. However pg_rewind
can only
be used if PostgreSQL either has wal_log_hints
enabled, or
data checksums were enabled when the cluster was initialized.
Note that pg_rewind
is available as part of the core PostgreSQL
distribution from PostgreSQL 9.5, and as a third-party utility for PostgreSQL 9.3 and 9.4.
repmgr provides the command repmgr node rejoin
which can
optionally execute pg_rewind
; see the repmgr node rejoin
documentation for details, in particular the section Using pg_rewind.
If pg_rewind
cannot be used, then the data directory will need
to be re-cloned from scratch.
Execute repmgr standby clone
with the --dry-run
option; this will report any configuration problems
which need to be rectified.
postgresql.conf
and pg_hba.conf
from the PostgreSQL configuration
directory in /etc
?
Use the command line option --copy-external-config-files
. For more details
see Copying configuration files.
shared_preload_libraries = 'repmgr'
in postgresql.conf
if I'm not using repmgrd?
No, the repmgr
shared library is only needed when running repmgrd.
If you later decide to run repmgrd, you just need to add
shared_preload_libraries = 'repmgr'
and restart PostgreSQL.
repmgr
user in pg_hba.conf
but repmgr
/repmgrd complains it can't connect to the server... Why?
repmgr
and repmgrd need to be able to connect to the repmgr database
with a normal connection to query metadata. The replication
connection
permission is for PostgreSQL's streaming replication (and doesn't necessarily need to be the repmgr
user).
Cloning a standby is a one-time action; the role of the server being cloned
from could change, so fixing it in the configuration file would create
confusion. If repmgr needs to establish a connection to the primary
server, it can retrieve this from the repmgr.nodes
table on the local
node, and if necessary scan the replication cluster until it locates the active primary.
Provide the option --waldir
(--xlogdir
in PostgreSQL 9.6
and earlier) with the absolute path to the WAL directory in pg_basebackup_options
.
For more details see pg_basebackup options when cloning a standby.
In repmgr 5.2 and later, this setting will also be honoured when cloning from Barman.
node_id
column in the repmgr.events
table?
Under some circumstances event notifications can be generated for servers
which have not yet been registered; it's also useful to retain a record
of events which includes servers removed from the replication cluster
which no longer have an entry in the repmgr.nodes
table.
recovery.conf
(PostgreSQL 11 and earlier) surrounded by pairs of single quotes?
This is to ensure that user-supplied values which are written as parameter values in recovery.conf
are escaped correctly and do not cause errors when the file is parsed.
The escaping is performed by an internal PostgreSQL routine, which leaves strings consisting of digits and alphabetical characters only as-is, but wraps everything else in pairs of single quotes, even if the string does not contain any characters which need escaping.
Beginning with repmgr 5.2, the metadata tables associated with the repmgr extension
(repmgr.nodes
, repmgr.events
and repmgr.monitoring_history
)
have been marked as dumpable as they contain configuration and user-generated data.
To exclude these from pg_dump output, add the flag --exclude-schema=repmgr
.
To exclude individual repmgr metadata tables from pg_dump output, add the flag
e.g. --exclude-table=repmgr.monitoring_history
. This flag can be provided multiple times
to exclude individual tables,
promote_command
or follow_command
upstream node must be running before repmgrd can start
"
In repmgr.conf
, set its priority to a value of 0
; apply the changed setting with
repmgr standby register --force
.
Additionally, if failover
is set to manual
, the node will never
be considered as a promotion candidate.
repmgrd can monitor delayed standbys - those set up with
recovery_min_apply_delay
set to a non-zero value
in the replication configuration. However repmgrd does not currently
consider this setting, and therefore may not be able to properly evaluate
the node as a promotion candidate.
We recommend that delayed standbys are explicitly excluded from promotion
by setting priority
to 0
in
repmgr.conf
.
Note that after registering a delayed standby, repmgrd will only start once the metadata added in the primary node has been replicated.
Configure your system's logrotate
service to do this; see Section 13.4.
Check you registered the standby after recloning. If unregistered, the standby
cannot be considered as a promotion candidate even if failover
is set to
automatic
, which is probably not what you want. repmgrd will start if
failover
is set to manual
so the node's replication status can still
be monitored, if desired.
promote_command
or follow_command
promote_command
or follow_command
can be user-defined scripts,
so repmgr will not apply pg_bindir
even if executing repmgr. Always provide the full
path; see Section 13.1.1 for more details.
upstream node must be running before repmgrd can start
"
repmgrd does this to avoid starting up on a replication cluster which is not in a healthy state. If the upstream is unavailable, repmgrd may initiate a failover immediately after starting up, which could have unintended side-effects, particularly if repmgrd is not running on other nodes.
In particular, it's possible that the node's local copy of the repmgr.nodes
copy
is out-of-date, which may lead to incorrect failover behaviour.
The onus is therefore on the administrator to manually set the cluster to a stable, healthy state before starting repmgrd.
Table of Contents
This section provides technical details about various repmgr binary packages, such as location of the installed binaries and configuration files.
Currently, repmgr RPM packages are provided for versions 6.x and 7.x of CentOS. These should also work on matching versions of Red Hat Enterprise Linux, Scientific Linux and Oracle Enterprise Linux; together with CentOS, these are the same RedHat-based distributions for which the main community project (PGDG) provides packages (see the PostgreSQL RPM Building Project page for details).
Note these repmgr RPM packages are not designed to work with SuSE/OpenSuSE.
repmgr packages are designed to be compatible with community-provided PostgreSQL packages. They may not work with vendor-specific packages such as those provided by RedHat for RHEL customers, as the filesystem layout may be different to the community RPMs. Please contact your support vendor for assistance.
repmgr packages are available from the public EDB repository, and also the PostgreSQL community repository. The EDB repository is updated immediately after each repmgr release.
Table D.1. EDB public repository
Repository URL: | https://dl.enterprisedb.com/ |
Repository documentation: | https://repmgr.org/docs/current/installation-packages.html#INSTALLATION-PACKAGES-REDHAT-2NDQ |
Table D.2. PostgreSQL community repository (PGDG)
Repository URL: | https://yum.postgresql.org/repopackages.php |
Repository documentation: | https://yum.postgresql.org/ |
The two tables below list relevant information, paths, commands etc. for the repmgr packages on CentOS 7 (with systemd) and CentOS 6 (no systemd). Substitute the appropriate PostgreSQL major version number for your installation.
For PostgreSQL 9.6 and lower, the CentOS packages use a mixture of 9.6
and 96
in various places to designate the major version; e.g. the
package name is repmgr96
, but the binary directory is
/var/lib/pgsql/9.6/data
.
From PostgreSQL 10, the first part of the version number (e.g. 10
) is
the major version, so there is more consistency in file/path/package naming
(package repmgr10
, binary directory /var/lib/pgsql/10/data
).
Table D.3. CentOS 7 packages
Package name example: | repmgr11-4.4.0-1.rhel7.x86_64 |
Metapackage: | (none) |
Installation command: | yum install repmgr11 |
Binary location: | /usr/pgsql-11/bin |
repmgr in default path: | NO |
Configuration file location: | /etc/repmgr/11/repmgr.conf |
Data directory: | /var/lib/pgsql/11/data |
repmgrd service command: | systemctl [start|stop|restart|reload] repmgr11 |
repmgrd service file location: | /usr/lib/systemd/system/repmgr11.service |
repmgrd log file location: | (not specified by package; set in repmgr.conf ) |
Table D.4. CentOS 6 packages
Package name example: | repmgr96-4.0.4-1.rhel6.x86_64 |
Metapackage: | (none) |
Installation command: | yum install repmgr96 |
Binary location: | /usr/pgsql-9.6/bin |
repmgr in default path: | NO |
Configuration file location: | /etc/repmgr/9.6/repmgr.conf |
Data directory: | /var/lib/pgsql/9.6/data |
repmgrd service command: | service [start|stop|restart|reload] repmgr-9.6 |
repmgrd service file location: | /etc/init.d/repmgr-9.6 |
repmgrd log file location: | /var/log/repmgr/repmgrd-9.6.log |
repmgr .deb
packages are provided by EDB as well as the
PostgreSQL Community APT repository, and are available for each community-supported
PostgreSQL version, currently supported Debian releases, and currently supported
Ubuntu LTS releases.
Table D.5. EDB public repository
Repository URL: | https://dl.enterprisedb.com/ |
Repository documentation: | https://repmgr.org/docs/current/installation-packages.html#INSTALLATION-PACKAGES-DEBIAN |
Table D.6. PostgreSQL Community APT repository (PGDG)
Repository URL: | https://apt.postgresql.org/ |
Repository documentation: | https://wiki.postgresql.org/wiki/Apt |
The table below lists relevant information, paths, commands etc. for the repmgr packages on Debian 9.x ("Stretch"). Substitute the appropriate PostgreSQL major version number for your installation.
See also Section 13.2.2 for some specifics related to configuring the repmgrd daemon.
Table D.7. Debian 9.x packages
Package name example: | postgresql-11-repmgr |
Metapackage: | repmgr-common |
Installation command: | apt-get install postgresql-11-repmgr |
Binary location: | /usr/lib/postgresql/11/bin |
repmgr in default path: | Yes (via wrapper script /usr/bin/repmgr ) |
Configuration file location: | (not set by package) |
Data directory: | /var/lib/postgresql/11/main |
PostgreSQL service command: | systemctl [start|stop|restart|reload] postgresql@11-main |
repmgrd service command: | systemctl [start|stop|restart|reload] repmgrd |
repmgrd service file location: | /etc/init.d/repmgrd (defaults in: /etc/defaults/repmgrd ) |
repmgrd log file location: | (not specified by package; set in repmgr.conf ) |
When using Debian packages, instead of using the systemd service
command directly, it's recommended to execute pg_ctlcluster
(as root
, either directly or via sudo
), e.g.:
pg_ctlcluster 11 main [start|stop|restart|reload]
For pre-systemd systems, pg_ctlcluster
can be executed directly by the postgres
user.
For testing new features and bug fixes, from time to time EDB provides so-called "snapshot packages" via its public repository. These packages are built from the repmgr source at a particular point in time, and are not formal releases.
We do not recommend installing these packages in a production environment unless specifically advised.
To install a snapshot package, it's necessary to install the EDB public snapshot repository,
following the instructions here: https://dl.enterprisedb.com/default/release/site/ but replace release
with snapshot
in the appropriate URL.
For example, to install the snapshot RPM repository for PostgreSQL 9.6, execute (as root
):
curl https://dl.enterprisedb.com/default/snapshot/get/9.6/rpm | bash
or as a normal user with root sudo access:
curl https://dl.enterprisedb.com/default/snapshot/get/9.6/rpm | sudo bash
Alternatively you can browse the repository here: https://dl.enterprisedb.com/default/snapshot/browse/.
Once the repository is installed, installing or updating repmgr will result in the latest snapshot package being installed.
The package name will be formatted like this:
repmgr96-4.1.1-0.0git320.g5113ab0.1.el7.x86_64.rpm
containing the snapshot build number (here: 320
) and the hash
of the git commit it was built from (here: g5113ab0
).
Note that the next formal release (in the above example 4.1.1
), once available,
will install in place of any snapshot builds.
An archive of old packages (3.3.2
and later) for Debian/Ubuntu-based systems is available here:
https://apt-archive.postgresql.org/
Old versions can be located with e.g.:
yum --showduplicates list repmgr96
(substitute the appropriate package name; see CentOS packages) and installed with:
yum install {package_name}-{version}
where {package_name}
is the base package name (e.g. repmgr96
)
and {version}
is the version listed by the
yum --showduplicates list ...
command, e.g. 4.0.6-1.rhel6
.
For example:
yum install repmgr96-4.0.6-1.rhel6
We recommend patching the following parameters when building the package as built-in default values for user convenience. These values can nevertheless be overridden by the user, if desired.
Configuration file location: the default configuration file location
can be hard-coded by patching package_conf_file
in configfile.c
:
/* packagers: if feasible, patch configuration file path into "package_conf_file" */ char package_conf_file[MAXPGPATH] = "";
See also: configuration file
PID file location: the default repmgrd PID file
location can be hard-coded by patching package_pid_file
in repmgrd.c
:
/* packagers: if feasible, patch PID file path into "package_pid_file" */ char package_pid_file[MAXPGPATH] = "";
See also: repmgrd's PID file
Table of Contents
EDB provides 24x7 production support for repmgr and other PostgreSQL products, including configuration assistance, installation verification and training for running a robust replication cluster.
For further details see: Support Center
A mailing list/forum is provided via Google groups to discuss contributions or issues: https://groups.google.com/group/repmgr.
Please report bugs and other issues to: https://github.com/EnterpriseDB/repmgr.
Please read the following section before submitting questions or issue reports.
When asking questions or reporting issues, it is extremely helpful if the following information is included:
repmgr.conf
files (suitably anonymized if necessary)
repmgr.nodes
table (suitably anonymized if necessary)
recovery.conf
file
(suitably anonymized if necessary).
postgresql.auto.conf
file
(suitably anonymized if necessary), and whether or not the PostgreSQL data directory
contains the files standby.signal
and/or recovery.signal
.
If issues are encountered with a repmgr client command, please provide
the output of that command executed with the options
-LDEBUG --verbose
, which will ensure repmgr emits
the maximum level of logging output.
If issues are encountered with repmgrd, please provide relevant extracts from the repmgr log files and if possible the PostgreSQL log itself. Please ensure these logs do not contain any confidential data.
In all cases it is extremely useful to receive as much detail as possible on how to reliably reproduce an issue.
Symbols | A | B | C | D | E | F | H | I | L | M | N | O | P | Q | R | S | T | U | W