Skip to content

A lightweight replication manager for PostgreSQL (Postgres)

License

Notifications You must be signed in to change notification settings

wangjie-star/repmgr

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

repmgr: Replication Manager for PostgreSQL

repmgr is a suite of open-source tools to manage replication and failover within a cluster of PostgreSQL servers. It enhances PostgreSQL's built-in replication capabilities with utilities to set up standby servers, monitor replication, and perform administrative tasks such as failover or switchover operations.

repmgr 4 is a complete rewrite of the existing repmgr codebase.

Supports PostgreSQL 9.5 and later; support for PostgreSQL 9.3 and 9.4 has been dropped. To use repmgr 4 with BDR 2.0, PostgreSQL 9.6 is required.

Building from source

Simply:

./configure && make install

Ensure pg_config for the target PostgreSQL version is in $PATH.

Reference

repmgr commands

The following commands are available:

repmgr primary register
repmgr primary unregister

repmgr standby clone
repmgr standby register
repmgr standby unregister
repmgr standby promote
repmgr standby follow

repmgr bdr register
repmgr bdr unregister

repmgr node status

repmgr cluster show
repmgr cluster event [--all] [--node-id] [--node-name] [--event] [--event-matching]
  • primary register

    Registers a primary in a streaming replication cluster, and configures it for use with repmgr. This command needs to be executed before any standby nodes are registered.

    master register can be used as an alias for primary register.

  • cluster show

    Displays information about each active node in the replication cluster. This command polls each registered server and shows its role (primary / standby / bdr) and status. It polls each server directly and can be run on any node in the cluster; this is also useful when analyzing connectivity from a particular node.

    This command requires either a valid repmgr.conf file or a database connection string to one of the registered nodes; no additional arguments are needed.

    Example:

      $ repmgr -f /etc/repmgr.conf cluster show
    
       ID | Name  | Role    | Status    | Upstream | Connection string
      ----+-------+---------+-----------+----------+-----------------------------------------
       1  | node1 | primary | * running |          | host=db_node1 dbname=repmgr user=repmgr
       2  | node2 | standby |   running | node1    | host=db_node2 dbname=repmgr user=repmgr
       3  | node3 | standby |   running | node1    | host=db_node3 dbname=repmgr user=repmgr
    

    To show database connection errors when polling nodes, run the command in --verbose mode.

    The cluster show command accepts an optional parameter --csv, which outputs the replication cluster's status in a simple CSV format, suitable for parsing by scripts:

      $ repmgr -f /etc/repmgr.conf cluster show --csv
      1,-1,-1
      2,0,0
      3,0,1
    

    The columns have following meanings:

      - node ID
      - availability (0 = available, -1 = unavailable)
      - recovery state (0 = not in recovery, 1 = in recovery, -1 = unknown)
    

    Note that the availability is tested by connecting from the node where repmgr cluster show is executed, and does not necessarily imply the node is down.

  • cluster matrix and cluster crosscheck

    These commands display connection information for each pair of nodes in the replication cluster.

    • cluster matrix runs a cluster show on each node and arranges the results in a matrix, recording success or failure;

    • cluster crosscheck runs a cluster matrix on each node and combines the results in a single matrix, providing a full overview of connections between all databases in the cluster.

    These commands require a valid repmgr.conf file on each node. Additionally passwordless ssh connections are required between all nodes.

    Example 1 (all nodes up):

      $ repmgr -f /etc/repmgr.conf cluster matrix
    
      Name   | Id |  1 |  2 |  3
      -------+----+----+----+----
       node1 |  1 |  * |  * |  *
       node2 |  2 |  * |  * |  *
       node3 |  3 |  * |  * |  *
    

    Here cluster matrix is sufficient to establish the state of each possible connection.

    Example 2 (node1 and node2 up, node3 down):

      $ repmgr -f /etc/repmgr.conf cluster matrix
    
      Name   | Id |  1 |  2 |  3
      -------+----+----+----+----
       node1 |  1 |  * |  * |  x
       node2 |  2 |  * |  * |  x
       node3 |  3 |  ? |  ? |  ?
    

    Each row corresponds to one server, and indicates the result of testing an outbound connection from that server.

    Since node3 is down, all the entries in its row are filled with "?", meaning that there we cannot test outbound connections.

    The other two nodes are up; the corresponding rows have "x" in the column corresponding to node3, meaning that inbound connections to that node have failed, and "*" in the columns corresponding to node1 and node2, meaning that inbound connections to these nodes have succeeded.

    In this case, cluster crosscheck gives the same result as cluster matrix, because from any functioning node we can observe the same state: node1 and node2 are up, node3 is down.

    Example 3 (all nodes up, firewall dropping packets originating from node1 and directed to port 5432 on node3)

    Running cluster matrix from node1 gives the following output:

      $ repmgr -f /etc/repmgr.conf cluster matrix
    
      Name   | Id |  1 |  2 |  3
      -------+----+----+----+----
       node1 |  1 |  * |  * |  x
       node2 |  2 |  * |  * |  *
       node3 |  3 |  ? |  ? |  ?
    

    (Note this may take some time depending on the connect_timeout setting in the registered node conninfo strings; default is 1 minute which means without modification the above command would take around 2 minutes to run; see comment elsewhere about setting connect_timeout)

    The matrix tells us that we cannot connect from node1 to node3, and that (therefore) we don't know the state of any outbound connection from node3.

    In this case, the cluster crosscheck command is more informative:

      $ repmgr -f /etc/repmgr.conf cluster crosscheck
    
      Name   | Id |  1 |  2 |  3
      -------+----+----+----+----
       node1 |  1 |  * |  * |  x
       node2 |  2 |  * |  * |  *
       node3 |  3 |  * |  * |  *
    

    What happened is that cluster crosscheck merged its own cluster matrix with the cluster matrix output from node2; the latter is able to connect to node3 and therefore determine the state of outbound connections from that node.

Backwards compatibility

repmgr is now implemented as a PostgreSQL extension, and all database objects used by repmgr are stored in a dedicated repmgr schema, rather than repmgr_$cluster_name. Note there is no need to install the extension, this will be done automatically by repmgr primary register.

Metadata tables have been revised and are not backwards-compatible with repmgr 3.x. (however future DDL updates will be easier as they can be carried out via the ALTER EXTENSION mechanism.).

An extension upgrade script will be provided for pre-4.0 installations; note this will require the existing repmgr_$cluster_name schema to be renamed to repmgr beforehand.

Some configuration items have had their names changed for consistency and clarity e.g. node => node_id. repmgr will issue a warning about deprecated/altered options.

Some configuration items have been changed to command line options, and vice-versa, e.g. to avoid hard-coding items such as a a node's upstream ID, which might change over time.

See file doc/changes-in-repmgr4.md for more details.

Generating event notifications with repmgr/repmgrd

Each time repmgr or repmgrd perform a significant event, a record of that event is written into the repmgr.events table together with a timestamp, an indication of failure or success, and further details if appropriate. This is useful for gaining an overview of events affecting the replication cluster. However note that this table has advisory character and should be used in combination with the repmgr and PostgreSQL logs to obtain details of any events.

Example output after a primary was registered and a standby cloned and registered:

repmgr=# SELECT * from repmgr.events ;
 node_id |      event       | successful |        event_timestamp        |                                       details
---------+------------------+------------+-------------------------------+-------------------------------------------------------------------------------------
       1 | primary_register  | t          | 2016-01-08 15:04:39.781733+09 |
       2 | standby_clone    | t          | 2016-01-08 15:04:49.530001+09 | Cloned from host 'repmgr_node1', port 5432; backup method: pg_basebackup; --force: N
       2 | standby_register | t          | 2016-01-08 15:04:50.621292+09 |
(3 rows)

Alternatively use repmgr cluster event to output a list of events.

Additionally, event notifications can be passed to a user-defined program or script which can take further action, e.g. send email notifications. This is done by setting the event_notification_command parameter in repmgr.conf.

This parameter accepts the following format placeholders:

%n - node ID
%e - event type
%s - success (1 or 0)
%t - timestamp
%d - details

The values provided for "%t" and "%d" will probably contain spaces, so should be quoted in the provided command configuration, e.g.:

event_notification_command='/path/to/some/script %n %e %s "%t" "%d"'

Additionally the following format placeholders are available for the event type bdr_failover and optionally bdr_recovery:

%c - conninfo string of the next available node
%a - name of the next available node

These should always be quoted.

By default, all notification type will be passed to the designated script; the notification types can be filtered to explicitly named ones:

event_notifications=primary_register,standby_register

The following event types are available:

  • master_register
  • standby_register
  • standby_unregister
  • standby_clone
  • standby_promote
  • standby_follow
  • standby_disconnect_manual
  • repmgrd_start
  • repmgrd_shutdown
  • repmgrd_failover_promote
  • repmgrd_failover_follow
  • bdr_failover
  • bdr_reconnect
  • bdr_recovery
  • bdr_register
  • bdr_unregister

Note that under some circumstances (e.g. no replication cluster master could be located), it will not be possible to write an entry into the repmgr.events table, in which case executing a script via event_notification_command can serve as a fallback by generating some form of notification.

About

A lightweight replication manager for PostgreSQL (Postgres)

Resources

License

Stars

Watchers

Forks

Packages

No packages published

Languages

  • C 98.0%
  • Lex 1.4%
  • Other 0.6%