Navigation :

DB Failover Installation

Overview

This installation guide shows the steps to enable database replication management and fail over via a tool known as RepMgr see RepMgr for more information.

It is expected that the DB Installation tasks have been completed.

This example runs through the installation of a PostgreSQL replication cluster consisting of 3 separate nodes:

1 Primary SMS node.
1 Secondary SMS node.
1 Application Services node.

In a production deployment it is expected that multiple PostgreSQL instances will be created for each ACD component. These steps will need to be performed and altered for each instance accordingly.

It is assumed that deployments will deploy a single primary and secondary node, and N application service nodes. Only the primary and secondary nodes will be setup with the ability to elect themselves as the primary in the replica set.

Overall Installation Steps

The high-level steps for installing and deploying PostgreSQL database instances are:

Install the RepMgr packages.
Configure RepMgr for each PostgreSQL instance.
Deploy each PostgreSQL replica set via RepMgr.
Enable PostgreSQL replica management and fail over via RepMgr.

Setup PostgreSQL Path

By default the PostgreSQL binaries are not added to the path; to make using PostgreSQL and RepMgr binaries easier, add the following to /etc/profile and restart the current SSH session.

PATH=$PATH:/usr/pgsql-13/bin

Install RepMgr (All Nodes)

RepMgr is installed via the N-Squared Yum repository.

Add the N-Squared distribution repository along side the existing N-Squared delivery repository:

nano /etc/yum.repos.d/nsquared.repo

Adding the following, replacing the username and password as provided by N-Squared.

[2ndquadrant-dl-default-release-pg13]
gpgcheck=1
enabled=1
name=2ndquadrant-dl-default-release-pg13 via N-Squared Software Distributions
baseurl=https://username:password@artefacts.nsquared.nz/yum/repo/distributions/el8/2ndquadrant-dl-default-release-pg13

Clean and update:

dnf clean all
dnf update

Install RepMgr:

dnf install repmgr13

Basic Node Configuration (All Nodes)

Setup /etc/hosts entries for each of the nodes that will be installed in the cluster, for example:

www.xxx.yyy.zzz sms-01
www.xxx.yyy.zzz sms-02
www.xxx.yyy.zzz as-01

Setup log rotate for the RepMgr service. Logs will be written to: /var/log/repmgr/repmgrd.log

Create a new log rotate configuration:

    nano /etc/logrotate.d/repmgrd

Adding:

/var/log/repmgr/repmgrd.log {
        missingok
        compress
        rotate 52
        maxsize 100M
        weekly
        create 0600 postgres postgres
        postrotate
            /usr/bin/killall -HUP repmgrd
        endscript
}

Setup PostgreSQL Self Management (All Nodes)

In order for RepMgr to manage PostgreSQL instances; some sudo permissions will need to be granted.

These will specifically be restricted to stopping, starting, restarting and reloading the PostgreSQL users own PostgreSQL DB instances.

Setup sudoers file access for PostgreSQL so that it can manage its own PostgreSQL instance.

    nano /etc/sudoers

    Defaults:postgres !requiretty
    postgres ALL=(ALL) NOPASSWD: /usr/sbin/service postgresql-13 stop, \
            /usr/sbin/service postgresql-13 start, \
            /usr/sbin/service postgresql-13 reload, \
            /usr/sbin/service postgresql-13 restart

Setup PostgreSQL DB Instance (Primary Node Only)

Note: In this example only a single DB instance running under the standard port of 5432 will created. For full installations these steps will need to repeated for all additional PostgreSQL instances.

Initialize the base PostgreSQL database:

postgresql-13-setup initdb
systemctl enable postgresql-13
systemctl start postgresql-13
sudo semanage port -a -t postgresql_port_t -p tcp 5432
firewall-cmd --zone=public --add-port=5432/tcp --permanent
firewall-cmd --reload

Setup the PostgreSQL HBA configuration, this will contain the addresses of all nodes for when replication is enabled.

nano /var/lib/pgsql/13/data/pg_hba.conf

Adding:

host replication repmgr sms-01 trust
host repmgr      repmgr sms-01 trust

host replication repmgr sms-02 trust
host repmgr      repmgr sms-02 trust

host replication repmgr as-01 trust
host repmgr      repmgr as-01 trust

Setup the replication user. RepMgr will create meta tables that are used to store information on the current replication in the repmgr tablespaces.

su - postgres
createuser -s repmgr
createdb repmgr -O repmgr

Set correct search paths:

su - postgres
psql
ALTER USER repmgr SET search_path TO repmgr, "$user", public;

Configure the PostgreSQL instance, editing: /var/lib/pgsql/13/data/postgres.conf enabling and altering as follows:

Field	Value	Notes
`listen_addresses`	`localhost,sms-01,sms-02,as-01`	Contains all nodes that are in the replication set. This is deployed automatically to all nodes as RepMgr initializes the report databases.
`port`	`5432`	The distinct listen port for this DB instance.
`password_encryption`	`scram-sha-256`	Enables a secure password mechanism using encrypted passwords rather than plain text MD5 hashes. Note: This must be set before any users are created and must be set on duplicated instances as the default is `md5`.
`max_wal_senders`	`10`	This value should be configured based on the amount of connected replica sets. General rule of thumb is: `(Number of Nodes) * 2) + 2`
`wal_level`	`replica`
`hot_standby`	`on`
`archive_mode`	`on`
`archive_command`	`'true'`
`shared_preload_libraries`	`'repmgr'`	This enables the shared preloaded libraries that allow the `repmgrd` daemon process the ability to access and manage the PostgreSQL instances.
`wal_log_hints`	`on`
`max_replication_slots`	`4`	The number of active replica slots to allow. General rule of thumb is: `(Number of Nodes) + 1`

Create the RepMgr configuration:

nano /etc/repmgr/13/repmgr.conf

With the config:

# Base configuration.
node_id=1
node_name='sms-01'
conninfo='host=sms-01 user=repmgr dbname=repmgr connect_timeout=5'
data_directory='/var/lib/pgsql/13/data'
log_file='/var/log/repmgr/repmgrd.log'
pg_bindir='/usr/pgsql-13/bin'
use_replication_slots=true

# Automatic failover configuration.
failover=automatic
promote_command='/usr/pgsql-13/bin/repmgr standby promote -f /etc/repmgr/13/repmgr.conf --log-to-file'
follow_command='/usr/pgsql-13/bin/repmgr standby follow -f /etc/repmgr/13/repmgr.conf --log-to-file --upstream-node-id=%n'
priority=100
standby_follow_restart=false
service_start_command = 'sudo service postgresql-13 start'
service_stop_command = 'sudo service postgresql-13 stop'
service_restart_command = 'sudo service postgresql-13 restart'
reconnect_attempts=10
reconnect_interval=20

The primary node now needs to be registered so that it can be managed by RepMgr. Note: All repmgr command executions must be performed as the postgres user and cannot be executed as root.

su - postgres

repmgr -f /etc/repmgr/13/repmgr.conf primary register

The cluster status can be viewed via:

repmgr -f /etc/repmgr/13/repmgr.conf cluster show

Setup PostgreSQL DB Instance (Secondary Node)

All PostgreSQL initialization tasks will be performed by RepMgr.

Prepare the data directories for syncing:

Check that the target data directory exists.
Check that the target data directory is owned by the postgres user.
Check that the target data directory has permissions set to 0700.

For example:

[postgres@pg-rep-02 13]$ ls -larth
drwx------. 20 postgres postgres 4.0K Feb 12 00:00 data

If no direct exists it can be created with:

mkdir /var/lib/pgsql/13/data
chown -R postgres:postgres /var/lib/pgsql/13/data
chmod 700 /var/lib/pgsql/13/data
semanage fcontext -a -t postgresql_db_t "/var/lib/pgsql/13/data(/.*)?"
restorecon -R -v /var/lib/pgsql/13/data

Ensure the service is enabled and correct permissions are granted.

Note: The service is not started at this stage. RepMgr will start it for us.

systemctl enable postgresql-13
sudo semanage port -a -t postgresql_port_t -p tcp 5432
firewall-cmd --zone=public --add-port=5432/tcp --permanent
firewall-cmd --reload

Create RepMgr configuration for the secondary node:

nano /etc/repmgr/13/repmgr.conf

With the contents:

# Base configuration.
node_id=2
node_name='sms-02'
conninfo='host=sms-02 user=repmgr dbname=repmgr connect_timeout=5'
data_directory='/var/lib/pgsql/13/data'
log_file='/var/log/repmgr/repmgrd.log'
pg_bindir='/usr/pgsql-13/bin'
use_replication_slots=true

# Automatic failover configuration.
failover=automatic
promote_command='/usr/pgsql-13/bin/repmgr standby promote -f /etc/repmgr/13/repmgr.conf --log-to-file'
follow_command='/usr/pgsql-13/bin/repmgr standby follow -f /etc/repmgr/13/repmgr.conf --log-to-file --upstream-node-id=%n'
priority=100
standby_follow_restart=false
service_start_command = 'sudo service postgresql-13 start'
service_stop_command = 'sudo service postgresql-13 stop'
service_restart_command = 'sudo service postgresql-13 restart'
reconnect_attempts=10
reconnect_interval=20

Check connectivity to the primary node:

repmgr -h sms-01 -U repmgr -d repmgr -f /etc/repmgr/13/repmgr.conf standby clone --dry-run

Perform the database sync from the primary node:

repmgr -h sms-01 -U repmgr -d repmgr -f /etc/repmgr/13/repmgr.conf standby clone

Start the PostgreSQL instance on the secondary node:

service postgresql-13 start

repmgr -f /etc/repmgr/13/repmgr.conf standby register

Checking the cluster status both the primary and secondary nodes should be visible:

repmgr -f /etc/repmgr/13/repmgr.conf cluster show

Setup PostgreSQL DB Instance (Application Nodes)

All PostgreSQL initialization tasks will be performed by RepMgr.

Prepare the data directories for syncing:

Check that the target data directory exists.
Check that the target data directory is owned by the postgres user.
Check that the target data directory has permissions set to 0700.

For example:

[postgres@pg-rep-02 13]$ ls -larth
drwx------. 20 postgres postgres 4.0K Feb 12 00:00 data

If no direct exists it can be created with:

mkdir /var/lib/pgsql/13/data
chown -R postgres:postgres /var/lib/pgsql/13/data
chmod 700 /var/lib/pgsql/13/data
semanage fcontext -a -t postgresql_db_t "/var/lib/pgsql/13/data(/.*)?"
restorecon -R -v /var/lib/pgsql/13/data

Ensure the service is enabled and correct permissions are granted.

Note: The service is not started at this stage. RepMgr will start it for us.

systemctl enable postgresql-13
sudo semanage port -a -t postgresql_port_t -p tcp 5432
firewall-cmd --zone=public --add-port=5432/tcp --permanent
firewall-cmd --reload

Create RepMgr configuration for the application node:

nano /etc/repmgr/13/repmgr.conf

Note: All application nodes will use the following base configuration. The node_id , node_name and conninfo will need to be updated accordingly to match each new node.

Note: The major difference between the application nodes and the secondary node is a priority value of 0 this will ensure that the application nodes are never elected as a primary node in the situation of a primary failure.

# Base configuration.
node_id=3
node_name='as-01'
conninfo='host=as-01 user=repmgr dbname=repmgr connect_timeout=5'
data_directory='/var/lib/pgsql/13/data'
log_file='/var/log/repmgr/repmgrd.log'
pg_bindir='/usr/pgsql-13/bin'
use_replication_slots=true

# Automatic failover configuration.
failover=automatic
promote_command='/usr/pgsql-13/bin/repmgr standby promote -f /etc/repmgr/13/repmgr.conf --log-to-file'
follow_command='/usr/pgsql-13/bin/repmgr standby follow -f /etc/repmgr/13/repmgr.conf --log-to-file --upstream-node-id=%n'
priority=0
standby_follow_restart=false
service_start_command = 'sudo service postgresql-13 start'
service_stop_command = 'sudo service postgresql-13 stop'
service_restart_command = 'sudo service postgresql-13 restart'
reconnect_attempts=10
reconnect_interval=20

Check connectivity to the primary node:

repmgr -h sms-01 -U repmgr -d repmgr -f /etc/repmgr/13/repmgr.conf standby clone --dry-run

Perform the database sync from the primary node:

repmgr -h sms-01 -U repmgr -d repmgr -f /etc/repmgr/13/repmgr.conf standby clone

Start the PostgreSQL instance on the application node:

service postgresql-13 start

repmgr -f /etc/repmgr/13/repmgr.conf standby register

Checking the cluster status all nodes should be visible:

repmgr -f /etc/repmgr/13/repmgr.conf cluster show

RepMgr Daemon Process (All Nodes)

RepMgr features a daemon process that is responsible for monitoring the cluster health and performing fail over tasks.

Once the cluster is initialized configure the RepMgr daemon process to start watching the new PostgreSQL instance.

Note: A service configuration will be required for each DB instance that is being managed; each of these will map to a different RepMgr config file defined above.

Enable and start the daemon process:

systemctl enable repmgr13
systemctl start repmgr13

The RepMgr service status can be checked with:

repmgr -f /etc/repmgr/13/repmgr.conf service status

Output logs for the daemon process can be found in /var/log/repmgr.