DB Failover Installation
Overview
This installation guide shows the steps to enable database replication management and fail over via a tool known as RepMgr see RepMgr for more information.
It is expected that the DB Installation tasks have been completed.
This example runs through the installation of a PostgreSQL replication cluster consisting of 3 separate nodes:
- 1 Primary SMS node.
- 1 Secondary SMS node.
- 1 Application Services node.
In a production deployment it is expected that multiple PostgreSQL instances will be created for each ACD component. These steps will need to be performed and altered for each instance accordingly.
It is assumed that deployments will deploy a single primary and secondary node, and N
application service nodes.
Only the primary and secondary nodes will be setup with the ability to elect themselves as the primary in the replica set.
Overall Installation Steps
The high-level steps for installing and deploying PostgreSQL database instances are:
- Install the RepMgr packages.
- Configure RepMgr for each PostgreSQL instance.
- Deploy each PostgreSQL replica set via RepMgr.
- Enable PostgreSQL replica management and fail over via RepMgr.
Setup PostgreSQL Path
By default the PostgreSQL binaries are not added to the path; to make using PostgreSQL and RepMgr binaries easier, add the following to /etc/profile
and restart the current SSH session.
PATH=$PATH:/usr/pgsql-13/bin
Install RepMgr (All Nodes)
RepMgr is installed via the N-Squared Yum repository.
Add the N-Squared distribution repository along side the existing N-Squared delivery repository:
nano /etc/yum.repos.d/nsquared.repo
Adding the following, replacing the username and password as provided by N-Squared.
[2ndquadrant-dl-default-release-pg13]
gpgcheck=1
enabled=1
name=2ndquadrant-dl-default-release-pg13 via N-Squared Software Distributions
baseurl=https://username:password@artefacts.nsquared.nz/yum/repo/distributions/el8/2ndquadrant-dl-default-release-pg13
Clean and update:
dnf clean all
dnf update
Install RepMgr:
dnf install repmgr13
Basic Node Configuration (All Nodes)
Setup /etc/hosts
entries for each of the nodes that will be installed in the cluster, for example:
www.xxx.yyy.zzz sms-01
www.xxx.yyy.zzz sms-02
www.xxx.yyy.zzz as-01
Setup log rotate for the RepMgr service. Logs will be written to: /var/log/repmgr/repmgrd.log
Create a new log rotate configuration:
nano /etc/logrotate.d/repmgrd
Adding:
/var/log/repmgr/repmgrd.log {
missingok
compress
rotate 52
maxsize 100M
weekly
create 0600 postgres postgres
postrotate
/usr/bin/killall -HUP repmgrd
endscript
}
Setup PostgreSQL Self Management (All Nodes)
In order for RepMgr to manage PostgreSQL instances; some sudo
permissions will need to be granted.
These will specifically be restricted to stopping, starting, restarting and reloading the PostgreSQL users own PostgreSQL DB instances.
Setup sudoers
file access for PostgreSQL so that it can manage its own PostgreSQL instance.
nano /etc/sudoers
Defaults:postgres !requiretty
postgres ALL=(ALL) NOPASSWD: /usr/sbin/service postgresql-13 stop, \
/usr/sbin/service postgresql-13 start, \
/usr/sbin/service postgresql-13 reload, \
/usr/sbin/service postgresql-13 restart
Setup PostgreSQL DB Instance (Primary Node Only)
Note: In this example only a single DB instance running under the standard port of 5432
will created. For full installations these steps will need to repeated for all additional PostgreSQL instances.
Initialize the base PostgreSQL database:
postgresql-13-setup initdb
systemctl enable postgresql-13
systemctl start postgresql-13
sudo semanage port -a -t postgresql_port_t -p tcp 5432
firewall-cmd --zone=public --add-port=5432/tcp --permanent
firewall-cmd --reload
Setup the PostgreSQL HBA configuration, this will contain the addresses of all nodes for when replication is enabled.
nano /var/lib/pgsql/13/data/pg_hba.conf
Adding:
host replication repmgr sms-01 trust
host repmgr repmgr sms-01 trust
host replication repmgr sms-02 trust
host repmgr repmgr sms-02 trust
host replication repmgr as-01 trust
host repmgr repmgr as-01 trust
Setup the replication user. RepMgr will create meta tables that are used to store information on the current replication in the repmgr
tablespaces.
su - postgres
createuser -s repmgr
createdb repmgr -O repmgr
Set correct search paths:
su - postgres
psql
ALTER USER repmgr SET search_path TO repmgr, "$user", public;
Configure the PostgreSQL instance, editing: /var/lib/pgsql/13/data/postgres.conf
enabling and altering as follows:
Field | Value | Notes |
---|---|---|
listen_addresses |
localhost,sms-01,sms-02,as-01 |
Contains all nodes that are in the replication set. This is deployed automatically to all nodes as RepMgr initializes the report databases. |
port |
5432 |
The distinct listen port for this DB instance. |
password_encryption |
scram-sha-256 |
Enables a secure password mechanism using encrypted passwords rather than plain text MD5 hashes. Note: This must be set before any users are created and must be set on duplicated instances as the default is md5 . |
max_wal_senders |
10 |
This value should be configured based on the amount of connected replica sets. General rule of thumb is: (Number of Nodes) * 2) + 2 |
wal_level |
replica |
|
hot_standby |
on |
|
archive_mode |
on |
|
archive_command |
'true' |
|
shared_preload_libraries |
'repmgr' |
This enables the shared preloaded libraries that allow the repmgrd daemon process the ability to access and manage the PostgreSQL instances. |
wal_log_hints |
on |
|
max_replication_slots |
4 |
The number of active replica slots to allow. General rule of thumb is: (Number of Nodes) + 1 |
Create the RepMgr configuration:
nano /etc/repmgr/13/repmgr.conf
With the config:
# Base configuration.
node_id=1
node_name='sms-01'
conninfo='host=sms-01 user=repmgr dbname=repmgr connect_timeout=5'
data_directory='/var/lib/pgsql/13/data'
log_file='/var/log/repmgr/repmgrd.log'
pg_bindir='/usr/pgsql-13/bin'
use_replication_slots=true
# Automatic failover configuration.
failover=automatic
promote_command='/usr/pgsql-13/bin/repmgr standby promote -f /etc/repmgr/13/repmgr.conf --log-to-file'
follow_command='/usr/pgsql-13/bin/repmgr standby follow -f /etc/repmgr/13/repmgr.conf --log-to-file --upstream-node-id=%n'
priority=100
standby_follow_restart=false
service_start_command = 'sudo service postgresql-13 start'
service_stop_command = 'sudo service postgresql-13 stop'
service_restart_command = 'sudo service postgresql-13 restart'
reconnect_attempts=10
reconnect_interval=20
The primary node now needs to be registered so that it can be managed by RepMgr. Note: All repmgr
command executions must be performed as the postgres
user and cannot be executed as root
.
su - postgres
Register the instance as the cluster primary:
repmgr -f /etc/repmgr/13/repmgr.conf primary register
The cluster status can be viewed via:
repmgr -f /etc/repmgr/13/repmgr.conf cluster show
Setup PostgreSQL DB Instance (Secondary Node)
All PostgreSQL initialization tasks will be performed by RepMgr.
Prepare the data directories for syncing:
- Check that the target data directory exists.
- Check that the target data directory is owned by the
postgres
user. - Check that the target data directory has permissions set to
0700
.
For example:
[postgres@pg-rep-02 13]$ ls -larth
drwx------. 20 postgres postgres 4.0K Feb 12 00:00 data
If no direct exists it can be created with:
mkdir /var/lib/pgsql/13/data
chown -R postgres:postgres /var/lib/pgsql/13/data
chmod 700 /var/lib/pgsql/13/data
semanage fcontext -a -t postgresql_db_t "/var/lib/pgsql/13/data(/.*)?"
restorecon -R -v /var/lib/pgsql/13/data
Ensure the service is enabled and correct permissions are granted.
Note: The service is not started at this stage. RepMgr will start it for us.
systemctl enable postgresql-13
sudo semanage port -a -t postgresql_port_t -p tcp 5432
firewall-cmd --zone=public --add-port=5432/tcp --permanent
firewall-cmd --reload
Create RepMgr configuration for the secondary node:
nano /etc/repmgr/13/repmgr.conf
With the contents:
# Base configuration.
node_id=2
node_name='sms-02'
conninfo='host=sms-02 user=repmgr dbname=repmgr connect_timeout=5'
data_directory='/var/lib/pgsql/13/data'
log_file='/var/log/repmgr/repmgrd.log'
pg_bindir='/usr/pgsql-13/bin'
use_replication_slots=true
# Automatic failover configuration.
failover=automatic
promote_command='/usr/pgsql-13/bin/repmgr standby promote -f /etc/repmgr/13/repmgr.conf --log-to-file'
follow_command='/usr/pgsql-13/bin/repmgr standby follow -f /etc/repmgr/13/repmgr.conf --log-to-file --upstream-node-id=%n'
priority=100
standby_follow_restart=false
service_start_command = 'sudo service postgresql-13 start'
service_stop_command = 'sudo service postgresql-13 stop'
service_restart_command = 'sudo service postgresql-13 restart'
reconnect_attempts=10
reconnect_interval=20
Check connectivity to the primary node:
repmgr -h sms-01 -U repmgr -d repmgr -f /etc/repmgr/13/repmgr.conf standby clone --dry-run
Perform the database sync from the primary node:
repmgr -h sms-01 -U repmgr -d repmgr -f /etc/repmgr/13/repmgr.conf standby clone
Start the PostgreSQL instance on the secondary node:
service postgresql-13 start
Register the secondary node into the cluster:
repmgr -f /etc/repmgr/13/repmgr.conf standby register
Checking the cluster status both the primary and secondary nodes should be visible:
repmgr -f /etc/repmgr/13/repmgr.conf cluster show
Setup PostgreSQL DB Instance (Application Nodes)
All PostgreSQL initialization tasks will be performed by RepMgr.
Prepare the data directories for syncing:
- Check that the target data directory exists.
- Check that the target data directory is owned by the
postgres
user. - Check that the target data directory has permissions set to
0700
.
For example:
[postgres@pg-rep-02 13]$ ls -larth
drwx------. 20 postgres postgres 4.0K Feb 12 00:00 data
If no direct exists it can be created with:
mkdir /var/lib/pgsql/13/data
chown -R postgres:postgres /var/lib/pgsql/13/data
chmod 700 /var/lib/pgsql/13/data
semanage fcontext -a -t postgresql_db_t "/var/lib/pgsql/13/data(/.*)?"
restorecon -R -v /var/lib/pgsql/13/data
Ensure the service is enabled and correct permissions are granted.
Note: The service is not started at this stage. RepMgr will start it for us.
systemctl enable postgresql-13
sudo semanage port -a -t postgresql_port_t -p tcp 5432
firewall-cmd --zone=public --add-port=5432/tcp --permanent
firewall-cmd --reload
Create RepMgr configuration for the application node:
nano /etc/repmgr/13/repmgr.conf
Note: All application nodes will use the following base configuration. The node_id
, node_name
and conninfo
will need to be updated accordingly to match each new node.
Note: The major difference between the application nodes and the secondary node is a priority value of 0
this will ensure that the application nodes are never elected as a primary node in the situation of a primary failure.
# Base configuration.
node_id=3
node_name='as-01'
conninfo='host=as-01 user=repmgr dbname=repmgr connect_timeout=5'
data_directory='/var/lib/pgsql/13/data'
log_file='/var/log/repmgr/repmgrd.log'
pg_bindir='/usr/pgsql-13/bin'
use_replication_slots=true
# Automatic failover configuration.
failover=automatic
promote_command='/usr/pgsql-13/bin/repmgr standby promote -f /etc/repmgr/13/repmgr.conf --log-to-file'
follow_command='/usr/pgsql-13/bin/repmgr standby follow -f /etc/repmgr/13/repmgr.conf --log-to-file --upstream-node-id=%n'
priority=0
standby_follow_restart=false
service_start_command = 'sudo service postgresql-13 start'
service_stop_command = 'sudo service postgresql-13 stop'
service_restart_command = 'sudo service postgresql-13 restart'
reconnect_attempts=10
reconnect_interval=20
Check connectivity to the primary node:
repmgr -h sms-01 -U repmgr -d repmgr -f /etc/repmgr/13/repmgr.conf standby clone --dry-run
Perform the database sync from the primary node:
repmgr -h sms-01 -U repmgr -d repmgr -f /etc/repmgr/13/repmgr.conf standby clone
Start the PostgreSQL instance on the application node:
service postgresql-13 start
Register the application node into the cluster:
repmgr -f /etc/repmgr/13/repmgr.conf standby register
Checking the cluster status all nodes should be visible:
repmgr -f /etc/repmgr/13/repmgr.conf cluster show
RepMgr Daemon Process (All Nodes)
RepMgr features a daemon process that is responsible for monitoring the cluster health and performing fail over tasks.
Once the cluster is initialized configure the RepMgr daemon process to start watching the new PostgreSQL instance.
Note: A service configuration will be required for each DB instance that is being managed; each of these will map to a different RepMgr config file defined above.
Enable and start the daemon process:
systemctl enable repmgr13
systemctl start repmgr13
The RepMgr service status can be checked with:
repmgr -f /etc/repmgr/13/repmgr.conf service status
Output logs for the daemon process can be found in /var/log/repmgr
.