How to Set Up a Highly Available PostgreSQL 15 Cluster on Cent OS8

PostgreSQL is a popular open-source relational database management system (RDBMS) used by many organizations to store and manage their data. To ensure high availability and prevent downtime, it’s essential to have a robust HA setup for PostgreSQL. Patroni, HAProxy, Keepalived, etcd, and watchdog can be used together to achieve this goal.

Patroni:

Patroni is an open-source tool used to manage PostgreSQL HA clusters. It’s a lightweight solution that uses the PostgreSQL replication mechanism to ensure that all nodes in the cluster have an up-to-date copy of the database. Patroni provides automatic failover and ensures that only one node is active at any given time.

HAProxy:

HAProxy is a load balancer that can be used to distribute incoming traffic across multiple PostgreSQL nodes in a cluster. It acts as a proxy between clients and the active node in the cluster and ensures that requests are routed to the appropriate node. HAProxy can also detect when a node becomes unavailable and automatically switch to a different node in the cluster.

Keepalived:

Keepalived is a Linux-based software that provides high availability for Linux systems. It can be used to monitor the health of the PostgreSQL nodes and automatically switch the virtual IP address to a healthy node in case of a failure. This ensures that client connections are redirected to the active node in the cluster.

etcd:

etcd is a distributed key-value store used to store configuration data for the PostgreSQL HA cluster. It’s used by Patroni to manage the state of the cluster, and by HAProxy and Keepalived to determine which node is currently active. etcd ensures that the configuration data is consistent across all nodes in the cluster.

Watchdog:

The watchdog process is responsible for monitoring the health of the PostgreSQL node and its associated processes. The watchdog process ensures that the PostgreSQL node is running correctly and that it’s able to serve client requests. If the watchdog detects an issue with the node or its processes, it can take action to recover the node or initiate failover to a standby node.

System Requirements 

To follow this tutorial along, you will need a minimum of 6 virtual machines installed with CentOS 8. The purpose is defined is below

HostnameIPAddressPurposeOptional
p1192.168.110.131etcd, HAProxy, keepalivedPostgreSQL
p2192.168.110.132etcd, HAProxy, keepalivedPostgreSQL
p3192.168.110.133etcd, HAProxy, keepalivedPostgreSQL
p4192.168.110.134PostgreSQL+Patroni 
p5192.168.110.135PostgreSQL+Patroni 
p5192.168.110.136PostgreSQL+Patroni 
ha-vip192.168.110.140HA-VIP 

In this lab set up,

  • I will use the words p1, p2 etc., to refer machine 1, 2 and so on. For example., execute “the command in p1″ means execute the command in “192.168.110.131” as mentioned in table above.
  • If I mention [all machines] indicate that commands have to be executed in all machines.
  • If I mention [… accordingly] it means you need to make necessary changes as mentioned in color red in each of the machines.

Create postgres user [all machines]

useradd postgres
passwd postgres

Set the hostname [all machines accordingly]

sudo hostnamectl set-hostname p1
reboot

You should see something like this

[root@p1 ~]#

For machine 2, it should be

sudo hostnamectl set-hostname p2 
reboot

and repeat for p3,p4,p5 and p6.,

Get IP Addresses of all machines and edit /etc/hosts file: [all machines]

vi /etc/hosts

add the below lines

192.168.110.131	 p1
192.168.110.132	 p2
192.168.110.133	 p3
192.168.110.134  p4
192.168.110.135  p5
192.168.110.136  p6
192.168.110.140  ha-vip

Here 192.168.110.140 can be any IP which is not being used and is in the network, I choose 192.168.110.140 randomly. we use his IP address in keepalived configuration below.

Disable Selinux [all machines]

Log in to p1 and edit /etc/selinux/config file with any of your favorite text editor:

vi /etc/selinux/config

Change SELINUX=enforcing to SELINUX=disabled

SELINUX=disabled

Reboot to make the selinux changes effect:

reboot

Configure Firewalld

To set up Patroni for PostgreSQL with additional components like HAProxy, etcd, Keepalived, Pgbouncer, and web servers the following ports need to be opened:

  1. Patroni:
  • 8008: This port is used for the HTTP API of Patroni.
  1. HAProxy:
  • 5000: This port is used for HTTP connections to the backend (PostgreSQL service).
  • 5001: This port is used for HTTPS connections to the backend (PostgreSQL service).
  1. etcd:
  • 2379: This port is used for etcd client communication.
  • 2380: This port is used for etcd server-to-server communication.
  1. Keepalived:
  • 112: This port is used for the Virtual Router Redundancy Protocol (VRRP) communication between Keepalived instances.
  • 5405: This port is used for the multicast traffic between Keepalived instances.
  1. Pgbouncer:
  • 6432: This port is used for Pgbouncer client connections.
  1. PostgreSQL:
  • 5432: This port is used for PostgreSQL client connections.
  1. Web server:
  • 5000: This port is used for HTTP connections to the web server.
  • 5001: This port is used for HTTPS connections to the web server.
  • 7000: This port is used for the reverse proxy connection from the web server to HAProxy.

These ports are required to allow communication between the different components of the setup. However, the specific port configuration can vary depending on your specific setup and requirements. It’s important to adjust firewall rules and network configurations accordingly to ensure the proper functioning and security of the setup.

Run the below commands to open above ports [all machines]

firewall-cmd --zone=public --add-port=5432/tcp --permanent
firewall-cmd --zone=public --add-port=6432/tcp --permanent
firewall-cmd --zone=public --add-port=8008/tcp --permanent
firewall-cmd --zone=public --add-port=2379/tcp --permanent
firewall-cmd --zone=public --add-port=2380/tcp --permanent
firewall-cmd --permanent --zone=public --add-service=http
firewall-cmd --zone=public --add-port=5000/tcp --permanent
firewall-cmd --zone=public --add-port=5001/tcp --permanent
firewall-cmd --zone=public --add-port=7000/tcp --permanent
firewall-cmd --zone=public --add-port=112/tcp --permanent
firewall-cmd --zone=public --add-port=5405/tcp --permanent
firewall-cmd --add-rich-rule='rule protocol value="vrrp" accept' --permanent
firewall-cmd --reload

Set up etcd for PostgreSQL 15 HA with Patroni

Install etcd [p1,p2,p3]

Create etcd.repo file like below:

vi /etc/yum.repos.d/etcd.repo

Add following:

[etcd]
name=PostgreSQL common RPMs for RHEL / Rocky $releasever - $basearch
baseurl=http://ftp.postgresql.org/pub/repos/yum/common/pgdg-rhel8-extras/redhat/rhel-$releasever-$basearch
enabled=1
gpgcheck=1
gpgkey=file:///etc/pki/rpm-gpg/RPM-GPG-KEY-PGDG
repo_gpgcheck = 1

Save and close the editor when you are finished.

Type following command to install etcd:

dnf -y install etcd

Configure etcd [p1,p2,p3 accordingly]

Edit the configuration file in p1

mv /etc/etcd/etcd.conf /etc/etcd/etcd.conf.orig
vi /etc/etcd/etcd.conf

Add following configuration(Give IP address of current machine in all places, ETCD_INITIAL_CLUSTER should get IP addresses of p1,p2 and p3):

Add the following configuration and save.

# This is an example configuration file for an etcd cluster with 3 nodes

#specify the name of an etcd member

ETCD_NAME=p1

# Configure the data directory
ETCD_DATA_DIR="/var/lib/etcd/p1"

# Configure the listen URLs
ETCD_LISTEN_PEER_URLS="http://192.168.110.131:2380"
ETCD_LISTEN_CLIENT_URLS="http://192.168.110.131:2379"

# Configure the advertise URLs
ETCD_INITIAL_ADVERTISE_PEER_URLS="http://192.168.110.131:2380"
ETCD_ADVERTISE_CLIENT_URLS="http://192.168.110.131:2379"

# Configure the initial cluster
ETCD_INITIAL_CLUSTER="p1=http://192.168.110.131:2380,p2=http://192.168.110.132:2380,p3=http://192.168.110.133:2380"

# Configure the initial cluster state
ETCD_INITIAL_CLUSTER_STATE="new"

# Configure the initial cluster token
ETCD_INITIAL_CLUSTER_TOKEN="etcd-cluster"

# Configure v2 API
ETCD_ENABLE_V2="true"

Edit the configuration file in p2

mv /etc/etcd/etcd.conf /etc/etcd/etcd.conf.orig 
vi /etc/etcd/etcd.conf

Add the following configuration and save.

# This is an example configuration file for an etcd cluster with 3 nodes

#specify the name of an etcd member
ETCD_NAME=p2

# Configure the data directory
ETCD_DATA_DIR="/var/lib/etcd/p2"

# Configure the listen URLs
ETCD_LISTEN_PEER_URLS="http://192.168.110.132:2380"
ETCD_LISTEN_CLIENT_URLS="http://192.168.110.132:2379"

# Configure the advertise URLs
ETCD_INITIAL_ADVERTISE_PEER_URLS="http://192.168.110.132:2380"
ETCD_ADVERTISE_CLIENT_URLS="http://192.168.110.132:2379"

# Configure the initial cluster
ETCD_INITIAL_CLUSTER="p1=http://192.168.110.131:2380,p2=http://192.168.110.132:2380,p3=http://192.168.110.133:2380"

# Configure the initial cluster state
ETCD_INITIAL_CLUSTER_STATE="new"

# Configure the initial cluster token
ETCD_INITIAL_CLUSTER_TOKEN="etcd-cluster"

# Configure v2 API
ETCD_ENABLE_V2="true"

Edit the configuration file in p3

mv /etc/etcd/etcd.conf /etc/etcd/etcd.conf.orig 
vi /etc/etcd/etcd.conf

Add the following configuration and save.

# This is an example configuration file for an etcd cluster with 3 nodes

#specify the name of an etcd member
ETCD_NAME=p3

# Configure the data directory
ETCD_DATA_DIR="/var/lib/etcd/p3"

# Configure the listen URLs
ETCD_LISTEN_PEER_URLS="http://192.168.110.133:2380"
ETCD_LISTEN_CLIENT_URLS="http://192.168.110.133:2379"

# Configure the advertise URLs
ETCD_INITIAL_ADVERTISE_PEER_URLS="http://192.168.110.133:2380"
ETCD_ADVERTISE_CLIENT_URLS="http://192.168.110.133:2379"

# Configure the initial cluster
ETCD_INITIAL_CLUSTER="p1=http://192.168.110.131:2380,p2=http://192.168.110.132:2380,p3=http://192.168.110.133:2380"

# Configure the initial cluster state
ETCD_INITIAL_CLUSTER_STATE="new"

# Configure the initial cluster token
ETCD_INITIAL_CLUSTER_TOKEN="etcd-cluster"

# Configure v2 API
ETCD_ENABLE_V2="true"

Now start the etcd on all the three machines [p1,p2,p3]

systemctl start etcd
systemctl enable etcd
systemctl status etcd

add below lines in the bash profile [p1,p2,p3]

p1=192.168.110.131
p2=192.168.110.132
p3=192.168.110.133
ENDPOINTS=$p1:2379,$p2:2379,$p3:2379

Check if etcd has been successfully working with

. .bash_profile
etcdctl endpoint status --write-out=table --endpoints=$ENDPOINTS

The sample output should look like this

[root@p1 ~]# etcdctl endpoint status --write-out=table --endpoints=$ENDPOINTS
+----------------------+------------------+---------+---------+-----------+------------+-----------+------------+--------------------+--------+
|       ENDPOINT       |        ID        | VERSION | DB SIZE | IS LEADER | IS LEARNER | RAFT TERM | RAFT INDEX | RAFT APPLIED INDEX | ERRORS |
+----------------------+------------------+---------+---------+-----------+------------+-----------+------------+--------------------+--------+
| 192.168.110.131:2379 | 341a3c460c1c993a |   3.5.7 |   20 kB |      true |      false |         2 |          9 |                  9 |        |
| 192.168.110.132:2379 | 4679fe0fcb37326d |   3.5.7 |   20 kB |     false |      false |         2 |          9 |                  9 |        |
| 192.168.110.133:2379 | ab23bcc86cf3190b |   3.5.7 |   20 kB |     false |      false |         2 |          9 |                  9 |        |
+----------------------+------------------+---------+---------+-----------+------------+-----------+------------+--------------------+--------+
[root@p1 ~]#

Set up keepalived for PostgreSQL 15 HA with Patroni

Install Keepalived [p1,p2,p3]

dnf -y install keepalived

Configure Keepalived [p1,p2,p3]

Edit the /etc/sysctl.conf file to allow the server to bind to the virtual IP address.

vi  /etc/sysctl.conf

Add the net.ipv4_ip_nonlocal_bind=1 directive, which allows the server to accept connections for IP addresses that are not bound to any of its interfaces, enabling the use of a floating, virtual IP:

net.ipv4.ip_nonlocal_bind = 1
net.ipv4.ip_forward = 1

Save and close the editor when you are finished.

Type the following command to reload settings from config file without rebooting:

sudo sysctl --system
sudo sysctl -p

Create /etc/keepalived/keepalived.conf and make necessary changes [p1,p2,p3]

Get the interface name of your network with ifconfig

[root@p1 ~]# ifconfig
ens33: flags=4163<UP,BROADCAST,RUNNING,MULTICAST> mtu 1500
inet 192.168.110.131 netmask 255.255.255.0 broadcast 192.168.110.255

Now edit the /etc/keepalived/keepalived.conf

vi /etc/keepalived/keepalived.conf

Add following configuration:

vrrp_script check_haproxy {
script "pkill -0 haproxy"
interval 2
weight 2
}
vrrp_instance VI_1 {
state MASTER
interface ens33
virtual_router_id 51
priority 101
advert_int 1
virtual_ipaddress {
192.168.110.140
}
track_script {
check_haproxy
}
}

Repeat the same in p2

vi /etc/keepalived/keepalived.conf

Add following configuration:

p2 file should look like below

vrrp_script check_haproxy {
script "pkill -0 haproxy"
interval 2
weight 2
}
vrrp_instance VI_1 {
state BACKUP
interface ens33
virtual_router_id 51
priority 100
advert_int 1
virtual_ipaddress {
192.168.110.140
}
track_script {
check_haproxy
}
}

Repeat the same in p3

vi /etc/keepalived/keepalived.conf

Add following configuration:

p3 file should look like below

vrrp_script check_haproxy {
script "pkill -0 haproxy"
interval 2
weight 2
}
vrrp_instance VI_1 {
state BACKUP
interface ens33
virtual_router_id 51
priority 100
advert_int 1
virtual_ipaddress {
192.168.110.140
}
track_script {
check_haproxy
}
}

Start the keepalived service on all the machines.

systemctl start keepalived
systemctl enable keepalived
systemctl status keepalived

Check on your (p1) node to see if your (ens33) network interface has config

ured with an additional shared IP (192.168.110.140) with the command below

 ip addr show ens33

the output should look like this

[root@p1 ~]# ip addr show ens33
2: ens33: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc fq_codel state UP group default qlen 1000
    link/ether 00:0c:29:5c:75:6a brd ff:ff:ff:ff:ff:ff
    altname enp2s1
    inet 192.168.110.131/24 brd 192.168.110.255 scope global dynamic noprefixroute ens33
       valid_lft 1107sec preferred_lft 1107sec
    inet 192.168.110.140/32 scope global ens33
       valid_lft forever preferred_lft forever
    inet6 fe80::20c:29ff:fe5c:756a/64 scope link noprefixroute
       valid_lft forever preferred_lft forever
[root@p1 ~]#

and you cannot see 192.168.110.140 in p2 and p3, because it’s a floating IP and will be moved to p2 or p3 when p1 is unavailable.

[root@p2 ~]# ip addr show ens33
2: ens33: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc fq_codel state UP group default qlen 1000
    link/ether 00:0c:29:fc:55:9d brd ff:ff:ff:ff:ff:ff
    altname enp2s1
    inet 192.168.110.132/24 brd 192.168.110.255 scope global dynamic noprefixroute ens33
       valid_lft 1110sec preferred_lft 1110sec
    inet6 fe80::20c:29ff:fefc:559d/64 scope link noprefixroute
       valid_lft forever preferred_lft forever
[root@p2 ~]#
[root@p3 ~]# ip addr show ens33
2: ens33: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc fq_codel state UP group default qlen 1000
    link/ether 00:0c:29:7d:46:36 brd ff:ff:ff:ff:ff:ff
    altname enp2s1
    inet 192.168.110.133/24 brd 192.168.110.255 scope global dynamic noprefixroute ens33
       valid_lft 1120sec preferred_lft 1120sec
    inet6 fe80::20c:29ff:fe7d:4636/64 scope link noprefixroute
       valid_lft forever preferred_lft forever
[root@p3 ~]#

Set up HAProxy for PostgreSQL 15 HA with Patroni

 

Install HAProxy [p1,p2,p3]

dnf -y install haproxy

Configure HAProxy [p1,p2,p3]

Edit haproxy.cfg file on your first node and repeat the same in p2 and p3 and make necessary changes in configuration files

mv /etc/haproxy/haproxy.cfg /etc/haproxy/haproxy.cfg.orig
vi /etc/haproxy/haproxy.cfg

Add following configuration:

global
    maxconn     1000
	
defaults
    mode                    tcp
    log                     global
    option                  tcplog
    retries                 3 
    timeout queue           1m
    timeout connect         4s
    timeout client          60m
    timeout server          60m
    timeout check           5s
    maxconn                 900

listen stats
    mode http
    bind *:7000
    stats enable
    stats uri /

listen primary
    bind 192.168.110.140:5000
    option httpchk OPTIONS /master
    http-check expect status 200
    default-server inter 3s fall 3 rise 2 on-marked-down shutdown-sessions
    server p4 192.168.110.134:5432 maxconn 100 check port 8008
    server p5 192.168.110.135:5432 maxconn 100 check port 8008
    server p6 192.168.110.136:5432 maxconn 100 check port 8008

listen standby
    bind 192.168.110.140:5001
    balance roundrobin
    option httpchk OPTIONS /replica
    http-check expect status 200
    default-server inter 3s fall 3 rise 2 on-marked-down shutdown-sessions
    server p4 192.168.110.134:5432 maxconn 100 check port 8008
    server p5 192.168.110.135:5432 maxconn 100 check port 8008
    server p6 192.168.110.136:5432 maxconn 100 check port 8008

copy the file to p2 and p3

[root@p1 ~]# scp /etc/haproxy/haproxy.cfg p2:/etc/haproxy/haproxy.cfg
root@p2's password:
haproxy.cfg                                                             100% 1218   676.0KB/s   00:00
[root@p1 ~]# scp /etc/haproxy/haproxy.cfg p3:/etc/haproxy/haproxy.cfg
root@p3's password:
haproxy.cfg                                                             100% 1218   491.5KB/s   00:00
[root@p1 ~]#

start haproxy service [p1, p2 and p3]

systemctl start haproxy
systemctl enable haproxy
systemctl status haproxy

Install PostgreSQL 15 with Patroni for PostgreSQL 15 HA with Patroni

Install PostgreSQL [p4, p5 and p6 ; optionally on p1]

Since I am going to have PostgreSQL client in p1 to test the configuration, I will install PostgreSQL in all the machines.

dnf install epel-release -y
dnf --enablerepo=powertools install moreutils -y
dnf install -y https://download.postgresql.org/pub/repos/yum/reporpms/EL-8-x86_64/pgdg-redhat-repo-latest.noarch.rpm
dnf -qy module disable postgresql
dnf install -y postgresql15-server
dnf install -y postgresql15-contrib
dnf install -y postgresql15-devel

Install Patroni [p4,p5,p6]

yum install -y python3 python3-devel gcc
sudo ln -s /usr/local/bin/pip /bin/pip
pip3 install --upgrade pip
pip install psycopg2-binary
pip install python-etcd
python3 -m pip install patroni[etcd]

dnf -y install patroni 
dnf -y install patroni-etcd 
dnf -y install watchdog

sudo ln -s /usr/local/bin/patronictl /bin/patronictl
/usr/local/bin/patronictl –help

Configure Patroni [p4,p5,p6 accordingly]

Create a configuration file for Patroni on your first node (p4) in our case, like below:

mkdir -p /etc/patroni
vi /etc/patroni/patroni.yml

Edit the file with following content (p4)

scope: postgres
namespace: /db/
name: p4 

restapi:
    listen: 192.168.110.134:8008
    connect_address: 192.168.110.134:8008

etcd3:
    hosts: 192.168.110.131:2379,192.168.110.132:2379,192.168.110.133:2379

bootstrap:
    dcs:
        ttl: 30
        loop_wait: 10
        retry_timeout: 10
        maximum_lag_on_failover: 1048576
        postgresql:
        use_pg_rewind: true
    pg_hba:
    - host replication replicator 127.0.0.1/32 md5
    - host replication replicator 192.168.110.134/0 md5
    - host replication replicator 192.168.110.135/0 md5
    - host replication replicator 192.168.110.136/0 md5
    - host all all 0.0.0.0/0 md5

postgresql:
    listen: 192.168.110.134:5432 
    connect_address: 192.168.110.134:5432
    data_dir: /u01/pgsql/15
    bin_dir: /usr/pgsql-15/bin
    authentication:
        replication:
            username: replicator
            password: replicator
        superuser:
            username: postgres
            password: postgres
    parameters:
        unix_socket_directories: '.'

watchdog:
  mode: required
  device: /dev/watchdog
  safety_margin: 5

tags:
    nofailover: false
    noloadbalance: false
    clonefrom: false
    nosync: false

In p5, edit the file with following content 

scope: postgres
namespace: /db/
name: p5 

restapi:
    listen: 192.168.110.135:8008
    connect_address: 192.168.110.135:8008

etcd3:
    hosts: 192.168.110.131:2379,192.168.110.132:2379,192.168.110.133:2379

bootstrap:
    dcs:
        ttl: 30
        loop_wait: 10
        retry_timeout: 10
        maximum_lag_on_failover: 1048576
        postgresql:
        use_pg_rewind: true
    pg_hba:
    - host replication replicator 127.0.0.1/32 md5
    - host replication replicator 192.168.110.134/0 md5
    - host replication replicator 192.168.110.135/0 md5
    - host replication replicator 192.168.110.136/0 md5
    - host all all 0.0.0.0/0 md5

postgresql:
    listen: 192.168.110.135:5432 
    connect_address: 192.168.110.135:5432
    data_dir: /u01/pgsql/15
    bin_dir: /usr/pgsql-15/bin
    authentication:
        replication:
            username: replicator
            password: replicator
        superuser:
            username: postgres
            password: postgres
    parameters:
        unix_socket_directories: '.'

watchdog:
  mode: required
  device: /dev/watchdog
  safety_margin: 5

tags:
    nofailover: false
    noloadbalance: false
    clonefrom: false
    nosync: false

In p6, edit the file with following content 

scope: postgres
namespace: /db/
name: p6 

restapi:
    listen: 192.168.110.136:8008
    connect_address: 192.168.110.136:8008

etcd3:
    hosts: 192.168.110.131:2379,192.168.110.132:2379,192.168.110.133:2379

bootstrap:
    dcs:
        ttl: 30
        loop_wait: 10
        retry_timeout: 10
        maximum_lag_on_failover: 1048576
        postgresql:
        use_pg_rewind: true
    pg_hba:
    - host replication replicator 127.0.0.1/32 md5
    - host replication replicator 192.168.110.134/0 md5
    - host replication replicator 192.168.110.135/0 md5
    - host replication replicator 192.168.110.136/0 md5
    - host all all 0.0.0.0/0 md5

postgresql:
    listen: 192.168.110.136:5432 
    connect_address: 192.168.110.136:5432
    data_dir: /u01/pgsql/15
    bin_dir: /usr/pgsql-15/bin
    authentication:
        replication:
            username: replicator
            password: replicator
        superuser:
            username: postgres
            password: postgres
    parameters:
        unix_socket_directories: '.'

watchdog:
  mode: required
  device: /dev/watchdog
  safety_margin: 5

tags:
    nofailover: false
    noloadbalance: false
    clonefrom: false
    nosync: false

Enable Watchdog [p4,p5,p6]

Edit /etc/watchdog.conf to enable software watchdog:

vi /etc/watchdog.conf

Uncomment following line:

watchdog-device = /dev/watchdog

Execute following commands to activate watchdog:

mknod /dev/watchdog c 10 130
modprobe softdog
chown postgres /dev/watchdog

For example.,

[root@p3 ~]# mknod /dev/watchdog c 10 130
[root@p3 ~]# modprobe softdog
[root@p3 ~]# chown postgres /dev/watchdog
[root@p3 ~]#

Add the necessary permissions for postgres user to create data directory [p4,p5,p6]

mkdir -p /u01
chown -R postgres:postgres /u01

start patroni on p4

systemctl start patroni

and the log should look like this

May  7 09:52:48 p4 systemd[1]: Started Runners to orchestrate a high-availability PostgreSQL.
May  7 09:52:48 p4 patroni[8469]: 2023-05-07 09:52:48,925 INFO: Selected new etcd server http://192.168.110.133:2379
May  7 09:52:48 p4 patroni[8469]: 2023-05-07 09:52:48,944 INFO: No PostgreSQL configuration items changed, nothing to reload.
May  7 09:52:49 p4 patroni[8469]: 2023-05-07 09:52:49,046 INFO: Lock owner: None; I am p4
May  7 09:52:49 p4 patroni[8469]: 2023-05-07 09:52:49,146 INFO: trying to bootstrap a new cluster
May  7 09:52:49 p4 patroni[8482]: The files belonging to this database system will be owned by user "postgres".
May  7 09:52:49 p4 patroni[8482]: This user must also own the server process.
May  7 09:52:49 p4 patroni[8482]: The database cluster will be initialized with locale "en_US.UTF-8".
May  7 09:52:49 p4 patroni[8482]: The default text search configuration will be set to "english".
May  7 09:52:49 p4 patroni[8482]: Data page checksums are enabled.
May  7 09:52:49 p4 patroni[8482]: creating directory /u01/pgsql/15 ... ok
May  7 09:52:49 p4 patroni[8482]: creating subdirectories ... ok
May  7 09:52:49 p4 patroni[8482]: selecting dynamic shared memory implementation ... posix
May  7 09:52:49 p4 patroni[8482]: selecting default max_connections ... 100
May  7 09:52:49 p4 patroni[8482]: selecting default shared_buffers ... 128MB
May  7 09:52:49 p4 patroni[8482]: selecting default time zone ... Asia/Kolkata
May  7 09:52:49 p4 patroni[8482]: creating configuration files ... ok
May  7 09:52:49 p4 patroni[8482]: running bootstrap script ... ok
May  7 09:52:49 p4 patroni[8482]: performing post-bootstrap initialization ... ok
May  7 09:52:49 p4 patroni[8482]: syncing data to disk ... ok
May  7 09:52:49 p4 patroni[8482]: initdb: warning: enabling "trust" authentication for local connections
May  7 09:52:49 p4 patroni[8482]: initdb: hint: You can change this by editing pg_hba.conf or using the option -A, or --auth-local and --auth-host, the next time you run initdb.
May  7 09:52:49 p4 patroni[8482]: Success. You can now start the database server using:
May  7 09:52:49 p4 patroni[8482]:    /usr/pgsql-15/bin/pg_ctl -D /u01/pgsql/15 -l logfile start
May  7 09:52:50 p4 patroni[8496]: 2023-05-07 09:52:50.238 IST [8496] LOG:  redirecting log output to logging collector process
May  7 09:52:50 p4 patroni[8496]: 2023-05-07 09:52:50.238 IST [8496] HINT:  Future log output will appear in directory "log".
May  7 09:52:50 p4 patroni[8469]: 2023-05-07 09:52:50,253 INFO: postmaster pid=8496
May  7 09:52:50 p4 patroni[8504]: 192.168.110.134:5432 - accepting connections
May  7 09:52:50 p4 patroni[8506]: 192.168.110.134:5432 - accepting connections
May  7 09:52:50 p4 patroni[8469]: 2023-05-07 09:52:50,344 INFO: establishing a new patroni connection to the postgres cluster
May  7 09:52:50 p4 patroni[8469]: 2023-05-07 09:52:50,405 INFO: running post_bootstrap
May  7 09:52:50 p4 patroni[8469]: 2023-05-07 09:52:50,440 INFO: Software Watchdog activated with 25 second timeout, timing slack 15 seconds
May  7 09:52:50 p4 patroni[8469]: 2023-05-07 09:52:50,639 INFO: initialized a new cluster
May  7 09:52:50 p4 patroni[8469]: 2023-05-07 09:52:50,737 INFO: no action. I am (p4), the leader with the lock

start patroni in p5 and p6

systemctl start patroni

on p5, you should see log like this

May  7 09:53:30 p5 systemd[1]: Started Runners to orchestrate a high-availability PostgreSQL.
May  7 09:53:30 p5 patroni[8964]: 2023-05-07 09:53:30,586 INFO: Selected new etcd server http://192.168.110.131:2379
May  7 09:53:30 p5 patroni[8964]: 2023-05-07 09:53:30,663 INFO: No PostgreSQL configuration items changed, nothing to reload.
May  7 09:53:30 p5 patroni[8964]: 2023-05-07 09:53:30,817 INFO: Lock owner: p4; I am p5
May  7 09:53:30 p5 patroni[8964]: 2023-05-07 09:53:30,868 INFO: trying to bootstrap from leader 'p4'
May  7 09:53:31 p5 patroni[8974]: WARNING:  skipping special file "./.s.PGSQL.5432"
May  7 09:53:31 p5 patroni[8974]: WARNING:  skipping special file "./.s.PGSQL.5432"
May  7 09:53:31 p5 patroni[8964]: 2023-05-07 09:53:31,628 INFO: replica has been created using basebackup
May  7 09:53:31 p5 patroni[8964]: 2023-05-07 09:53:31,630 INFO: bootstrapped from leader 'p4'
May  7 09:53:31 p5 patroni[8964]: 2023-05-07 09:53:31,951 INFO: postmaster pid=8982
May  7 09:53:31 p5 patroni[8982]: 2023-05-07 09:53:31.960 IST [8982] LOG:  redirecting log output to logging collector process
May  7 09:53:31 p5 patroni[8982]: 2023-05-07 09:53:31.960 IST [8982] HINT:  Future log output will appear in directory "log".
May  7 09:53:31 p5 patroni[8983]: 192.168.110.135:5432 - rejecting connections
May  7 09:53:31 p5 patroni[8989]: 192.168.110.135:5432 - rejecting connections
May  7 09:53:33 p5 patroni[8997]: 192.168.110.135:5432 - accepting connections
May  7 09:53:33 p5 patroni[8964]: 2023-05-07 09:53:33,070 INFO: Lock owner: p4; I am p5
May  7 09:53:33 p5 patroni[8964]: 2023-05-07 09:53:33,071 INFO: establishing a new patroni connection to the postgres cluster
May  7 09:53:33 p5 patroni[8964]: 2023-05-07 09:53:33,124 INFO: no action. I am (p5), a secondary, and following a leader (p4)
May  7 09:53:40 p5 patroni[8964]: 2023-05-07 09:53:40,822 INFO: no action. I am (p5), a secondary, and following a leader (p4)

Repeat the same in p6

Check patroni member nodes:

patronictl -c /etc/patroni/patroni.yml list

The output should look like this

[root@p4 pgsql]#  patronictl -c /etc/patroni/patroni.yml list
+ Cluster: postgres -------+---------+---------+----+-----------+
| Member | Host            | Role    | State   | TL | Lag in MB |
+--------+-----------------+---------+---------+----+-----------+
| p4     | 192.168.110.134 | Leader  | running |  1 |           |
| p5     | 192.168.110.135 | Replica | running |  1 |         0 |
| p6     | 192.168.110.136 | Replica | running |  1 |         0 |
+--------+-----------------+---------+---------+----+-----------+
[root@p4 pgsql]#

Check if you can connect from psql client

psql -h 192.168.110.140 -p 5000

and you should get something like this

[postgres@p1 ~]$ psql -h 192.168.110.140 -p 5000
psql (15.2)
Type "help" for help.

postgres=#

Create a sample table and we will use this table for failure scenarios

create table emp(id int, sal int);

Test switchover

[root@p4 pgsql]#  patronictl -c /etc/patroni/patroni.yml switchover
Current cluster topology
+ Cluster: postgres -------+---------+---------+----+-----------+
| Member | Host            | Role    | State   | TL | Lag in MB |
+--------+-----------------+---------+---------+----+-----------+
| p4     | 192.168.110.134 | Leader  | running |  1 |           |
| p5     | 192.168.110.135 | Replica | running |  1 |         0 |
| p6     | 192.168.110.136 | Replica | running |  1 |         0 |
+--------+-----------------+---------+---------+----+-----------+
Primary [p4]:         
Candidate ['p5', 'p6'] []: p5
When should the switchover take place (e.g. 2023-05-07T11:01 )  [now]:
Are you sure you want to switchover cluster postgres, demoting current leader p4? [y/N]: y
2023-05-07 10:02:03.39983 Successfully switched over to "p5"
+ Cluster: postgres -------+---------+---------+----+-----------+
| Member | Host            | Role    | State   | TL | Lag in MB |
+--------+-----------------+---------+---------+----+-----------+
| p4     | 192.168.110.134 | Replica | stopped |    |   unknown |
| p5     | 192.168.110.135 | Leader  | running |  1 |           |
| p6     | 192.168.110.136 | Replica | running |  1 |         0 |
+--------+-----------------+---------+---------+----+-----------+
..
(after few seconds)
..
[root@p4 pgsql]# patronictl -c /etc/patroni/patroni.yml list
+ Cluster: postgres -------+---------+---------+----+-----------+
| Member | Host            | Role    | State   | TL | Lag in MB |
+--------+-----------------+---------+---------+----+-----------+
| p4     | 192.168.110.134 | Replica | running |  2 |         0 |
| p5     | 192.168.110.135 | Leader  | running |  2 |           |
| p6     | 192.168.110.136 | Replica | running |  2 |         0 |
+--------+-----------------+---------+---------+----+-----------+
[root@p4 pgsql]#

Created a script that continuously inserts data into table by creating a connection.

python3 HATest.py --port 5000

The log shows that my python script connected to 192.168.110.135 to execute insert statement and it experienced a connection lost when stopped the primary with systemctl stop patroni for few seconds and connected to new primary  192.168.110.136

We will cover the failure scenarios and more about HATest.py in detail in another post.

Connected to: 192.168.110.135
PostgreSQL server is not in recovery mode. Performing INSERT query...
Data inserted successfully
Count: 25
Press Ctrl+Z to terminating the job...

Connected to: 192.168.110.135
PostgreSQL server is not in recovery mode. Performing INSERT query...
Data inserted successfully
Count: 26
Press Ctrl+Z to terminating the job...

Connection attempt 1 failed. Retrying in 1 seconds...
Connection attempt 2 failed. Retrying in 1 seconds...
Connected to: 192.168.110.136
PostgreSQL server is not in recovery mode. Performing INSERT query...
Data inserted successfully
Count: 27
Press Ctrl+Z to terminating the job...

A simple ping test can also serve the purpose

Write test

while true; 
do
echo "select inet_server_addr(),now()::timestamp" | psql -Upostgres -h192.168.110.140 -p5000 -t;
sleep 1;
done

Read test

while true; 
do
echo "select inet_server_addr(),now()::timestamp" | psql -Upostgres -h192.168.110.140 -p5000 -t;
sleep 1;
done

Words from postgreshelp

Thank you for giving your valuable time to read the above information. I hope the content served your purpose in reaching out the blog.
Suggestions for improvement of the blog are highly appreciable. Please contact us for any information/suggestion/feedback.

If you want to be updated with all our articles

please follow us on Facebook Twitter
Please subscribe to our newsletter.