Tuesday, February 8, 2011

Postgresql failover with Pgpool II replication

16 comments
Postgresql failover with Pgpool II replication can be configured for 24x7 database system where no downtime is allowable. The followings describe the steps that we have taken for configuration og Pgpool II replication.
1. Environment

    * Two identical servers 10.0.0.36 as 'Node 1' and 10.0.0.34 as 'Node 2' with Centos 5

    * Existing ssh-key-exchange between both the nodes.

    * Postgresql 8.3.8 has been installed in each server

    * Pgpool II 2.2.5 has been installed in 10.0.0.36

    * At Node 1, $PGDATA=/usr/local/pgsql/data

    * At Node 2, $PGDATA=/usr/local/pgsql/data

1.1 Wal Archiving is on in both the nodes.

At Node_1:

archive_mode = on
archive_command = 'rsync %p postgres@10.0.0.34:/var/lib/pgsql/wal_archive_36/%f<At Node_2:
archive_mode = on
archive_command = 'cp %p /var/lib/pgsql/wal_archive_34/%f<
 1.2 Pgpool II installation
As root user perform the followings:
cd /downloads/
tar -xzvf pgpool-II-2.2.5.tar.gz
cd pgpool-II-2.2.5
./configure --prefix=/opt/pgpoolII   --with-pgsql-libdir=/usr/local/pgsql/lib   --with-pgsql-includedir=/usrlocal/pgsql/include
make && make install

 1.2.1 Configuration
cd /opt/pgpoolII/
cp etc/pcp.conf.sample etc/pcp.conf
cp etc/pgpool.conf.sample etc/pgpool.conf
/opt/pgpoolII/bin/pg_md5 -p
password:postgres
e8a48653851e28c69d0506508fb27fc5
vi etc/pcp.conf
...
postgres:e8a48653851e28c69d0506508fb27fc5

 1.2.2 Modify parameters at pgpool.conf

Here we define ports for pgpool,pgpool communication manager, listen addresses and a lot of other things.

vi etc/pgpool.conf
...
listen_addresses ='*'
...
replication_mode = true

logdir = '/opt/pgpoolII/log'
.....
pid_file_name='/var/run/pgppool/pgpool.pid'
...
health_check_period= 5
health_check_user = 'postgres'
...
failover_command = 'echo host:%h new master id:%m old master id:%M>/opt/pgpoolII/log/failover.log'
failback_command = 'echo host:%h new master id:%m old master id:%M>/opt/pgpoolII/log/failback.log'
...
backend_hostname0 = '10.0.0.36'
backend_port0 = 5432
backend_weight0 = 1
backend_data_directory0 = '/usr/local/pgsql/data'
backend_hostname1 = '10.0.0.34'
backend_port1 = 5432
backend_weight1 = 1
backend_data_directory1 = '/usr/local/pgsql/data'
..
 1.2.3 Make directory for pgpool pid file:

Login as root and make a directory pgpool in /var/run
cd /var/run
mkdir pgpool
chown -R postgres:postgres pgpool
 1.2.4 Install online recovery functions at both the nodes:
cd /downloads/pgpool-II-2.2.5/sql/pgpool-recovery/
make install
psql -f pgpool-recovery.sql template1
 1.2.5 pg_hba.conf configuration

As pgpool is running at 10.0.0.35 and the connection between pgpool and postgresql should be trusted, we configured as follows:
At Node_1:
host    all         all         10.0.0.35/32       trust
host    all         all         0.0.0.0/0          reject
At Node_2:
host    all         all         10.0.0.35/32       trust
host    all         all         0.0.0.0/0          reject
pool_hba.conf configuration of pgpool
In our test environment, we did not enable pgpool hba configuration. Important point is that Pgpool II does not support md5 authentication.

 2. Copy database from Node_1 to Node_2


2.1 At Node_1:

Assume postgresql is running.
psql -U postgres
> select pg_start_backup('initial_backup');
> \q
cd /usr/local/pgsql
rsync -avz data/* postgres@node_2:/usr/local/pgsql/data
psql -U postgres
> select pg_stop_backup();
> \q
2.2 At Node_2:
cd /usr/local/pgsql/data
rm postmaster.pid
Start postgresql at Node_2
pg_ctl start
Now postgresql DB is running in both the nodes. We will start PgpoolII.

 3. Start Pgpool II

At this stage, we start pgpool in debug mode.

/opt/pgpoolII/bin/pgpool  -f /opt/pgpoolII/etc/pgpool.conf -F /opt/pgpoolII/etc/pcp.conf -a /opt/pgpoolII/etc/pool_hba.conf -d -n > /opt/pgpoolII/log/pgpool.log 2>&1 &
Check log files to see if pgpool is working.

tail /opt/pgpoolII/log/pgpool.log
...
2009-10-29 18:49:07 DEBUG: pid 12752: num_backends: 2 num_backends: 2 total_weight: 2.000000
2009-10-29 18:49:07 DEBUG: pid 12752: backend 0 weight: 1073741823.500000
2009-10-29 18:49:07 DEBUG: pid 12752: backend 1 weight: 1073741823.500000
2009-10-29 18:49:07 DEBUG: pid 12753: I am 12753
...
2009-10-29 18:49:07 DEBUG: pid 12784: I am 12784
2009-10-29 18:49:07 LOG:   pid 12752: pgpool successfully started
2009-10-29 18:49:07 DEBUG: pid 12785: I am PCP 12785
...
2009-10-29 18:51:53 DEBUG: pid 17586: starting health checking
2009-10-29 18:51:53 DEBUG: pid 17586: health_check: 0 th DB node status: 1
2009-10-29 18:51:53 DEBUG: pid 17586: health_check: 1 th DB node status: 1
...
 4. Check if replication is working 

    - postgres is running on all nodes

    - pgpool is running on port 9999 on node 1

    - shell session on node1 established

 4.1 create a test database and insert some data
psql -p 9999
If the above command is successfull, we are confirm that Pgpool is working. Tthen do the following steps.
createdb -p 9999 bench_replication
pgbench -i -p 9999 bench_replication
psql -p 9999 bench_replication
bench_replication=# insert into history (tid, bid,aid,mtime,filler) (select 1,1,1,now(),i::text from
(select generate_series(1,1000000) as i) as q);
 4.2 Check data in each node

At Node_1:
psql -p 9999 bench_replication
select count(*) from history;
Count
--------
1000000
(1 row)
At Node_1:
psql -p 5432 bench_replication
select count(*) from history;
Count
--------
1000000
(1 row)
At Node_2:
psql -p 5432 bench_replication
select count(*) from history;
Count
--------
1000000
(1 row)
The results indicate that the replication is working.

 5. Enable online recovery


5.1 Configure pgpool.conf parameters:
...
recovery_user =  'postgres'
recovery_1st_stage_command =  'copy_base_backup'
recovery_2nd_stage_command = 'pgpool_recovery_pitr'
...
 Reload pgpool to reflect the changes.

5.2 Create scripts
I have write scripts for both the nodes. If anyone wants to setup Online recovery from either side only, he may use only required scripts.
------------------------------------------------
 copy_base_backup at Node_1:
 ------------------------------------------------

#! /bin/sh
psql -c "select pg_start_backup('pgpool_recovery')" postgres
echo "restore_command = 'cp /var/lib/pgsql/wal_archives_36/%f %p'">/var/lib/pgsql/data/recovery.conf
tar -C /var/lib/pgsql/data -zcf pgsql.tar.gz base global pg_clog pg_multixact pg_subtrans pg_tblspc pg_twophase pg_xlog recovery.conf
psql -c "select pg_stop_backup()" postgres
scp pgsql.tar.gz 10.0.0.34:/var/lib/pgsql/data
 #Expand a database backup
ssh -T 10.0.0.34 'cd /var/lib/pgsql/data; tar zxf pgsql.tar.gz' 2>/dev/null 1>/dev/null
rm -f recovery.conf
-------------------------------------------------
copy_base_backup at Node_2:
-------------------------------------------------
#! /bin/sh
 BINPATH=/opt/PostgreSQL/8.3/bin
$BINPATH/psql -c "select pg_start_backup('pgpool_recovery')" postgres
echo "restore_command = 'scp postgres@10.0.0.34:/var/lib/pgsql/wal_archives_34/%f %p'">/var/lib/pgsql/data/recovery.conf
tar -C /var/lib/pgsql/data -zcf pgsql.tar.gz base global pg_clog pg_multixact pg_subtrans pg_tblspc pg_twophase pg_xlog recovery.conf
$BINPATH/psql -c "select pg_stop_backup()" postgres
scp pgsql.tar.gz 10.0.0.36:/var/lib/pgsql/data
# Expand a database backup
ssh -T 10.0.0.36 'cd /var/lib/pgsql/data; tar zxf pgsql.tar.gz' 2>/dev/null 1>/dev/null
rm -f recovery.conf
-----------------------------------
pgpool_recovery_pitr:
-----------------------------------
#! /bin/sh
psql -c "select pg_switch_xlog()" postgres
--------------------------------------------------
pgpool_remote_start at Node_1:
--------------------------------------------------
#! /bin/sh
if [ $# -ne 2 ]
then
echo "pgpool_remote_start remote_host remote_datadir"
exit 1
fi
DEST=$1
DESTDIR=$2
PGCTL=/opt/PostgreSQL/8.3/bin/pg_ctl    #postgesql bin directory at Node_2
#Startup PostgreSQL server
ssh -T $DEST $PGCTL -w -D $DESTDIR start 2>/dev/null 1>/dev/null 
--------------------------------------------------
pgpool_remote_start at Node_2:
--------------------------------------------------
#! /bin/sh
if [ $# -ne 2 ]
then
echo "pgpool_remote_start remote_host remote_datadir"
exit 1
fi
DEST=$1
DESTDIR=$2
PGCTL=/usr/local/pgsql/bin/pg_ctl  #Path for postgresql bin directory at Node_1
#Startup PostgreSQL server
ssh -T $DEST $PGCTL -w -D $DESTDIR start 2>/dev/null 1>/dev/null 


6. Test failover:
To test failover is working perfectly, I just killed postgres processes on node_2 while an update statement is running at node_1 using port 9999.
At Node_1:
psql -p 9999 bench_repplication
update history set tid=2;

At Node_2:
pkill postgres
or
pg_ctl stop -m immediate

Check that at Node_1 the update is still running.

tail pgpool.log
...
2009-11-12 13:22:23 DEBUG: pid 13719: detect_error: kind: E
2009-11-12 13:22:23 DEBUG: pid 13719: detect_stop_postmaster_error: receive admin shutdown error from a node.
2009-11-12 13:22:23 LOG:   pid 13719: notice_backend_error: 1 fail over request from pid 13719
2009-11-12 13:22:23 DEBUG: pid 13715: failover_handler called
2009-11-12 13:22:23 DEBUG: pid 13715: failover_handler: starting to select new master node
2009-11-12 13:22:23 LOG:   pid 13715: starting degeneration. shutdown host 10.0.0.54(5432)
2009-11-12 13:22:23 DEBUG: pid 13715: VN:: Master_node_id is changing from 0 to 0
2009-11-12 13:22:23 LOG:   pid 13715: failover_handler: do not restart pgpool. same master node 0 was selected
2009-11-12 13:22:23 LOG:   pid 13715: failover done. shutdown host 10.0.0.54(5432)
2009-11-12 13:22:23 LOG:   pid 13715: execute command: echo host:10.0.0.54 new master id:0 old master id:0>/opt/pgpoolII/log/failover.log
2009-11-12 13:22:23 DEBUG: pid 13715: reap_handler called
2009-11-12 13:22:23 DEBUG: pid 13715: reap_handler: call wait3
2009-11-12 13:22:23 DEBUG: pid 13715: reap_handler: normally exited
2009-11-12 13:22:28 DEBUG: pid 13715: starting health checking
2009-11-12 13:22:28 DEBUG: pid 13715: health_check: 0 th DB node status: 2
2009-11-12 13:22:28 DEBUG: pid 13715: health_check: 1 th DB node status: 3
...


7. Online Recovery and Re-attach node:

After failover if we want to re-attach Node_1, we node to apply the changes happened at Node_2 through Pgpool. As a result , recovery at Node_1 is required.
/opt/pgpoolII/bin/pcp_recovery_node  20 10.0.0.36 9898 postgres postgres 1
Parameters are:
20 - timeout in seconds
10.0.0.36 - host ip where pgpool is running
9898 - port at which pgpool communication manager listens
postgres, postgres - username password at pcp.conf
1 - ID of the node we want to attach (refers to the backend number in pgpool.conf)

pgpool.log messages:
...
DEBUG: pid 4411: pcp_child: start online recovery
LOG: pid 4411: starting recovering node 1
DEBUG: pid 4411: exec_checkpoint: start checkpoint
DEBUG: pid 4411: exec_checkpoint: finish checkpoint
LOG: pid 4411: CHECKPOINT in the 1st stage done
LOG: pid 4411: starting recovery command: "SELECT pgpool_recovery('copy_base_backup', '10.0.0.34','/usr/local/pgsql/data')"
DEBUG: pid 4411: exec_recovery: start recovery
DEBUG: pid 29658: starting health checking
DEBUG: pid 29658: health_check: 0 the DB node status: 2
DEBUG: pid 29658: health_check: 1 the DB node status: 3
DEBUG: pid 4411: exec_recovery: finish recovery
LOG: pid 4411: 1st stage is done
LOG: pid 4411: starting 2nd stage
LOG: pid 4411: all connections from clients have been closed
DEBUG: pid 4411: exec_checkpoint: start checkpoint
DEBUG: pid 4411: exec_checkpoint: finish checkpoint
LOG: pid 4411: CHECKPOINT in the 2nd stage done
LOG: pid 4411: starting recovery command: "SELECT pgpool_recovery('pgpool_recovery_pitr', '10.0.0.34','/usr/local/pgsql/data')"
DEBUG: pid 4411: exec_recovery: start recovery
DEBUG: pid 4411: exec_recovery: finish recovery
DEBUG: pid 4411: exec_remote_start: start pgpool_remote_start
DEBUG: pid 29658: starting health checking
DEBUG: pid 4411: exec_remote_start: finish pgpool_remote_start
DEBUG: pid 29658: starting health checking
LOG: pid 4411: 1 node restarted
LOG: pid 4411: send_failback_request: fail back 1 th node request from pid 4411
LOG: pid 4411: recovery done
DEBUG: pid 29658: failover_handler called
DEBUG: pid 29658: failover_handler: starting to select new master node...


8. Database Downtime:
During production period if pgpool can communicate with any node, then system will not need any downtime from database end. But during online recovery, before 2nd stage, all connections with pgpool should be closed. It is designed in this way so that no transaction miss during the recover and re-attach of a node.

16 Responses so far

  1. great manual, very clean and very entendible... I'm going to test it with gentoo, postgres9.0.3, with 2 nodes and a director... and your post was very helpfull, thanks a lot.

  2. Great article.. thanks for the efforts..

    I have some questions related to online recovery:

    1. After failover, the recovery.conf on the Node2 was converted to recover.done. Right? Now, what changes need to be done for bringing back the Node1 (primary)? Who will try to start the primary?
    2. Let's say I have two servers running on the same host, then for online recovery, do I need to setup pcp_recovery_node ?

  3. Let's say I want to restore Standby. Where will the recovery.conf, copy_base_backup.sh and pgpool_remote_start will reside? Also, who will pass the arguments to pgpool_remote_start?

  4. If you consider Node 1 as master and Node 2 as standby then ->
    # Place recovery.conf at master server. The tar that I created at copy_base_backup includes the file and then transfers to the standby.
    # Place copy_base_backup and pgpool_remote_start at master server. Pgpool_remote_start opens and ssh terminal with standby and then start the database at standby.

  5. Anonymous says:

    Have you tested this with Postgres 9.0 streaming replication.

  6. Anonymous says:

    I keep getting that error
    ERROR: kind mismatch among backends. Possible last query was: "update pgbench_history set tid=5;" kind details are: 0[C] 1[N: terminating connection because of crash of another server process]
    HINT: check data consistency among db nodes
    ERROR: kind mismatch among backends. Possible last query was: "update pgbench_history set tid=5;" kind details are: 0[C] 1[N: terminating connection because of crash of another server process]
    HINT: check data consistency among db nodes
    The connection to the server was lost. Attempting reset: Failed.
    so please any advise

  7. AI Rumman says:

    Have you solved it or still facing the error?

  8. Saritha N says:
    This comment has been removed by the author.
  9. Saritha N says:

    Hi,when I call psql -p 9999 getting this error
    psql: server closed the connection unexpectedly
    This probably means the server terminated abnormally
    before or while processing the request.
    could not send startup packet: Broken pipe.
    Please help me in resolving this.

  10. AI Rumman says:

    Look at your log. May be your backend was down.

  11. Saritha N says:

    that issue got resolved.When i call other postgresql port on my system. psql -h 192.168.1.22 -p 5433.I am getting exception like
    server closed the connection unexpectedly
    tcp/ip connection are not opened.Iptables also has been configured according to your post .Please reply back

  12. AI Rumman says:

    So you are saying you can't connect to your Postgresql instance?

  13. Saritha N says:

    Hi,Thanks for your reply.I have installed everything once again.Replication is working fine.I want to know how to check load balancing.Please reply

  14. AI Rumman says:

    I haven't tried with Pgpool Load Balancing.

  15. Unknown says:

    Guys,

    has anyone fixed problem "ERROR: kind mismatch among backends."? We are constantly facing the issue! Please help if you know how. We have synchronous streaming replication between backends and load balancing activated.

Leave a Reply

Labels