Tuesday, December 17, 2013

pgHA - Postgresql+Pgbouncer High Availability Tools

I was working on a project to setup easy failover system for Postgresql Master-Slave replication where all app servers were connected to the dbserver through pgbouncer. I had to configure such a infrastructure where we could easily promote Slave to master in case of failover.

To achieve this, I found a good tool named pgHA, developed by OpenSCG (https://bitbucket.org/openscg/pgha).

pgHA - working mechanism

pgHA is a perl based tool and it has a file called failoverd.pl which is actually the heart of the tool. Installation procedures described at README file.

pgHA works at the slave server. It reads some configuration parameters from pgha.conf and works as a background process in the server.
When we executes
perl failoverd.pl -c ../cfg/pgha.conf --start
it modifies the app servers pgbouncer config files with pgbouncer_failover_config and starts a background process. In order to modify the pgbouncer.ini files in the app servers, pgHA requires password less access to the app servers from slave db server.

When we executes
perl failoverd.pl -c pgha.conf --pushButton
it creates the trigger file in the slave which allows the slave to act as master and restarts the app servers' pgbouncer so that it reads pgbouncer.ini directing to new master.

pgHA also supports automatic failover.

It also reads some configuration parameters to kill all database connections through pgbouncer to the old master and then resume with the new master db server. These parameters are -

# Which databases should be part of failover
# pgBouncer command to pause / fence the db's
# pgBouncer command to reload the config file
# pgBouncer command to unlock the db's

Plug pgHA with Postgresql 8.4:

pgHA was designed to work with Postgresql Streaming Replication and pgbouncer infrastructure. In my end, we were running with Postgresql 8.4 Log Shipping Replication. As pgHA was executing queries in the Slave server to find out if the slave is alive or not, we were not able to implement it at our site. So I decided to modify the pgHA source a bit to work with Postgresql 8.4 Log Shipping.

In Postgresql 8.4, it was not possible to execute queries on slave as it had no hot_standby feature. So I added a function in the source of pgHA, which checked that if any service was running at the slave with the given port and if it would find yes, then it would read the pg_controldata and looked for 'in archive recoverymode'. With this function, I also added some parameters which were required for the execution of the function.

The function was:

# Sub: CheckPgDatabase
# Inputs:
#       dbport- Slave Postgres db port  
# Output:
#       isDBUp      - (Boolean) 1 if database is up, 0 if it is down
#       Check if Postgresql is running on that port using netstat command. The OS user that is running pgha should have the permission to
#       execute netstat command. If it founds the Postgresql is running on that port, to make sure it parses the value from pg_controldata and
#       check "Database cluster state". The value should be "in archive recovery" for a running Postgresql instance with pg_standby.
sub CheckSlaveStatusInArchiveRecoveryMode
my ( $dbport ) = @_;
  my $isDBUp=1;
  $logger->debug("Attempting connect: ".$connStr);  
  if ($isDBUp==1)
  #Checking if there is any service running on the dbport 
 my $cmd = 'netstat -a | grep '. $dbport;
 my $var = qx{$cmd};
 if ($var)
  #print "\n" . $cmd; 
   #print "\n\t Slave is running" ;
 else #if ($var)
  #print "\n\t Slave is NOT running" ; 
 } #if ($var)
  } #if ($isDBUp==1)
  if ($isDBUp==1)
  # if the above passed, then check what the pg_controldata says
 my $cmd = $v_slavePgengine . '/pg_controldata ' .  $v_slavePgdata . ' | grep "Database cluster state"';
 my $var = qx{$cmd};
 if ($var =~ /in archive recovery/)
   print "\n\t Database is in archive recovery mode\n";
  print "\n\t ". $cmd;
  print "\n\t Database in down\n";
  } #if ($isDBUp==1)
  return $isDBUp;
} #sub CheckSlaveStatusInArchiveRecoveryMode

The pgHA code was written so self-descriptive that I was able to push my desired function in the code. I communicated with the pgHA developer and he really liked the solution and ready to push it in his code. That was really inspirational for me.