Wednesday, January 30, 2013

Postgresql Query Parallelizing with dblink

I have three tables with almost 10 millions of records in each. The tables are:

"customer" table holds record for all kind of customers which are related to account or tickets.

We need to generate a report with of customer and its related accounts or tickets.
The basic sql is like:
select *
select c.custid,
case when a.accountid is not null then
     when t.ticketid is not null then
as relatedid      
from customer as c
left join account as a  on c.custid = a.accountid and type ='Accounts'
left  join tickets as t on c.custid = t.ticketid and type ='HelpDesk'
where c.deleted = 0
) as q
where relatedid is not null

I have all the required indexes. But the query was taking too much time.
One of the bottleneck that I always feel with Postgresql is its lack of query parallelism technique. Good news is that, the great developers are working on it.
However, I have to improve the query performance at this moment. So I make a plan to divide the query in two parts and then execute each part asynchronously and then collect the result.

To achieve this, I make the function qry_parallel_fn. This function create two separate dblink connection conn1 and conn2 and execute two different queries in async mode.
There is a while loop which checks if both the connections have completed the task or not. If yes, then the function return results.

   v_qry1 text;
   v_qry2 text;
   cur1 cursor is
   select *
   from dblink_get_result('conn1') as t1(custid int, relatedid int);
   cur2 cursor is
   select *
   from dblink_get_result('conn2') as t1(custid int, relatedid int);
   v_closed smallint;
     v_qry1 := 'select custid, accountid as relatedid from customer c inner join account a on c.custid = a.accountid where c.deleted = 0';
     RAISE NOTICE 'vqry1 = %' , v_qry1;
     v_qry2 := 'select custid, ticketid as relatedid from customer c inner join tickets as t on c.custid = t.ticketid where c.deleted = 0';
   PERFORM dblink_connect('conn1','dbname=rumman');
   PERFORM dblink_connect('conn2','dbname=rumman');
     PERFORM dblink_send_query('conn1',v_qry1);
     PERFORM dblink_send_query('conn2',v_qry2);
     v_closed := 0;
     WHILE v_closed <> 2 loop
       if check_conn_is_busy('conn1') = 0 then
          v_closed := v_closed + 1;
       end if;
       if check_conn_is_busy('conn2') = 0 then
          v_closed := v_closed + 1;
       end if;
     END LOOP;
     FOR rec IN cur1
       RETURN NEXT rec;
     END LOOP;
     FOR rec IN cur2
       RETURN NEXT rec;
     END LOOP;
     PERFORM dblink_disconnect('conn1');
     PERFORM dblink_disconnect('conn2');
language 'plpgsql'

--select * from test_fn() as t1(c int, d int);
-- select count(*) from test_fn() as t1(c int, d int);

CREATE OR REPLACE FUNCTION check_conn_is_busy(conn text) RETURNS INT AS $$
  v int;
   SELECT dblink_is_busy(conn) INTO v;
   RETURN v;
language 'plpgsql'

I was monitoring the server performance and found that it was using two cpu cores to get the result and improve the query timing a bit.

Continue reading →
Monday, January 21, 2013

Rsync with filter for specific subdirectory

I have to ensure a backup of our document folders. The structure is

In each /DOC/companyN/ folder, we had some files and folders :-

I have to ensure backups for storage and user_privileges folder of each company.

And to achive this, I have creared the following rsync command :-
 rsync -az  --include='/*/'  --include='*/storage/***'  --include='*/user_privileges/***'  --exclude='*'   /DOC/*

 I used the following options :-

 a : for archive mode
 z : for compression

Now, the filter rules  are
 include='/*/' : this includes all sub-directories in /DOC
 include='*/storage/***' : this includes all storage directories in /DOC/companyN
 include='*/user_privileges/***' : this includes all user_privileges directories in /DOC/companyN
 exclude='*' : use it at last to exclude all other files and directories

And its working now. However, I found a problem with this command where it send any directory from /DOC which does not have either storage  or user_privileges directory.
For now, we accept and release it as there is a very little chance to have any directory in /DOC which does not have any storage or user_privileges directory.
Continue reading →
Thursday, January 17, 2013

Postgresql function to check a value is number

In Postrgesql, I created a function to check a column value is number or not. Its a simple function.
  v_return BOOLEAN;
  IF regexp_matches(prm_str,E'^-*[[:digit:]]+\.?[[:digit:]]+$') is not null
     v_return = TRUE;
    v_return = FALSE;
  RETURN v_return;    
$$ LANGUAGE 'plpgsql';

Now execute the function:
 SELECT *, is_number(col_val) as isNumber FROM table;
Continue reading →
Wednesday, January 16, 2013

Steps to upgrade Postgresql database using pg_upgrade

I had  to upgrade Postgresql 9.0 to 9.2 with less downtime. Earlier I used to complete this kind of upgrade work using pg_dump. But as I have very little downtime for 60 GB database, I chose to go with pg_upgrade and yes, I completed the whole task within an hour.
The steps that I followed. Step 1 to 9 are related to data migration using pg_upgrade and 10 - 14 are related to Postgresql 9.2 installation from yum repo and remove source build version.
Later I added a plan to revert back to Postgresql 9.0, if some thing went wrong.
Steps to upgrade database:
1. Install Postgresql 9.2 from source at /usr/pgsql-9.2 with contrib modules
2. Create Postgresql Cluster directory for 9.2
/usr/pgsql-9.2/bin/initdb -D /var/lib/pgsql/9.2/data
3. Stop Postgresql-9.0
pg_ctl stop -mf
4. Verfy Postgresql 9.2 installation
/usr/pgsql-9.2/bin/pg_ctl start -D /var/lib/pgsql/9.2/data
5. Stop Postgresql 9.2 
/usr/pgsql-9.2/bin/pg_ctl stop -mf -D /var/lib/pgsql/9.2/data
6. Both the database servers are stopped. Now upgrade database:
export OLDCLUSTER=/usr/pgsql-9.0
export NEWCLUSTER=/usr/pgsql-9.2
/usr/pgsql-9.2/bin/pg_upgrade --old-bindir=$OLDCLUSTER/bin --new-bindir=$NEWCLUSTER/bin --old-datadir=$OLDCLUSTER/data --new-datadir=$NEWCLUSTER/data
7. Verify Upgradation.
/usr/pgsql-9.2/bin/pg_ctl start -D /var/lib/pgsql/9.2/data
Execute some sqls
8. Vacuum and analyze databases.
9. Stop Postresql 9.2
Install Postgresql 9.2 from yum repo:
10. Rename Postgresql 9.2 home to some new name and install Postgresql 9.2 from yum repo
mv /usr/pgsql-9.2 /usr/pgsql-9.2_from_source
11. Uninstall Postgresql 9.0
11. Set $PGDATA to /var/lib/pgsql/9.2/data
12. Verify Postgresql 9.2 installation
13. Verify Postgresql 9.2 startup script at /etc/init.d and enable it with system startup
14. Remove /usr/pgsql-9.2_from_source
If for some reasons, we have to revert back to Postgresql 9.0:
  1. Stop Postgresql 9.2
  2. Uninstall Postgresql 9.2
  3. Install Postgresql 9.0
  4. Change $PGDATA to /var/lib/pgsql/9.0/data
  5. Start Postgresql 9.0
  6. Verify and release
 Important issues that I faced during upgrade:
  1. 9.0 database was running with uuid and pgxml modules. I had to configure 9.2 with these options. uuid-devel lib was missing and I had to install it.
  2. In a database, there was a view named 'slowest_queries' based on pg_stat_activity. As some columns names had been changed in pg_stat_activity, I was getting error during pg_upgrade operation. To avoid this, I saved the view code somewhere else, dropped the view from 9.0, upgrade database using pg_upgrade, modified view code to new column names and created the view in 9.2.

Post upgrade issues:
  1. Set standard_confomation_setting to off for backward compatibility

Continue reading →