Thursday, September 26, 2013

Postgresql Radius Based Search using Latitude and Longitude for a Zip Code

4 comments
We have a table called "image_info" as follows. We have to find the images that are nearest to a given zip code.

\d image_info
                                    Table "public.image_info"
      Column       |       Type        |                         Modifiers                         
-------------------+-------------------+------------------------------------------------------------
 id                | bigint            | not null default nextval('image_info_id_seq'::regclass)
 content_type      | character varying |
 image_name        | character varying |
 zip_code          | character varying |
 Indexes:
    "image_info_pkey" PRIMARY KEY, btree (id)
    "zip_code_5digit_idx" btree (substr(zip_code::text, 1, 5))
    "zip_code_idx" btree (zip_code)
To achieve this, we used  data from commercial version of Maxmind's GeoIP Software.
Using Maxmind's GeoIP software, we can narrow down the LONG/LAT of an IP address to relative accuracy within 25 miles around 80% of the time. For more details, please visit
 http://stackoverflow.com/questions/4982125/matching-closest-longitude-latitude

In crawledphoto database, we created a table named "zip_code_based_lng_lat"
\d zip_code_based_lng_lat
Table "public.zip_code_based_lng_lat"
Column | Type | Modifiers
--------+------------------------+-----------
zip | character varying(50) |
state | character varying(2) |
city | character varying(100) |
type | character varying(2) |
lat | double precision |
lng | double precision |
Indexes:
"zip_code_based_lat_lng_gist_idx" gist (ll_to_earth(lat, lng))
"zip_code_based_lng_lat_zip_idx" btree (zip)

Here,
ZIP — ZIP Codes identify specific geographic delivery areas. A ZIP code can represent an area within a state, one which
crosses state boundaries, or a single building.
State — The USPS® standard 2-letter state or territory abbreviation.
City — The city, community, station or other name by which a 5-digit ZIP area is known.
Type — Defines the ZIP Code for delivery purposes.
      P = A ZIP code used only for P.O. Boxes.
      U = A unique* ZIP code.
      M = Identifies an APO/FPO ZIP Code.
      Blank = A standard ZIP Code.

*These are assigned to specific organizations and not to a city as a whole. Unique ZIP's are assigned to organizations such as Reader's Digest, educational institutions and government facilities. Use the UNQ.DAT file to find the organization name.
County FIPS — A Federal Information Processing Standard number is assigned to each county in the United States. Use this number to find the county name in the CNTY.DAT file. FIPS numbers begin with the 2-digit state code and are followed by the 3-digit county code.

Latitude — The geographic coordinate of a point measured in degrees north or south of the equator.
Longitude — The geographic coordinate of a point measured in degrees east or west of the Greenwich meridian

This table had two indexes.
zip_code_based_lng_lat_zip_idx - used to match the exact zip code
zip_code_based_lat_lng_gist_idx - a GIST index using "ll_to_earth" functionality of "earth_distance" extension.

CREATE INDEX zip_code_based_lat_lng_gist_idx ON zip_code_based_lng_lat USING gist(ll_to_earth(lat::float, lng::float)) ;


Here, "ll_to_earth" returns the location of a point on the surface of the Earth given its latitude (argument 1) and longitude (argument 2) in degrees.

Example data:

  zip  | state |         city         | type |   lat   |   lng  
-------+-------+----------------------+------+---------+---------
 00501 | NY    | HOLTSVILLE           | U    | 40.8172 | 73.0451
 00501 | NY    | I R S SERVICE CENTER | U    | 40.8172 | 73.0451
 ...


Now, we created a function which will return data using two parameters as zip code and radius distance in miles.

CREATE OR REPLACE FUNCTION get_zip_code_based_records_fn(prm_zip_code varchar, prm_distance_in_miles int default 10) RETURNS SETOF RECORD AS $$
DECLARE
 cur1 cursor ( vlat float, vlng float, v_area_km float) IS
SELECT g.*, z.state, z.city, z.type
,
earth_distance(ll_to_earth(vlat, vlng), ll_to_earth(lat::float, lng::float))::float/(1000*1.609) as distance_in_mile
FROM image_info as g,
zip_code_based_lng_lat as z
WHERE substr(g.zip_code,1,5) = z.zip
AND earth_box(ll_to_earth(vlat, vlng), v_area_km) @> ll_to_earth(lat::float, lng::float)
AND earth_distance(ll_to_earth(vlat, vlng), ll_to_earth(lat::float, lng::float)) < v_area_km
;
vlat float;
vlng float;
v_area_km float;
BEGIN
  SELECT lat, lng
  INTO vlat, vlng
  FROM zip_code_based_lng_lat
  WHERE zip = prm_zip_code
  LIMIT 1;
  v_area_km := (prm_distance_in_miles*1000)::float*1.609;
  FOR rec IN cur1(vlat, vlng, v_area_km)
  LOOP
    RETURN NEXT rec;
  END LOOP;
  RETURN;
END;
$$ LANGUAGE 'plpgsql';

Here two parameters:
  • prm_zip_code - takes the zip code around which we want to get the data
  • prm_distance_in_miles  - takes an integer data denoting the circle distance radius around the "prm_zip_code" in miles; default 10 miles
In the function, we used a cursor using the query:

SELECT g.*, z.state, z.city, z.type,
earth_distance(ll_to_earth(vlat, vlng), ll_to_earth(lat::float, lng::float))::float/(1000*1.609) as distance_in_mile
FROM image_info as g,
          zip_code_based_lng_lat as z
WHERE
substr(g.zip_code,1,5) = z.zip
AND earth_box(ll_to_earth(vlat, vlng), v_area_km) @> ll_to_earth(lat::float, lng::float)
AND earth_distance(ll_to_earth(vlat, vlng), ll_to_earth(lat::float, lng::float)) < v_area_km

This cursor takes parameters such as
  • vlat - latitude of the given zip code at "prm_zip_code"
  • vlng - longitudeof the given zip code at "prm_zip_code"
  • v_area_km - circle area around "prm_zip_code"", user passes "prm_distance_in_miles" and the function converts it to KM and uses in the cursor

And uses two functions -
  • earth_box - Returns a box suitable for an indexed search using the cube @> operator for points within a given great circle distance of a location. Some points in this box are further than the specified great circle distance from the location, so a second check using earth_distance should be included in the query.
  • earth_distance - Returns the great circle distance between two points on the surface of the Earth
For more information, please visit http://www.postgresql.org/docs/9.2/static/earthdistance.html
Now, to get the data around 10 miles distance of zip code 18938, we use -
SELECT  *
FROM get_zip_code_based_records_fn('18938') as t
(
id bigint, content_type character varying,
          image_name character varying,
zip_code character varying ,
          distance_in_mile float
)
ORDER BY distance_in_mile
LIMIT 50

It uses the indexes that described above and give result within a second.


Continue reading →
Friday, August 16, 2013

Postgresql 9.2 Streaming Replication Setup

2 comments
Node 1 : 10.0.1.46
Node 2: 10.0.1.56

i) Prepare Node 1:
 
1. Check postgresqls running with the following parameters:

listen_addresses = '*'
wal_level = hot_standby
max_wal_senders = 5
wal_keep_segments = 32
hot_standby = on

2. Modify pg_hba.conf for replication

host all replicator 10.0.1.56/24 trust
host replication replicator 10.0.1.56/24 trust


3. Create required user for replication

CREATE USER replicator WITH PASSWORD 'replicator' REPLICATION LOGIN;

4. Reload pg_hba

pg_ctl reload

 
ii) Prepare Node 2:

1. Install Postgresql 9.2 at Node2

2. Modfiy postgresql.conf for replication

listen_addresses = '*'
wal_level = hot_standby
max_wal_senders = 5
wal_keep_segments = 32
hot_standby = on

3. Modify pg_hba.conf for replication

host all replicator 10.0.1.46/24 trust
host replication replicator 10.0.1.46/24 trust

4. Restart database

pg_ctl -mf restart

5. Create script to sync with other node and act as slave

vi sync_with_master.sh
#! /bin/sh
MASTER=10.0.1.46
MASTER_PGPORT=5432
MASTER_PGDATA=/var/lib/pgsql/9.2/data
MASTER_PGENGINE=/usr/pgsql-9.2/bin
SLAVE=10.0.0.37
SLAVE_PGDATA=$MASTER_PGDATA
SLAVE_PGPORT=$MASTER_PGPORT
SLAVE_PGENGINE=$MASTER_PGENGINE
REPL_USER=repl
$SLAVE_PGENGINE/pg_ctl stop -D $SLAVE_PGDATA -mf
psql -h $MASTER -p $MASTER_PGPORT -U $REPL_USER -d postgres -c "select pg_start_backup('replication_backup')"
rsync -azv "$MASTER":"$MASTER_PGDATA"/* "$SLAVE_PGDATA"/ --exclude="postmaster.pid" --exclude="*.conf*"
psql -h $MASTER -p $MASTER_PGPORT -U $REPL_USER -d postgres -c "select pg_stop_backup()"
echo "standby_mode = 'on' # enables stand-by (readonly) mode" > $SLAVE_PGDATA/recovery.conf
echo "primary_conninfo = 'host= $MASTER port= $MASTER_PGPORT user= $REPL_USER'" >> $SLAVE_PGDATA/recovery.conf
echo "trigger_file = '/tmp/pg_failover_trigger'" >> $SLAVE_PGDATA/recovery.conf
$SLAVE_PGENGINE/pg_ctl start -D $SLAVE_PGDATA -w

iii) Ensure password less ssh access for "postgres" user between the nodes

iv) Make Node 2 slave:

 sh sync_with_master.sh

v) Check replication:

At master:

SELECT pg_xlog_location_diff(pg_current_xlog_location(), '0/0') AS offset;

At Slave:

SELECT pg_xlog_location_diff(pg_last_xlog_receive_location(), '0/0') AS receive, pg_xlog_location_diff(pg_last_xlog_replay_location(), '0/0') AS replay;

If we get the same number in above two queries then we can assume that the replication is working.

Even, we may use "CREATE DATABASE db1" at Master and see if we get it at Slave or not.

v) Make Node 2 Master:

touch /tmp/pg_failover_trigger

At this stage, Node 2 starts act as Master while Node 1 is still alive as a stand-alone server.
Modify some data in Node 2. Now we can make Node 1 as a slave for Node 2.


Continue reading →
Friday, June 14, 2013

Nagios Script to find unused databases

1 comments
I created to nagios script to find out unused databases in our data centers. It gives us an alert like :

    ***** Nagios *****

    CRITICAL: 4 unused database(s) found at 0.0.0.0:5432 that were last accessed 7 days ago :-

      1) Database db1 last accessed on the 2013-05-22 04:02:08 that is 22 days ago

      2) Database db2 last accessed on the 2013-05-29 04:02:08 that is 15 days ago

      3) Database db3 last accessed on the 2013-05-17 04:02:08 that is 27 days ago

      4) Database db4 last accessed on the 2013-06-05 04:02:07 that is 8 days ago

    Notification Type: PROBLEM

    Service: Check Postgresql Unused Database

    State: CRITICAL

    Alias: dbserver1

    Host: dbserver1

    Address: 0.0.0.0

    Date/Time: Thu Jun 13 23:00:09 EDT 2013

The script is available at check-postgres-unused-db.sh. It takes some parameters :-

    Check Postgresql Unused Database

    Description: Create alert base on the number of databases older than given mode,time and units for a given Postgrsql database server

    Author: www.rummandba.com

    # This script expects psql to be in the PATH.

    Usage: ./check_postgres_least_vacuum [-h] [-v][ -H <host> ] [ -P <port> ] [ -U user ] [ -D dbname] [ -m <mode of time> ] [ -t <time value> ] [ -x <units> ] [-w <warn count>] [-c <critical count>]

    -h --help help

    -v --verbose verbose or debug mode

    -H --host host (default 127.0.0.1)

    -P --port port (default 5432)

    -U --user database user (default postgres)

    -S --password database user password

    -D --dbname dbname to connect with postgresql (default postgres)

    -m --mode mode of time to check with current time to mark as unused (c = create, a = access, m = modified; default : a)

    -t --time number of times after that database should be marked as unused

    -x --units units of measurement to compare (s = seconds; m = minutes; h = hours; D = days; M = months; Y = year)

    -w --warning warning threshold; number of databases older than given mode,time and units (default 1 )

    -c --critical critical threshold;number of databases older than given mode,time and units (default 3 )


Based on the db connection parameters, it connects to the Postgresql server and creates a function "check_pg_unused_db_f" and and after completion it removes the function.

The function uses the plpythonu language. So we must have Postgresql plpythonu language available in the dbserver.

    CREATE OR REPLACE FUNCTION check_pg_unused_db_f(stat_type varchar, dbid int)
      RETURNS text As \$BODY\$
       import sys
       import os
       import datetime
      
       afilename = 'base/' + str(dbid)
       (mode, ino, dev, nlink, uid, gid, size, atime, mtime, ctime) = os.stat(afilename)
       if stat_type == 'atime':
         return datetime.datetime.fromtimestamp(atime)
       elif stat_type == 'mtime':
         return datetime.datetime.fromtimestamp(mtime) 
       elif stat_type == 'ctime':
         return datetime.datetime.fromtimestamp(ctime)
       else:
         return 'UnknownVariable'   
      
      \$BODY\$ Language plpythonu;

The function takes stat_type ( a = access, m = modified , c = created) and find out that specific time for the given dbid. The query gives the desired result:

    SELECT OID, datname, replace(check_pg_unused_db_f('"$TIME_MODE"', OID::int ),' ','_') as check_time FROM pg_database WHERE datallowconn and NOT datistemplate and datname NOT  IN ('postgres')

Each database's check_time compares with the following three parameters :-

    -m --mode mode of time to check with current time to mark as unused (c = create, a = access, m = modified; default : a)

    -t --time number of times after that database should be marked as unused

    -x --units units of measurement to compare (s = seconds; m = minutes; h = hours; D = days; M = months; Y = year)

For example, if we want to find those databases that have not been accessed for last 15 days and if there exists any in the server we should get an alert, we may configure the script like :-

check_postgres_unused_db.sh -H 10.0.0.1 -P 5432 -D postgres -S password -m a -t 15 -x D -w 1 -c 3

After collecting all the data, check_postgres_unused_db script compare the total number of databases found and the number given as warning and critical threshold and provides the required alert.

Sample Nagios Configuration:

    define command{

         command_name check_postgres_unused_db

          command_line $USER1$/check_postgres_unused_db.sh   -H $HOSTADDRESS$ -P $ARG1$ -U $ARG2$ -D $ARG3$ -m $ARG4$ -t $ARG5$ -x $ARG6$ -w $ARG7$ -c $ARG8$

    }

    define service{

            use                                    generic-service

            host_name                         dbserver

            service_description            Check Postgresql Unused Database

            check_command               check_postgres_unused_db!5432!postgres!postgres!a!15!D!1!3

            contact_groups                  admins,dev-admins

            check_period                   non-workhours

            }

Continue reading →
Monday, June 10, 2013

Nagios to Check Postgresql Least Vacuum Time

0 comments
Nagios is one of the finest tools to monitor system. We use it in our environment to monitor all our systems like databases, application servers, services etc. check_postgres is one of the tools to monitor different aspect of Postgresql database service. It has a function named check_postrges_last_vacuum which works well to get notified if there is any critical distance between last vacuum date of a table of a database.
But our requirement was a bit different. We configured manual vacuum each night during backup and need a method which would notify us the least vacuum date for a table in any database in a server where we were not able to give any specific database name. The reason was such that if for some reason vacuum failed for one database we should be notified.
So we created a nagios plugin named check_postgres_least_vacuum and now its working fine.
The script is available at:
https://www.dropbox.com/sh/gw44f3c1tjm4k96/40gU8bgZsh/check_postgres_least_vacuum.sh


Then we defined  command in command.cfg:
define command{
       command_name  check-postgres-least-vacuum
       command_line  $USER1$/check_postgres_least_vacuum -H $HOSTADDRESS$ -P $ARG1$ -U $ARG2$ -D $ARG3$ -x $ARG4$ -w $ARG5$ -c $ARG6$
       }

Then we defined service in dbserver.cfg:

define service{
        use                             generic-service
        host_name                       DBSERVER1
        service_description             Check Postgresql DB Server Least Vacuum time
        check_command                   check-postgres-least-vacuum!5432!postgres!postgres!D!2!3
        contact_groups                  admins,dev-admins
        check_period                    non-workhours
        }






Continue reading →
Wednesday, March 13, 2013

Insert records instead of delete to reclaim space of a large table

0 comments
A few days back, I got a task to reclaim space for a database server by deleting a good number of records from a table. It was at Postgresql 9.1.
Initially, the developers planned to delete the rows and execute a vacuum full on the database.

delete from usage where cacheid in (
select cacheid from cache_tab
where distid in ('1','2','3')
AND createddatetime < '2011-05-01');

delete from cache_tab where distid in ('1','2','3') AND createddatetime < '2011-05-01';

Vacuum Full Analyze;

The table had  612613 records. The table description was -

search_engine2=# \d cache_tab
                                           Table "public.cache_tab"
         Column         |            Type             |                          Modifiers                          
------------------------+-----------------------------+--------------------------------------------------------------
 id                     | integer                     | not null default nextval('cache_tab_id_seq'::regclass)
 cachekey               | character varying(60)       | not null
 userid                 | bigint                      |
 col1                   | character varying(255)      |
 col2                   | bigint                      | not null
 query                  | character varying(5000)     | not null
 resultxml              | text                        | not null
 col4                   | character varying(50)       | not null
 createddatetime        | timestamp without time zone | not null default now()
 col6                   | character varying(255)      | not null
 distid                 | character varying(1000)     |
 col7                   | character varying(1000)     |
 col8                   | character varying(1000)     |
 col9                   | character varying(10)       |
 col10                  | timestamp without time zone | default now()
 col11                  | integer                     | not null default 0
 col12                  | bigint                      |
 col13                  | character varying(15)       |
 col14                  | timestamp without time zone |
 col15                  | boolean                     | default false
 col16                  | character varying(1000)     |
 col17                  | boolean                     | default false
Indexes:
    "cache_tab_pkey" PRIMARY KEY, btree (id)
    "cache_tab_ccol1_pkey" UNIQUE CONSTRAINT, btree (col1)
    "cache_tab_col2_index" btree (col2)
    "cache_tab_col3_index" btree (col3)
    "cache_tab_col4_index" btree (col4)
    "cache_tab_col5_index" btree (col5)
    "cache_tab_query_index" btree (query)
    "cache_tab_distid_index" btree (distid)

Referenced by:
    TABLE "usage" CONSTRAINT "usage_cacheid_fkey" FOREIGN KEY (cacheid) REFERENCES cache_tab(id)

The table size was 88 GB where the whole database size was 90 GB. Another table usage which was small, but the tricky part was it had a foreign key constaring references to cache_tab data.

 \d usage
                                          Table "public.usage"
        Column        |            Type             |                         Modifiers                        
----------------------+-----------------------------+-----------------------------------------------------------
 id                   | integer                     | not null default nextval('usage_id_seq'::regclass)
 hitid                | bigint                      | not null
 cacheid        | bigint                      | not null
 selectionref         | character varying(1000)     |
 createddatetime      | timestamp without time zone | not null default now()
 populateddatetime    | timestamp without time zone |
 populationstatuscode | character varying(20)       | default 'PENDING'::character varying
 lastmodifieddatetime | timestamp without time zone | default now()
Indexes:
    "usage_pkey" PRIMARY KEY, btree (id)
    "usage_hitid_index" btree (hitid)
    "usage_populationstatuscode_index" btree (populationstatuscode)
    "usage_cacheid_index" btree (cacheid)
Foreign-key constraints:
    "usage_hitid_fkey" FOREIGN KEY (hitid) REFERENCES sch_hit(id)
    "usage_cacheid_fkey" FOREIGN KEY (cacheid) REFERENCES cache_tab(id)

As vacuum full will lock the table and only delete would not going to reclaim space, I changed the plan to different way.

Instead of deleteing records, I planned to insert active records in a new table. My plan was -

1. Make a create table script for cache_tab using pg_dump
2. Disable Foreign Key with sch_hituse
3. Rename existing cache_tab to cache_tab_old
4. Create cache_tab from the script
5. Identify indexes from cache_tab_old and drop those indexes
6. Create missing indexes
7. Insert into cache_tab from schresultcache_old according to filter in small chunk
8. Delete data from sch_hituse that no longer needed
9. Make a dump backup of sch_hitresultcache_old and drop the table to reclaim space
10. Enable Foreign key

Step 1: Make a create table script for cache_tab using pg_dump

 

pg_dump search_engine2 -t cache_tab -s > cache_tab.sql.schema

 

Step 2: Disable Foreign Key at usage


ALTER TABLE usage  drop constraint usage_cacheid_fkey;

Step 3: Rename existing cache_tab to cache_tab_old

ALTER TABLE cache_tab rename to cache_tab_old;

 

Step 4. Create cache_tab from the script


psql search_engine2 < cache_tab.sql.schema

This created the table without index. It gave error that the indexes were already exist. I was aware of this error.
But I created the table at this moment to make the downtime very less. At this moment the application was ready to go and as the new table is very small, it would not effect the performance running withtout indexes.

Step 5: Identify indexes from cache_tab_old and drop those indexes


Created drop index script -
SELECT 'DROP INDEX '|| c2.relname || ';' as index_name ,pg_catalog.pg_get_indexdef(i.indexrelid, 0, true) as cmd
FROM pg_catalog.pg_class c, pg_catalog.pg_class c2, pg_catalog.pg_index i
  LEFT JOIN pg_catalog.pg_constraint con ON (conrelid = i.indrelid AND conindid = i.indexrelid AND contype IN ('p','u','x'))
WHERE c.relname = 'cache_tab' AND c.oid = i.indrelid AND i.indexrelid = c2.oid
ORDER BY i.indisprimary DESC, i.indisunique DESC, c2.relname;

Executed the drop index command.
Used the output of cmd for next step.

Step 6. Create missing indexes


Modified cache_tab_old to cache_tab in CREATE INDEX command and executed to create the indexes.

Step 7. Insert into cache_tab from cache_tab_old according to filter in small chunk


At this moment, I started to transfer data from cache_tab_old to cache_tab. I modified the WHERE condition given by the developers and also made a function to do this in small chunk.
insert into cache_tab
select * from cache_tab_old
where (createddatetime >= :START_TIME and  createddatetime < :END_TIME )
and distid not in ('1','2','3') 

I used time frame of two months like -
2011-05-01  - 2011-07-01
2011-07-01  - 2011-09-01
...
...
2013-03-01 - 2013-05-01

Before executing the insert statement, I created an index on cache_tab_old on (createddatetime,distid).

It took almost an hour to insert all 444279 records to the table.

Step 8. Delete data from sch_hituse that no longer needed


I created a backup copy of usage as  usage_bak and removed data from usage.

Step 9. Make a dump backup of cache_tab_old and drop the table to reclaim space


Before creating a dump backup of cache_tab_old for archiving, I deleted those records that are in cache_tab.
Again, instead of deletion, I created another table.
CREATE TABLE cache_tab_deleted_data as
select * from cache_tab_old where distid in ('1','2','3') AND createddatetime < '2011-05-01';
DROP TABLE cache_tab_old;

pg_dump search_engine2 -t cache_tab_deleted_data -Fc > cache_tab_deleted_data.sql.mar122013
pg_dump search_engine2 -t sch_hituse_old -Fc > sch_hituse_old.sql.mar122013

10. Enable Foreign key


Created the foreign key again that I disabled in step 2.

Now, the table cache_tab  size was 26 GB which was 88 GB earlier.

In this way, I reclaimed 62 GB of space without any downtime or locking.








Continue reading →
Thursday, March 7, 2013

Postgresql dynamic crosstab function

1 comments
Postgresql 9.0 has a contrib module named tablefunc which provide crosstab functionality.
http://www.postgresql.org/docs/9.1/static/tablefunc.html
It has two types of crosstab functions :-
  • crosstab(text sql)
  • crosstab(text source_sql, text category_sql)
We generally use the second one where we provide source and category sql.
source_sql is a SQL statement that produces the source set of data. This statement must return one row_name column, one category column, and one value column
category_sql is a SQL statement that produces the set of categories. This statement must return only one column. It must produce at least one row, or an error will be generated. Also, it must not produce duplicate values, or an error will be generated.
The biggest problem of this crosstab function is it needs to write all return column in sql queries to generate the report.
 
SELECT * FROM crosstab('...', '...')
    AS ct(row_name text, cat1 text, cat2 text, cat3 text, cat4 text);
One must know how many of categories are going to return as - cat1 text, cat2 text, cat3 text, cat4 text. So, it was not going with our idea to generate a report dynamically.
To overcome this, I create a function called crosstab_dynamic_sql_gen_fn. It takes following parameters as input :-
  • source_sql text - described above
  • category_sql text - described above
  • v_matrix_col_type varchar(100)  - data type for category values
  • v_matrix_rows_name_and_type varchar(100) - row name with data type like 'username text'
  • debug bool default false - to get debug output
And it returns the generated sql.
 
DROP FUNCTION crosstab_dynamic_sql_gen_fn (source_sql text, category_sql text, v_matrix_col_type varchar(100), v_matrix_rows_name_and_type varchar(100),debug bool);
CREATE OR REPLACE FUNCTION crosstab_dynamic_sql_gen_fn (source_sql text, category_sql text, v_matrix_col_type varchar(100), v_matrix_rows_name_and_type varchar(100),debug bool default false)
RETURNS text AS $$
DECLARE
v_sql text;
curs1 refcursor;
v_val text;
BEGIN
v_sql = v_matrix_rows_name_and_type;
OPEN curs1 FOR execute category_sql;
Loop
FETCH curs1 INTO v_val;
exit when v_val IS NULL;
v_sql = v_sql ||' , "'||v_val||'" '||v_matrix_col_type;
IF debug THEN
RAISE NOTICE 'v_val = %',v_val;
END IF;
END LOOP;
CLOSE curs1;
v_sql := 'SELECT * from crosstab(' || chr(10) || E' \''||source_sql || E'\','||chr(10) || E' \''||category_sql || E'\'' || chr(10)|| ' ) AS (' || v_sql ||')';
IF debug THEN
RAISE NOTICE 'v_sql = %',v_sql;
END IF;
RETURN v_sql;
END;
$$ language 'plpgsql';

Now, when I use the function, I get the sql to generate crosstab report.
select crosstab_dynamic_sql_gen_fn('select year, month, qty from sales order by 1','select distinct month from sales','int','year text');
crosstab_dynamic_sql_gen_fn
---------------------------------------------------------------------------------
SELECT * from crosstab( +
'select year, month, qty from sales order by 1', +
'select distinct month from sales' +
) AS (year text , "1" int , "5" int , "11" int , "12" int , "2" int , "7" int)
(1 row)
As long as you get the generated sql, you may do whatever you like with that. You may create function, view etc.
 
Up to this point, we can use this function for all our dynamic crosstab operation. Next turn is to use this for our specific purpose.
For this, I created another function populate_matrix_rep_fn which takes two parameters -
  • v_pkid int  - subsetid
  • v_outfile varchar(100) - output csv file path
It creates the csv file in the given path.
DROP FUNCTION populate_matrix_rep_fn(v_pkid int, v_outfile varchar(100));
CREATE OR REPLACE FUNCTION populate_matrix_rep_fn(v_pkid int, v_outfile varchar(100))
RETURNS void AS $$
DECLARE
v_source_sql text;
v_category_sql text;
v_sql text;
BEGIN
v_source_sql := 'SELECT rowname, ii.category, value '||
'FROM tab1 r, tab2 ii where ii.id=r.itemid '||
'and pkid=' || v_pkid ;

v_category_sql := 'SELECT distinct ii.category '||
'FROM tab1 r,tab2 ii where ii.id=r.itemid '||
'and pkid= '||v_pkid || ' order by 1 ';

v_sql := crosstab_dynamic_sql_gen_fn(v_source_sql,v_category_sql,'text','"user text"',false);
v_sql := E'COPY ( ' || v_sql || E' ) TO \'' || v_outfile || E'\' (FORMAT \'csv\', NULL \'0\' , HEADER)';
--RAISE NOTICE 'v_sql = %',v_sql;
EXECUTE v_sql;

END;
$$ Language 'plpgsql'
SET work_mem=1048576;

Now, simply, we can call the function to generate crosstab reports for different subsetid.
 
select populate_matrix_rep_fn(522219,'/tmp/rumman/out.csv');




Continue reading →
Friday, February 8, 2013

Postgresql type casting information using pg_cast

0 comments
Query to Postgresql type casting information:

SELECT ct.*,
source_t.typname as source_type
,target_t.typname as target_type
, proc.proname
FROM
pg_cast as ct
, pg_type as source_t
, pg_type as target_t
,pg_proc as proc
WHERE
ct.castsource = source_t.oid
and ct.casttarget = target_t.oid
and  ct.castfunc =  proc.oid


  source_type  |  target_type  |   proname   | castcontext | castmethod
---------------+---------------+-------------+-------------+------------
 int8          | int2          | int2        | a           | f
 int8          | int4          | int4        | a           | f
 int8          | float4        | float4      | i           | f
 int8          | float8        | float8      | i           | f
 int8          | numeric       | numeric     | i           | f
 int2          | int8          | int8        | i           | f
 int2          | int4          | int4        | i           | f
 int2          | float4        | float4      | i           | f
 int2          | float8        | float8      | i           | f
 int2          | numeric       | numeric     | i           | f
 int4          | int8          | int8        | i           | f

...
(143 rows)

The details of pg_cast is available at
http://www.postgresql.org/docs/9.2/static/catalog-pg-cast.html

Continue reading →

Labels