Friday, June 14, 2013

Nagios Script to find unused databases

I created to nagios script to find out unused databases in our data centers. It gives us an alert like :

    ***** Nagios *****

    CRITICAL: 4 unused database(s) found at that were last accessed 7 days ago :-

      1) Database db1 last accessed on the 2013-05-22 04:02:08 that is 22 days ago

      2) Database db2 last accessed on the 2013-05-29 04:02:08 that is 15 days ago

      3) Database db3 last accessed on the 2013-05-17 04:02:08 that is 27 days ago

      4) Database db4 last accessed on the 2013-06-05 04:02:07 that is 8 days ago

    Notification Type: PROBLEM

    Service: Check Postgresql Unused Database

    State: CRITICAL

    Alias: dbserver1

    Host: dbserver1


    Date/Time: Thu Jun 13 23:00:09 EDT 2013

The script is available at It takes some parameters :-

    Check Postgresql Unused Database

    Description: Create alert base on the number of databases older than given mode,time and units for a given Postgrsql database server


    # This script expects psql to be in the PATH.

    Usage: ./check_postgres_least_vacuum [-h] [-v][ -H <host> ] [ -P <port> ] [ -U user ] [ -D dbname] [ -m <mode of time> ] [ -t <time value> ] [ -x <units> ] [-w <warn count>] [-c <critical count>]

    -h --help help

    -v --verbose verbose or debug mode

    -H --host host (default

    -P --port port (default 5432)

    -U --user database user (default postgres)

    -S --password database user password

    -D --dbname dbname to connect with postgresql (default postgres)

    -m --mode mode of time to check with current time to mark as unused (c = create, a = access, m = modified; default : a)

    -t --time number of times after that database should be marked as unused

    -x --units units of measurement to compare (s = seconds; m = minutes; h = hours; D = days; M = months; Y = year)

    -w --warning warning threshold; number of databases older than given mode,time and units (default 1 )

    -c --critical critical threshold;number of databases older than given mode,time and units (default 3 )

Based on the db connection parameters, it connects to the Postgresql server and creates a function "check_pg_unused_db_f" and and after completion it removes the function.

The function uses the plpythonu language. So we must have Postgresql plpythonu language available in the dbserver.

    CREATE OR REPLACE FUNCTION check_pg_unused_db_f(stat_type varchar, dbid int)
      RETURNS text As \$BODY\$
       import sys
       import os
       import datetime
       afilename = 'base/' + str(dbid)
       (mode, ino, dev, nlink, uid, gid, size, atime, mtime, ctime) = os.stat(afilename)
       if stat_type == 'atime':
         return datetime.datetime.fromtimestamp(atime)
       elif stat_type == 'mtime':
         return datetime.datetime.fromtimestamp(mtime) 
       elif stat_type == 'ctime':
         return datetime.datetime.fromtimestamp(ctime)
         return 'UnknownVariable'   
      \$BODY\$ Language plpythonu;

The function takes stat_type ( a = access, m = modified , c = created) and find out that specific time for the given dbid. The query gives the desired result:

    SELECT OID, datname, replace(check_pg_unused_db_f('"$TIME_MODE"', OID::int ),' ','_') as check_time FROM pg_database WHERE datallowconn and NOT datistemplate and datname NOT  IN ('postgres')

Each database's check_time compares with the following three parameters :-

    -m --mode mode of time to check with current time to mark as unused (c = create, a = access, m = modified; default : a)

    -t --time number of times after that database should be marked as unused

    -x --units units of measurement to compare (s = seconds; m = minutes; h = hours; D = days; M = months; Y = year)

For example, if we want to find those databases that have not been accessed for last 15 days and if there exists any in the server we should get an alert, we may configure the script like :- -H -P 5432 -D postgres -S password -m a -t 15 -x D -w 1 -c 3

After collecting all the data, check_postgres_unused_db script compare the total number of databases found and the number given as warning and critical threshold and provides the required alert.

Sample Nagios Configuration:

    define command{

         command_name check_postgres_unused_db

          command_line $USER1$/   -H $HOSTADDRESS$ -P $ARG1$ -U $ARG2$ -D $ARG3$ -m $ARG4$ -t $ARG5$ -x $ARG6$ -w $ARG7$ -c $ARG8$


    define service{

            use                                    generic-service

            host_name                         dbserver

            service_description            Check Postgresql Unused Database

            check_command               check_postgres_unused_db!5432!postgres!postgres!a!15!D!1!3

            contact_groups                  admins,dev-admins

            check_period                   non-workhours


Continue reading →
Monday, June 10, 2013

Nagios to Check Postgresql Least Vacuum Time

Nagios is one of the finest tools to monitor system. We use it in our environment to monitor all our systems like databases, application servers, services etc. check_postgres is one of the tools to monitor different aspect of Postgresql database service. It has a function named check_postrges_last_vacuum which works well to get notified if there is any critical distance between last vacuum date of a table of a database.
But our requirement was a bit different. We configured manual vacuum each night during backup and need a method which would notify us the least vacuum date for a table in any database in a server where we were not able to give any specific database name. The reason was such that if for some reason vacuum failed for one database we should be notified.
So we created a nagios plugin named check_postgres_least_vacuum and now its working fine.
The script is available at:

Then we defined  command in command.cfg:
define command{
       command_name  check-postgres-least-vacuum
       command_line  $USER1$/check_postgres_least_vacuum -H $HOSTADDRESS$ -P $ARG1$ -U $ARG2$ -D $ARG3$ -x $ARG4$ -w $ARG5$ -c $ARG6$

Then we defined service in dbserver.cfg:

define service{
        use                             generic-service
        host_name                       DBSERVER1
        service_description             Check Postgresql DB Server Least Vacuum time
        check_command                   check-postgres-least-vacuum!5432!postgres!postgres!D!2!3
        contact_groups                  admins,dev-admins
        check_period                    non-workhours

Continue reading →