Thursday, December 27, 2012

Python Virtual Environment Setup

0 comments

I had to install python 2.7 for an application where I did not want to change the default configuration of Python 2.4. That's why I choose to use vurtualenv.
virtualenv is a virtual python environment builder. pypi.python.org/pypi/virtualenv has detail information about virtualenv.

I configured my environment using the following steps:-

Download virtualenv


wget http://pypi.python.org/packages/source/v/virtualenv/virtualenv-1.8.4.tar.gz

Extract  it

tar -zxvf virtualenv-1.8.4.tar.gz

Create virtual environment

I keep my default python 2.4 intact. I installed python 2.7 from source in /extra/python2.7 and use that library to create virtual environment.

cd /root/virtualenv-1.8.4
/extra/python2.7/bin/python  virtualenv.py /extra/app_python
cd app_python
ls -l
total 12
drwxr-xr-x 2 root root 4096 Dec 27 15:22 bin
drwxr-xr-x 2 root root 4096 Dec 27 15:22 include
drwxr-xr-x 3 root root 4096 Dec 27 15:22 lib

 

Activate this environment

[root@rumman app_python]# source bin/activate
(app_python)[root@rumman app_python]#

Now if I use python, it uses python 2.7.
(app_python)[root@rumman app_python]# python
Python 2.7.3 (default, Dec 27 2012, 12:28:28)
[GCC 4.1.2 20080704 (Red Hat 4.1.2-52)] on linux2
Type "help", "copyright", "credits" or "license" for more information.
>>>

 

Return to normal mode

(app_python)[root@rumman app_python]# deactivate
[root@rumman erp_python]#
 

Continue reading →
Wednesday, December 26, 2012

Rank Functions

0 comments
RANK(), DENSE_RANK() and ROW_NUMBER() are three important reporting functions in Postgresql. These are also used in Oracle.

RANK
Retuns unique serial number for each row, except for rows that have duplicate values, in which case the same ranking is assigned
and a gap appears in the sequence for each duplicate ranking.

DENSE_RANK
Retunrs unique serial number for each row, except for rows that have duplicate values, in which case the same ranking is assigned.

ROW_NUMBER
Returns unique serial number for each row. For rows that have duplicate values, numbers are randomly assigned.

Let we have a table as follows:-

SELECT * FROM t1;

 id  | points 
----+----
  1 | 19
  2 | 86
  3 | 24
  4 |  5
  5 | 56
  6 | 48
  7 | 55
  8 | 74
  9 | 69
 10 | 34
 11 | 96
 12 | 32
 13 | 35
 14 | 92
 15 | 18
 16 | 85
 17 | 25
 18 | 88
 19 | 59
 20 | 78
(20 rows)

There is no duplicate value in "points" column.
So using the following query, we get same result for rank(), dense_rank() and row_number().

SELECT *,
RANK() OVER( ORDER BY points DESC ) rnk,
DENSE_RANK() OVER(ORDER BY points DESC ) drank,
ROW_NUMBER() OVER( ORDER BY points DESC ) rnum
FROM t1;

  id  | points  | rnk | drank | rnum
----+----+-----+-------+------
 11 | 96 |   1 |     1 |    1
 14 | 92 |   2 |     2 |    2
 18 | 88 |   3 |     3 |    3
  2 | 86 |   4 |     4 |    4
 16 | 85 |   5 |     5 |    5
 20 | 78 |   6 |     6 |    6
  8 | 74 |   7 |     7 |    7
  9 | 69 |   8 |     8 |    8
 19 | 59 |   9 |     9 |    9
  5 | 56 |  10 |    10 |   10
  7 | 55 |  11 |    11 |   11
  6 | 48 |  12 |    12 |   12
 13 | 35 |  13 |    13 |   13
 10 | 34 |  14 |    14 |   14
 12 | 32 |  15 |    15 |   15
 17 | 25 |  16 |    16 |   16
  3 | 24 |  17 |    17 |   17
  1 | 19 |  18 |    18 |   18
 15 | 18 |  19 |    19 |   19
  4 |  5 |  20 |    20 |   20

Now, removing a random row from the table and execute the query again to see if the result is changed:

BEGIN;
DELETE from t1 where id=8;

SELECT *,
RANK() OVER( ORDER BY points DESC ) rnk,
DENSE_RANK() OVER(ORDER BY points DESC ) drank,
ROW_NUMBER() OVER( ORDER BY points DESC ) rnum
FROM t1;

 id  | points  | rnk | drank | rnum
----+----+-----+-------+------
 11 | 96 |   1 |     1 |    1
 14 | 92 |   2 |     2 |    2
 18 | 88 |   3 |     3 |    3
  2 | 86 |   4 |     4 |    4
 16 | 85 |   5 |     5 |    5
 20 | 78 |   6 |     6 |    6
  9 | 69 |   7 |     7 |    7
 19 | 59 |   8 |     8 |    8
  5 | 56 |   9 |     9 |    9
  7 | 55 |  10 |    10 |   10
  6 | 48 |  11 |    11 |   11
 13 | 35 |  12 |    12 |   12
 10 | 34 |  13 |    13 |   13
 12 | 32 |  14 |    14 |   14
 17 | 25 |  15 |    15 |   15
  3 | 24 |  16 |    16 |   16
  1 | 19 |  17 |    17 |   17
 15 | 18 |  18 |    18 |   18
  4 |  5 |  19 |    19 |   19
(19 rows)

ROLLBACK;

Again, there is no difference among the results of rank(), dense_rank() and row_number().

Now, I interted a new rocord in the table to make a duplicate value in "points" column.

INSERT INTO t1 VALUES (21, 78);
Now we have two records with 78 points. Using the query we got result :-

 id  | points  | rnk | drank | rnum
----+----+-----+-------+------
 11 | 96 |   1 |     1 |    1
 14 | 92 |   2 |     2 |    2
 18 | 88 |   3 |     3 |    3
  2 | 86 |   4 |     4 |    4
 16 | 85 |   5 |     5 |    5
 21 | 78 |   6 |     6 |    6
 20 | 78 |   6 |     6 |    7
  8 | 74 |   8 |     7 |    8
  9 | 69 |   9 |     8 |    9
 19 | 59 |  10 |     9 |   10
  5 | 56 |  11 |    10 |   11
  7 | 55 |  12 |    11 |   12
  6 | 48 |  13 |    12 |   13
 13 | 35 |  14 |    13 |   14
 10 | 34 |  15 |    14 |   15
 12 | 32 |  16 |    15 |   16
 17 | 25 |  17 |    16 |   17
  3 | 24 |  18 |    17 |   18
  1 | 19 |  19 |    18 |   19
 15 | 18 |  20 |    19 |   20
  4 |  5 |  21 |    20 |   21
(21 rows)

The difference is obvious.
For rank() function in "rnk" column, we get rank 6 for both ID 21 and 20 and the next rank is 8 and the last rank is 21. Here rank 7 is not used as rank 6 gets two records.
For dense_rank() function in "drank" column, we get rank 6 for both ID 21 and 20 and the next rank is 7 and the last rank is 20. Here no gap is created.
For row_number() function in "rnum" column, we get two different values for the same "points" as it only gives the row number instead of ranking.

Continue reading →
Tuesday, December 18, 2012

Remove log files using Python in Windows Server

0 comments

I was working in Windows Server 2003 where one of our application service was creating a large log files every day and when the  log file used to become more than 2 GB the application rotate it to another new log file. I had to create a script to remove those old unnecessary log files.

I created a script using used Python 2.7 environment and add it to schedule task. Now it has been working fine.

#!/usr/bin/python -tt

import sys
import os

def main():
  filelist = []
  path = r'C:\test\log'
  if os.path.exists(path) == True:
    os.chdir(path)
    for f in os.listdir(os.getcwd()):
       if f.endswith('log'):
         lastmod_date = os.path.getmtime(f)
         f_tuple = lastmod_date , f
         filelist.append(f_tuple)
  if len(filelist) > 1: # if we have more than one log files then go to remove
    filelist.sort() # sort according to the lastmod_date
    filelist.pop() # remove the last item from the list as it is the current one
    for i in range(0,len(filelist)):
      os.remove(path + '\\'+ filelist[i][1])
  return
 
if __name__ == '__main__':
 main()
 
Continue reading →
Friday, December 14, 2012

pg_restore error with invalid input syntax for integer: "U" and out of memory

0 comments
I was trying to restore a 6 Gb database in my development machine which is running on Centos 5.6 with memory 1 GB.
The postgresql.conf was like :

shared_memory = 128 MB
maintenance_work_mem = 300 MB
checkpoint_segment = 10  # as the disk space is limited
fsync=off
autocommit=off

During restoration I got error  as follows:

LOG:  checkpoints are occurring too frequently (22 seconds apart)
HINT:  Consider increasing the configuration parameter "checkpoint_segments".
pg_restore: out of memory
pg_restore: finished item 8570 TABLE DATA entity
pg_restore: [archiver] worker process failed: exit code 1
ERROR:  invalid input syntax for integer: "U"
CONTEXT:  COPY entity, line 2120568, column version: "U"
STATEMENT:  COPY entity (crmid, smcreatorid, smownerid, modifiedby, setype, description, createdtime, modifiedtime, viewedtime, status, version, presence, deleted, owner_type) FROM stdin;

LOG:  could not send data to client: Broken pipe
STATEMENT:  COPY entity (crmid, smcreatorid, smownerid, modifiedby, setype, description, createdtime, modifiedtime, viewedtime, status, version, presence, deleted, owner_type) FROM stdin;

During error my OS status:
free -m
             total       used       free     shared    buffers     cached
Mem:          1024        975         48          0          3        857
-/+ buffers/cache:        114        909
Swap:         1027          0       1027


I posted in forums, discussed with experts and modified the shared_buffers and maintenance_work_mem to lower. But still I was getting the error.

Then I removed this backup copy and made a new one and used it. Now I was able to restore database successfully.

Then I came to knew that this development server had an io error a few days ago during the restoration of the database from this backup and I think that caused the backup copy damaged.
Continue reading →

Labels