Thursday, December 27, 2012

Python Virtual Environment Setup

0 comments

I had to install python 2.7 for an application where I did not want to change the default configuration of Python 2.4. That's why I choose to use vurtualenv.
virtualenv is a virtual python environment builder. pypi.python.org/pypi/virtualenv has detail information about virtualenv.

I configured my environment using the following steps:-

Download virtualenv


wget http://pypi.python.org/packages/source/v/virtualenv/virtualenv-1.8.4.tar.gz

Extract  it

tar -zxvf virtualenv-1.8.4.tar.gz

Create virtual environment

I keep my default python 2.4 intact. I installed python 2.7 from source in /extra/python2.7 and use that library to create virtual environment.

cd /root/virtualenv-1.8.4
/extra/python2.7/bin/python  virtualenv.py /extra/app_python
cd app_python
ls -l
total 12
drwxr-xr-x 2 root root 4096 Dec 27 15:22 bin
drwxr-xr-x 2 root root 4096 Dec 27 15:22 include
drwxr-xr-x 3 root root 4096 Dec 27 15:22 lib

 

Activate this environment

[root@rumman app_python]# source bin/activate
(app_python)[root@rumman app_python]#

Now if I use python, it uses python 2.7.
(app_python)[root@rumman app_python]# python
Python 2.7.3 (default, Dec 27 2012, 12:28:28)
[GCC 4.1.2 20080704 (Red Hat 4.1.2-52)] on linux2
Type "help", "copyright", "credits" or "license" for more information.
>>>

 

Return to normal mode

(app_python)[root@rumman app_python]# deactivate
[root@rumman erp_python]#
 

Continue reading →
Wednesday, December 26, 2012

Rank Functions

0 comments
RANK(), DENSE_RANK() and ROW_NUMBER() are three important reporting functions in Postgresql. These are also used in Oracle.

RANK
Retuns unique serial number for each row, except for rows that have duplicate values, in which case the same ranking is assigned
and a gap appears in the sequence for each duplicate ranking.

DENSE_RANK
Retunrs unique serial number for each row, except for rows that have duplicate values, in which case the same ranking is assigned.

ROW_NUMBER
Returns unique serial number for each row. For rows that have duplicate values, numbers are randomly assigned.

Let we have a table as follows:-

SELECT * FROM t1;

 id  | points 
----+----
  1 | 19
  2 | 86
  3 | 24
  4 |  5
  5 | 56
  6 | 48
  7 | 55
  8 | 74
  9 | 69
 10 | 34
 11 | 96
 12 | 32
 13 | 35
 14 | 92
 15 | 18
 16 | 85
 17 | 25
 18 | 88
 19 | 59
 20 | 78
(20 rows)

There is no duplicate value in "points" column.
So using the following query, we get same result for rank(), dense_rank() and row_number().

SELECT *,
RANK() OVER( ORDER BY points DESC ) rnk,
DENSE_RANK() OVER(ORDER BY points DESC ) drank,
ROW_NUMBER() OVER( ORDER BY points DESC ) rnum
FROM t1;

  id  | points  | rnk | drank | rnum
----+----+-----+-------+------
 11 | 96 |   1 |     1 |    1
 14 | 92 |   2 |     2 |    2
 18 | 88 |   3 |     3 |    3
  2 | 86 |   4 |     4 |    4
 16 | 85 |   5 |     5 |    5
 20 | 78 |   6 |     6 |    6
  8 | 74 |   7 |     7 |    7
  9 | 69 |   8 |     8 |    8
 19 | 59 |   9 |     9 |    9
  5 | 56 |  10 |    10 |   10
  7 | 55 |  11 |    11 |   11
  6 | 48 |  12 |    12 |   12
 13 | 35 |  13 |    13 |   13
 10 | 34 |  14 |    14 |   14
 12 | 32 |  15 |    15 |   15
 17 | 25 |  16 |    16 |   16
  3 | 24 |  17 |    17 |   17
  1 | 19 |  18 |    18 |   18
 15 | 18 |  19 |    19 |   19
  4 |  5 |  20 |    20 |   20

Now, removing a random row from the table and execute the query again to see if the result is changed:

BEGIN;
DELETE from t1 where id=8;

SELECT *,
RANK() OVER( ORDER BY points DESC ) rnk,
DENSE_RANK() OVER(ORDER BY points DESC ) drank,
ROW_NUMBER() OVER( ORDER BY points DESC ) rnum
FROM t1;

 id  | points  | rnk | drank | rnum
----+----+-----+-------+------
 11 | 96 |   1 |     1 |    1
 14 | 92 |   2 |     2 |    2
 18 | 88 |   3 |     3 |    3
  2 | 86 |   4 |     4 |    4
 16 | 85 |   5 |     5 |    5
 20 | 78 |   6 |     6 |    6
  9 | 69 |   7 |     7 |    7
 19 | 59 |   8 |     8 |    8
  5 | 56 |   9 |     9 |    9
  7 | 55 |  10 |    10 |   10
  6 | 48 |  11 |    11 |   11
 13 | 35 |  12 |    12 |   12
 10 | 34 |  13 |    13 |   13
 12 | 32 |  14 |    14 |   14
 17 | 25 |  15 |    15 |   15
  3 | 24 |  16 |    16 |   16
  1 | 19 |  17 |    17 |   17
 15 | 18 |  18 |    18 |   18
  4 |  5 |  19 |    19 |   19
(19 rows)

ROLLBACK;

Again, there is no difference among the results of rank(), dense_rank() and row_number().

Now, I interted a new rocord in the table to make a duplicate value in "points" column.

INSERT INTO t1 VALUES (21, 78);
Now we have two records with 78 points. Using the query we got result :-

 id  | points  | rnk | drank | rnum
----+----+-----+-------+------
 11 | 96 |   1 |     1 |    1
 14 | 92 |   2 |     2 |    2
 18 | 88 |   3 |     3 |    3
  2 | 86 |   4 |     4 |    4
 16 | 85 |   5 |     5 |    5
 21 | 78 |   6 |     6 |    6
 20 | 78 |   6 |     6 |    7
  8 | 74 |   8 |     7 |    8
  9 | 69 |   9 |     8 |    9
 19 | 59 |  10 |     9 |   10
  5 | 56 |  11 |    10 |   11
  7 | 55 |  12 |    11 |   12
  6 | 48 |  13 |    12 |   13
 13 | 35 |  14 |    13 |   14
 10 | 34 |  15 |    14 |   15
 12 | 32 |  16 |    15 |   16
 17 | 25 |  17 |    16 |   17
  3 | 24 |  18 |    17 |   18
  1 | 19 |  19 |    18 |   19
 15 | 18 |  20 |    19 |   20
  4 |  5 |  21 |    20 |   21
(21 rows)

The difference is obvious.
For rank() function in "rnk" column, we get rank 6 for both ID 21 and 20 and the next rank is 8 and the last rank is 21. Here rank 7 is not used as rank 6 gets two records.
For dense_rank() function in "drank" column, we get rank 6 for both ID 21 and 20 and the next rank is 7 and the last rank is 20. Here no gap is created.
For row_number() function in "rnum" column, we get two different values for the same "points" as it only gives the row number instead of ranking.

Continue reading →
Tuesday, December 18, 2012

Remove log files using Python in Windows Server

0 comments

I was working in Windows Server 2003 where one of our application service was creating a large log files every day and when the  log file used to become more than 2 GB the application rotate it to another new log file. I had to create a script to remove those old unnecessary log files.

I created a script using used Python 2.7 environment and add it to schedule task. Now it has been working fine.

#!/usr/bin/python -tt

import sys
import os

def main():
  filelist = []
  path = r'C:\test\log'
  if os.path.exists(path) == True:
    os.chdir(path)
    for f in os.listdir(os.getcwd()):
       if f.endswith('log'):
         lastmod_date = os.path.getmtime(f)
         f_tuple = lastmod_date , f
         filelist.append(f_tuple)
  if len(filelist) > 1: # if we have more than one log files then go to remove
    filelist.sort() # sort according to the lastmod_date
    filelist.pop() # remove the last item from the list as it is the current one
    for i in range(0,len(filelist)):
      os.remove(path + '\\'+ filelist[i][1])
  return
 
if __name__ == '__main__':
 main()
 
Continue reading →
Friday, December 14, 2012

pg_restore error with invalid input syntax for integer: "U" and out of memory

0 comments
I was trying to restore a 6 Gb database in my development machine which is running on Centos 5.6 with memory 1 GB.
The postgresql.conf was like :

shared_memory = 128 MB
maintenance_work_mem = 300 MB
checkpoint_segment = 10  # as the disk space is limited
fsync=off
autocommit=off

During restoration I got error  as follows:

LOG:  checkpoints are occurring too frequently (22 seconds apart)
HINT:  Consider increasing the configuration parameter "checkpoint_segments".
pg_restore: out of memory
pg_restore: finished item 8570 TABLE DATA entity
pg_restore: [archiver] worker process failed: exit code 1
ERROR:  invalid input syntax for integer: "U"
CONTEXT:  COPY entity, line 2120568, column version: "U"
STATEMENT:  COPY entity (crmid, smcreatorid, smownerid, modifiedby, setype, description, createdtime, modifiedtime, viewedtime, status, version, presence, deleted, owner_type) FROM stdin;

LOG:  could not send data to client: Broken pipe
STATEMENT:  COPY entity (crmid, smcreatorid, smownerid, modifiedby, setype, description, createdtime, modifiedtime, viewedtime, status, version, presence, deleted, owner_type) FROM stdin;

During error my OS status:
free -m
             total       used       free     shared    buffers     cached
Mem:          1024        975         48          0          3        857
-/+ buffers/cache:        114        909
Swap:         1027          0       1027


I posted in forums, discussed with experts and modified the shared_buffers and maintenance_work_mem to lower. But still I was getting the error.

Then I removed this backup copy and made a new one and used it. Now I was able to restore database successfully.

Then I came to knew that this development server had an io error a few days ago during the restoration of the database from this backup and I think that caused the backup copy damaged.
Continue reading →
Monday, November 12, 2012

Collect Mysql table statistics for millions of tables running for Wordpress site

0 comments
I was administering the database for a Portal where we had millions of blogs. The site was build on Wordpress. So we had Mysql database with Wordpress Hyperdb setup. As each blogs consists of 9 separate tables, we had huge number of tables in the database. This huge number of tables were creating a great problem in database maintenance like we couldn't use any single command on INFORMATION_SCHEMA to get table status or something. However, we found that we can use SHOW TABLE STAUS in each database. As we had almost 150 databases, it was not possible to execute SHOW TABLES and then collect data for all. So we planned to create a script for the task.
The script would traverse all the databases and execute SHOW TABLE STATUS and store the result in a table named table_stats.table_stat.
Here, we created a new schema table_stats  to store all the statistics related information.
CREATE TABLE `tmp_table_stat` (
`TABLE_NAME` varchar(64) NOT NULL DEFAULT '',
`ENGINE` varchar(64) DEFAULT NULL,
`VERSION` bigint(21) unsigned DEFAULT NULL,
`ROW_FORMAT` varchar(10) DEFAULT NULL,
`TABLE_ROWS` bigint(21) unsigned DEFAULT NULL,
`AVG_ROW_LENGTH` bigint(21) unsigned DEFAULT NULL,
`DATA_LENGTH` bigint(21) unsigned DEFAULT NULL,
`MAX_DATA_LENGTH` bigint(21) unsigned DEFAULT NULL,
`INDEX_LENGTH` bigint(21) unsigned DEFAULT NULL,
`DATA_FREE` bigint(21) unsigned DEFAULT NULL,
`AUTO_INCREMENT` bigint(21) unsigned DEFAULT NULL,
`CREATE_TIME` datetime DEFAULT NULL,
`UPDATE_TIME` datetime DEFAULT NULL,
`CHECK_TIME` datetime DEFAULT NULL,
`TABLE_COLLATION` varchar(32) DEFAULT NULL,
`CHECKSUM` bigint(21) unsigned DEFAULT NULL,
`CREATE_OPTIONS` varchar(255) DEFAULT NULL,
`TABLE_COMMENT` varchar(80) NOT NULL DEFAULT ''
);


CREATE TABLE table_stats.`table_stat` (
`TABLE_SCHEMA` varchar(64) NOT NULL DEFAULT '',
`STAT_COLLECTION_TIME` datetime DEFAULT NOW();
`TABLE_NAME` varchar(64) NOT NULL DEFAULT '',
`ENGINE` varchar(64) DEFAULT NULL,
`VERSION` bigint(21) unsigned DEFAULT NULL,
`ROW_FORMAT` varchar(10) DEFAULT NULL,
`TABLE_ROWS` bigint(21) unsigned DEFAULT NULL,
`AVG_ROW_LENGTH` bigint(21) unsigned DEFAULT NULL,
`DATA_LENGTH` bigint(21) unsigned DEFAULT NULL,
`MAX_DATA_LENGTH` bigint(21) unsigned DEFAULT NULL,
`INDEX_LENGTH` bigint(21) unsigned DEFAULT NULL,
`DATA_FREE` bigint(21) unsigned DEFAULT NULL,
`AUTO_INCREMENT` bigint(21) unsigned DEFAULT NULL,
`CREATE_TIME` datetime DEFAULT NULL,
`UPDATE_TIME` datetime DEFAULT NULL,
`CHECK_TIME` datetime DEFAULT NULL,
`TABLE_COLLATION` varchar(32) DEFAULT NULL,
`CHECKSUM` bigint(21) unsigned DEFAULT NULL,
`CREATE_OPTIONS` varchar(255) DEFAULT NULL,
`TABLE_COMMENT` varchar(80) NOT NULL DEFAULT ''
);

The first table  `tmp_table_stat` holds statistics record for temporary purpose. It then push the results to  `table_stat`  with SCHEMA_NAME and STAT_COLLECTION_TIME.
All these tasks were done by a python scrip:

#!/usr/bin/python -tt
import sys
import commands
import os
import MySQLdb as mdb
db_server='localhost'
db_port='*****'
db_user='*****'
db_password='******'
db_name='table_stats'

def save_table_stat():
db_list = []
v_file = "/tmp/status.data"

mysql_con = mdb.connect(host=db_server, user=db_user,db="table_stats",port=int(db_port))
cur=mysql_con.cursor()
cur.execute("SHOW DATABASES LIKE 'test'")
rows = cur.fetchall()

for r in rows:

qry = "TRUNCATE TABLE table_stats.tmp_table_stat" 
cur.execute(qry)
db_name = str(r[0])
#print 'Dumping status for ' + db_name

qry = "INSERT INTO command_log (execute_time, command, object) SELECT now() as execute_time, 'Dumping stats' as command, 'Stats DB Name:: " + db_name +"'"
print qry
cur.execute(qry)

cmd = 'mysql -h ' + db_server + ' -P' + db_port + ' -u' + db_user + ' ' + db_name + ' -e "SHOW TABLE STATUS WHERE COMMENT NOT LIKE \'view%\' " -B -N > ' + v_file
#print cmd
os.system(cmd)
cmd = 'chmod 777 ' + v_file
print "Done"


qry = "INSERT INTO command_log (execute_time, command, object) SELECT now() as execute_time, 'Loading data' as command, 'Stats DB Name:: " + db_name +"'"
cur.execute(qry)
qry = "LOAD DATA INFILE '" + v_file + "' INTO TABLE table_stats.tmp_table_stat FIELDS TERMINATED BY '\\t' LINES TERMINATED BY '\\n' "
#print qry
cur.execute(qry)


qry = "INSERT INTO command_log (execute_time, command, object) SELECT now() as execute_time, 'Parmanent store' as command, 'Stats DB Name:: " + db_name + "'"
cur.execute(qry)
qry = "INSERT INTO table_stats.table_stat SELECT '" + db_name + "' as TABLE_SCHEMA, now() as stat_collection_tome, t.* FROM table_stats.tmp_table_stat as t" 
#print qry
cur.execute(qry)

qry = "INSERT INTO command_log (execute_time, command, object) SELECT now() as execute_time, 'Completed', 'Stats DB Name:: " + db_name + "'"
cur.execute(qry)


cur.close()
def main():
save_table_stat()


if __name__ == '__main__':
main()


The script first collects all the database names in a tuple and then connect to each schema and execute SHOW TABLES. The result is written to a file and then load data from this file into a tmp table and consequently into the permanent statistics table.

Our plan was to collect all statistics data and then analyze on those to remove unused blogs and tables.




Continue reading →
Tuesday, October 16, 2012

Getting used to with Postgresql Timestamp Conversion

0 comments

I was about to get knowledge on Postgresql Timezone conversion. The develoeprs need a query where they store a timestamp without timezone,
but when they match this timestamp with user values, database should convert both the timestamp values to one timezone.
Timezone conversion for Postgresql has described clearly in Postgresql documentation at 9.9.3.
http://www.postgresql.org/docs/9.1/static/functions-datetime.html#FUNCTIONS-DATETIME-ZONECONVERT

I went through it and tested it as follows.

I have a table as follows:

create table tmptab(id int, t timestamp);

insert into tmptab
select c, '2012-10-15 16:29:00'::timestamp without time zone as t
from generate_series(1,100) as c;

System timezone is US/Eastern which I get from timezone system variable.

select id, t,
(t::timestamp with time zone at time zone 'BDT') as from_est_to_bdt ,
(t::timestamp without time zone at time zone 'BDT') as from_bdt_to_est
from tmptab 

In the above query there are two columns - from_est_to_bdt and from_bdt_to_est

from_est_to_bdt - it follows the doc Table 9-28
Expression:  timestamp with time zone AT TIME ZONE zone
Return type: timestamp without time zone
Description: Convert given time stamp with time zone to the new time zone, with no time zone designation
Here, the value of t is considering as EST as our default timezone and then convert to BDT.

from_bdt_to_est - it follows the doc Table 9-28
Expression:  timestamp without time zone AT TIME ZONE zone
Return type: timestamp with time zone 
Description: Treat given time stamp without time zone as located in the specified time zone
Here, the value of t is considering in BDT and then convert to EST in our default timezone.

So, in our application, where user is going to give us a date from different location and we need to match that date with timestamp wihtout time zone values,
I did the following query :-
select *
from tmptab
where (t::timestamp with time zone at time zone 'BDT')::date  = '2012-10-17' 
And it worked for me.

BTW, I got the timezone names from pg_timezone_names.
select * from pg_timezone_names

Continue reading →
Wednesday, July 25, 2012

Mysql function to check a value is numeric or not

0 comments
In Mysql 5.1 I need a function which would return true if a column value was Numeric and false if it was not.
I searched the internet and found the link -
http://stackoverflow.com/questions/2756736/how-to-only-select-numeric-data-from-mysql

It gave me a good idea about how to do it. Then I wrote my one as follows :-

CREATE FUNCTION is_numeric (sIn text) RETURNS tinyint
DETERMINISTIC
NO SQL
   RETURN sIn REGEXP '^-*[[:digit:]]+(\.[[:digit:]]+)?$' ;
 
Description -   



  • ^ = match at the begining of the string
  • $ = match at the end of the string
  • [:digit:] = digit class
  • ^[[:digit:]]+$ = from the begining to the end only digit characters
  • -* = 0 or more occurrances of - (minus sign)
  • (\.[[:digit:]]+)? = if a .(dot) is used then there must be 1 or more digit after that and by using (..)?, we ensured 0 or more occurrences of the item




  
Continue reading →
Tuesday, July 24, 2012

Mysql Query tune for a wordpress site

0 comments


Yesterday one of the developers of my team came to me with the following query which would summarize user ratings in the homepage of our wordpress site.
The query was taking 22 sec and it was not tolerable for a home page to take 22 sec every time the page load.

The basic query was as follows -

SELECT
userids.user_id,
u.user_login,
metaFirst.meta_value AS first_name,
metaPrimarySchool.meta_value AS primary_school_blog_id,
b.alias AS primary_school_blog_alias,
SUM(cp.measures) AS measures
FROM users AS u
INNER JOIN (
SELECT DISTINCT user_id FROM usermeta WHERE meta_key LIKE '%_capabilities'
AND
(
meta_value LIKE '%abc%'
)
) AS userids ON (u.ID = userids.user_id AND u.ID<>1)
INNER JOIN usermeta metaFirst ON (metaFirst.user_id=userids.user_id AND metaFirst.meta_key = 'first_name')
INNER JOIN usermeta metaPrimarySchool ON (metaPrimarySchool.user_id=userids.user_id AND metaPrimarySchool.meta_key = 'primary_school_blog_id')
INNER JOIN blogs b ON (b.blog_id = metaPrimarySchool.meta_value)
INNER JOIN cp cp ON (cp.uid=userids.user_id AND cp.timestamp>=1341100800)
GROUP BY cp.uid
ORDER BY measures DESC
limit 5;

We had to tune the query.

We took the following steps :-

Created a index on timestamp column of table cp.
create index cp_timestamp on cp(timestamp);

Created a 2 GB hot_cache and cached the most frequently used tables such as blogs and usermeta.

set global hot_cache.key_buffer_size=2197815296;
cache index blogs, usermeta in hot_cache;

Rewrite the query by replacing the derived join with EXISTS as we don't need any column from the derived "userids" in SELECT clause.



SELECT
cp.uid userid,
u.user_login,
metaFirst.meta_value AS first_name,
metaPrimarySchool.meta_value AS primary_school_blog_id,
b.alias AS primary_school_blog_alias,
b.blog_name AS blog_name,
SUM(cp.measures) AS measures
FROM users AS u
INNER JOIN usermeta metaFirst ON (metaFirst.user_id=u.id AND metaFirst.meta_key = 'first_name')
INNER JOIN usermeta metaPrimarySchool ON (metaPrimarySchool.user_id=u.id AND metaPrimarySchool.meta_key = 'primary_school_blog_id')
INNER JOIN blogs b ON (b.blog_id = metaPrimarySchool.meta_value)
INNER JOIN cp cp ON (cp.uid=u.id AND cp.timestamp>=cast(unix_timestamp(concat(year(curdate()),'-', lpad(month(curdate()),2,0),'-', '01')) as unsigned))
WHERE EXISTS
(
SELECT 1
FROM usermeta as userids
where meta_key LIKE '%_capabilities'
AND
(
meta_value LIKE '%abc%'
)
AND userids.user_id = u.id
)
GROUP BY cp.uid
ORDER BY measures DESC
LIMIT 100;



Now the query is giving result under a second.

Continue reading →
Wednesday, June 20, 2012

Character Set Conversion for Wordpress MySql

0 comments
In my wordpress database, there were some tables which were running with character set latin1 while others were running with utf8.
As our application use SET NAMES UTF8 before every query, we were not facing any trouble from client side. But it was not standard that without any requirement we had the database with multiple character set. So I decided to modify all tables character set to utf8.

To modify the character set of tables, we had to look for any char, varchar or text data type column in those tables. If any type of these columns exists, we had to modify it to BLOB and then convert the character set and then again modify to its original data type. In this way, we can ensure no loss of any character.

As primary and unique indexes do not allow to modify the data type of a column, we had to drop indexes from the columns to achive the above task.

Our wordpress database has been running with Hyperdb configuration. So we have 'db' as the common database and then 'dbN' for each 1000 blogs.

I followed the below steps:-

Step 1: Stop connetions


Change database user password to stop connection


Step 2: Create Drop indexes Script

vi 2_drop_indexes.sql
alter table db.table1 drop  index index1;
select 1;
alter table db.table2 drop  index index1;
select 2;
...
...
alter table db.table3 drop  primary key;
select 16;



Step 3: Create Script tp Modify Columns to BLOB


select
concat('alter table `', table_schema, '`.`', table_name,'` modify ',column_name,' BLOB;') as cmd_before
from information_schema.columns as c
where table_schema = 'db'
and data_type in ('char', 'varchar','text')
and exists
(
select 1
from information_schema.tables as t
where table_schema = 'db'
and t.table_type = 'BASE TABLE'
and t.table_collation <> 'utf8_general_ci'
and t.table_name = c.table_name
and t.table_schema = c.table_schema
)
INTO OUTFILE '3_col_text_to_blob.sql' ;


Step 4: Create Script tp Change table character set


#ALTER TABLE table_name CONVERT TO CHARACTER SET charset_name [COLLATE collation_name];
select concat('ALTER TABLE `',table_schema, '`.`', table_Name,'` CONVERT TO CHARACTER SET utf8;') as cmd
from information_schema.tables
where table_schema = 'db'
and table_type = 'BASE TABLE'
and table_collation <> 'utf8_general_ci'
INTO OUTFILE '4_table_latin_to_utf8.sql' ;


Step 5: Create Script to Modify wp_N_tems_relationships



select concat('ALTER TABLE `',table_schema, '`.`', table_Name,'` CONVERT TO CHARACTER SET utf8;') as cmd
from information_schema.tables
where table_name like 'wp_%_term_relationships'
and table_type = 'BASE TABLE'
and table_collation <> 'utf8_general_ci'
INTO OUTFILE '5_table_tems_relationships_to_utf8.sql' ;


Step 6: Create Script to Revert column character set


select concat('alter table `',table_schema, '`.`', table_name,'` modify ',column_name,' ', column_type
,';') as cmd_after
from information_schema.columns as c
where table_schema = 'db'
and data_type in ('char', 'varchar','text')
and exists
(
select 1
from information_schema.tables as t
where table_schema = 'db'
and t.table_type = 'BASE TABLE'
and t.table_collation <> 'utf8_general_ci'
and t.table_name = c.table_name
and t.table_schema = c.table_schema
)
INTO OUTFILE '6_col_blob_to_text.sql' ;


Step 7: Create Script to Create index


vi 7_crate_index.sql
alter table table1 add unique key index1(col1, col2);
select 1;
alter table db.table2 add unique index index1 (index1);
select 2;
...
...
alter table db.table3 add primary key (col1);
select 16;


Step 8: Create Script to Analyze Table


vi 8_analyze_tables.sql
analyze table table1;
select 1;
analyze table db.table2;
select 2;
...
...
analyze table db.table3 ;
select 16;


Step 9: Execute Scripts


mysql db -s -N < 2_drop_indexes.sql
mysql db -s -N < 3_col_text_to_blob.sql
mysql db -s -N < 4_table_latin_to_utf8.sql
mysql db -s -N < 5_table_tems_relationships_to_utf8.sql
mysql db -s -N < 6_col_blob_to_text.sql
mysql db -s -N < 7_crate_index.sql
mysql db -s -N < 8_analyze_tables.sql


Step 10: Modify database parameters


vi my.cnf
...
[mysqld]
...
default_character_set=utf8
...
[mysql]
...
default_character_set=utf8

mysql restart

Continue reading →
Sunday, June 17, 2012

Convert MS Sql script to MySql

0 comments
Yesterday I got a work to convert a MS SQL CREATE TABLE script to MySql. There were some other stuffs like DROP PROCEDURE, ALTER TABLE in the script. But my client wanted me to convert only the CREATE TABLE parts. So rather converting this manually, I made a python script to achieve the task.
The script was like as follows:-

exec sp_dboption N'testdb', N'autoclose', N'true'
GO
exec sp_dboption N'testdb', N'bulkcopy', N'false'
GO
/****** Object:  Stored Procedure dbo.testproc    Script Date: 11/01/2003 15:25:29 ******/
if exists (select * from dbo.sysobjects where id = object_id(N'[dbo].[testproc]') and OBJECTPROPERTY(id, N'IsProcedure') = 1)
drop procedure [dbo].[testproc]
GO
CREATE TABLE [dbo].[table1] (
[OrgCode] [varchar] (5) NOT NULL ,
[OrgName] [varchar] (60) NOT NULL ,
[Logo] [image] NULL ,
[Col1] [bit] NOT NULL ,
[Col2] [varchar] (25) NOT NULL ,
[col3] [timestamp] NULL 
) ON [PRIMARY] 
GO

/****** Object:  Table [dbo].[table2]    Script Date: 11/01/2003 15:27:18 ******/
CREATE TABLE [dbo].[table2] (
[ORG_CODE] [varchar] (5) NOT NULL ,
[col1] [varchar] (10) NOT NULL ,
[col2] [timestamp] NULL 
) ON [PRIMARY]
GO

I found that every statement has been ended with GO command. I selected it as a delimiter for the script. At first, I read the whole file in my buffer. Then I extracted only the CREATE TABLE ... GO parts. I found there were 233 tables. Then I modified the CREATE TABLE parts by adding ` before column and table names, replacing [ and ] and also adding longblob and decimal(19,4) data type instead of the image and money data types.

The convert.py as follows:

#!/usr/bin/python -tt

import sys
import commands
import os
import re


def make_list_create_table(script_text):
 
  list = script_text.split('\nGO')
  list_create_table = []
  for i in range(len(list)):
    match = re.search('CREATE TABLE',list[i])
    if match:
  list_create_table.append(list[i])

  return list_create_table
def convert_create_table_for_mysql(list_create_table):
  for i in range(len(list_create_table)):
    s = list_create_table[i]
 
    m = re.search('create table(.*)\([\r\n a-zA-Z0-9,\t_\)\( \[\]]+\)',s,flags=re.DOTALL|re.IGNORECASE)
    if m:
      s = m.group()
      #replace [] from column name and add `
      s = re.sub(r'''([\(,]+[\r\n\t ]*)\[([ a-zA-Z_0-9]+)\]''', r'''\1`\2`''', s, flags=re.DOTALL|re.IGNORECASE)
      #replace all other []
      s = s.replace('[','')
      s = s.replace(']','')
      #replace dbo. and add ` for table name
      s = re.sub(r'''dbo.([a-zA-Z0-9_]+) ''', r'''`\1`''',s, flags=re.DOTALL|re.IGNORECASE)
      s = s + ' ENGINE=InnoDB DEFAULT CHARSET=latin1;\n\n'
      list_create_table[i] = s
   
    else:
      print "!!!not match!!!"
 
  return   list_create_table


def write_file(file_path,str):
  f = open(file_path,'w')
  f.writelines(str)
  f.close
  return
def data_type_conversion(list_create_table):
  for i in range(len(list_create_table)):
    s = list_create_table[i]
    s = s.replace(' image ',' longblob ');
    s = s.replace(' money ',' decimal (19,4) ');
    #s = s.replace(' numeric ',' decimal ');
    list_create_table[i] = s
  return list_create_table
 
def main():
  file_path = 'E:\mssql_scriptsql'
  f = open(file_path,'r')
  script_text = f.read()
  f.close

  list_create_table = make_list_create_table(script_text)
  list_create_table = convert_create_table_for_mysql(list_create_table)
  list_create_table = data_type_conversion(list_create_table)
  write_file('e:\out.sql',list_create_table)
  return
if __name__ == '__main__':
 main()


Continue reading →
Sunday, June 10, 2012

Baseball Statistics Data Model

4 comments
A few months ago I was working in a sports portal project. As a database developer, my first task was designing data model for Baseball sports. I am living in Bangladesh where I have not ever played the game. Even I did not watch the game in television let alone my knowledge on game rules and statistics. But I have to solve the task.
So what I did I read the following two links :-
Baseball Rules
Baseball Statistics

These two links help to gather knowledge on the game. But theoretical knowledge was not enough for me and as I am a sports lover from my childhood, I collect the game BASEBALL 2005 by EA Sports. Playing the game I got myself interested in Baseball and also I got the knowledge about it.
Finally, I designed the data model for the development team which I added here below:-



I designed the data model in 3 layers-
1. Common Layer : 
This layer was designed to support player profile entry, player roster management, teams, stadium. It was named Common as we planned to re-use this part for other games also. The tables are  - Players
- Team_Rosters
- Season
- Roster_details
- Team
- Stadium
- Position
- Game
- Game_Roster
2. Game Layer:
This layer was designed specifically to support Baseball game. Tables are
- Half_innings
- Batter_at_bat
- Pitcher
- Plate_outcome
- Plate_outcome_types
- Picher_statistics_at_half_innigs
- Batter_statistics_at_half_innigs
3. Statistics Layer:
This layer was designed to summarize the game layer's data to display the statistics.Tables are
-Batter_stats_in_game
-Picther_stat_in_game

The rest of the logic was written in application layer.





Continue reading →
Friday, June 8, 2012

Postgresql dblink connect with random port

0 comments

I was working in a project where I have to build a function which will use dblink to connect to other Postgresql databases. It could be easily done unless my client wanted to use port address dynamically from the function. More precisely, he would going to connect to other databases using dblink_connect and the critical part was in his environment different Postgresql clusters were running in different ports. He was the project owner and he told me that the developers had no idea in which ports the Postgresql instances were running. So he wanted me to write a function which would find out the port of the Postgresql instance from where the dblink_connect had been called. The developer would pass the database name, the user name and the password and my function would use these variables to make the developer connect for operation in that Postgresql instance.

So I build the function as below :-

CREATE OR REPLACE FUNCTION dblink_connect_with_dynamic_port(prm_dbname text, prm_user_name text, prm_password text) RETURNS text AS $$
DECLARE
      str VARCHAR(100);
      v_port int;
BEGIN
       execute 'show port' into v_port ;
       str := '''dbname = '||prm_dbname||' user= '||prm_user_name||' password = '||prm_password||'  port=' ||v_port || '''';
      str :=  'select dblink_connect(\'new_conn\','||str||')';
       execute str;
       return str;
END;
$$ LANGUAGE plpgsql;

Continue reading →
Friday, June 1, 2012

Sql Server Index Rebuild

2 comments

I had to tune our application database that was running on Sql Server 2005. The first step that I choose to rebuild the indexes. I used sys.dm_db_index_physical_stats view to get size and fragmentation information for the data and indexes of the specified table or view.


As data in this view are refreshed with server restart, I created a table to hold this information.
CREATE TABLE testdb.[dbo].[index_stat](
    [id] [int]  IDENTITY(1,1),
    [objectid] [int] NULL,
    [indexid] [int] NULL,
    [partitionnum] [int] NULL,
    [frag] [float] NULL,
    [start_time] [datetime] NULL,
    [completed_time] [datetime] NULL
) ON [PRIMARY]
Saving data of the above view in the table where avg_fragmentation_in_percent is greater than 10. 
use testdb
insert into testdb.dbo.index_stat([objectid],[indexid],[partitionnum],[frag])
SELECT
    object_id AS objectid,
    index_id AS indexid,
    partition_number AS partitionnum,
    avg_fragmentation_in_percent AS frag
FROM sys.dm_db_index_physical_stats (5, NULL, NULL , NULL, 'LIMITED')
WHERE avg_fragmentation_in_percent > 10.0 AND index_id > 0;


Now I created the index_rebuild_proc to rebuild the indexes where fragmentation is above 30.0 and reorganize where fragmentation is between 10.0 and 30.0.


-- Ensure a USE statement has been executed first.
CREATE PROCEDURE index_rebuild_proc
AS
DECLARE @id int;
DECLARE @objectid int;
DECLARE @indexid int;
DECLARE @partitioncount bigint;
DECLARE @schemaname nvarchar(130);
DECLARE @objectname nvarchar(130);
DECLARE @indexname nvarchar(130);
DECLARE @partitionnum bigint;
DECLARE @partitions bigint;
DECLARE @frag float;
DECLARE @command nvarchar(4000);
-- Conditionally select tables and indexes from the sys.dm_db_index_physical_stats function
-- and convert object and index IDs to names.


-- Declare the cursor for the list of partitions to be processed.
DECLARE indexes_cur CURSOR FOR SELECT id, objectid, indexid, partitionnum, frag FROM testdb.dbo.index_stat where completed_time is NULL ;
-- Open the cursor.
OPEN indexes_cur;
-- Loop through the partitions.
WHILE (1=1)
    BEGIN;
        FETCH NEXT
           FROM indexes_cur
           INTO @id, @objectid, @indexid, @partitionnum, @frag;
        IF @@FETCH_STATUS < 0 BREAK;
        SELECT @objectname = QUOTENAME(o.name), @schemaname = QUOTENAME(s.name)
        FROM sys.objects AS o
        JOIN sys.schemas as s ON s.schema_id = o.schema_id
        WHERE o.object_id = @objectid;
        SELECT @indexname = QUOTENAME(name)
        FROM sys.indexes
        WHERE  object_id = @objectid AND index_id = @indexid;
        SELECT @partitioncount = count (*)
        FROM sys.partitions
        WHERE object_id = @objectid AND index_id = @indexid;
-- 30 is an arbitrary decision point at which to switch between reorganizing and rebuilding.
        IF @frag < 30.0
            SET @command = N'ALTER INDEX ' + @indexname + N' ON ' + @schemaname + N'.' + @objectname + N' REORGANIZE';
        IF @frag >= 30.0
            SET @command = N'ALTER INDEX ' + @indexname + N' ON ' + @schemaname + N'.' + @objectname + N' REBUILD';
        IF @partitioncount > 1
            SET @command = @command + N' PARTITION=' + CAST(@partitionnum AS nvarchar(10));
        if getdate() < cast(CONVERT(VARCHAR(20),GETDATE(),101) + ' 05:00:00' as datetime)
        BEGIN
         
            update testdb.dbo.index_stat set start_time = getdate() where id = @id;
            --EXEC (@command);
            PRINT N'Executed: ' + @command;
            update testdb.dbo.index_stat set completed_time = getdate() where id = @id;
        END
    END;
-- Close and deallocate the cursor.
CLOSE indexes_cur;
DEALLOCATE indexes_cur;
GO


This procedure will update start_time column of testdb.dbo.index_stat as soon as it starts rebuilding of that index and similarly update completed_time column of the table after completion of rebuilding process. This helps me to monitor the progress of the procedure. Again, as I am going to add this procedure in job scheduler, so I add a condition so that the process might stop as soon as server time passes 05:00 AM.


Now, execute the procedure.
exec index_rebuild_proc
Continue reading →
Thursday, April 12, 2012

Python Script for Remote Server Connection through SSH2

100 comments
To make an ssh connection to remote server and scp files from that server through Python script, I used the python package "paramiko". This is a library for making SSH2 connections (client or server). Emphasis is on using SSH2 as an alternative to SSL for making secure connections between python scripts. All major ciphers and hash methods are supported. SFTP client and server mode are both supported too.
 I installed paramiko in my python environment using the following steps -
wget http://pypi.python.org/packages/source/p/paramiko/paramiko-1.7.7.1.zip
unzip  paramiko-1.7.7.1.zip
cd paramiko-1.7.7.1
python setup.py build
python setup.py install
Now check whether the installation was successful or not -
python
>>> import paramiko

I added the script below where I created functions to open ssh connection, find required directory in remote machine and close that ssh connection.
#!/usr/bin/python -tt
import sys
import commands
import paramiko
import os
import re
import shutil

def open_ssh_connection():
    ssh = paramiko.SSHClient()
    ssh.set_missing_host_key_policy(paramiko.AutoAddPolicy())
    ssh.connect(hostname=source_server, username=source_os_user, password=source_os_password)
    return ssh

def find_backup_dir(ssh):   
  #Change directory to remote server and list the directories
  stdin, stdout, stderr = ssh.exec_command('cd /BACKUP; ls')
  cmd_out = stdout.readlines()
  #Make tuple to string
  cmd_out= ','.join(cmd_out)
  #Find backup directories like backup_020612, here backup_dirs is a list
  backup_dirs = re.findall(r'backup_\w+', cmd_out)
  backup_dirs.sort()
  #Select the last backup directory
  backup_dir = backup_dirs[len(backup_dirs)-1]
  backup_dir=source_dbbackupdir + '/' + backup_dir
  return backup_dir

def close_ssh_connection(ssh):
  ssh.close()

def main():
  print ""
  print "Opening connection to source server ..."
  ssh = open_ssh_connection()
  print "Done"
  print ""
  print "Searching for latest backup directory in source server ..."
  source_dbbackupdir = find_backup_dir(ssh)
  print "Lastest backup directory =", source_dbbackupdir
  print "Done"
  print ""
  print "Closing connection with the source server..."
  close_ssh_connection(ssh)
  print "Done"
  print ""
  return

if __name__ == '__main__':
 main()
Continue reading →
Thursday, April 5, 2012

Install Mysql-Python

3 comments
In order to connect to Mysql from Python, I used MySQLdb library. It is an interface to the MySQL database server for Python.  For more information - pypi.python.org/pypi/MySQL-python.

I installed MySQL-python in my environment following the steps below. 

Import  setuptools:

setuptools is a python package that is used to download, build, install, upgrade, and uninstall Python packages -- easily!  For more information -  http://pypi.python.org/pypi/setuptools

And as setuptools was not installed in my machine, I had to make it at first. 

wget http://pypi.python.org/packages/2.7/s/setuptools  \
/setuptools-0.6c11py2.7.egg#md5=fe1f997bc722265116870bc7919059ea
sh setuptools-0.6c11-py2.7.egg

Install MySQL-python:

wget http://pypi.python.org/packages/source/M/MySQL-python/MySQL-python-1.2.3.tar.gz
tar -zxvf  MySQL-python-1.2.3.tar.gz
cd  MySQL-python-1.2.3
python setup.py build 
python setup.py install

Check Installation:

# python
Python 2.7.1 (r271:86832, Jun  8 2011, 12:30:16)
[GCC 4.1.2 20071124 (Red Hat 4.1.2-42)] on linux2
Type "help", "copyright", "credits" or "license" for more information.
>>import MySQLdb
>>

First Script:

vi mysql_test.py

#!/usr/bin/python -tt

import MySQLdb as mdb
import sys
con = None
def main():

    con = mdb.connect(host='localhost',  user='testuser',passwd=t'test_passwd',db='testdb',port=3306)
    cur = con.cursor()
    cur.execute("SELECT VERSION()")
    data = cur.fetchone()
    print "Database version : %s " % data
    return
   
if __name__ == '__main__':
 main()



Execute Script:

python mysql_test.py
Database version : 5.1.48-community-log 


For more basics of Mysql programming with Python, you may visit
http://zetcode.com/databases/mysqlpythontutorial
Continue reading →

The beginning of Python

0 comments
For last few days, I was thinking of learning something new for development works. From several options I chose to learn Python. In fact, I was encouraged by google's tutorial for python by Nick Parlante. It really helped me to start python programming quickly. You may get it at -



I also got support materials at
http://code.google.com/edu/languages/google-python-class.

Another very good python tutorial site for the beginners is http://www.learnpython.org/.
I worked with these sites and had some basic skills. I installed Python 2.7 in my machine and developed my first script hello.py.

hello.py
#!/usr/bin/python2.7 -tt

def main():
 print "Started learning python"

#boilerplate function
if __name__ == '__main__':
 main()

Executed the scipt

python hello.py

I got output -
Started learning python

I was happy and then went on my study and completed my first script to import wordpress blogs to a staging machine from production backup. I'll upload my script later.
However, in the script I had to make SSH connection with different servers, scp file from remote server to local server, connect with mysql and execute queries, restore mysql dump file to new mysql instance. All these tasks were pretty much easy with Python. Lots of library available, I utilized them and made my work.
Continue reading →
Monday, February 27, 2012

Query to find unused blogs in Wordpress

0 comments

I have been administering mysql database running for Wordpress. The system has been running with  HyperDB implementation where we separated every 1000 blogs in different databases.
For maintenance, I have to find the blogs which have not used for more than 1 year and to achieve this I used the following query:
SELECT
SUBSTR(TABLE_NAME,LOCATE('_',TABLE_NAME,2)+1, LOCATE('_',TABLE_NAME,2)-1) AS blog_id,
max(CREATE_TIME) as CREATE_TIME,
max(update_time) as last_update_time,
datediff(now() ,UPDATE_TIME) as days_passed_since_last_update
FROM
INFORMATION_SCHEMA.TABLES
WHERE
TABLE_SCHEMA = 'db0'
AND TABLE_TYPE = 'BASE TABLE'
AND datediff(now() ,UPDATE_TIME) > 365
GROUP BY SUBSTR(TABLE_NAME,LOCATE('_',TABLE_NAME,2)+1, LOCATE('_',TABLE_NAME,2)-1)
Output:

blog_id CREATE_TIME last_update_time days_passed_since_last_update
10 2011-01-26 02:37:54 2011-01-26 02:37:54 397
11 2011-01-26 02:37:56 2011-01-26 02:37:56 397
12 2011-01-26 02:37:57 2011-01-26 02:37:57 397
13 2011-01-26 02:37:58 2011-01-26 02:37:58 397
14 2011-01-26 02:38:00 2011-01-26 02:38:00 397
15 2011-01-26 02:38:01 2011-01-26 02:38:01 397
16 2011-01-26 02:38:03 2011-01-26 02:38:03 397
17 2011-01-26 02:38:04 2011-01-26 02:38:04 397
18 2011-01-26 02:38:05 2011-01-26 02:38:05 397
...
Continue reading →
Saturday, February 18, 2012

Postgresql dbi-link to access Sql Server Database

3 comments

Postgresql database can be configured to fetch data from Sql Server tables using dbi-link.
DBI-Link is a partial implementation of the SQL/MED portion of the SQL:2008 specification written in PL/Perl. It communicates with Sql Server through DBD-Sybase perl module from CPAN.
Before that however, we must first compile the freetds library. FreeTDS is a set of libraries for Unix and Linux that allows your programs to natively talk to Microsoft SQL Server and Sybase databases.
The DBD::Sybase module is built on top of the Sybase Open Client Client Library API.
In this test, I used Postgresql 9.1.2 and Sql Server 2005.
Following steps describe how we can configure Postgresql database to fetch data from Sql Server 2008:
STEP 1: Freetds Install

wget http://ibiblio.org/pub/Linux/ALPHA/freetds/stable/freetds-stable.tgz

tar -zxvf freetds-stable.tgz

cd freetds-0.91
./configure --prefix=/usr/local/freetds --with-tdsver=7.0
make
make install

vi freetds.conf
...
[testserver]
host=10.0.0.1
port=1433
tds version = 8.0
...
:wq


Check if the installation is successful:
# /usr/local/freetds/bin/tsql -S teseserver -U testuser
Password:
locale is "en_US.UTF-8"
locale charset is "UTF-8"
using default charset "UTF-8"
1>

STEP 2: Compiling DBD::Sybase

wget http://search.cpan.org/CPAN/authors/id/M/ME/MEWP/DBD-Sybase-1.14.tar.gz

tar -zxvf DBD-Sybase-1.14.tar.gz

cd DBD-Sybase-1.14

vi CONFIG

..

SYBASE=/usr/local/freetds

...


perl Makefile.pl

make

make install

Check if the installation is successful:
Create a perl script named first.pl.
#!/usr/bin/perl

use DBI;

$dsn = 'DBI:Sybase:server=testserver';

my $dbh = DBI->connect($dsn, "testuser", 'std123');
die "unable to connect to server $DBI::errstr" unless $dbh;

$dbh->do("use testdb");

$query = "SELECT * FROM testdb.dbo.school_test";
$sth = $dbh->prepare ($query) or die "prepare failed\n";
$sth->execute( ) or die "unable to execute query $query error $DBI::errstr";

$rows = $sth->rows ;
print "$row rows returned by query\n";

while ( @first = $sth->fetchrow_array ) {
foreach $field (@first) {
print "field: $field\n";
}
}

#perl /tmp/first.pl
rows returned by query
field: 1
field: A school
field: 2
field: B High
field: 3
field: C Academy
field: 4
field: D Academy


STEP 3: Dwonload dbi-link

http://pgfoundry.org/projects/dbi-link/

Download dbi-link from

wget http://pgfoundry.org/frs/download.php/1235/dbi-link-2.0.0.tar.bz2


tar -jxvf dbi-link-2.0.0.tar.bz2
Step 4: Configure dbi-link

psql>
create extension plperlu;


\i /usr/local/src/dbi-link-2.0.0/dbi_link.sql


UPDATE
pg_catalog.pg_settings
SET
setting = '"$user",public'
CASE WHEN 'dbi_link' = ANY(string_to_array(setting, ','))
THEN setting
ELSE 'dbi_link,' || setting

UPDATE
pg_catalog.pg_settings
SET
setting = '"$user",public'
WHERE
name = 'search_path'
;

SELECT dbi_link.make_accessor_functions(
'dbi:Sybase:testserver'::dbi_link.data_source,
'dswebuser'::text,
'std123'::text,
'---
AutoCommit: 1
RaiseError: 1
'::dbi_link.yaml,
NULL::dbi_link.yaml,
NULL::text,
NULL::text,
'testserver_mssql'::text
);
Check if the installation is successful:
select * from dbi_link.dbi_connection;
 data_source_id |  data_source  | user_name |  auth  | dbh_attributes | remote_schema | remote_catalog | local_schema
----------------+---------------+-----------+--------+----------------+---------------+----------------+--------------
              1 | dbi:Sybase:testserver | testuser | std123 | ---           +|               |                | testserver_mssql
                |               |           |        | AutoCommit: 1 +|               |                |
                |               |           |        | RaiseError: 1 +|               |                |
                |               |           |        |                |               |                |

data_source_id = 1

SELECT * from dbi_link.remote_select(
1,
'select school_name from testdb.dbo.school_test'::text
) as c(school_name text)

school_name                
-------------------------------------------
  A School
  B High
  C Academy
  D Academy
(4 rows)
References
Continue reading →

Labels