Wednesday, January 5, 2011

RAID level selection for databases

RAID, an acronym for Redundant Array of Independent Disks, is a technology that provides increased storage functions and reliability through redundancy, combining multiple disk drives components into a logical unit where all drives in the array are interdependent.

Different levels of RAID:
  • RAID 0 – Block level striping without mirroring
  • RAID 1 - Mirroring without striping
  • RAID 2 - Bit-level striping with dedicated Hamming-code parity. 
  • RAID 3 - Byte striping with dedicated parity drive
  • RAID 4 - Block striping with dedicated parity 
  • RAID 5 - Block striping with distributed parity
  • RAID 6 - Block striping with double distributed parity
Comparison of RAID Levels: 





Fastest I/O

No overhead for parity

Simple design, easily implemented

Not really RAID

One drive failure destroys all data

Not for mission-critical deployment


All drives usable for data reads

Can be implemented w/ 2 drives

Greatest storage overhead - 100%

Highest cost/capacity ratio


High transfer rates

Degraded mode still fast

Requires spindle synchronization

Can't do overlapped I/O


High read transfer rates

Efficient use of capacity

Poor write rates

Parity drive can be bottleneck


Very high read rate

Efficient use of capacity

Slower write rates

Slow rebuild times


Allows failure of multiple drives

Very poor write performance

Proprietary solution, rare


Supposed to be fastest

Proprietary, very expensive

RAID 1+0

Very high reads and writes

Most expensive

RAID for databases:

There are a number of factors to consider when choosing the appropriate RAID level for a specific application’s data storage.
RAID 5 is better suited for mostly read-oriented applications while RAID 0+1 is also better suited for write intensive applications. In addition, stripping component (RAID 0) of RAID 0+1 offers the same read performance as RAID 5.

In our case for CRM application’s database, where we had to ensure good read-write speed with data protection, we chose RAID 1 for 2 drives (500 GB SATA) in Dell PowerEdge 6850 server and for a sports portal we used RAID 10 in  Dell PowerEdge M600 server and for a Bank database I had RAID 10 with remote mirroring in IBM Power System.


Continue reading →
Tuesday, January 4, 2011

Scheduled VACUUM instead of AutoVacuum

Once I found in Postgresql 9.0.1 that one of the report queries was hanged in the DB server while it was perfectly running on the development server which is a virtual machine with only 2 GB of RAM. Investigating properly, I found that some tables and indexes which were used in joining for the report query were bloated. I executed VACUUM ANALYZE manually and the system started to perform better.

Then I came to solution that the autovacuum is not vacuuming all the tables properly and it is fact.
Present status of some tables:

select last_vacuum, last_autovacuum, last_analyze, last_autoanalyze from pg_stat_user_tables  where relname ='crm';
 -[ RECORD 1 ]----+------------------------------
last_vacuum      | 2011-01-04 12:39:57.586728+06
last_autovacuum  | 2011-01-04 16:13:30.254226+06
last_analyze     | 2011-01-04 12:40:17.680571+06
last_autoanalyze | 2011-01-04 16:14:17.402896+06

Same as for other tables.

Data shows that tables were not vacuumed using autovacuum process and the reason behind the autovacuum parameters.

At present, our autovacuum configuration are as follows:
autovacuum_naptime = 5min
autovacuum_vacuum_threshold = 150
autovacuum_analyze_threshold = 150
autovacuum_vacuum_scale_factor = 0.2  
autovacuum_analyze_scale_factor = 0.1 

Thresholds for -
crm = 150 + 0.2 * 1446016 =  289353.2

So, according to postgresql docs, if 289353.2 number of rows have been modified for crm table since the last autovacuum  then it will be a candidate for autovacuum and that's why the tables has not been vacuumed yet by the autovacuumn process.

So I prefer schedule vacuum for the following reasons -
As, we have no online users during night time, so we may easily schedule routine vacuum. It'll remove the overhead of autovacuum process from the DB server during peak hour of operations.
Besides, it'll eliminate the parameter settings overhead though by table-level storage parameter settings we can ensure autovacuum for the tables.
Autovacuum does not work for temporary tables.

Continue reading →