Sunday, October 31, 2010

DB performance degrade and the action steps in Postgresql

Once I had to improve performance for the CRM application running on Postgresql 8.1. Customers was complaining that after 1 pm the system got slow.
At first, I ensured that no background process was running at that time. Then I studied a lot and communicated with psql-archive writers. I planned some action steps and send mail to top management. The mail was as follows:-
The prime suggestion is to upgrade DB as only a year support available for 8.1.
As it is not possible right now, we may go for the following:

1. nohup psql crm -c "vacuum verbose analyze" > /tmp/vacuum_crm.log
- To get a vaccum log
- restart no required

2. Defragment the DB will increase performance. Take a dump and then restore. It'll reduce DB bloat. DB size'll be reduced to 3Gb from 12 gb.

DB wise:

3. default_statistics_target = 200
- increase this value to improve the quality of planner's estimates.
- restart not required
- set only for crm database
- current value 100

4. effective_cache_size = 17 gb then 20 gb incrementaly
- increase this value will force more index scan
- restart not required
- current value 1835010 ( 14 gb)
- statistics suggest we have free memory regularly
- this may cause paging for server, we need to check it out

System wide:

5. log_min_error_statement=error;
- To find out which statements are creating error in the DB.
- restart not required
- current value panic

6. log_min_messages=notice;
- To get more information about DB activities
- restart not required
- current value error

8. checkpoint_warning = 3600
- to find checkpoint time in peak time to find a good value for checkpoint_segment and checkpoint_timeout
- I got suggestion to go for 8-12, but I want to be confirm at first that checkpoint_segment is a performance issue in our case
- restart not required

9. max_fsm_pages = 2546960
- to reduce table bloat
- new value comes from vacuum verbose output
- required restart
- we are getting message in DB log every night to increase it during vaccum

10. shared_buffers = 2 gb
- the main shared memory for the DB
- doc suggests to increase it upto 25% of total RAM incrementaly
- current value 958 MB
- required restart

I have completed task #1 to find a good value for step 9.
I'll suggest to implement these changes incrementally so that we may find whether we are in correct way or not.
Here the suspicious steps are #3, #4, #9 and #10. We may go for step #9 at first.
Step #5, #6 and #7 are just DB log facility, so we should go for these as soon as possible.
Whale query tuning is an on-going process and will be continued.

So far, I modified two values - random_page_cost and work_mem and I got benefits.

I think the mail may help people in finding root cause for performance decrease.

Leave a Reply