Tuesday, December 3, 2013

Postgresql online space reclaim using pg_repack

1 comments
Working with the Postgresql databases, often we hear the term table bloat and index bloat. Postgresql uses MVCC technology, for which deleted or updated rows remain in the table and it creates index and table bloat. Earlier, we used to work with VACUUM FULL or CLUSTER operation, in order to, reclaim the unused space of a table. VACUUM FULL and CLUSTER both locks the table so that users cannot make any changes on it.
So in 24X7 production system, it was difficult to to execute those operation. Searching a bit, I found the Postgresql extension project pg_repack.

What is pg_repack:


Accoring to https://github.com/reorg/pg_repack,
pg_repack is a PostgreSQL extension which lets you remove bloat from tables and indexes, and optionally restore the physical order of clustered indexes. Unlike CLUSTER and VACUUM FULL it works online, without holding an exclusive lock on the processed tables during processing. pg_repack is efficient to boot, with performance comparable to using CLUSTER directly.
pg_repack 1.1.8 was released as a drop-in replacement for pg_reorg, addressing some of the shortcomings of the last pg_reorg version (such as support for PostgreSQL 9.2 and EXTENSION packaging) and known bugs.


Install pg_repack:

Download pg_repack from   http://pgxn.org/dist/pg_repack/.
Installation of pg_repack quite easy and simple. pg_repack looks for pg_config, so make sure pg_config is available.

cp -R pg_repack /source/postgresql-9.2.4/contrib/pg_repack
cd pg_repack
make
make install

Now at psql client

psql# select * from pg_available_extensions where name = 'pg_repack';
name              | pg_repack
default_version   | 1.1.8
installed_version | 1.1.8
comment           | Reorganize tables in PostgreSQL databases with minimal locks

CREATE EXTENSION pg_repack;
pg_repack creates a schema "repack" in the database;

Tables in repack:


\dv repack.*
            List of relations
 Schema |     Name     | Type |  Owner
--------+--------------+------+----------
 repack | primary_keys | view | postgres
 repack | tables       | view | postgres
(2 rows)

Functions in repack:

\df repack.*
                                                                       
 Schema |           Name            |  Type
--------+---------------------------
 repack | array_accum                | agg
 repack | conflicted_triggers        | normal
 repack | disable_autovacuum         | normal
 repack | get_assign                 | normal
 repack | get_columns_for_create_as  | normal
 repack | get_compare_pkey           | normal
 repack | get_create_index_type      | normal
 repack | get_create_trigger         | normal
 repack | get_drop_columns           | normal
 repack | get_enable_trigger         | normal
 repack | get_index_columns          | normal
 repack | get_index_keys             | normal
 repack | oid2text                   | normal
 repack | repack_apply               | normal
 repack | repack_drop                | normal
 repack | repack_indexdef            | normal
 repack | repack_swap                | normal
 repack | repack_trigger             | trigger
 repack | version                    | normal
 repack | version_sql                | normal
(20 rows)                                                                                                                                                 

Example with a dummy table:

Create a table:

CREATE TABLE t1 ( i INT PRIMARY KEY, nam CHAR(20));  

Populate the table:

INSERT INTO t1 SELECT c , 'a'||c FROM generate_series(1,10000000) as c;
INSERT 0 10000000

Check table size: 

 select pg_size_pretty(pg_table_size('t1')); pg_size_pretty
----------------
 539 MB
(1 row)

Delete records from table:

DELETE FROM t1 WHERE i < 100000;
DELETE 99999
SELECT pg_size_pretty(pg_table_size('t1'));
 pg_size_pretty
----------------
 539 MB
(1 row)
DELETE FROM t1 WHERE i < 900000;
DELETE 400000
select pg_size_pretty(pg_table_size('t1'));
 pg_size_pretty
----------------
 539 MB
(1 row)

Update some records:
   
Update t1 SET nam = 'c'||i WHERE i < 100000; 

Check table size: 

SELECT pg_size_pretty(pg_table_size('t1'));
 pg_size_pretty
----------------
 828 MB
(1 row)

Execute pg_repack command:

pg_repack -d postgres -t t1 --no-order

Online update during repack:

update t1 set nam = 'c'||i where i = 1000000;
UPDATE 1

Check table size after repack:

select pg_size_pretty(pg_table_size('t1'));
pg_size_pretty
----------------
490 MB
(1 row)


Moitoring locks:

Using the following query, we monitored the lock during repack

 SELECT locktype, mode, relation, c.relname
 FROM pg_locks as l
 INNER JOIN pg_class as c
   ON l.relation = c.oid;

 locktype |       mode       | relation |          relname

----------+------------------+----------+-----------------------
 relation | AccessShareLock  |    33608 | t1_pkey
 relation | AccessShareLock  |    33605 | t1
 relation | RowExclusiveLock |    33641 | log_33605_pkey
 relation | RowExclusiveLock |    33634 | log_33605
 relation | AccessShareLock  |    11069 | pg_locks
 relation | AccessShareLock  |     2663 | pg_class_relname_nsp_index
 relation | AccessShareLock  |     2662 | pg_class_oid_index
 relation | AccessShareLock  |     1259 | pg_class
 relation | AccessShareLock  |    11069 | pg_locks
 relation | SIReadLock       |    33605 | t1
 relation | SIReadLock       |    33634 | log_33605
(11 rows)

After repack operation, locks were -
SELECT locktype, mode, relation, c.relname
FROM pg_locks as l
INNER JOIN pg_class as c
  ON l.relation = c.oid;

 locktype |      mode       | relation |          relname

----------+-----------------+----------+------------------------
 relation | AccessShareLock |     2663 | pg_class_relname_nsp_index
 relation | AccessShareLock |     2662 | pg_class_oid_index
 relation | AccessShareLock |     1259 | pg_class
 relation | AccessShareLock |    11069 | pg_locks
(4 rows)

Internal procedures:

pg_repack follows the steps in order to reclaim the unused space with minimal locking

1. Setup workspaces and a trigger.                                                                                      
2. Copy tuples into temp table.                                                                                        
3. Create indexes on temp table.                                                                                        
4. Apply log to temp table until no tuples are left in the log and all of the old transactions are finished.                 When All old transactions are finished, go to next step.
5. Swap - original table with new temp table                                                                                                 6. Drop - temporary tables and triggers and functions
7. Analyze - the new table                                                                                                            

Triggers on table:

pg_repack operation creates trigger on the specified table:

\d t1
      Table "public.t1"
 Column |  Type   | Modifiers
--------+---------+-----------
 i      | integer | not null
 nam    | bytea   |
Indexes:
    "t1_pkey" PRIMARY KEY, btree (i)
Triggers firing always:
    z_repack_trigger BEFORE INSERT OR DELETE OR UPDATE ON t1 FOR EACH ROW EXECUTE PROCEDURE repack.repack_trigger('INSERT INTO repack.log_33648(pk, row) VALUES( CASE WHEN $1 IS NULL THEN NULL ELSE (ROW($1.i)::repack.pk_33648) END, $2)')

References:


http://reorg.github.io/pg_repack/

One Response so far

  1. Thank you for sharing! Glad to find the information.Androdumpper Whatsdog Testdpc

Leave a Reply

Labels