Monday, February 10, 2014

Understanding XID Wrap around

0 comments

A few days back, I faced a disaster with one of the databases with xid wrap around issue. I started to dig into the details about the actual case for the failure. There are some very good articles about xid mechanism. I added those as references below.

However, I added my test result here.

Created a brand new cluster:

initdb
   
postgres=# select datname, datfrozenxid from pg_database;
  datname  | datfrozenxid
-----------+--------------
 template1 |         1798
 template0 |         1798
 postgres  |         1798
(3 rows)
   

postgres=# select txid_current();
 txid_current
--------------
         1809
(1 row)

postgres=# \c testdb
You are now connected to database "testdb" as user "postgres92".
testdb=# create table t ( i int);
CREATE TABLE
testdb=# insert into t values (1);
INSERT 0 1
testdb=# select txid_current();
 txid_current
--------------
         1812
(1 row)


testdb=# select datname, datfrozenxid from pg_database;
  datname  | datfrozenxid
-----------+--------------
 template1 |         1798
 template0 |         1798
 postgres  |         1798
 testdb    |         1798
(4 rows)


testdb=# select txid_current();
 txid_current
--------------
         1854
(1 row)


Identified in the source:


In postgresql-9.2.4/src/backend/access/varsup.c,
I found functions:
 TransactionId GetNewTransactionId(bool isSubXact)
 TransactionId ReadNewTransactionId(void)
 void7 SetTransactionIdLimit(TransactionId oldest_datfrozenxid, Oid oldest_datoid)

After every start of the database and after every execution of autovacuum process, it calls "SetTransactionIdLimit" to set the transaction xid limit.
It follows the following formula:

xidVacLimit = oldest_datfrozenxid + autovacuum_freeze_max_age;
xidWarnLimit = xidStopLimit - 10000000; // (10M)
xidStopLimit = xidWrapLimit - 1000000; //(1M)
xidWrapLimit = oldest_datfrozenxid + (MaxTransactionId >> 1);

When db xid crosses xidVacLimit, it startes autovacuum process.
When db xid crosses xidWarnLimit, it warns us.
When db xid crosses xidStopLimit, it stops to generate any new xid with alerts.



In our case:


oldest_datfrozenxid = 1798
MaxTransactionId = 4294967295

xidWrapLimit = oldest_datfrozenxid + (MaxTransactionId >> 1);
MaxTransactionId >> 1 = 2147483647
xidWrapLimit = 2147485445

xidStopLimit = xidWrapLimit - 1000000;  (1M)
xidStopLimit = 2146485445


// We'll start complaining loudly when we get within 10M transactions of the stop point.
xidWarnLimit = xidStopLimit - 10000000; (10M)
xidWarnLimit = 2136485445

//We'll start trying to force autovacuums when oldest_datfrozenxid gets to be more than autovacuum_freeze_max_age transactions old.
xidVacLimit = oldest_datfrozenxid + autovacuum_freeze_max_age;
xidVacLimit = 1798 + 100000000 (100M)


So, I modfieid the values as following to get the alerts:
printf("\nRumman: SetTransactionId: assign\n")
xidVacLimit = 1900;
xidWarnLimit = 1910;
xidStopLimit = 2000;
xidWrapLimit=2500;

So when I started the db, I get the followings in log:   
  
Rumman: SetTransactionId: assign

Rumman: SetTransactionID() : xidVacLimit = 1900
Rumman: SetTransactionID() : xidWarnLimit = 1910
Rumman: SetTransactionID() : xidStopLimit = 2000
Rumman: SetTransactionID() : xidWrapLimit = 2500
Rumman: SetTransactionId: curXid = 1903
Rumman: SetTransactionID: passed xidVacLimit

Then I increased the txid using txid_current() and crossed the value 1910 and got:
Rumman: getTransactionId: passed vacuum limit
WARNING:  database "testdb" must be vacuumed within 590 transactions
HINT:  To avoid a database shutdown, execute a database-wide VACUUM in that database.
    You might also need to commit or roll back old prepared transactions.
WARNING:  database "testdb" must be vacuumed within 589 transactions
HINT:  To avoid a database shutdown, execute a database-wide VACUUM in that database.
    You might also need to commit or roll back old prepared transactions.
WARNING:  database "postgres" must be vacuumed within 588 transactions
HINT:  To avoid a database shutdown, execute a database-wide VACUUM in that database.
    You might also need to commit or roll back old prepared transactions.


No, I modfied the values for xidStopLimit and xidWrapLimit to maximum and run the db:
Rumman: getTransactionId: passed vacuum limit

Rumman: SetTransactionId: assign

Rumman: SetTransactionID() : xidVacLimit = 1900
Rumman: SetTransactionID() : xidWarnLimit = 1910

Rumman: SetTransactionId: curXid = 1912
Rumman: SetTransactionID: passed xidVacLimit

Rumman: Set TransactionID (): passed xidWarnLimit
In psql console:

postgres=# \c testdb
You are now connected to database "testdb" as user "postgres92".
testdb=# select datname, datfrozenxid from pg_database;
  datname  | datfrozenxid
-----------+--------------
 template1 |         1798
 template0 |         1798
 postgres  |         1798
 testdb    |         1798
(4 rows)


No exeucted Vacuum Full and the xid changed:
VACUUM FULL;

testdb=# select datname, datfrozenxid from pg_database;
  datname  | datfrozenxid
-----------+--------------
 template1 |         1798
 template0 |         1798
 postgres  |         1798
 testdb    |         1905
(4 rows)

testdb=# insert into t values (1);
ERROR:  database is not accepting commands to avoid wraparound data loss in database "postgres"
HINT:  Stop the postmaster and use a standalone backend to vacuum that database.
You might also need to commit or roll back old prepared transactions.

Then, modified the source of varsup.c with its original and run the db and it worked fine.

So, I got into the result:
xidVacLimit = oldest_datfrozenxid + autovacuum_freeze_max_age
We should always take care of 
"datfrozenxid + autovacuum_freeze_max_age" 
in order to prevent xid wrap around issue;
Here,
datfrozenxid - is the column of a database's pg_database row is a lower bound on the normal XIDs appearing in that database — it is just the minimum of the per-table relfrozenxid values within the database.

And relfrozenxid is the column of a table's pg_class row contains the freeze cutoff XID that was used by the last whole-table VACUUM for that table.  All normal XIDs older than this cutoff XID are guaranteed to have been replaced by FrozenXID within the table.

References:

http://www.postgresql.org/docs/9.0/static/routine-vacuuming.html



http://www.databasesoup.com/2012/10/freezing-your-tuples-off-part-2.html


Leave a Reply

Labels