您的位置:首页 > 数据库

pg_clog的原子操作与pg_subtrans(子事务)

2015-10-14 20:41 288 查看


Postgres2015全国用户大会将于11月20至21日在北京丽亭华苑酒店召开。本次大会嘉宾阵容强大,国内顶级PostgreSQL数据库专家将悉数到场,并特邀欧洲、俄罗斯、日本、美国等国家和地区的数据库方面专家助阵:
Postgres-XC项目的发起人铃木市一(SUZUKI Koichi)
Postgres-XL的项目发起人Mason Sharp
pgpool的作者石井达夫(Tatsuo Ishii)
PG-Strom的作者海外浩平(Kaigai Kohei)
Greenplum研发总监姚延栋
周正中(德哥), PostgreSQL中国用户会创始人之一
汪洋,平安科技数据库技术部经理
……

 
2015年度PG大象会报名地址:http://postgres2015.eventdove.com/PostgreSQL中国社区: http://postgres.cn/PostgreSQL专业1群: 3336901(已满)PostgreSQL专业2群: 100910388PostgreSQL专业3群: 150657323


如果没有子事务,其实很容易保证pg_clog的原子操作,但是,如果加入了子事务并为子事务分配了XID,并且某些子事务XID和父事务的XID不在同一个CLOG PAGE时,保证事务一致性就涉及CLOG的原子写了。

PostgreSQL是通过2PC来实现CLOG的原子写的。

1. 首先将主事务以外的CLOG PAGE中的子事务设置为sub-committed状态。

2. 然后将主事务所在的CLOG PAGE中的子事务设置为sub-committed,同时设置主事务为committed状态,将同页的子事务设置为committed状态。

3. 将其他CLOG PAGE中的子事务设置为committed状态。

代码如下:

src/backend/access/transam/clog.c

/* * TransactionIdSetTreeStatus * * Record the final state of transaction entries in the commit log for * a transaction and its subtransaction tree. Take care to ensure this is * efficient, and as atomic as possible. * * xid is a single xid to set status for. This will typically be * the top level transactionid for a top level commit or abort. It can * also be a subtransaction when we record transaction aborts. * * subxids is an array of xids of length nsubxids, representing subtransactions * in the tree of xid. In various cases nsubxids may be zero. * * lsn must be the WAL location of the commit record when recording an async * commit.  For a synchronous commit it can be InvalidXLogRecPtr, since the * caller guarantees the commit record is already flushed in that case.  It * should be InvalidXLogRecPtr for abort cases, too. * * In the commit case, atomicity is limited by whether all the subxids are in * the same CLOG page as xid.  If they all are, then the lock will be grabbed * only once, and the status will be set to committed directly.  Otherwise * we must *       1. set sub-committed all subxids that are not on the same page as the *              main xid *       2. atomically set committed the main xid and the subxids on the same page *       3. go over the first bunch again and set them committed * Note that as far as concurrent checkers are concerned, main transaction * commit as a whole is still atomic. * * Example: *              TransactionId t commits and has subxids t1, t2, t3, t4 *              t is on page p1, t1 is also on p1, t2 and t3 are on p2, t4 is on p3 *              1. update pages2-3: *                                      page2: set t2,t3 as sub-committed *                                      page3: set t4 as sub-committed *              2. update page1: *                                      set t1 as sub-committed, *                                      then set t as committed,                                        then set t1 as committed *              3. update pages2-3: *                                      page2: set t2,t3 as committed *                                      page3: set t4 as committed * * NB: this is a low-level routine and is NOT the preferred entry point * for most uses; functions in transam.c are the intended callers. * * XXX Think about issuing FADVISE_WILLNEED on pages that we will need, * but aren't yet in cache, as well as hinting pages not to fall out of * cache yet. */

实际调用的入口代码在transam.c。subtrans.c中是一些低级接口。

那么什么是subtrans?

当我们使用savepoint时,会产生子事务。子事务和父事务一样,可能消耗XID。一旦为子事务分配了XID,那么就涉及CLOG的原子操作了。因为要保证父事务和所有的子事务的CLOG一致性。

当不消耗XID时,需要通过SubTransactionId来区分子事务。

src/backend/access/transam/README

Transaction and Subtransaction Numbering----------------------------------------事务和子事务都可以有XID,子事务和事务一样,在真正需要XID的时候才会分配XID,也就是说,一个事务,如果它有子事务,可能消耗多个XID。另外需要注意,如果子事务要分配XID,必须先给它的父事务分配一个XID,才能给子事务分配XID,因为要确保子事务的XID是在父事务后分配的。Transactions and subtransactions are assigned permanent XIDs only when/ifthey first do something that requires one --- typically, insert/update/deletea tuple, though there are a few other places that need an XID assigned.If a subtransaction requires an XID, we always first assign one to itsparent.  This maintains the invariant that child transactions have XIDs laterthan their parents, which is assumed in a number of places.
The subsidiary actions of obtaining a lock on the XID and entering it intopg_subtrans and PG_PROC are done at the time it is assigned.
A transaction that has no XID still needs to be identified for variouspurposes, notably holding locks.  For this purpose we assign a "virtualtransaction ID" or VXID to each top-level transaction.  VXIDs are formed fromtwo fields, the backendID and a backend-local counter; this arrangement allowsassignment of a new VXID at transaction start without any contention forshared memory.  To ensure that a VXID isn't re-used too soon after backendexit, we store the last local counter value into shared memory at backendexit, and initialize it from the previous value for the same backendID slotat backend start.  All these counters go back to zero at shared memoryre-initialization, but that's OK because VXIDs never appear anywhere on-disk.
子事务没有分配事务号时,如何区分各个子事务呢?这里用到了SubTransactionId数据类型,从父事务开始SubTransactionId=1,后面的子事务递增。SubTransactionId是uint32的类型。Internally, a backend needs a way to identify subtransactions whether or notthey have XIDs; but this need only lasts as long as the parent top transactionendures.  Therefore, we have SubTransactionId, which is somewhat likeCommandId in that it's generated from a counter that we reset at the start ofeach top transaction.  The top-level transaction itself has SubTransactionId 1,and subtransactions have IDs 2 and up.  (Zero is reserved forInvalidSubTransactionId.)  Note that subtransactions do not have theirown VXIDs; they use the parent top transaction's VXID.

因为一个子事务要消耗4个字节,而且主事务默认会分配一个子事务号,所以和CLOG每事务消耗2BIT相比,pg_subtrans中会产生更多的文件。

另外需要注意的是,子事务不一定会分配事务号,所以对于未分配事务号的子事务,在CLOG中是没有记录的。而在pg_subtrans中一定有记录并占空间。

src/backend/access/transam/subtrans.c

/* * Defines for SubTrans page sizes.  A page is the same BLCKSZ as is used * everywhere else in Postgres. * * Note: because TransactionIds are 32 bits and wrap around at 0xFFFFFFFF, * SubTrans page numbering also wraps around at * 0xFFFFFFFF/SUBTRANS_XACTS_PER_PAGE, and segment numbering at * 0xFFFFFFFF/SUBTRANS_XACTS_PER_PAGE/SLRU_PAGES_PER_SEGMENT.  We need take no * explicit notice of that fact in this module, except when comparing segment * and page numbers in TruncateSUBTRANS (see SubTransPagePrecedes). */
/* We need four bytes per xact */#define SUBTRANS_XACTS_PER_PAGE (BLCKSZ / sizeof(TransactionId))
#define TransactionIdToPage(xid) ((xid) / (TransactionId) SUBTRANS_XACTS_PER_PAGE)#define TransactionIdToEntry(xid) ((xid) % (TransactionId) SUBTRANS_XACTS_PER_PAGE)

验证:

postgres@digoal-> psqlpsql (9.4.4)Type "help" for help.postgres=# select pg_backend_pid(); pg_backend_pid ----------------           5749(1 row)

跟踪:

[root@digoal ~]# cat trc.stp global f_start[999999]
probe process("/opt/pgsql/bin/postgres").function("*@/opt/soft_bak/postgresql-9.4.4/src/backend/access/transam/subtrans.c").call {    f_start[execname(), pid(), tid(), cpu()] = gettimeofday_ms()   printf("%s -> time:%d, pp:%s, par:%s\n", thread_indent(-1), gettimeofday_ms(), pp(), $$parms$$)   # printf("%s -> time:%d, pp:%s\n", thread_indent(1), f_start[execname(), pid(), tid(), cpu()], pp() )}
probe process("/opt/pgsql/bin/postgres").function("*@/opt/soft_bak/postgresql-9.4.4/src/backend/access/transam/subtrans.c").return {  t=gettimeofday_ms()  a=execname()  b=cpu()  c=pid()  d=pp()  e=tid()  if (f_start[a,c,e,b]) {  printf("%s <- time:%d, pp:%s, par:%s\n", thread_indent(-1), t - f_start[a,c,e,b], d, $$locals$$)  # printf("%s <- time:%d, pp:%s\n", thread_indent(-1), t - f_start[a,c,e,b], d)  }}

执行如下SQL:

postgres@digoal-> psqlpsql (9.4.4)Type "help" for help.postgres=# begin;  // 主事务开始,但是不分配事务号。BEGINpostgres=# select txid_current();  // 主事务调用DML函数,分配一个事务号。 txid_current --------------    607466850(1 row)postgres=# savepoint a;  // 开启子事务,但是不分配事务号,父事务号为607466850SAVEPOINTpostgres=# \dt        List of relations Schema | Name | Type  |  Owner   --------+------+-------+---------- public | t    | table | postgres public | test | table | postgres(2 rows)postgres=# delete from t;  // 子事务中调用DML,分配事务号607466851DELETE 2postgres=# rollback to a;  //  回滚子事务,创建新的子事务,但是不分配事务号,父事务号为607466850ROLLBACKpostgres=# delete from t;  // 子事务中调用DML,分配事务号607466852DELETE 2postgres=# rollback to a;  //  回滚子事务,创建新的子事务,但是不分配事务号,父事务号为607466850ROLLBACKpostgres=# delete from t; // 子事务中调用DML,分配事务号607466853DELETE 2postgres=# rollback to a;  //  回滚子事务,创建新的子事务,但是不分配事务号,父事务号为607466850ROLLBACKpostgres=# insert into t values (1);    // 子事务中调用DML,分配事务号607466854INSERT 0 1postgres=# insert into t values (1);INSERT 0 1postgres=# insert into t values (1);INSERT 0 1postgres=# savepoint b;   // 开启子事务,但是不分配事务号,父事务号为607466854SAVEPOINTpostgres=# insert into t values (1);   // 子事务中调用DML,分配事务号607466855INSERT 0 1postgres=# insert into t values (1);INSERT 0 1postgres=# savepoint c;  // 开启子事务,但是不分配事务号,父事务号为607466855SAVEPOINTpostgres=# insert into t values (1);   // 子事务中调用DML,分配事务号607466856INSERT 0 1postgres=# savepoint d;  // 开启子事务,但是不分配事务号,父事务号为607466856SAVEPOINTpostgres=# insert into t values (1);   // 子事务中调用DML,分配事务号607466857INSERT 0 1postgres=# rollback to a;  //  回滚子事务,创建新的子事务,但是不分配事务号,父事务号为607466850ROLLBACKpostgres=# insert into t values (1);   // 子事务中调用DML,分配事务号607466858INSERT 0 1postgres=# select txid_current();   // 查看主事务的事务号 txid_current --------------    607466850(1 row)

跟踪结果

[root@digoal ~]# stap -vp 5 -DMAXSKIPPED=9999999 -DSTP_NO_OVERLOAD -DMAXTRYLOCK=100 ./trc.stp -x 5749Pass 1: parsed user script and 112 library script(s) using 209284virt/36876res/3172shr/34504data kb, in 110usr/90sys/192real ms.Pass 2: analyzed script: 36 probe(s), 33 function(s), 4 embed(s), 27 global(s) using 223660virt/51416res/4248shr/48880data kb, in 0usr/130sys/134real ms.Pass 3: using cached /root/.systemtap/cache/28/stap_282339931bbfe754a24af75ea3476930_35559.cPass 4: using cached /root/.systemtap/cache/28/stap_282339931bbfe754a24af75ea3476930_35559.koPass 5: starting run.     0 postgres(5749): -> time:1441519748850, pp:process("/opt/pgsql9.4.4/bin/postgres").function("ExtendSUBTRANS@/opt/soft_bak/postgresql-9.4.4/src/backend/access/transam/subtrans.c:307").call, par:newestXact=607466848    22 postgres(5749): <- time:0, pp:process("/opt/pgsql9.4.4/bin/postgres").function("ExtendSUBTRANS@/opt/soft_bak/postgresql-9.4.4/src/backend/access/transam/subtrans.c:307").return, par:pageno=?20726607 postgres(5749): -> time:1441519769576, pp:process("/opt/pgsql9.4.4/bin/postgres").function("ExtendSUBTRANS@/opt/soft_bak/postgresql-9.4.4/src/backend/access/transam/subtrans.c:307").call, par:newestXact=60746684920726671 postgres(5749): <- time:0, pp:process("/opt/pgsql9.4.4/bin/postgres").function("ExtendSUBTRANS@/opt/soft_bak/postgresql-9.4.4/src/backend/access/transam/subtrans.c:307").return, par:pageno=?69692931 postgres(5749): -> time:1441519818543, pp:process("/opt/pgsql9.4.4/bin/postgres").function("ExtendSUBTRANS@/opt/soft_bak/postgresql-9.4.4/src/backend/access/transam/subtrans.c:307").call, par:newestXact=60746685069692991 postgres(5749): <- time:0, pp:process("/opt/pgsql9.4.4/bin/postgres").function("ExtendSUBTRANS@/opt/soft_bak/postgresql-9.4.4/src/backend/access/transam/subtrans.c:307").return, par:pageno=?85924642 postgres(5749): -> time:1441519834774, pp:process("/opt/pgsql9.4.4/bin/postgres").function("ExtendSUBTRANS@/opt/soft_bak/postgresql-9.4.4/src/backend/access/transam/subtrans.c:307").call, par:newestXact=60746685185924720 postgres(5749): <- time:0, pp:process("/opt/pgsql9.4.4/bin/postgres").function("ExtendSUBTRANS@/opt/soft_bak/postgresql-9.4.4/src/backend/access/transam/subtrans.c:307").return, par:pageno=?85924766 postgres(5749): -> time:1441519834774, pp:process("/opt/pgsql9.4.4/bin/postgres").function("SubTransSetParent@/opt/soft_bak/postgresql-9.4.4/src/backend/access/transam/subtrans.c:75").call, par:xid=607466851 parent=607466850 overwriteOK='\000'85924838 postgres(5749): <- time:1, pp:process("/opt/pgsql9.4.4/bin/postgres").function("SubTransSetParent@/opt/soft_bak/postgresql-9.4.4/src/backend/access/transam/subtrans.c:75").return, par:pageno=? entryno=? slotno=607466851 ptr=0102973659 postgres(5749): -> time:1441519851823, pp:process("/opt/pgsql9.4.4/bin/postgres").function("ExtendSUBTRANS@/opt/soft_bak/postgresql-9.4.4/src/backend/access/transam/subtrans.c:307").call, par:newestXact=607466852102973718 postgres(5749): <- time:0, pp:process("/opt/pgsql9.4.4/bin/postgres").function("ExtendSUBTRANS@/opt/soft_bak/postgresql-9.4.4/src/backend/access/transam/subtrans.c:307").return, par:pageno=?102973746 postgres(5749): -> time:1441519851823, pp:process("/opt/pgsql9.4.4/bin/postgres").function("SubTransSetParent@/opt/soft_bak/postgresql-9.4.4/src/backend/access/transam/subtrans.c:75").call, par:xid=607466852 parent=607466850 overwriteOK='\000'102973782 postgres(5749): <- time:0, pp:process("/opt/pgsql9.4.4/bin/postgres").function("SubTransSetParent@/opt/soft_bak/postgresql-9.4.4/src/backend/access/transam/subtrans.c:75").return, par:pageno=? entryno=? slotno=607466852 ptr=0112206905 postgres(5749): -> time:1441519861057, pp:process("/opt/pgsql9.4.4/bin/postgres").function("ExtendSUBTRANS@/opt/soft_bak/postgresql-9.4.4/src/backend/access/transam/subtrans.c:307").call, par:newestXact=607466853112206964 postgres(5749): <- time:0, pp:process("/opt/pgsql9.4.4/bin/postgres").function("ExtendSUBTRANS@/opt/soft_bak/postgresql-9.4.4/src/backend/access/transam/subtrans.c:307").return, par:pageno=?112206992 postgres(5749): -> time:1441519861057, pp:process("/opt/pgsql9.4.4/bin/postgres").function("SubTransSetParent@/opt/soft_bak/postgresql-9.4.4/src/backend/access/transam/subtrans.c:75").call, par:xid=607466853 parent=607466850 overwriteOK='\000'112207028 postgres(5749): <- time:0, pp:process("/opt/pgsql9.4.4/bin/postgres").function("SubTransSetParent@/opt/soft_bak/postgresql-9.4.4/src/backend/access/transam/subtrans.c:75").return, par:pageno=? entryno=? slotno=607466853 ptr=0152610154 postgres(5749): -> time:1441519901460, pp:process("/opt/pgsql9.4.4/bin/postgres").function("ExtendSUBTRANS@/opt/soft_bak/postgresql-9.4.4/src/backend/access/transam/subtrans.c:307").call, par:newestXact=607466854152610212 postgres(5749): <- time:0, pp:process("/opt/pgsql9.4.4/bin/postgres").function("ExtendSUBTRANS@/opt/soft_bak/postgresql-9.4.4/src/backend/access/transam/subtrans.c:307").return, par:pageno=?152610238 postgres(5749): -> time:1441519901460, pp:process("/opt/pgsql9.4.4/bin/postgres").function("SubTransSetParent@/opt/soft_bak/postgresql-9.4.4/src/backend/access/transam/subtrans.c:75").call, par:xid=607466854 parent=607466850 overwriteOK='\000'152610275 postgres(5749): <- time:0, pp:process("/opt/pgsql9.4.4/bin/postgres").function("SubTransSetParent@/opt/soft_bak/postgresql-9.4.4/src/backend/access/transam/subtrans.c:75").return, par:pageno=? entryno=? slotno=607466854 ptr=0167139858 postgres(5749): -> time:1441519915990, pp:process("/opt/pgsql9.4.4/bin/postgres").function("ExtendSUBTRANS@/opt/soft_bak/postgresql-9.4.4/src/backend/access/transam/subtrans.c:307").call, par:newestXact=607466855167139929 postgres(5749): <- time:1, pp:process("/opt/pgsql9.4.4/bin/postgres").function("ExtendSUBTRANS@/opt/soft_bak/postgresql-9.4.4/src/backend/access/transam/subtrans.c:307").return, par:pageno=?167139958 postgres(5749): -> time:1441519915990, pp:process("/opt/pgsql9.4.4/bin/postgres").function("SubTransSetParent@/opt/soft_bak/postgresql-9.4.4/src/backend/access/transam/subtrans.c:75").call, par:xid=607466855 parent=607466854 overwriteOK='\000'167139995 postgres(5749): <- time:0, pp:process("/opt/pgsql9.4.4/bin/postgres").function("SubTransSetParent@/opt/soft_bak/postgresql-9.4.4/src/backend/access/transam/subtrans.c:75").return, par:pageno=? entryno=? slotno=607466855 ptr=0184727823 postgres(5749): -> time:1441519933578, pp:process("/opt/pgsql9.4.4/bin/postgres").function("ExtendSUBTRANS@/opt/soft_bak/postgresql-9.4.4/src/backend/access/transam/subtrans.c:307").call, par:newestXact=607466856184727849 postgres(5749): <- time:0, pp:process("/opt/pgsql9.4.4/bin/postgres").function("ExtendSUBTRANS@/opt/soft_bak/postgresql-9.4.4/src/backend/access/transam/subtrans.c:307").return, par:pageno=?184727859 postgres(5749): -> time:1441519933578, pp:process("/opt/pgsql9.4.4/bin/postgres").function("SubTransSetParent@/opt/soft_bak/postgresql-9.4.4/src/backend/access/transam/subtrans.c:75").call, par:xid=607466856 parent=607466855 overwriteOK='\000'184727872 postgres(5749): <- time:0, pp:process("/opt/pgsql9.4.4/bin/postgres").function("SubTransSetParent@/opt/soft_bak/postgresql-9.4.4/src/backend/access/transam/subtrans.c:75").return, par:pageno=? entryno=? slotno=607466856 ptr=0228240429 postgres(5749): -> time:1441519977090, pp:process("/opt/pgsql9.4.4/bin/postgres").function("ExtendSUBTRANS@/opt/soft_bak/postgresql-9.4.4/src/backend/access/transam/subtrans.c:307").call, par:newestXact=607466857228240493 postgres(5749): <- time:0, pp:process("/opt/pgsql9.4.4/bin/postgres").function("ExtendSUBTRANS@/opt/soft_bak/postgresql-9.4.4/src/backend/access/transam/subtrans.c:307").return, par:pageno=?228240520 postgres(5749): -> time:1441519977090, pp:process("/opt/pgsql9.4.4/bin/postgres").function("SubTransSetParent@/opt/soft_bak/postgresql-9.4.4/src/backend/access/transam/subtrans.c:75").call, par:xid=607466857 parent=607466856 overwriteOK='\000'228240557 postgres(5749): <- time:0, pp:process("/opt/pgsql9.4.4/bin/postgres").function("SubTransSetParent@/opt/soft_bak/postgresql-9.4.4/src/backend/access/transam/subtrans.c:75").return, par:pageno=? entryno=? slotno=607466857 ptr=0316079437 postgres(5749): -> time:1441520064929, pp:process("/opt/pgsql9.4.4/bin/postgres").function("ExtendSUBTRANS@/opt/soft_bak/postgresql-9.4.4/src/backend/access/transam/subtrans.c:307").call, par:newestXact=607466858316079496 postgres(5749): <- time:0, pp:process("/opt/pgsql9.4.4/bin/postgres").function("ExtendSUBTRANS@/opt/soft_bak/postgresql-9.4.4/src/backend/access/transam/subtrans.c:307").return, par:pageno=?316079524 postgres(5749): -> time:1441520064929, pp:process("/opt/pgsql9.4.4/bin/postgres").function("SubTransSetParent@/opt/soft_bak/postgresql-9.4.4/src/backend/access/transam/subtrans.c:75").call, par:xid=607466858 parent=607466850 overwriteOK='\000'316079560 postgres(5749): <- time:0, pp:process("/opt/pgsql9.4.4/bin/postgres").function("SubTransSetParent@/opt/soft_bak/postgresql-9.4.4/src/backend/access/transam/subtrans.c:75").return, par:pageno=? entryno=? slotno=607466858 ptr=0

重新开一个会话,你会发现,子事务也消耗了XID。因为重新分配的XID已经从607466859开始了。

postgres@digoal-> psqlpsql (9.4.4)Type "help" for help.postgres=# select txid_current(); txid_current --------------    607466859(1 row)

[参考]

src/backend/access/transam/clog.c

src/backend/access/transam/subtrans.c

src/backend/access/transam/transam.c

src/backend/access/transam/README

src/include/c.h:typedef uint32 SubTransactionId;
内容来自用户分享和网络整理,不保证内容的准确性,如有侵权内容,可联系管理员处理 点击这里给我发消息