您的位置:首页 > 其它

处理琐表——这是日常工作的10%的工作

2005-09-20 08:05 309 查看
业务人员反映说业务拥堵的利害,于是看看等待事件,发现大量的顽固enqueue,分别在RAC的两个节点:
INST_ID SID EVENT P1 P2
---------- ---------- ---------------------------------------------------------------- ---------- ----------
2 1 pmon timer 300 0
2 167 enqueue 1415053318 720913
2 486 enqueue 1415053318 720913
2 1168 enqueue 1415053318 720913
2 1228 enqueue 1415053318 720913
2 1044 enqueue 1415053318 720913
2 836 enqueue 1415053318 720913
2 1259 enqueue 1415053318 720913
2 1301 enqueue 1415053318 720913
2 2 async disk IO 4294967295 0
2 4 ges remote message 32 0
2 5 gcs remote message 64 0
2 7 gcs remote message 64 0
2 806 db file sequential read 20 350884
2 33 global cache cr request 12 132456
2 19 smon timer 300 0
2 1043 row cache lock 13 0
2 368 PX Deq: reap credit 0 0
1 1 pmon timer 300 0
1 218 enqueue 1415053318 720913
1 1245 enqueue 1415053318 720913
1 1085 enqueue 1415053318 720913
1 1077 enqueue 1415053318 720913
1 906 enqueue 1415053318 720913
1 620 enqueue 1415053318 720913
1 496 enqueue 1415053318 720913
1 469 enqueue 1415053318 720913
1 2 async disk IO 4294967295 0
1 4 ges remote message 32 0
1 3 CGS wait for IPC msg 0 0
1 5 gcs remote message 64 0
1 7 gcs remote message 64 0
1 488 buffer busy waits 2 25
1 124 global cache cr request 12 132547
1 872 global cache cr request 12 51407
1 19 smon timer 300 0
1 43 PX Deq: reap credit 0 0
1 634 PX Deq: Execute Reply 200 1
。。。
这些信息还很难判断是哪个节点的那个SESSION阻塞了其他节点的其他会话,于是看所得情况:
然后我们再看看GV$LOCK表:
INST_ID ADDR KADDR SID TY ID1 ID2 LMODE REQUEST CTIME BLOCK
---------- ---------------- ---------------- ---------- -- ---------- ---------- ---------- ---------- ---------- ----------
2 C000000175493968 C000000175493AE0 1043 TX 720913 4688389 6 0 53380 1
Elapsed: 00:00:00.29
这里我们找到了,原来是节点2的会话1043阻塞了其他人,他以TX模式持有锁。
其他相关的锁信息:
INST_ID ADDR KADDR SID TY ID1 ID2 LMODE REQUEST CTIME BLOCK
---------- ---------------- ---------------- ---------- -- ---------- ---------- ---------- ---------- ---------- ----------
2 C00000015F124490 C00000015F1244B0 1301 TX 720913 4688389 0 6 22277 0
2 C00000015F1233E8 C00000015F123408 1259 TX 720913 4688389 0 6 9764 0
2 C00000015F123E18 C00000015F123E38 1228 TX 720913 4688389 0 6 22258 0
2 C00000015F1240C0 C00000015F1240E0 1168 TX 720913 4688389 0 6 22233 0
2 C00000015F124518 C00000015F124538 1044 TX 720913 4688389 0 6 22271 0
2 C000000175493968 C000000175493AE0 1043 TX 720913 4688389 6 0 53418 1
2 C00000015F1242F8 C00000015F124318 836 TX 720913 4688389 0 6 22282 0
2 C00000015F10D980 C00000015F10D9A0 486 TX 720913 4688389 0 6 22295 0
2 C00000015F123470 C00000015F123490 167 TX 720913 4688389 0 6 9579 0
1 C0000001601237A0 C0000001601237C0 1245 TX 720913 4688389 0 6 22241 0
1 C000000160123470 C000000160123490 1085 TX 720913 4688389 0 6 22295 0
1 C00000016010D980 C00000016010D9A0 1077 TX 720913 4688389 0 6 22328 0
1 C0000001601239D8 C0000001601239F8 906 TX 720913 4688389 0 6 7106 0
1 C0000001601232D8 C0000001601232F8 620 TX 720913 4688389 0 6 22310 0
1 C000000160123828 C000000160123848 496 TX 720913 4688389 0 6 6353 0
1 C000000160123608 C000000160123628 469 TX 720913 4688389 0 6 22274 0
1 C00000016010D8F8 C00000016010D918 218 TX 720913 4688389 0 6 20691 0

记录下来阻塞会话的所有信息和正在执行的sql之后,尝试在SQLPLUS中kill这个会话:
alter system kill session '1043,57526';
但是几分钟过去了,这个会话一直hang着。
于是看了一下是否有活动事务:
SQL> select * from v$transaction;
ADDR XIDUSN XIDSLOT XIDSQN UBAFIL UBABLK UBASQN UBAREC STATUS START_TIME
---------------- ---------- ---------- ---------- ---------- ---------- ---------- ---------- ---------------- --------------------
START_SCNB START_SCNW START_UEXT START_UBAFIL START_UBABLK START_UBASQN START_UBAREC SES_ADDR FLAG SPA REC NOU PTX
---------- ---------- ---------- ------------ ------------ ------------ ------------ ---------------- ---------- --- --- --- ---
NAME
------------------------------------------------------------------------------------------------------------------------------------
PRV_XIDUSN PRV_XIDSLT PRV_XIDSQN PTX_XIDUSN PTX_XIDSLT PTX_XIDSQN DSCN-B DSCN-W USED_UBLK USED_UREC LOG_IO PHY_IO
---------- ---------- ---------- ---------- ---------- ---------- ---------- ---------- ---------- ---------- ---------- ----------
CR_GET CR_CHANGE
---------- ----------
C000000175493968 11 17 4688389 71 18612 6281 1 ACTIVE 09/16/05 01:17:24
1649596707 261 105 401 11839 6203 40 C00000015DDE5F18 67116675 NO NO NO NO
0 0 0 0 0 0 0 0 54664 3640075 45491178 66836
1200506898 2

Elapsed: 00:00:00.07
SQL>

好家伙,有个150M左右的大事务正在处理。至此,我基本猜到原因了,由于这个事务需要实行的时间太长,而同时又是一个需要并发操作的业务,因此,造成了拥堵。
业务还是在拥堵,于是尝试利用前面记录下来的信息,在OS系统上直接kill进程:
kill 20166
之后,这个SESSION是立刻不见了:
SQL> select sid,status from v$session where sid=1043;
no rows selected
Elapsed: 00:00:00.01
SQL>
业务拥堵也缓解了很多,可是我们看到等待事件里面的enqueue基本都变成了transaction,在等待和不断的查看业务一段时间后,也切恢复正常了。
内容来自用户分享和网络整理,不保证内容的准确性,如有侵权内容,可联系管理员处理 点击这里给我发消息
标签: