您的位置:首页 > 其它

ORA-04031错误导致宕机案例分析

2015-11-03 15:34 295 查看
今天遇到一起ORACLE数据库宕机案例,下面是对这起数据库宕机案例的原因进行分析、解读。分析过程中顺便记录一下这个案例的前因后果,攒点经验值,培养一下分析、解决问题的能力。案例环境:操作系统:OracleLinuxServerrelease5.764bit数据库版本:OracleDatabase10gRelease10.2.0.4.0-64bitProduction案例分析:收到告警去检查数据库时,发现实例已经宕机。检查告警日志,发现下面错误信息:

ORA-00604:erroroccurredatrecursiveSQLlevel1
ORA-04031:unabletoallocate32bytesofsharedmemory("sharedpool","selectcount(*)fromsys.job...","sqlarea","tmp")
MonNov211:43:002015
Errorsinfile/u01/app/oracle/admin/SCM2/bdump/scm2_cjq0_6571.trc:
ORA-00604:erroroccurredatrecursiveSQLlevel1
ORA-04031:unabletoallocate32bytesofsharedmemory("sharedpool","selectjob,nvl2(last_date,...","sqlarea","tmp")
MonNov211:43:002015
Errorsinfile/u01/app/oracle/admin/SCM2/bdump/scm2_cjq0_6571.trc:
ORA-00604:erroroccurredatrecursiveSQLlevel1
ORA-04031:unabletoallocate32bytesofsharedmemory("sharedpool","selectcount(*)fromsys.job...","sqlarea","tmp")
MonNov211:43:052015
Errorsinfile/u01/app/oracle/admin/SCM2/bdump/scm2_cjq0_6571.trc:
ORA-00604:erroroccurredatrecursiveSQLlevel1
ORA-04031:unabletoallocate32bytesofsharedmemory("sharedpool","selectjob,nvl2(last_date,...","sqlarea","tmp")
MonNov211:43:052015
Errorsinfile/u01/app/oracle/admin/SCM2/bdump/scm2_cjq0_6571.trc:
ORA-00604:erroroccurredatrecursiveSQLlevel1
ORA-04031:unabletoallocate32bytesofsharedmemory("sharedpool","selectcount(*)fromsys.job...","sqlarea","tmp")
MonNov211:43:082015
Errorsinfile/u01/app/oracle/admin/SCM2/bdump/scm2_reco_6569.trc:
ORA-04031:unabletoallocate32bytesofsharedmemory("sharedpool","selecthost,userid,password,...","sqlarea","tmp")
MonNov211:43:082015
RECO:terminatinginstanceduetoerror4031
MonNov211:43:082015
Errorsinfile/u01/app/oracle/admin/SCM2/bdump/scm2_pmon_6555.trc:
ORA-04031:unabletoallocatebytesofsharedmemory("","","","")
InstanceterminatedbyRECO,pid=6569




从告警日志我们可以看到ORA-00604与ORA-04031错误导致了这次宕机事故(RECO:terminatinginstanceduetoerror4031):
$oerrora4031
04031,00000,"unabletoallocate%sbytesofsharedmemory(\"%s\",\"%s\",\"%s\",\"%s\")"
//*Cause:Moresharedmemoryisneededthanwasallocatedintheshared
//pool.
//*Action:Ifthesharedpoolisoutofmemory,eitherusethe
//dbms_shared_poolpackagetopinlargepackages,
//reduceyouruseofsharedmemory,orincreasetheamountof
//availablesharedmemorybyincreasingthevalueofthe
//INIT.ORAparameters"shared_pool_reserved_size"and
//"shared_pool_size".
//Ifthelargepoolisoutofmemory,increasetheINIT.ORA
//parameter"large_pool_size".
一般出现ORA-04031错误可能由两个原因引起:
1:内存中存在大量碎片,导致在分配内存的时候,没有连续的内存可存放,此问题一般是需要在开发的角度上入手,比如增加绑定变量,减少硬解析来改善和避免;
2.内存容量不足,需要扩大内存。
这台机器分配的物理内存为8G,结果检查发现SGA只分配了1168M,不到2G,瞬时碉堡了。此时真是很无语。ASHReport分析宕机前后的BufferCache和SharedPool大小如下所示。








查看跟踪文件,可以看到SGA:allocationforcingcomponentgrowth等待事件,可以确认的是由于SGA无法增长导致,也就是SGA被撑爆了,结合ASHReport我们可以看到当时SharedPool的大小已经接近SGA的69.6%大小。

SO:0xa617d9c0,type:4,owner:0xa8a26c68,flag:INIT/-/-/0x00
(session)sid:932trans:(nil),creator:0xa8a26c68,flag:(51)USR/-BSY/-/-/-/-/-
DID:0001-000A-00000003,short-termDID:0000-0000-00000000
txnbranch:(nil)
oct:0,prv:0,sql:(nil),psql:(nil),user:0/SYS
lastwaitfor'SGA:allocationforcingcomponentgrowth'blockingsess=0x(nil)seq=51324wait_time=10714secondssincewaitstarted=0
=0,=0,=0
DumpingSessionWaitHistory
for'SGA:allocationforcingcomponentgrowth'count=1wait_time=10714
=0,=0,=0
for'SGA:allocationforcingcomponentgrowth'count=1wait_time=10512
=0,=0,=0
for'latch:sharedpool'count=1wait_time=892
address=600e7320,number=d6,tries=0
for'latch:sharedpool'count=1wait_time=28
address=600e7320,number=d6,tries=0
for'latch:sharedpool'count=1wait_time=51
address=600e7320,number=d6,tries=0
for'latch:sharedpool'count=1wait_time=114
address=600e7320,number=d6,tries=0
for'latch:sharedpool'count=1wait_time=120
address=600e7320,number=d6,tries=0
for'latch:librarycache'count=1wait_time=33
address=a3fa46e8,number=d7,tries=1




结合上面的一些分析,可以断定SGA的不合理设置导致sharedpool的内存被全部耗尽,SGA被撑爆了。于是调整SGA的参数才是解决问题的正确对策。另外考虑到这个数据库也正常运行了较长一段时间,也分析了一下awr、addm报告,发现系统的硬解析相当严重。另外通过下面脚本观察了一段时间sharedpool的变化,发现其收缩、增长较频繁。

SELECTstart_time,
component,
oper_type,
oper_mode,
initial_size/1024/1024"INITIAL",
final_size/1024/1024"FINAL",
end_time
FROMv$sga_resize_ops
WHEREcomponentIN('DEFAULTbuffercache','sharedpool')
ANDstatus='COMPLETE'
ORDERBYstart_time,
component;

这个可以通过设置数据库参数SHARED_POOL_SIZE,保证SHARED_POOL_SIZE大小不会由于内存紧张而低于这个大小,另外可以设置SGAresize的时间间隔

ALTERSYSTEMSET“_memory_broker_stat_interval”=nSCOPE=SPFILE;

问题虽然解决了,但是真正需要反思的是为什么这个SGA_MAX_SIZE设置为1168M大小的事情!而且没有在巡检当中被发现。
参考资料:
http://blog.csdn.net/wenzhongyan/article/details/29866845
http://blog.chinaunix.net/uid-20802110-id-4188357.html
http://www.oraclefreebase.com/blog/2015/10/%E6%95%B0%E6%8D%AE%E5%BA%93ora-4031%E6%95%B0%E6%8D%AE%E5%BA%93crash/
内容来自用户分享和网络整理,不保证内容的准确性,如有侵权内容,可联系管理员处理 点击这里给我发消息
标签: