您的位置:首页 > 其它

ora-00600: internal error code, arguments: [KGHLKREM1], [0x838000020], [], [], [], [], [], [], [], [

2012-02-21 16:19 661 查看
author:skate

time:2012/02/22


昨天晚上凌晨12点接到监控短信(dataguard is down),于是登录系统查看原因,

首先查看备库的alertlog文件,查看最近的半小时的log都是如下的信息

........

Tue Feb 21 00:02:03 2012

Use ADRCI or Support Workbench to package the incident.

See Note 411.1 at My Oracle Support for error and packaging details.

Tue Feb 21 00:02:03 2012

Use ADRCI or Support Workbench to package the incident.

See Note 411.1 at My Oracle Support for error and packaging details.

Tue Feb 21 00:02:05 2012

Use ADRCI or Support Workbench to package the incident.

See Note 411.1 at My Oracle Support for error and packaging details.

Tue Feb 21 00:02:05 2012

Use ADRCI or Support Workbench to package the incident.

See Note 411.1 at My Oracle Support for error and packaging details.

Tue Feb 21 00:02:06 2012

Use ADRCI or Support Workbench to package the incident.

See Note 411.1 at My Oracle Support for error and packaging details.

.......

在往前查看alertlog

.....

Mon Feb 20 09:35:59 2012

Archived Log entry 10127 added for thread 1 sequence 11099 ID 0x263e89b dest 1:

Mon Feb 20 10:01:06 2012

RFS[6]: Selected log 13 for thread 1 sequence 11101 dbid 40093083 branch 760555291

Mon Feb 20 10:01:06 2012

Media Recovery Waiting for thread 1 sequence 11101 (in transit)

Recovery of Online Redo Log: Thread 1 Group 13 Seq 11101 Reading mem 0

Mem# 0: /oracle/oradata/skate01/standbyredo13.log

Mon Feb 20 10:01:14 2012

Archived Log entry 10128 added for thread 1 sequence 11100 ID 0x263e89b dest 1:

Mon Feb 20 10:03:58 2012

Errors in file /oracle/app/diag/rdbms/skate01/skate01/trace/skate01_ora_17783.trc (incident=264961):

ora-00600: internal error code, arguments: [KGHLKREM1], [0x838000020], [], [], [], [], [], [], [], [], [], []

Incident details in: /oracle/app/diag/rdbms/skate01/skate01/incident/incdir_264961/skate01_ora_17783_i264961.trc

Use ADRCI or Support Workbench to package the incident.

See Note 411.1 at My Oracle Support for error and packaging details.

Mon Feb 20 10:04:27 2012

Errors in file /oracle/app/diag/rdbms/skate01/skate01/trace/skate01_mmon_1590.trc (incident=264121):

ora-00600: internal error code, arguments: [KGHLKREM1], [0x838000020], [], [], [], [], [], [], [], [], [], []

Incident details in: /oracle/app/diag/rdbms/skate01/skate01/incident/incdir_264121/skate01_mmon_1590_i264121.trc

Use ADRCI or Support Workbench to package the incident.

See Note 411.1 at My Oracle Support for error and packaging details.

Mon Feb 20 10:04:29 2012

Restarting dead background process MMON

Mon Feb 20 10:04:29 2012

MMON started with pid=15, OS id=17808

Mon Feb 20 10:04:29 2012

Dumping diagnostic data in directory=[cdmp_20120220100429], requested by (instance=1, osid=1590 (MMON)), summary=[incident=264121].

Errors in file /oracle/app/diag/rdbms/skate01/skate01/trace/skate01_mmon_17808.trc (incident=264122):

ora-00600: internal error code, arguments: [KGHLKREM1], [0x838000020], [], [], [], [], [], [], [], [], [], []

Incident details in: /oracle/app/diag/rdbms/skate01/skate01/incident/incdir_264122/skate01_mmon_17808_i264122.trc

Use ADRCI or Support Workbench to package the incident.

See Note 411.1 at My Oracle Support for error and packaging details.

Errors in file /oracle/app/diag/rdbms/skate01/skate01/trace/skate01_mmon_17808.trc (incident=264123):

ora-00600: internal error code, arguments: [KGHLKREM1], [0x838000020], [], [], [], [], [], [], [], [], [], []

Incident details in: /oracle/app/diag/rdbms/skate01/skate01/incident/incdir_264123/skate01_mmon_17808_i264123.trc

Dumping diagnostic data in directory=[cdmp_20120220100432], requested by (instance=1, osid=17808 (MMON)), summary=[incident=264122].

Use ADRCI or Support Workbench to package the incident.

See Note 411.1 at My Oracle Support for error and packaging details.

Mon Feb 20 10:04:52 2012

.......

发现在“Mon Feb 20 10:03:58 2012”就已经开始报ora-600错误了。首先看看数据库现在是什么状态,是否正常。

1. $ ps -ef |grep $ORACLE_SID //检查oracle进程是否正常

2. $ netstat -an | grep 1588| wc -l //检查oracle是否有连接

3. 检查os的状态:vmstat,top,iostat



从以上检查,没发现什么异常,想起来20号有项目迁移到这个active备库上,可能和这有原因,于是想登录数据库进一步查证,发现无法登陆,提示错误如下:

[root@skate01 ~]# su - oracle

[oracle@skate01 ~]$ sqlplus "/as sysdba"

SQL*Plus: Release 11.2.0.2.0 Production on Tue Feb 21 00:26:12 2012

Copyright (c) 1982, 2010, Oracle. All rights reserved.

ERROR:

ORA-01075: you are currently logged on

Enter user-name:

ERROR:

ORA-01017: invalid username/password; logon denied

SP2-0157: unable to CONNECT to ORACLE after 3 attempts, exiting SQL*Plus

尝试两次都提示一样的错误,无法登陆,看来数据库服务当掉了,看来只能重启数据库了,ORA-01075的错误一般是磁盘空间不够或审计原因,但我检查我的环境不是这两种原因,所以使用os命令kill进程,使用如下两个命令



1. $ ps -ef |grep $ORACLE_SID|grep -v grep|awk '{print $2}' | xargs kill -9 //kill进程

2. $ ipcs -m | grep oracle | awk '{print $2}' | xargs ipcrm shm //删除掉oracle的共享段

先查看需要kill的进程

[oracle@skate01 ~]$ ps -ef |grep $ORACLE_SID|grep -v grep |grep -v avahi

kill的进程

[oracle@skate01 ~]$ ps -ef |grep $ORACLE_SID|grep -v grep |grep -v avahi |awk '{print $2}' | xargs kill -9

如果只kill掉oracle进程,还是无法登陆oracle



查看删除的共享段

[oracle@skate01 ~]$ ipcs -m | grep oracle | awk '{print $2}'

删除共享段

[oracle@skate01 ~]$ ipcs -m | grep oracle | awk '{print $2}' | xargs ipcrm shm

resource(s) deleted

[oracle@skate01 ~]$

[oracle@skate01 ~]$ ipcs -m | grep oracle | awk '{print $2}'

尝试登录oracle

[oracle@skate01 ~]$ sqlplus "/as sysdba"

SQL*Plus: Release 11.2.0.2.0 Production on Tue Feb 21 00:47:36 2012

Copyright (c) 1982, 2010, Oracle. All rights reserved.

Connected to an idle instance.

SQL> startup nomount;

ORACLE instance started.

Total System Global Area 3.5275E+10 bytes

Fixed Size 2233656 bytes

Variable Size 3623881416 bytes

Database Buffers 3.1541E+10 bytes

Redo Buffers 108003328 bytes

SQL> alter database mount standby database;

Database altered.

SQL> alter database open read only;

Database altered.

SQL> alter database recover managed standby database disconnect using current logfile;

Database altered.

然后检查alterlog看是否有异常,发现都很正常,然后检查确认os层是正常的,然后在登录数据库检查dataguard是否健康。

1.standby库和primary的时间延迟(在standby上运行):

select 'Last applied : ' Logs,

to_char(next_time, 'DD-MON-YY:HH24:MI:SS') Time

from v$archived_log

where sequence# =

(select max(sequence#) from v$archived_log where applied = 'YES')

union

select 'Last received : ' Logs,

to_char(next_time, 'DD-MON-YY:HH24:MI:SS') Time

from v$archived_log

where sequence# = (select max(sequence#) from v$archived_log);



2.查看进程的活动状态(在standby运行):

select process, status, thread#, sequence#, block#, blocks

from v$managed_standby;

3.检查log的恢复速度

select * from v$dataguard_status

select * from v$recovery_progress



确认库目前是正常的,然后在会头看数据库为什么会宕机,为什么会报ora-600

查看trace文件

[root@skate01 ~]# more /oracle/app/diag/rdbms/skate01/skate01/incident/incdir_264961/skate01_ora_17783_i264961.trc

Dump file /oracle/app/diag/rdbms/skate01/skate01/incident/incdir_264961/skate01_ora_17783_i264961.trc

Oracle Database 11g Enterprise Edition Release 11.2.0.2.0 - 64bit Production

With the Partitioning, OLAP, Data Mining and Real Application Testing options

ORACLE_HOME = /oracle/app/product/11.2.0/db_1

System name: Linux

Node name: skate01

Release: 2.6.18-194.el5

Version: #1 SMP Fri Apr 2 14:58:14 EDT 2010

Machine: x86_64

Instance name: skate01

Redo thread mounted by this instance: 1

Oracle process number: 120

Unix process pid: 17783, image: oracle@skate01

*** 2012-02-20 10:03:58.215

*** SESSION ID:(17.5) 2012-02-20 10:03:58.215

*** CLIENT ID:() 2012-02-20 10:03:58.215

*** SERVICE NAME:(SYS$USERS) 2012-02-20 10:03:58.215

*** MODULE NAME:(JDBC Thin Client) 2012-02-20 10:03:58.215

*** ACTION NAME:() 2012-02-20 10:03:58.215



Dump continued from file: /oracle/app/diag/rdbms/skate01/skate01/trace/skate01_ora_17783.trc

ORA-00600: internal error code, arguments: [KGHLKREM1], [0x838000020], [], [], [], [], [], [], [], [], [], []

========= Dump for incident 264961 (ORA 600 [KGHLKREM1]) ========

----- Beginning of Customized Incident Dump(s) -----

***** Internal heap ERROR KGHLKREM1 addr=0x838000020 ds=0x60001188 *****

***** Dump of memory around addr 0x838000020:

837FFF020 00000000 00000000 00000000 00000000 [................]

Repeat 511 times

Recovery state: ds=0x60001188 rtn=(nil) *rtn=(nil) szo=0 u4o=0 hdo=0 off=0

Szo:

UB4o:

Hdo:

Off:

Hla: 0

******************************************************

HEAP DUMP heap name="sga heap" desc=0x60001188

extent sz=0x9800 alt=248 het=32767 rec=9 flg=-126 opc=4

parent=(nil) owner=(nil) nex=(nil) xsz=0x0 heap=(nil)

fl2=0x60, nex=(nil)

ds for latch 1: 0x600551d8 0x60056a30 0x60058288 0x60059ae0

ds for latch 2: 0x6005eaa0 0x600602f8 0x60061b50 0x600633a8

reserved granule count 12 (granule size 134217728)

----- End of Customized Incident Dump(s) -----

*** 2012-02-20 10:03:58.341

dbkedDefDump(): Starting incident default dumps (flags=0x2, level=3, mask=0x0)

----- Current SQL Statement for this session (sql_id=40p7rprfbt1as) -----

select 'a' from dual

----- Call Stack Trace -----

calling call entry argument values in hex

location type point (? means dubious value)

-------------------- -------- -------------------- ----------------------------

skdstdst()+36 call kgdsdst() 000000000 ? 000000000 ?

7FFF0B5CCD88 ? 000000001 ?

000000001 ? 000000002 ?

ksedst1()+98 call skdstdst() 000000000 ? 000000000 ?

7FFF0B5CCD88 ? 000000001 ?

000000000 ? 000000002 ?

ksedst()+34 call ksedst1() 000000000 ? 000000001 ?

7FFF0B5CCD88 ? 000000001 ?

000000000 ? 000000002 ?

dbkedDefDump()+2741 call ksedst() 000000000 ? 000000001 ?

7FFF0B5CCD88 ? 000000001 ?

000000000 ? 000000002 ?

ksedmp()+36 call dbkedDefDump() 000000003 ? 000000002 ?

7FFF0B5CCD88 ? 000000001 ?

000000000 ? 000000002 ?

ksfdmp()+64 call ksedmp() 000000003 ? 000000002 ?

7FFF0B5CCD88 ? 000000001 ?

000000000 ? 000000002 ?

dbgexPhaseII()+1764 call ksfdmp() 000000003 ? 000000002 ?

7FFF0B5CCD88 ? 000000001 ?

000000000 ? 000000002 ?

dbgexExplicitEndInc call dbgexPhaseII() 2B1892AF1710 ? 2B1892EA06A8 ?

()+750 7FFF0B5D88C0 ? 000000001 ?

000000000 ? 000000002 ?

dbgeEndDDEInvocatio call dbgexExplicitEndInc 2B1892AF1710 ? 2B1892EA06A8 ?

nImpl()+767 () 7FFF0B5D88C0 ? 000000001 ?

000000000 ? 000000002 ?

dbgeEndDDEInvocatio call dbgeEndDDEInvocatio 2B1892AF1710 ? 2B1892EA06A8 ?

n()+47 nImpl() 7FFF0B5D88C0 ? 000000001 ?

000000000 ? 000000002 ?

kghnerror()+394 call dbgeEndDDEInvocatio 2B1892AF1710 ? 2B1892EA06A8 ?

n() 7FFF0B5D88C0 ? 000000001 ?

000000000 ? 000000002 ?

kghadd_reserved_ext call kghnerror() 00B7CCEA0 ? 060001188 ?

ent()+945 00A0EF0C0 ? 838000020 ?

100000000 ? 000000002 ?

kghget_reserved_ext call kghadd_reserved_ext 00B7CCEA0 ? 060001188 ?

ent()+526 ent() 060059AE0 ? 060059B28 ?

000000000 ? 000000000 ?

kghgex()+1455 call kghget_reserved_ext 00B7CCEA0 ? 060004CD8 ?

ent() 060059AE0 ? 060059B28 ?

000000000 ? 000000000 ?

kghfnd()+734 call kghgex() 00B7CCEA0 ? 060004CD8 ?

060059AE0 ? 000001058 ?

000000000 ? 000000000 ?

kghalo()+536 call kghfnd() 00B7CCEA0 ? 000000000 ?

060004CD8 ? 000000000 ?

060059AE0 ? 7FFF0B5C95C0 ?

kghgex()+437 call kghalo() 00B7CCEA0 ? 060059AE0 ?

000001000 ? 000001000 ?

060059AE0 ? 060004CD8 ?

kghalf()+395 call kghgex() 00B7CCEA0 ? 000000000 ?

85DD58D08 ? 000000FD0 ?

060059AE0 ? 060004CD8 ?

kksLoadChild()+2785 call kghalf() 00B7CCEA0 ? 85DD58D08 ?

000000001 ? 060004CD8 ?

000000000 ? 009A98F10 ?

kxsGetRuntimeLock() call kksLoadChild() 00B7CCEA0 ? 88FD354E0 ?

+2061 7FFF0B5DB5B0 ? 2B1892F59070 ?

85DD586F8 ? 000000000 ?

kksfbc()+14522 call kxsGetRuntimeLock() 00B7CCEA0 ? 2B1892F59070 ?

7FFF0B5DB5B0 ? 2B1892F59070 ?

85DD586F8 ? 88FD354E0 ?

kkspsc0()+2020 call kksfbc() 2B1892F59070 ? 000000003 ?

000000108 ? 7FFF0B5DD6F8 ?

000000015 ? 000000000 ?

kksParseCursor()+13 call kkspsc0() 2B1892F41BB8 ? 7FFF0B5DD6F8 ?

9 000000015 ? 000000003 ?

000000006 ? 0000000A4 ?

opiosq0()+2022 call kksParseCursor() 7FFF0B5DC0D0 ? 7FFF0B5DD6F8 ?

000000015 ? 000000003 ?

000000006 ? 0000000A4 ?

kpooprx()+269 call opiosq0() 000000003 ? 00000000E ?

7FFF0B5DC2A0 ? 0000000A4 ?

000000000 ? 7FFF0B5DBFB0 ?

kpoal8()+795 call kpooprx() 7FFF0B5DF694 ? 7FFF0B5DD6F8 ?

000000014 ? 000000001 ?

000000000 ? 7FFF0B5DBFB0 ?

opiodr()+910 call kpoal8() 00000005E ? 00000001C ?

7FFF0B5DF690 ? 000000001 ?

000000000 ? 000000001 ?

ttcpip()+2289 call opiodr() 00000005E ? 00000001C ?

7FFF0B5DF690 ? 000000000 ?

0098A1530 ? 000000001 ?

opitsk()+1665 call ttcpip() 00B7E2B10 ? 00923BB90 ?

7FFF0B5DF690 ? 000000000 ?

7FFF0B5DF0F0 ? 7FFF0B5DF888 ?

opiino()+961 call opitsk() 00B7E2B10 ? 000000001 ?

7FFF0B5DF690 ? 000000000 ?

7FFF0B5DF0F0 ? 7FFF0B5DF888 ?

opiodr()+910 call opiino() 00000003C ? 000000004 ?

7FFF0B5E0E18 ? 000000000 ?

7FFF0B5DF0F0 ? 7FFF0B5DF888 ?

opidrv()+565 call opiodr() 00000003C ? 000000004 ?

7FFF0B5E0E18 ? 000000000 ?

0098A0FE0 ? 7FFF0B5DF888 ?

sou2o()+98 call opidrv() 00000003C ? 000000004 ?

7FFF0B5E0E18 ? 000000000 ?

0098A0FE0 ? 7FFF0B5DF888 ?

opimai_real()+128 call sou2o() 7FFF0B5E0DF0 ? 00000003C ?

000000004 ? 7FFF0B5E0E18 ?

0098A0FE0 ? 7FFF0B5DF888 ?

ssthrdmain()+252 call opimai_real() 000000002 ? 7FFF0B5E0FE0 ?

000000004 ? 7FFF0B5E0E18 ?

0098A0FE0 ? 7FFF0B5DF888 ?

main()+196 call ssthrdmain() 000000002 ? 7FFF0B5E0FE0 ?

000000001 ? 000000000 ?

0098A0FE0 ? 7FFF0B5DF888 ?

__libc_start_main() call main() 000000002 ? 7FFF0B5E1188 ?

+244 000000001 ? 000000000 ?

0098A0FE0 ? 7FFF0B5DF888 ?

_start()+36 call __libc_start_main() 000A07368 ? 000000002 ?

7FFF0B5E1178 ? 000000000 ?

0098A0FE0 ? 000000002 ?



--------------------- Binary Stack Dump ---------------------

再往前查看alertlog,发现还报了ora-07445

Tue Jan 17 08:42:12 2012

Archived Log entry 7472 added for thread 1 sequence 8444 ID 0x263e89b dest 1:

Tue Jan 17 09:00:14 2012

Exception [type: SIGSEGV, Address not mapped to object] [ADDR:0x8] [PC:0xB0997A, ksmdscan_internal()+82] [flags: 0x0, count: 1]

Errors in file /oracle/app/diag/rdbms/skate01/skate01/trace/skate01_ora_25574.trc (incident=264155):

ora-07445: exception encountered: core dump [ksmdscan_internal()+82] [SIGSEGV] [ADDR:0x8] [PC:0xB0997A] [Address not mapped to objec

t] []

Incident details in: /oracle/app/diag/rdbms/skate01/skate01/incident/incdir_264155/skate01_ora_25574_i264155.trc

Use ADRCI or Support Workbench to package the incident.

See Note 411.1 at My Oracle Support for error and packaging details.

Tue Jan 17 09:00:21 2012

Dumping diagnostic data in directory=[cdmp_20120117090021], requested by (instance=1, osid=25574), summary=[incident=264155].

Tue Jan 17 09:00:22 2012

Sweep [inc][264155]: completed

Sweep [inc2][264155]: completed

Tue Jan 17 09:06:08 2012

Media Recovery Waiting for thread 1 sequence 8446

然后查看oracle文档“ID 1070812.1”,发现这个我启用hugepage有关,

当系统vm.drop_caches设置大于0,并且启用hugepage,这时这两个就会冲突,因为drop_caches是要释放内存,而hugepage是hold住内存。

参考:/article/2349643.html



解决方法

1.如果启用hugepage,那就设置vm.drop_caches=0

[root@localhost ~]# more /proc/sys/vm/drop_caches

3

[root@localhost ~]# sysctl -a | grep drop_caches

vm.drop_caches = 3

[root@localhost ~]# vi /etc/sysctl.conf

##skate add

vm.drop_caches=0

使其立刻生效

[root@localhost ~]# sysctl -p



检查是否生效

[root@localhost ~]# sysctl -a | grep drop_caches

vm.drop_caches = 0

或者

2.升级Linux Kernel version到 2.6.18-194.0.0.0.4.EL5





附上官方文档:



ORA-600 [KGHLKREM1] On Linux Using Parameter drop_cache On hugepages Configuration [ID 1070812.1]



Applies to:

Oracle Server - Enterprise Edition - Version: 10.2.0.1 and later [Release: 10.2 and later ]

Generic Linux

Symptoms

You are running an Oracle Database, single-instance or RAC. You have the SGA backed by hugepages.



You are getting the error

ORA-00600: internal error code, arguments: [KGHLKREM1], [0x06BC00020]

with stack trace similar to: kghnerror kghadd_reserved_ext kghgex

or also

ORA-07445: exception encountered: core dump

[kglhdal()+1105][SIGSEGV] [Address not mapped to object] [0x000000008] [] []

ORA-07445: exception encountered: core dump [kghfnd()+2328] [SIGSEGV]

[Address not mapped to object] [0xFFFFFFFFFFFFFFF0] [] []


and the SGA heap Dump of memory around the offending addr (in this particular example: 0x6bc00020)

it's showing zeroed out :

asm1_lmd0_8600.trc

~~~~~~~~~~~~~~~~~~

*** 2010-02-08 15:57:38.274

***** Internal heap ERROR KGHLKREM1 addr=0x6c400020 ds=0x60000058 *****

***** Dump of memory around addr 0x6c400020:

06C3FF020 00000000 00000000 00000000 00000000 [................]

Repeat 511 times




Changes

1. On your system you are running with vm.drop_caches=1 (or 3), drop_cache have been setto a value greater than zero , or you are executing

echo 3 > /proc/sys/vm/drop_caches




/proc/sys/vm/drop_caches (since Linux 2.6.16)

Writing to this file causes the kernel to drop clean caches, dentries and inodes from memory, causing that memory to become free.

To free pagecache:

* echo 1 > /proc/sys/vm/drop_caches

To free dentries and inodes:

* echo 2 > /proc/sys/vm/drop_caches

To free pagecache, dentries and inodes:

* echo 3 > /proc/sys/vm/drop_cachesAs this is a non-destructive operation, and dirty objects are not freeable, the user should run "sync" first in order to make sure all cached objects are freed.

2. You have setup the Hugepages

Cause

This is a Linux Kernel issue.

Using the linux kernel "drop_cache" parameter and having the hugepages a memory corruption can occurs.

Per internal Bug 9461825, executing vm.drop_caches corrupts Oracle Database SGA hugepages;

it is fixed in Linux Kernel version 2.6.18-194.0.0.0.4.EL5

Solution

1. As a workaround when hugepages are set avoid any vm.drop_cache settings.

OR

2. Upgrade to Linux Kernel version 2.6.18-194.0.0.0.4.EL5

----------end-----------
内容来自用户分享和网络整理,不保证内容的准确性,如有侵权内容,可联系管理员处理 点击这里给我发消息
标签: 
相关文章推荐