Oracle CRS 不能启动,日志报错: "has a disk HB, but no network HB, DHB has rcfg..."
2014-08-06 16:11
671 查看
现象:
--查看crs状态
#/u01/app/11.2.0/grid/bin/crsctl check crs
CRS-4638: Oracle High Availability Services is online
CRS-4535: Cannot communicate with Cluster Ready Services
CRS-4530: Communications failure contacting Cluster Synchronization Services daemon
CRS-4534: Cannot communicate with Event Manager
[root@ntrac1 ~]# /u01/app/oracle/grid/bin/crsctl stat res -t -init
--------------------------------------------------------------------------------
NAME TARGET STATE SERVER STATE_DETAILS
--------------------------------------------------------------------------------
Cluster Resources
--------------------------------------------------------------------------------
ora.asm
1 ONLINE OFFLINE Instance Shutdown
ora.cluster_interconnect.haip
1 ONLINE OFFLINE
ora.crf
1 ONLINE OFFLINE
ora.crsd
1 ONLINE OFFLINE
ora.cssd
1 ONLINE OFFLINE STARTING
ora.cssdmonitor
1 ONLINE ONLINE ntrac1
ora.ctssd
1 ONLINE OFFLINE
ora.diskmon
1 OFFLINE OFFLINE
ora.evmd
1 ONLINE OFFLINE
ora.gipcd
1 ONLINE ONLINE ntrac1
ora.gpnpd
1 ONLINE ONLINE ntrac1
ora.mdnsd
1 ONLINE ONLINE ntrac1
--查看grid日志
#tail -100f $GRID_HOME/log/ntrac1/alertntrac1.log
2014-08-06 12:29:59.627:
[/u01/app/oracle/grid/bin/cssdagent(32145)]CRS-5818:Aborted command 'start' for resource 'ora.cssd'. Details at (:CRSAGF00113:) {0:0:2} in /u01/app/oracle/grid/log/ntrac1/agent/ohasd/oracssdagent_root/oracssdagent_root.log.
2014-08-06 12:29:59.628:
[cssd(32230)]CRS-1656:The CSS daemon is terminating due to a fatal error; Details at (:CSSSC00012:) in /u01/app/oracle/grid/log/ntrac1/cssd/ocssd.log
2014-08-06 12:29:59.628:
[cssd(32230)]CRS-1603:CSSD on node ntrac1 shutdown by user.
2014-08-06 12:30:04.791:
[ohasd(20111)]CRS-2765:Resource 'ora.cssdmonitor' has failed on server 'ntrac1'.
2014-08-06 12:30:06.569:
[cssd(36385)]CRS-1713:CSSD daemon is started in clustered mode
2014-08-06 12:30:08.191:
[ohasd(20111)]CRS-2767:Resource state recovery not attempted for 'ora.diskmon' as its target state is OFFLINE
2014-08-06 12:30:22.814:
[cssd(36385)]CRS-1707:Lease acquisition for node ntrac1 number 1 completed
2014-08-06 12:30:24.103:
[cssd(36385)]CRS-1605:CSSD voting file is online: /dev/mapper/CML_OCR02; details in /u01/app/oracle/grid/log/ntrac1/cssd/ocssd.log.
2014-08-06 12:30:24.111:
[cssd(36385)]CRS-1605:CSSD voting file is online: /dev/mapper/CML_OCR03; details in /u01/app/oracle/grid/log/ntrac1/cssd/ocssd.log.
2014-08-06 12:30:24.122:
[cssd(36385)]CRS-1605:CSSD voting file is online: /dev/mapper/CML_OCR01; details in /u01/app/oracle/grid/log/ntrac1/cssd/ocssd.log.
--查看ocssd日志
#tail -100f $GRID_HOME/log/ntrac1/cssd/ocssd.log
2014-08-06 14:45:04.140: [ CSSD][483813120]clssnmLocalJoinEvent: takeover aborted due to cluster member node found on disk
2014-08-06 14:45:04.623: [ CSSD][488544000]clssgmWaitOnEventValue: after CmInfo State val 3, eval 1 waited 0
2014-08-06 14:45:04.968: [ CSSD][502802176]clssnmvDHBValidateNcopy: node 2, ntrac2, has a disk HB, but no network HB, DHB has rcfg 301688209, wrtcnt, 3361203, LATS 5085774, lastSeqNo 3361200, uniqueness 1406193376, timestamp 1407307504/1113874084
2014-08-06 14:45:04.968: [ CSSD][502802176]clssnmvDHBValidateNcopy: node 3, ntrac3, has a disk HB, but no network HB, DHB has rcfg 301688209, wrtcnt, 3360733, LATS 5085774, lastSeqNo 3360730, uniqueness 1406193385, timestamp 1407307504/1113864924
2014-08-06 14:45:05.105: [ CSSD][498054912]clssnmvDHBValidateNcopy: node 2, ntrac2, has a disk HB, but no network HB, DHB has rcfg 301688209, wrtcnt, 3361205, LATS 5085904, lastSeqNo 3361202, uniqueness 1406193376, timestamp 1407307504/1113874544
2014-08-06 14:45:05.105: [ CSSD][498054912]clssnmvDHBValidateNcopy: node 3, ntrac3, has a disk HB, but no network HB, DHB has rcfg 301688209, wrtcnt, 3360735, LATS 5085904, lastSeqNo 3360732, uniqueness 1406193385, timestamp 1407307504/1113864974
解决:
--查看网络连接,发现问题
从其它节点ping故障节点的私网IP地址,发现是ping不通的。
初步确定是网络原因,可能是网卡的问题,后来发现是有故障节点的有一根私有网卡上的一根网线没有插上,插上就重新启动crs就没问题了。
参考:
http://sqlsewer.blogspot.com/2013/07/oracle-crs-is-not-starting-has-disk-hb.html
http://t.askmaclean.com/thread-3709-1-1.html
--查看crs状态
#/u01/app/11.2.0/grid/bin/crsctl check crs
CRS-4638: Oracle High Availability Services is online
CRS-4535: Cannot communicate with Cluster Ready Services
CRS-4530: Communications failure contacting Cluster Synchronization Services daemon
CRS-4534: Cannot communicate with Event Manager
[root@ntrac1 ~]# /u01/app/oracle/grid/bin/crsctl stat res -t -init
--------------------------------------------------------------------------------
NAME TARGET STATE SERVER STATE_DETAILS
--------------------------------------------------------------------------------
Cluster Resources
--------------------------------------------------------------------------------
ora.asm
1 ONLINE OFFLINE Instance Shutdown
ora.cluster_interconnect.haip
1 ONLINE OFFLINE
ora.crf
1 ONLINE OFFLINE
ora.crsd
1 ONLINE OFFLINE
ora.cssd
1 ONLINE OFFLINE STARTING
ora.cssdmonitor
1 ONLINE ONLINE ntrac1
ora.ctssd
1 ONLINE OFFLINE
ora.diskmon
1 OFFLINE OFFLINE
ora.evmd
1 ONLINE OFFLINE
ora.gipcd
1 ONLINE ONLINE ntrac1
ora.gpnpd
1 ONLINE ONLINE ntrac1
ora.mdnsd
1 ONLINE ONLINE ntrac1
--查看grid日志
#tail -100f $GRID_HOME/log/ntrac1/alertntrac1.log
2014-08-06 12:29:59.627:
[/u01/app/oracle/grid/bin/cssdagent(32145)]CRS-5818:Aborted command 'start' for resource 'ora.cssd'. Details at (:CRSAGF00113:) {0:0:2} in /u01/app/oracle/grid/log/ntrac1/agent/ohasd/oracssdagent_root/oracssdagent_root.log.
2014-08-06 12:29:59.628:
[cssd(32230)]CRS-1656:The CSS daemon is terminating due to a fatal error; Details at (:CSSSC00012:) in /u01/app/oracle/grid/log/ntrac1/cssd/ocssd.log
2014-08-06 12:29:59.628:
[cssd(32230)]CRS-1603:CSSD on node ntrac1 shutdown by user.
2014-08-06 12:30:04.791:
[ohasd(20111)]CRS-2765:Resource 'ora.cssdmonitor' has failed on server 'ntrac1'.
2014-08-06 12:30:06.569:
[cssd(36385)]CRS-1713:CSSD daemon is started in clustered mode
2014-08-06 12:30:08.191:
[ohasd(20111)]CRS-2767:Resource state recovery not attempted for 'ora.diskmon' as its target state is OFFLINE
2014-08-06 12:30:22.814:
[cssd(36385)]CRS-1707:Lease acquisition for node ntrac1 number 1 completed
2014-08-06 12:30:24.103:
[cssd(36385)]CRS-1605:CSSD voting file is online: /dev/mapper/CML_OCR02; details in /u01/app/oracle/grid/log/ntrac1/cssd/ocssd.log.
2014-08-06 12:30:24.111:
[cssd(36385)]CRS-1605:CSSD voting file is online: /dev/mapper/CML_OCR03; details in /u01/app/oracle/grid/log/ntrac1/cssd/ocssd.log.
2014-08-06 12:30:24.122:
[cssd(36385)]CRS-1605:CSSD voting file is online: /dev/mapper/CML_OCR01; details in /u01/app/oracle/grid/log/ntrac1/cssd/ocssd.log.
--查看ocssd日志
#tail -100f $GRID_HOME/log/ntrac1/cssd/ocssd.log
2014-08-06 14:45:04.140: [ CSSD][483813120]clssnmLocalJoinEvent: takeover aborted due to cluster member node found on disk
2014-08-06 14:45:04.623: [ CSSD][488544000]clssgmWaitOnEventValue: after CmInfo State val 3, eval 1 waited 0
2014-08-06 14:45:04.968: [ CSSD][502802176]clssnmvDHBValidateNcopy: node 2, ntrac2, has a disk HB, but no network HB, DHB has rcfg 301688209, wrtcnt, 3361203, LATS 5085774, lastSeqNo 3361200, uniqueness 1406193376, timestamp 1407307504/1113874084
2014-08-06 14:45:04.968: [ CSSD][502802176]clssnmvDHBValidateNcopy: node 3, ntrac3, has a disk HB, but no network HB, DHB has rcfg 301688209, wrtcnt, 3360733, LATS 5085774, lastSeqNo 3360730, uniqueness 1406193385, timestamp 1407307504/1113864924
2014-08-06 14:45:05.105: [ CSSD][498054912]clssnmvDHBValidateNcopy: node 2, ntrac2, has a disk HB, but no network HB, DHB has rcfg 301688209, wrtcnt, 3361205, LATS 5085904, lastSeqNo 3361202, uniqueness 1406193376, timestamp 1407307504/1113874544
2014-08-06 14:45:05.105: [ CSSD][498054912]clssnmvDHBValidateNcopy: node 3, ntrac3, has a disk HB, but no network HB, DHB has rcfg 301688209, wrtcnt, 3360735, LATS 5085904, lastSeqNo 3360732, uniqueness 1406193385, timestamp 1407307504/1113864974
解决:
--查看网络连接,发现问题
从其它节点ping故障节点的私网IP地址,发现是ping不通的。
初步确定是网络原因,可能是网卡的问题,后来发现是有故障节点的有一根私有网卡上的一根网线没有插上,插上就重新启动crs就没问题了。
参考:
http://sqlsewer.blogspot.com/2013/07/oracle-crs-is-not-starting-has-disk-hb.html
http://t.askmaclean.com/thread-3709-1-1.html
相关文章推荐
- clssnmvDHBValidateNCopy: node 1, rac01, has a disk HB, but no network HB, DHB has rcfg
- Oracle RAC 单节点宕机 has a disk HB, but no network HB
- PL/SQL Developer启动时报错:“Control 'dxDockBrowserPanel' has no parent window"
- PL/SQL Developer启动时报错:“Control 'dxDockBrowserPanel' has no parent window"
- Apache "No services installed"问题的处理以及Apache提示 the requested operation has failed而无法启动
- Apache "No services installed"问题的处理以及Apache提示 the requested operation has failed而无法启动
- GeoServer启动错误Error on startup, "java.lang.NoSuchFieldError KEY_CACHED_TILE_RECYCLING_ENABLED"
- VMware下Linux安装VMWare Tools 后无法启动图形界面,出现"no screens found" 的解决办法
- [Python][转]Exception AttributeError: "'NoneType' object has no attribute
- ASM Diskgroup Creation Failed with "IBM AIX RISC system/6000: 6:no such device or address" [ID 12634
- Exception AttributeError: "'NoneType' object has no attribute
- 解决IIS6.0不能启动"不能访问网络位置"深入理解Socket Pooling(套接字池)
- Virtual Disk "hard disk 3' is not accessible on the host 虚拟机不能启动的解决方法
- 当使用JQuery的"$",抛异常Uncaught TypeError: Object #<Object> has no method 'ajax'
- web.py 启动时候出现AttributeError: 'module' object has no attribute 'inet_pton'错误
- 已解决:mssqlserver服务启动后又停止了.一些服务自动停止,如果它们没有什么可做的,例如"性能和警报日志"
- ubuntu shortcut "no such file or directory" 快捷方式不能用
- 不能启动虚拟机 Unable to open kernel device "\\.\Global\vmx86
- Android 应用开发单步调试中"The JAR...has no source attachment"并非一定是错误
- pxe windows img under linux 客户端启动时出现 PXE-E51 "No DHCP or DHCP Proxy Offers received" error