RAC故障处理一例
2014-05-27 17:52
399 查看
上周六午夜12点刚要睡觉,电话响起,这个时候来电话肯定没啥好事,一看手机号码不认识,通了电话才知道是我们外聘的HP工程师在客户现场处理故障,客户是两台HP小型机做了一个两个节点的RAC,由于客户的原因导致第二个节点系统无法进入多用户模式,估计是在系统里乱操作,删了什么操作系统文件,导致机器只能进入维护模式,因此第二个节点不得不重新安装,HP工程师是克隆了另外一个节点的系统到第二个节点的,然后修改IP,主机名等等的配置好Service
Guard后,HA能起来,但是启动CRS的时候,第二个节点报如下错误:
Attempting to start CRS stack
Failure at scls_scr_create with code 1
Internal Error Information:
Category: 1234
Operation: scls_scr_create
Location: mkdir
Other: Unable to make user dir
Dep: 2
折腾了半天毫无进展,想重启系统然系统自己带起来,但是跟HP的工程师交流了一下,主机起来后CRS是要手工启动的,那么重启就毫无意义了,在Unix、Linux下,CRS的启动停止脚本是放在init.d目录里的,对HP-Unix不太熟悉,问了才知道HP-Unix中,这个目录是在/sbin/init.d 中,而不是/etc/init.d
目录,从这个目录里用./init.crs 脚本来启动CRS,用法如下:
# ./init.crs xxx <--随便输入一个让它显示用法
Usage: ./init.crs {stop|start|enable|disable}
# ./init.crs start
这次的错误信息有参考意义了:
/sbin/init.d/init.cssd[537]: /var/opt/oracle/scls_scr/rqtmsdb2/root/cssrun: Cannot
create the specified file.
Startup will be queued to init within 30 seconds.
错误日志显示CRS不能创建cssrun这个文件,
检查之:
# cd /var/opt/oracle/scls_scr/rqtmsdb2/root/
sh: /var/opt/oracle/scls_scr/rqtmsdb2/root/: not found.
咦,没有这个目录!
# cd /var/opt/oracle/scls_scr/
ls -l 一看就明白了:
# ls -l
total 0
drwxr-xr-x 4 root sys 96 Dec 31 2010 rqtmsdb1
因为这个系统是从第一个节点克隆过来的,所以这个本应该是rqtmsdb2的目录现在是rqtmsdb1,怪不得呢!
修改之:
# mv rqtmsdb1 rqtmsdb2
# ls -l
total 0
drwxr-xr-x 4 root sys 96 Dec 31 2010 rqtmsdb2
# cd rq*
# ls -l
total 16
drwxr-xr-x 2 orarac sys 96 Dec 31 2010 orarac
drwxr-xr-x 2 root sys 8192 Nov 17 09:55 root
# cd root
# ls -l
total 48
-rw-rw-rw- 1 root root 8 Nov 17 15:33 crsdboot
-rw-r--r-- 1 root sys 7 Dec 31 2010 crsstart
-rw-rw-rw- 1 root sys 6 Nov 17 15:33 cssrun
-rw-r--r-- 1 root sys 0 Nov 17 15:33 noclsmon
-rw-rw-rw- 1 root root 0 Nov 17 15:33 nooprocd
再次启动CRS:
# cd /sbin/init.d
#
# ./init.crs
start
Startup will be queued to init within 30 seconds.
# ps -ef|grep d.bin
root 18734 22410 1 02:22:49 pts/ta 0:00
grep d.bin
# ps -ef|grep d.bin
root 2059 1 0 22:03:36 ? 0:00 /ora_soft/oracle/product/crs/bin/crsd.bin
reboot
orarac 18782 2057 0 02:23:09 ? 0:00 /ora_soft/oracle/product/crs/bin/evmd.bin
orarac 19013 19012 0 02:23:14 ? 0:00 /ora_soft/oracle/product/crs/bin/ocssd.bin
# /ora_soft/oracle/product/crs/bin/crsctl
check crs
CSS appears healthy
CRS appears healthy
EVM appears healthy
# /ora_soft/oracle/product/crs/bin/crlctl
stop crs
sh: /ora_soft/oracle/product/crs/bin/crlctl: not
found.
# /ora_soft/oracle/product/crs/bin/crsctl
stop crs
Stopping resources.
Successfully stopped CRS resources
Stopping CSSD.
Shutting down CSS daemon.
Shutdown request successfully issued.
# ps -ef|grep d.bin
root 21987 22410 0 02:24:53 pts/ta 0:00
grep d.bin
# /ora_soft/oracle/product/crs/bin/crsctl
start crs
Attempting to start CRS stack
The CRS stack will be started shortly
# ps -ef|grep d.bin
root 23992 22410 0 02:32:59 pts/ta 0:00
grep d.bin
# ps -ef|grep d.bin
root 23995 22410 0 02:33:05 pts/ta 0:00
grep d.bin
# ps -ef|grep d.bin
root 21829 1 0 02:24:44 ? 0:00 /ora_soft/oracle/product/crs/bin/crsd.bin
reboot
orarac 24152 21817 0 02:33:18 ? 0:00 /ora_soft/oracle/product/crs/bin/evmd.bin
orarac 24299 24298 0 02:33:21 ? 0:00 /ora_soft/oracle/product/crs/bin/ocssd.bin
root 24577 22410 0 02:33:31 pts/ta 0:00
grep d.bin
# /ora_soft/oracle/product/crs/bin/crsctl
status
Unknown parameter: status
# /ora_soft/oracle/product/crs/bin/crsctl
check crs
CSS appears healthy
CRS appears healthy
EVM appears healthy
#
这次能够正常启动了!
回头检查第一个节点,这个节点HP工程师跟我说什么也没动过,我就信了,克隆一个系统嘛是对这个节点不用做任何改动,但是现实且很残酷!
命令敲下去:
# cd /sbin/init.d
#
# ./init.crs
start
Startup will be queued to init within 30 seconds.
等不到d.bin的进程,无任何反应,回头检查操作系统日志:
Nov 18 03:26:00 rqtmsdb1 syslog: Cluster
Ready Services waiting on dependencies. Diagnostics in /tmp/crsctl.2104.
Nov 18 03:26:00 rqtmsdb1 syslog: Cluster Ready Services waiting
on dependencies. Diagnostics in /tmp/crsctl.2116.
Nov 18 03:26:00 rqtmsdb1 syslog: Cluster Ready Services waiting
on dependencies. Diagnostics in /tmp/crsctl.2154.
Nov 18 03:34:16 rqtmsdb1 syslog: Cluster Ready Services waiting
on dependencies. Diagnostics in /tmp/crsctl.2154.
看来有些错误信息啊,其中的一个文件:
#cat /tmp/crsctl.2104
Failed 3 to bind listening endpoint:(ADDRESS=(PROTOCOL=tcp)(HOST=rqtmsdb1-priv))
#
无法绑定监听到PricateIP上,再去检查/etc/hosts文件,发现没有Pricate
IP!,只有第二个节点的Pricate IP,再去检查第二个节点的/etc/hosts文件,对比后添加第一个节点的Pricate IP :
192.168.0.1 rqtmsdb1-priv
没在开始去检查/etc/hosts文件真是失误啊!听到的一定要自己再确认一遍!又一次在RAC环境里载在/etc/hosts文件手里!!!之前在一个客户那里配置RAC,工程师给我将localhosts这个系统默认的东东去掉了,导致我在这个上面花了一天的时间才找到是没有localhosts导致的!
再次启动CRS,这次正常启动了!以为一切都好了,可以去睡觉了,没先到后面VIP还有问题,
crs_start -all 启动Cluste,报告不能启动,VIP起不来,后面的就都失败了,这个错误好办,之前解决过,先设置对VIP进行debug:
#/ora_soft/oracle/product/crs/bin/crsctl debug log
res "ora.rqtmsdb1.vip:5"
然后单独启动VIP资源:
# /ora_soft/oracle/product/crs/bin/srvctl start nodeapps -n rqtmsdb1
rqtmsdb1:ora.rqtmsdb1.vip:Sun
Nov 18 04:19:29 EAT 2012 [ 25193 ] Checking interface existance
rqtmsdb1:ora.rqtmsdb1.vip:Sun
Nov 18 04:19:29 EAT 2012 [ 25193 ] Calling
getifbyip
rqtmsdb1:ora.rqtmsdb1.vip:Sun
Nov 18 04:19:29 EAT 2012 [ 25193 ] getifbyip: started for 172.16.7.22
rqtmsdb1:ora.rqtmsdb1.vip:Sun
Nov 18 04:19:29 EAT 2012 [ 25193 ] Completed
getifbyip
rqtmsdb1:ora.rqtmsdb1.vip:Sun
Nov 18 04:19:29 EAT 2012 [ 25193 ] switched
to standby : start/check operation
rqtmsdb1:ora.rqtmsdb1.vip:Sun
Nov 18 04:19:33 EAT 2012 [ 25193 ] Completed
with initial interface test
rqtmsdb1:ora.rqtmsdb1.vip:Sun
Nov 18 04:19:33 EAT 2012 [ 25193 ] Broadcast = 172.16.7.255
rqtmsdb1:ora.rqtmsdb1.vip:Sun
Nov 18 04:19:33 EAT 2012 [ 25193 ] Interface tests
rqtmsdb1:ora.rqtmsdb1.vip:Sun
Nov 18 04:19:33 EAT 2012 [ 25193 ] checkIf: start for if=lan0
rqtmsdb1:ora.rqtmsdb1.vip:Sun
Nov 18 04:19:33 EAT 2012 [ 25193 ] checkIf: get default gw
rqtmsdb1:ora.rqtmsdb1.vip:Sun
Nov 18 04:19:33 EAT 2012 [ 25193 ] defaultgw: started
rqtmsdb1:ora.rqtmsdb1.vip:Sun
Nov 18 04:19:33 EAT 2012 [ 25193 ] defaultgw: completed
with
rqtmsdb1:ora.rqtmsdb1.vip:checkIf:
Default gateway is not defined (host=rqtmsdb1)
rqtmsdb1:ora.rqtmsdb1.vip:Interface
lan0 checked failed (host=rqtmsdb1)
rqtmsdb1:ora.rqtmsdb1.vip:Sun
Nov 18 04:19:33 EAT 2012 [ 25193 ] checkIf: end for if=lan0
rqtmsdb1:ora.rqtmsdb1.vip:Sun
Nov 18 04:19:33 EAT 2012 [ 25193 ] DEBUG: FAIL_WHEN_ALL_LINK_DOWN = 1
and IF_USING =
rqtmsdb1:ora.rqtmsdb1.vip:Invalid
parameters, or failed to bring up VIP (host=rqtmsdb1)
rqtmsdb1:ora.rqtmsdb1.vip:Sun
Nov 18 04:19:33 EAT 2012 [ 25341 ] Checking interface existance
rqtmsdb1:ora.rqtmsdb1.vip:Sun
Nov 18 04:19:33 EAT 2012 [ 25341 ] Calling
getifbyip
rqtmsdb1:ora.rqtmsdb1.vip:Sun
Nov 18 04:19:33 EAT 2012 [ 25341 ] getifbyip: started for 172.16.7.22
rqtmsdb1:ora.rqtmsdb1.vip:Sun
Nov 18 04:19:33 EAT 2012 [ 25341 ] Completed
getifbyip
rqtmsdb1:ora.rqtmsdb1.vip:Sun
Nov 18 04:19:33 EAT 2012 [ 25341 ] switched
to standby : start/check operation
rqtmsdb1:ora.rqtmsdb1.vip:Sun
Nov 18 04:19:37 EAT 2012 [ 25341 ] Completed
with initial interface test
rqtmsdb1:ora.rqtmsdb1.vip:Sun
Nov 18 04:19:37 EAT 2012 [ 25341 ] Broadcast = 172.16.7.255
rqtmsdb1:ora.rqtmsdb1.vip:Sun
Nov 18 04:19:37 EAT 2012 [ 25341 ] Performing
CRS_STAT testing
rqtmsdb1:ora.rqtmsdb1.vip:Sun
Nov 18 04:19:37 EAT 2012 [ 25341 ] Completed
CRS_STAT testing
rqtmsdb1:ora.rqtmsdb1.vip:Sun
Nov 18 04:19:37 EAT 2012 [ 25341 ] Interface tests
rqtmsdb1:ora.rqtmsdb1.vip:Sun
Nov 18 04:19:37 EAT 2012 [ 25341 ] checkIf: start for if=lan0
rqtmsdb1:ora.rqtmsdb1.vip:Sun
Nov 18 04:19:37 EAT 2012 [ 25341 ] checkIf: get default gw
rqtmsdb1:ora.rqtmsdb1.vip:Sun
Nov 18 04:19:37 EAT 2012 [ 25341 ] defaultgw: started
rqtmsdb1:ora.rqtmsdb1.vip:Sun
Nov 18 04:19:37 EAT 2012 [ 25341 ] defaultgw: completed
with
rqtmsdb1:ora.rqtmsdb1.vip:checkIf: Default gateway is not
defined (host=rqtmsdb1)
rqtmsdb1:ora.rqtmsdb1.vip:Interface lan0 checked failed (host=rqtmsdb1)
rqtmsdb1:ora.rqtmsdb1.vip:Sun
Nov 18 04:19:37 EAT 2012 [ 25341 ] checkIf: end for if=lan0
rqtmsdb1:ora.rqtmsdb1.vip:Sun
Nov 18 04:19:37 EAT 2012 [ 25341 ] DEBUG: FAIL_WHEN_ALL_LINK_DOWN = 1
and IF_USING =
rqtmsdb1:ora.rqtmsdb1.vip:Invalid
parameters, or failed to bring up VIP (host=rqtmsdb1)
CRS-1006: No more members to consider
CRS-0215: Could not start resource 'ora.rqtmsdb1.vip'.
rqtmsdb1:ora.rqtmsdb1.vip:Sun
Nov 18 04:19:48 EAT 2012 [ 25801 ] Checking interface existance
rqtmsdb1:ora.rqtmsdb1.vip:Sun
Nov 18 04:19:48 EAT 2012 [ 25801 ] Calling
getifbyip
rqtmsdb1:ora.rqtmsdb1.vip:Sun
Nov 18 04:19:48 EAT 2012 [ 25801 ] getifbyip: started for 172.16.7.22
rqtmsdb1:ora.rqtmsdb1.vip:Sun
Nov 18 04:19:48 EAT 2012 [ 25801 ] Completed
getifbyip
rqtmsdb1:ora.rqtmsdb1.vip:Sun
Nov 18 04:19:48 EAT 2012 [ 25801 ] switched
to standby : start/check operation
rqtmsdb1:ora.rqtmsdb1.vip:Sun
Nov 18 04:19:52 EAT 2012 [ 25801 ] Completed
with initial interface test
rqtmsdb1:ora.rqtmsdb1.vip:Sun
Nov 18 04:19:52 EAT 2012 [ 25801 ] Broadcast = 172.16.7.255
rqtmsdb1:ora.rqtmsdb1.vip:Sun
Nov 18 04:19:52 EAT 2012 [ 25801 ] Interface tests
rqtmsdb1:ora.rqtmsdb1.vip:Sun
Nov 18 04:19:52 EAT 2012 [ 25801 ] checkIf: start for if=lan0
rqtmsdb1:ora.rqtmsdb1.vip:Sun
Nov 18 04:19:52 EAT 2012 [ 25801 ] checkIf: get default gw
rqtmsdb1:ora.rqtmsdb1.vip:Sun
Nov 18 04:19:52 EAT 2012 [ 25801 ] defaultgw: started
rqtmsdb1:ora.rqtmsdb1.vip:Sun
Nov 18 04:19:52 EAT 2012 [ 25801 ] defaultgw: completed
with
rqtmsdb1:ora.rqtmsdb1.vip:checkIf: Default gateway is not
defined (host=rqtmsdb1)
rqtmsdb1:ora.rqtmsdb1.vip:Interface lan0 checked failed (host=rqtmsdb1)
rqtmsdb1:ora.rqtmsdb1.vip:Sun
Nov 18 04:19:52 EAT 2012 [ 25801 ] checkIf: end for if=lan0
rqtmsdb1:ora.rqtmsdb1.vip:Sun
Nov 18 04:19:52 EAT 2012 [ 25801 ] DEBUG: FAIL_WHEN_ALL_LINK_DOWN = 1
and IF_USING =
rqtmsdb1:ora.rqtmsdb1.vip:Invalid
parameters, or failed to bring up VIP (host=rqtmsdb1)
rqtmsdb1:ora.rqtmsdb1.vip:Sun
Nov 18 04:19:52 EAT 2012 [ 25949 ] Checking interface existance
rqtmsdb1:ora.rqtmsdb1.vip:Sun
Nov 18 04:19:52 EAT 2012 [ 25949 ] Calling
getifbyip
rqtmsdb1:ora.rqtmsdb1.vip:Sun
Nov 18 04:19:52 EAT 2012 [ 25949 ] getifbyip: started for 172.16.7.22
rqtmsdb1:ora.rqtmsdb1.vip:Sun
Nov 18 04:19:52 EAT 2012 [ 25949 ] Completed
getifbyip
rqtmsdb1:ora.rqtmsdb1.vip:Sun
Nov 18 04:19:52 EAT 2012 [ 25949 ] switched
to standby : start/check operation
rqtmsdb1:ora.rqtmsdb1.vip:Sun
Nov 18 04:19:56 EAT 2012 [ 25949 ] Completed
with initial interface test
rqtmsdb1:ora.rqtmsdb1.vip:Sun
Nov 18 04:19:56 EAT 2012 [ 25949 ] Broadcast = 172.16.7.255
rqtmsdb1:ora.rqtmsdb1.vip:Sun
Nov 18 04:19:56 EAT 2012 [ 25949 ] Performing
CRS_STAT testing
rqtmsdb1:ora.rqtmsdb1.vip:Sun
Nov 18 04:19:56 EAT 2012 [ 25949 ] Completed
CRS_STAT testing
rqtmsdb1:ora.rqtmsdb1.vip:Sun
Nov 18 04:19:56 EAT 2012 [ 25949 ] Interface tests
rqtmsdb1:ora.rqtmsdb1.vip:Sun
Nov 18 04:19:56 EAT 2012 [ 25949 ] checkIf: start for if=lan0
rqtmsdb1:ora.rqtmsdb1.vip:Sun
Nov 18 04:19:56 EAT 2012 [ 25949 ] checkIf: get default gw
rqtmsdb1:ora.rqtmsdb1.vip:Sun
Nov 18 04:19:56 EAT 2012 [ 25949 ] defaultgw: started
rqtmsdb1:ora.rqtmsdb1.vip:Sun
Nov 18 04:19:56 EAT 2012 [ 25949 ] defaultgw: completed
with
rqtmsdb1:ora.rqtmsdb1.vip:checkIf: Default gateway is not
defined (host=rqtmsdb1)
rqtmsdb1:ora.rqtmsdb1.vip:Interface lan0 checked failed (host=rqtmsdb1)
rqtmsdb1:ora.rqtmsdb1.vip:Sun
Nov 18 04:19:56 EAT 2012 [ 25949 ] checkIf: end for if=lan0
rqtmsdb1:ora.rqtmsdb1.vip:Sun
Nov 18 04:19:56 EAT 2012 [ 25949 ] DEBUG: FAIL_WHEN_ALL_LINK_DOWN = 1
and IF_USING =
rqtmsdb1:ora.rqtmsdb1.vip:Invalid
parameters, or failed to bring up VIP (host=rqtmsdb1)
CRS-0215: Could not start resource 'ora.rqtmsdb1.LISTENER_RQTMSDB1.lsnr'.
#
没有配置默认网关,在检查IP地址配置情况,发现,IP地址是配置在lan2上的,一问才知道,由于lan0经常出问题,这次改到lan2,不早说啊,nnd!!
VIP在启动的时候回去ping默认网关,如果不通,那么VIP是起不来的。HP工程师配置好默认网关后,修改VIP到lan0上去:
先删除之:
su - oracle
oifcfg delif -global
然后再重新配置:
$oifcfg setif -global lan2/172.16.7.0:public
$oifcfg setif -global lan3/192.168.0.0:cluster_interconnect
#/ora_soft/oracle/product/crs/bin/srvctl modify nodeapps -n rqtmsdb2 -A 172.16.7.23/255.255.255.0/lan2
#/ora_soft/oracle/product/crs/bin/srvctl modify nodeapps -n rqtmsdb1 -A 172.16.7.22/255.255.255.0/lan2
修改完成后再次crs_start -all ,RAC启动成功,手工,睡觉!
http://blog.chinaunix.net/uid-26896647-id-3417998.html
Guard后,HA能起来,但是启动CRS的时候,第二个节点报如下错误:
Attempting to start CRS stack
Failure at scls_scr_create with code 1
Internal Error Information:
Category: 1234
Operation: scls_scr_create
Location: mkdir
Other: Unable to make user dir
Dep: 2
折腾了半天毫无进展,想重启系统然系统自己带起来,但是跟HP的工程师交流了一下,主机起来后CRS是要手工启动的,那么重启就毫无意义了,在Unix、Linux下,CRS的启动停止脚本是放在init.d目录里的,对HP-Unix不太熟悉,问了才知道HP-Unix中,这个目录是在/sbin/init.d 中,而不是/etc/init.d
目录,从这个目录里用./init.crs 脚本来启动CRS,用法如下:
# ./init.crs xxx <--随便输入一个让它显示用法
Usage: ./init.crs {stop|start|enable|disable}
# ./init.crs start
这次的错误信息有参考意义了:
/sbin/init.d/init.cssd[537]: /var/opt/oracle/scls_scr/rqtmsdb2/root/cssrun: Cannot
create the specified file.
Startup will be queued to init within 30 seconds.
错误日志显示CRS不能创建cssrun这个文件,
检查之:
# cd /var/opt/oracle/scls_scr/rqtmsdb2/root/
sh: /var/opt/oracle/scls_scr/rqtmsdb2/root/: not found.
咦,没有这个目录!
# cd /var/opt/oracle/scls_scr/
ls -l 一看就明白了:
# ls -l
total 0
drwxr-xr-x 4 root sys 96 Dec 31 2010 rqtmsdb1
因为这个系统是从第一个节点克隆过来的,所以这个本应该是rqtmsdb2的目录现在是rqtmsdb1,怪不得呢!
修改之:
# mv rqtmsdb1 rqtmsdb2
# ls -l
total 0
drwxr-xr-x 4 root sys 96 Dec 31 2010 rqtmsdb2
# cd rq*
# ls -l
total 16
drwxr-xr-x 2 orarac sys 96 Dec 31 2010 orarac
drwxr-xr-x 2 root sys 8192 Nov 17 09:55 root
# cd root
# ls -l
total 48
-rw-rw-rw- 1 root root 8 Nov 17 15:33 crsdboot
-rw-r--r-- 1 root sys 7 Dec 31 2010 crsstart
-rw-rw-rw- 1 root sys 6 Nov 17 15:33 cssrun
-rw-r--r-- 1 root sys 0 Nov 17 15:33 noclsmon
-rw-rw-rw- 1 root root 0 Nov 17 15:33 nooprocd
再次启动CRS:
# cd /sbin/init.d
#
# ./init.crs
start
Startup will be queued to init within 30 seconds.
# ps -ef|grep d.bin
root 18734 22410 1 02:22:49 pts/ta 0:00
grep d.bin
# ps -ef|grep d.bin
root 2059 1 0 22:03:36 ? 0:00 /ora_soft/oracle/product/crs/bin/crsd.bin
reboot
orarac 18782 2057 0 02:23:09 ? 0:00 /ora_soft/oracle/product/crs/bin/evmd.bin
orarac 19013 19012 0 02:23:14 ? 0:00 /ora_soft/oracle/product/crs/bin/ocssd.bin
# /ora_soft/oracle/product/crs/bin/crsctl
check crs
CSS appears healthy
CRS appears healthy
EVM appears healthy
# /ora_soft/oracle/product/crs/bin/crlctl
stop crs
sh: /ora_soft/oracle/product/crs/bin/crlctl: not
found.
# /ora_soft/oracle/product/crs/bin/crsctl
stop crs
Stopping resources.
Successfully stopped CRS resources
Stopping CSSD.
Shutting down CSS daemon.
Shutdown request successfully issued.
# ps -ef|grep d.bin
root 21987 22410 0 02:24:53 pts/ta 0:00
grep d.bin
# /ora_soft/oracle/product/crs/bin/crsctl
start crs
Attempting to start CRS stack
The CRS stack will be started shortly
# ps -ef|grep d.bin
root 23992 22410 0 02:32:59 pts/ta 0:00
grep d.bin
# ps -ef|grep d.bin
root 23995 22410 0 02:33:05 pts/ta 0:00
grep d.bin
# ps -ef|grep d.bin
root 21829 1 0 02:24:44 ? 0:00 /ora_soft/oracle/product/crs/bin/crsd.bin
reboot
orarac 24152 21817 0 02:33:18 ? 0:00 /ora_soft/oracle/product/crs/bin/evmd.bin
orarac 24299 24298 0 02:33:21 ? 0:00 /ora_soft/oracle/product/crs/bin/ocssd.bin
root 24577 22410 0 02:33:31 pts/ta 0:00
grep d.bin
# /ora_soft/oracle/product/crs/bin/crsctl
status
Unknown parameter: status
# /ora_soft/oracle/product/crs/bin/crsctl
check crs
CSS appears healthy
CRS appears healthy
EVM appears healthy
#
这次能够正常启动了!
回头检查第一个节点,这个节点HP工程师跟我说什么也没动过,我就信了,克隆一个系统嘛是对这个节点不用做任何改动,但是现实且很残酷!
命令敲下去:
# cd /sbin/init.d
#
# ./init.crs
start
Startup will be queued to init within 30 seconds.
等不到d.bin的进程,无任何反应,回头检查操作系统日志:
Nov 18 03:26:00 rqtmsdb1 syslog: Cluster
Ready Services waiting on dependencies. Diagnostics in /tmp/crsctl.2104.
Nov 18 03:26:00 rqtmsdb1 syslog: Cluster Ready Services waiting
on dependencies. Diagnostics in /tmp/crsctl.2116.
Nov 18 03:26:00 rqtmsdb1 syslog: Cluster Ready Services waiting
on dependencies. Diagnostics in /tmp/crsctl.2154.
Nov 18 03:34:16 rqtmsdb1 syslog: Cluster Ready Services waiting
on dependencies. Diagnostics in /tmp/crsctl.2154.
看来有些错误信息啊,其中的一个文件:
#cat /tmp/crsctl.2104
Failed 3 to bind listening endpoint:(ADDRESS=(PROTOCOL=tcp)(HOST=rqtmsdb1-priv))
#
无法绑定监听到PricateIP上,再去检查/etc/hosts文件,发现没有Pricate
IP!,只有第二个节点的Pricate IP,再去检查第二个节点的/etc/hosts文件,对比后添加第一个节点的Pricate IP :
192.168.0.1 rqtmsdb1-priv
没在开始去检查/etc/hosts文件真是失误啊!听到的一定要自己再确认一遍!又一次在RAC环境里载在/etc/hosts文件手里!!!之前在一个客户那里配置RAC,工程师给我将localhosts这个系统默认的东东去掉了,导致我在这个上面花了一天的时间才找到是没有localhosts导致的!
再次启动CRS,这次正常启动了!以为一切都好了,可以去睡觉了,没先到后面VIP还有问题,
crs_start -all 启动Cluste,报告不能启动,VIP起不来,后面的就都失败了,这个错误好办,之前解决过,先设置对VIP进行debug:
#/ora_soft/oracle/product/crs/bin/crsctl debug log
res "ora.rqtmsdb1.vip:5"
然后单独启动VIP资源:
# /ora_soft/oracle/product/crs/bin/srvctl start nodeapps -n rqtmsdb1
rqtmsdb1:ora.rqtmsdb1.vip:Sun
Nov 18 04:19:29 EAT 2012 [ 25193 ] Checking interface existance
rqtmsdb1:ora.rqtmsdb1.vip:Sun
Nov 18 04:19:29 EAT 2012 [ 25193 ] Calling
getifbyip
rqtmsdb1:ora.rqtmsdb1.vip:Sun
Nov 18 04:19:29 EAT 2012 [ 25193 ] getifbyip: started for 172.16.7.22
rqtmsdb1:ora.rqtmsdb1.vip:Sun
Nov 18 04:19:29 EAT 2012 [ 25193 ] Completed
getifbyip
rqtmsdb1:ora.rqtmsdb1.vip:Sun
Nov 18 04:19:29 EAT 2012 [ 25193 ] switched
to standby : start/check operation
rqtmsdb1:ora.rqtmsdb1.vip:Sun
Nov 18 04:19:33 EAT 2012 [ 25193 ] Completed
with initial interface test
rqtmsdb1:ora.rqtmsdb1.vip:Sun
Nov 18 04:19:33 EAT 2012 [ 25193 ] Broadcast = 172.16.7.255
rqtmsdb1:ora.rqtmsdb1.vip:Sun
Nov 18 04:19:33 EAT 2012 [ 25193 ] Interface tests
rqtmsdb1:ora.rqtmsdb1.vip:Sun
Nov 18 04:19:33 EAT 2012 [ 25193 ] checkIf: start for if=lan0
rqtmsdb1:ora.rqtmsdb1.vip:Sun
Nov 18 04:19:33 EAT 2012 [ 25193 ] checkIf: get default gw
rqtmsdb1:ora.rqtmsdb1.vip:Sun
Nov 18 04:19:33 EAT 2012 [ 25193 ] defaultgw: started
rqtmsdb1:ora.rqtmsdb1.vip:Sun
Nov 18 04:19:33 EAT 2012 [ 25193 ] defaultgw: completed
with
rqtmsdb1:ora.rqtmsdb1.vip:checkIf:
Default gateway is not defined (host=rqtmsdb1)
rqtmsdb1:ora.rqtmsdb1.vip:Interface
lan0 checked failed (host=rqtmsdb1)
rqtmsdb1:ora.rqtmsdb1.vip:Sun
Nov 18 04:19:33 EAT 2012 [ 25193 ] checkIf: end for if=lan0
rqtmsdb1:ora.rqtmsdb1.vip:Sun
Nov 18 04:19:33 EAT 2012 [ 25193 ] DEBUG: FAIL_WHEN_ALL_LINK_DOWN = 1
and IF_USING =
rqtmsdb1:ora.rqtmsdb1.vip:Invalid
parameters, or failed to bring up VIP (host=rqtmsdb1)
rqtmsdb1:ora.rqtmsdb1.vip:Sun
Nov 18 04:19:33 EAT 2012 [ 25341 ] Checking interface existance
rqtmsdb1:ora.rqtmsdb1.vip:Sun
Nov 18 04:19:33 EAT 2012 [ 25341 ] Calling
getifbyip
rqtmsdb1:ora.rqtmsdb1.vip:Sun
Nov 18 04:19:33 EAT 2012 [ 25341 ] getifbyip: started for 172.16.7.22
rqtmsdb1:ora.rqtmsdb1.vip:Sun
Nov 18 04:19:33 EAT 2012 [ 25341 ] Completed
getifbyip
rqtmsdb1:ora.rqtmsdb1.vip:Sun
Nov 18 04:19:33 EAT 2012 [ 25341 ] switched
to standby : start/check operation
rqtmsdb1:ora.rqtmsdb1.vip:Sun
Nov 18 04:19:37 EAT 2012 [ 25341 ] Completed
with initial interface test
rqtmsdb1:ora.rqtmsdb1.vip:Sun
Nov 18 04:19:37 EAT 2012 [ 25341 ] Broadcast = 172.16.7.255
rqtmsdb1:ora.rqtmsdb1.vip:Sun
Nov 18 04:19:37 EAT 2012 [ 25341 ] Performing
CRS_STAT testing
rqtmsdb1:ora.rqtmsdb1.vip:Sun
Nov 18 04:19:37 EAT 2012 [ 25341 ] Completed
CRS_STAT testing
rqtmsdb1:ora.rqtmsdb1.vip:Sun
Nov 18 04:19:37 EAT 2012 [ 25341 ] Interface tests
rqtmsdb1:ora.rqtmsdb1.vip:Sun
Nov 18 04:19:37 EAT 2012 [ 25341 ] checkIf: start for if=lan0
rqtmsdb1:ora.rqtmsdb1.vip:Sun
Nov 18 04:19:37 EAT 2012 [ 25341 ] checkIf: get default gw
rqtmsdb1:ora.rqtmsdb1.vip:Sun
Nov 18 04:19:37 EAT 2012 [ 25341 ] defaultgw: started
rqtmsdb1:ora.rqtmsdb1.vip:Sun
Nov 18 04:19:37 EAT 2012 [ 25341 ] defaultgw: completed
with
rqtmsdb1:ora.rqtmsdb1.vip:checkIf: Default gateway is not
defined (host=rqtmsdb1)
rqtmsdb1:ora.rqtmsdb1.vip:Interface lan0 checked failed (host=rqtmsdb1)
rqtmsdb1:ora.rqtmsdb1.vip:Sun
Nov 18 04:19:37 EAT 2012 [ 25341 ] checkIf: end for if=lan0
rqtmsdb1:ora.rqtmsdb1.vip:Sun
Nov 18 04:19:37 EAT 2012 [ 25341 ] DEBUG: FAIL_WHEN_ALL_LINK_DOWN = 1
and IF_USING =
rqtmsdb1:ora.rqtmsdb1.vip:Invalid
parameters, or failed to bring up VIP (host=rqtmsdb1)
CRS-1006: No more members to consider
CRS-0215: Could not start resource 'ora.rqtmsdb1.vip'.
rqtmsdb1:ora.rqtmsdb1.vip:Sun
Nov 18 04:19:48 EAT 2012 [ 25801 ] Checking interface existance
rqtmsdb1:ora.rqtmsdb1.vip:Sun
Nov 18 04:19:48 EAT 2012 [ 25801 ] Calling
getifbyip
rqtmsdb1:ora.rqtmsdb1.vip:Sun
Nov 18 04:19:48 EAT 2012 [ 25801 ] getifbyip: started for 172.16.7.22
rqtmsdb1:ora.rqtmsdb1.vip:Sun
Nov 18 04:19:48 EAT 2012 [ 25801 ] Completed
getifbyip
rqtmsdb1:ora.rqtmsdb1.vip:Sun
Nov 18 04:19:48 EAT 2012 [ 25801 ] switched
to standby : start/check operation
rqtmsdb1:ora.rqtmsdb1.vip:Sun
Nov 18 04:19:52 EAT 2012 [ 25801 ] Completed
with initial interface test
rqtmsdb1:ora.rqtmsdb1.vip:Sun
Nov 18 04:19:52 EAT 2012 [ 25801 ] Broadcast = 172.16.7.255
rqtmsdb1:ora.rqtmsdb1.vip:Sun
Nov 18 04:19:52 EAT 2012 [ 25801 ] Interface tests
rqtmsdb1:ora.rqtmsdb1.vip:Sun
Nov 18 04:19:52 EAT 2012 [ 25801 ] checkIf: start for if=lan0
rqtmsdb1:ora.rqtmsdb1.vip:Sun
Nov 18 04:19:52 EAT 2012 [ 25801 ] checkIf: get default gw
rqtmsdb1:ora.rqtmsdb1.vip:Sun
Nov 18 04:19:52 EAT 2012 [ 25801 ] defaultgw: started
rqtmsdb1:ora.rqtmsdb1.vip:Sun
Nov 18 04:19:52 EAT 2012 [ 25801 ] defaultgw: completed
with
rqtmsdb1:ora.rqtmsdb1.vip:checkIf: Default gateway is not
defined (host=rqtmsdb1)
rqtmsdb1:ora.rqtmsdb1.vip:Interface lan0 checked failed (host=rqtmsdb1)
rqtmsdb1:ora.rqtmsdb1.vip:Sun
Nov 18 04:19:52 EAT 2012 [ 25801 ] checkIf: end for if=lan0
rqtmsdb1:ora.rqtmsdb1.vip:Sun
Nov 18 04:19:52 EAT 2012 [ 25801 ] DEBUG: FAIL_WHEN_ALL_LINK_DOWN = 1
and IF_USING =
rqtmsdb1:ora.rqtmsdb1.vip:Invalid
parameters, or failed to bring up VIP (host=rqtmsdb1)
rqtmsdb1:ora.rqtmsdb1.vip:Sun
Nov 18 04:19:52 EAT 2012 [ 25949 ] Checking interface existance
rqtmsdb1:ora.rqtmsdb1.vip:Sun
Nov 18 04:19:52 EAT 2012 [ 25949 ] Calling
getifbyip
rqtmsdb1:ora.rqtmsdb1.vip:Sun
Nov 18 04:19:52 EAT 2012 [ 25949 ] getifbyip: started for 172.16.7.22
rqtmsdb1:ora.rqtmsdb1.vip:Sun
Nov 18 04:19:52 EAT 2012 [ 25949 ] Completed
getifbyip
rqtmsdb1:ora.rqtmsdb1.vip:Sun
Nov 18 04:19:52 EAT 2012 [ 25949 ] switched
to standby : start/check operation
rqtmsdb1:ora.rqtmsdb1.vip:Sun
Nov 18 04:19:56 EAT 2012 [ 25949 ] Completed
with initial interface test
rqtmsdb1:ora.rqtmsdb1.vip:Sun
Nov 18 04:19:56 EAT 2012 [ 25949 ] Broadcast = 172.16.7.255
rqtmsdb1:ora.rqtmsdb1.vip:Sun
Nov 18 04:19:56 EAT 2012 [ 25949 ] Performing
CRS_STAT testing
rqtmsdb1:ora.rqtmsdb1.vip:Sun
Nov 18 04:19:56 EAT 2012 [ 25949 ] Completed
CRS_STAT testing
rqtmsdb1:ora.rqtmsdb1.vip:Sun
Nov 18 04:19:56 EAT 2012 [ 25949 ] Interface tests
rqtmsdb1:ora.rqtmsdb1.vip:Sun
Nov 18 04:19:56 EAT 2012 [ 25949 ] checkIf: start for if=lan0
rqtmsdb1:ora.rqtmsdb1.vip:Sun
Nov 18 04:19:56 EAT 2012 [ 25949 ] checkIf: get default gw
rqtmsdb1:ora.rqtmsdb1.vip:Sun
Nov 18 04:19:56 EAT 2012 [ 25949 ] defaultgw: started
rqtmsdb1:ora.rqtmsdb1.vip:Sun
Nov 18 04:19:56 EAT 2012 [ 25949 ] defaultgw: completed
with
rqtmsdb1:ora.rqtmsdb1.vip:checkIf: Default gateway is not
defined (host=rqtmsdb1)
rqtmsdb1:ora.rqtmsdb1.vip:Interface lan0 checked failed (host=rqtmsdb1)
rqtmsdb1:ora.rqtmsdb1.vip:Sun
Nov 18 04:19:56 EAT 2012 [ 25949 ] checkIf: end for if=lan0
rqtmsdb1:ora.rqtmsdb1.vip:Sun
Nov 18 04:19:56 EAT 2012 [ 25949 ] DEBUG: FAIL_WHEN_ALL_LINK_DOWN = 1
and IF_USING =
rqtmsdb1:ora.rqtmsdb1.vip:Invalid
parameters, or failed to bring up VIP (host=rqtmsdb1)
CRS-0215: Could not start resource 'ora.rqtmsdb1.LISTENER_RQTMSDB1.lsnr'.
#
没有配置默认网关,在检查IP地址配置情况,发现,IP地址是配置在lan2上的,一问才知道,由于lan0经常出问题,这次改到lan2,不早说啊,nnd!!
VIP在启动的时候回去ping默认网关,如果不通,那么VIP是起不来的。HP工程师配置好默认网关后,修改VIP到lan0上去:
先删除之:
su - oracle
oifcfg delif -global
然后再重新配置:
$oifcfg setif -global lan2/172.16.7.0:public
$oifcfg setif -global lan3/192.168.0.0:cluster_interconnect
#/ora_soft/oracle/product/crs/bin/srvctl modify nodeapps -n rqtmsdb2 -A 172.16.7.23/255.255.255.0/lan2
#/ora_soft/oracle/product/crs/bin/srvctl modify nodeapps -n rqtmsdb1 -A 172.16.7.22/255.255.255.0/lan2
修改完成后再次crs_start -all ,RAC启动成功,手工,睡觉!
http://blog.chinaunix.net/uid-26896647-id-3417998.html
相关文章推荐
- 一次RAC共享磁盘映射问题导致RAC异常重启的故障处理过程
- 一例加载Crystal Print Control控件故障的处理过程
- 【故障处理】一次RAC故障处理过程
- 服务器故障处理一例
- SQL Server 事务复制故障处理一例
- DNS故障处理一例(转)
- RedFlag HA启动失败故障处理一例
- ds4700处理黄灯故障过程一例
- MySQL故障处理一例_Another MySQL daemon already running with the same unix socket
- impdp hang住故障处理一例
- ClearCase故障处理一例:解决eclipsed状态的私有文件不能被删除
- 山东省枣庄市台儿庄区云平台运维故障处理一例
- 处理mysql复制故障一例 推荐
- RAC 未从SPfile启动故障处理
- RAC 下处理ORA-12545错误一例
- ORA-00054 故障处理一例
- SCCM软件分发故障处理一例-重新播发未生效
- ORA-00054 故障处理一例
- [故障处理] MongoDB Assertion: 10334:BSONObj size: 1852142352 (0x1073656E) is invalid 故障处理一例
- ORACLE 10G RAC 节点自动重启故障处理