您的位置:首页 > 其它

11gR2 RAC启用iptables导致节点宕机问题处理

2013-08-20 01:48 477 查看
通常,在安装数据库时,绝大多数都是要求把selinux及iptables关闭,然后再进行安装的。但是在运营商的系统中,很多安全的因素,需要将现网的数据库主机上的iptables开启的。

在开启iptables时就要注意了,比如一RAC中的hosts配置如下:

192.168.142.115 subsdb1

192.168.142.117 subsdb1-vip

10.0.0.115 subsdb1-priv

192.168.142.116 subsdb2

192.168.142.118 subsdb2-vip

10.0.0.116 subsdb2-priv

192.168.142.32 db-scan

那么理所当然的要将上面的IP都要放通的。但是在实际操作中,已经放通了上面的IP,结果数据库一的个实例宕掉了。

看看数据库的alert日志:

Tue Aug 20 00:29:40 2013

IPC Send timeout detected. Sender: ospid 8284 [oracle@subsdb2 (LMD0)]

Receiver: inst 1 binc 1740332689 ospid 15851

IPC Send timeout to 1.0 inc 10 for msg type 65521 from opid 12

Tue Aug 20 00:29:48 2013

IPC Send timeout detected. Sender: ospid 8276 [oracle@subsdb2 (PING)]

Receiver: inst 2 binc 1801834534 ospid 8276

Tue Aug 20 00:29:52 2013

Detected an inconsistent instance membership by instance 2

Errors in file /oracle/app/oracle/diag/rdbms/gdordb/GDORDB2/trace/GDORDB2_lmon_8282.trc (incident=784092):

ORA-29740: evicted by instance number 2, group incarnation 12

Incident details in: /oracle/app/oracle/diag/rdbms/gdordb/GDORDB2/incident/incdir_784092/GDORDB2_lmon_8282_i784092.trc

Use ADRCI or Support Workbench to package the incident.

See Note 411.1 at My Oracle Support for error and packaging details.

Errors in file /oracle/app/oracle/diag/rdbms/gdordb/GDORDB2/trace/GDORDB2_lmon_8282.trc:

ORA-29740: evicted by instance number 2, group incarnation 12

LMON (ospid: 8282): terminating the instance due to error 29740

Tue Aug 20 00:29:54 2013

ORA-1092 : opitsk aborting process

Tue Aug 20 00:29:54 2013

License high water mark = 29

Tue Aug 20 00:29:57 2013

System state dump requested by (instance=2, osid=8282 (LMON)), summary=[abnormal instance termination].

System State dumped to trace file /oracle/app/oracle/diag/rdbms/gdordb/GDORDB2/trace/GDORDB2_diag_8272.trc

Instance terminated by LMON, pid = 8282

USER (ospid: 31106): terminating the instance

Instance terminated by USER, pid = 31106

单纯从上面来看,初步可以断定是内部通信有问题,但是如何解决?

但再从数据库的alert和ASM实例的alert日志中都有这样的信息:

Private Interface 'bond2:1' configured from GPnP for use as a private interconnect.

[name='bond2:1', type=1, ip=169.254.148.209, mac=00-25-b5-00-00-67, net=169.254.0.0/16, mask=255.255.0.0, use=haip:cluster_interconnect/62]

Public Interface 'bond0' configured from GPnP for use as a public interface.

[name='bond0', type=1, ip=192.168.142.116, mac=00-25-b5-00-01-cb, net=192.168.142.0/24, mask=255.255.255.0, use=public/1]

Picked latch-free SCN scheme 3

从这个信息来看,RAC的内部通信还要用到net=169.254.0.0/16的IP,再从MOS Doc ID 1383737.1也有这样的说明,最后用ifconfig查到了RAC的两个节点中使用到的169网段的IP为:

169.254.122.59

169.254.148.209

在iptables中放通了这两个IP后,集群正常。
内容来自用户分享和网络整理,不保证内容的准确性,如有侵权内容,可联系管理员处理 点击这里给我发消息
标签: