您的位置:首页 > 大数据 > 人工智能

AIX 5L上安装RAC的遭遇

2004-09-03 15:02 525 查看
验证OS的patch(在两个节点):
[m80a]/> instfix -ik IY28766
All filesets for IY28766 were found.
[m80a]/> instfix -ik IY28949
There was no data for IY28949 in the fix database.
[m80a]/> instfix -ik IY29965
All filesets for IY29965 were found.
[m80a]/> instfix -ik IY30150
All filesets for IY30150 were found.
[m80a]/> instfix -ik IY22854
All filesets for IY22854 were found.
[m80a]/> instfix -ik IY26778
All filesets for IY26778 were found.
[m80a]/> instfix -ik IY28111
All filesets for IY28111 were found.
[m80a]/> instfix -ik IY21047
All filesets for IY21047 were found.
[m80a]/>

安装JDK和JRE必需的Patchs
如果使用HTTP SERVER: IY30886 JRE1.1.8
如果不使用HTTP SERVER: IY31033 JDK1.3.1

或者到ibm网站download专门的JDK131_64bit的包
安装后会有一个专门的目录,安装oracle的时候就指定这个jdk:
[m80b]/usr/java13_64> pwd
/usr/java13_64
[m80b]/usr/java13_64>

[m80a]/> lslpp -l |grep rsct
rsct.basic.hacmp 2.2.1.20 APPLIED RSCT Basic Function (HACMP/ES
rsct.basic.rte 2.2.1.20 APPLIED RSCT Basic Function
rsct.clients.rte 99.99.999.999 COMMITTED Supersede Entry - Not really
rsct.compat.basic.hacmp 2.2.1.20 APPLIED RSCT Event Management Basic
rsct.compat.basic.rte 2.2.1.20 APPLIED RSCT Event Management Basic
rsct.compat.clients.hacmp
rsct.compat.clients.rte 2.2.1.20 APPLIED RSCT Event Management Client
rsct.core.auditrm 2.2.1.20 APPLIED RSCT Audit Log Resource
rsct.core.errm 2.2.1.20 APPLIED RSCT Event Response Resource
rsct.core.fsrm 2.2.1.20 APPLIED RSCT File System Resource
rsct.core.gui 2.2.1.20 APPLIED RSCT Graphical User Interface
rsct.core.hostrm 2.2.1.20 APPLIED RSCT Host Resource Manager
rsct.core.rmc 2.2.1.20 APPLIED RSCT Resource Monitoring and
rsct.core.sec 2.2.1.20 APPLIED RSCT Security
rsct.core.sensorrm 2.2.1.20 APPLIED RSCT Sensor Resource Manager
rsct.core.sr 2.2.1.20 APPLIED RSCT Registry
rsct.core.utils 2.2.1.20 APPLIED RSCT Utilities
rsct.msg.EN_US.core.auditrm
rsct.msg.EN_US.core.errm 2.2.0.0 COMMITTED RSCT Event Response RM Msgs -
rsct.msg.EN_US.core.fsrm 2.2.0.0 COMMITTED RSCT File System RM Msgs -
rsct.msg.EN_US.core.gui 2.2.0.0 COMMITTED RSCT GUI Msgs - U.S. English
rsct.msg.EN_US.core.hostrm
rsct.msg.EN_US.core.rmc 2.2.0.0 COMMITTED RSCT RMC Msgs - U.S. English
rsct.msg.EN_US.core.sec 2.2.0.0 COMMITTED RSCT Security Msgs - U.S.
rsct.msg.EN_US.core.sensorrm
rsct.msg.EN_US.core.sr 2.2.0.0 COMMITTED RSCT Registry Msgs - U.S.
rsct.msg.EN_US.core.utils 2.2.0.0 COMMITTED RSCT Utilities Msgs - U.S.
rsct.msg.en_US.core.auditrm
rsct.msg.en_US.core.errm 2.2.0.0 COMMITTED RSCT Event Response RM Msgs -
rsct.msg.en_US.core.fsrm 2.2.0.0 COMMITTED RSCT File System RM Msgs -
rsct.msg.en_US.core.gui 2.2.0.0 COMMITTED RSCT GUI Msgs - U.S. English
rsct.msg.en_US.core.hostrm
rsct.msg.en_US.core.rmc 2.2.0.0 COMMITTED RSCT RMC Msgs - U.S. English
rsct.msg.en_US.core.sec 2.2.0.0 COMMITTED RSCT Security Msgs - U.S.
rsct.msg.en_US.core.sensorrm
rsct.msg.en_US.core.sr 2.2.0.0 COMMITTED RSCT Registry Msgs - U.S.
rsct.msg.en_US.core.utils 2.2.0.0 COMMITTED RSCT Utilities Msgs - U.S.
rsct.basic.rte 2.2.1.20 APPLIED RSCT Basic Function
rsct.compat.basic.rte 2.2.1.0 COMMITTED RSCT Event Management Basic
rsct.core.rmc 2.2.1.20 APPLIED RSCT Resource Monitoring and
rsct.core.sec 2.2.1.20 APPLIED RSCT Security
rsct.core.sr 2.2.1.20 APPLIED RSCT Registry
rsct.core.utils 2.2.1.20 APPLIED RSCT Utilities
[m80a]/>

[m80a]/> lslpp -l |grep -i hacmp
cluster.doc.en_US.html 4.4.1.4 APPLIED HACMP Web-based HTML
cluster.doc.en_US.pdf 4.4.1.3 APPLIED HACMP PDF Documentation - U.S.
cluster.doc.en_US.ps 4.4.1.0 COMMITTED HACMP Postscript Documentation
rsct.basic.hacmp 2.2.1.20 APPLIED RSCT Basic Function (HACMP/ES
rsct.compat.basic.hacmp 2.2.1.20 APPLIED RSCT Event Management Basic
Function (HACMP/ES Support)
rsct.compat.clients.hacmp
Function (HACMP/ES Support)
[m80a]/>


===============验证dba用户组和hagsuser组==========================
在节点1:
[m80a]/oracle> tail /etc/group
。。。
dba:!:300

racle
hagsuser:!:301

racle,root
。。。
[m80a]/oracle>

在节点2:
[m80b]/oracle> tail /etc/group
。。。
dba:!:300

racle
hagsuser:!:301:root,oracle
。。。。
[m80b]/oracle>

===============验证oracle用户==========================
在节点1:
[m80a]/oracle> cat /etc/passwd| grep oracle
oracle:!:300:300::/oracle:/usr/bin/ksh
[m80a]/oracle>

在节点2:
[m80b]/oracle> cat /etc/passwd| grep oracle
oracle:!:300:300::/oracle:/usr/bin/ksh
[m80b]/oracle>

==================配置网络=======================
[m80a]/> tail /etc/hosts
127.0.0.1 loopback localhost # loopback (lo0) name/address

192.168.2.215 m80a m80a
10.1.1.1 m80a_stb m80a
192.168.2.216 m80b m80b
10.1.1.2 m80b_stb m80b
192.168.2.205 bepone
192.168.2.207 fepone
192.168.2.208 feptwo
192.168.2.209 ibm170
[m80a]/>

===================使用root用户和oracle用户验证通信:===================
1。rlogin
在节点1:
[m80a]/> su - oracle
[m80a]/oracle> id
uid=300(oracle) gid=300(dba) groups=1(staff),301(hagsuser)
[m80a]/oracle> rlogin m80b
*******************************************************************************
* *
* *
* Welcome to AIX Version 5.1! *
* *
* *
* Please see the README file in /usr/lpp/bos for information pertinent to *
* this release of the AIX Operating System. *
* *
* *
*******************************************************************************
Last unsuccessful login: Fri Dec 20 09:28:05 BEIST 2002 on /dev/dtlogin/_0
Last login: Fri Dec 20 17:00:24 BEIST 2002 on 192_168_2_89_0 from 192.168.2.89:0

[m80b]/oracle> exit
Connection closed.
[m80a]/oracle>

在节点2:
[m80b]/> su - oracle
[m80b]/oracle> id
uid=300(oracle) gid=300(dba) groups=1(staff),301(hagsuser)
[m80b]/oracle> rlogin m80a
*******************************************************************************
* *
* *
* Welcome to AIX Version 5.1! *
* *
* *
* Please see the README file in /usr/lpp/bos for information pertinent to *
* this release of the AIX Operating System. *
* *
* *
*******************************************************************************
Last unsuccessful login: Thu Dec 19 17:50:11 BEIST 2002 on /dev/dtlogin/_0
Last login: Fri Dec 20 17:01:26 BEIST 2002 on 192_168_2_89_0 from 192.168.2.89:0

[m80a]/oracle> exit
Connection closed.
[m80b]/oracle>

2。rcp和rsh
在节点1:
[m80a]/oracle> id
uid=300(oracle) gid=300(dba) groups=1(staff),301(hagsuser)
[m80a]/oracle> rcp root.sh m80b:/tmp
[m80a]/oracle> rsh m80b ls -l /tmp/root.sh
-rwxr-xr-x 1 oracle dba 6386 Dec 20 17:17 /tmp/root.sh
[m80a]/oracle>

在节点2:
[m80b]/oracle> id
uid=300(oracle) gid=300(dba) groups=1(staff),301(hagsuser)
[m80b]/oracle> rcp root.sh m80a:/tmp/root.sh
[m80b]/oracle> rsh m80a ls -l /tmp/root.sh
-rwxr-xr-x 1 oracle dba 6386 Dec 20 17:18 /tmp/root.sh
[m80b]/oracle>


============注意配置HA时,节点关系必须是concurrent模式:=============
Cluster Configuration
Cluster Resources
Show Cluster Resources
Show Resource Information by Resource Group
x Select Resource Group Name x
x x
x Move cursor to desired item and press Enter. x
x x
x rac x
x rsg_m80a x
x rsg_m80b x

选择资源组,我的叫rac
查看信息:
Node Relationship concurrent

=======================打开aio:===============
[m80a]/> lsdev -Cc aio
aio0 Available Asynchronous I/O
[m80a]/>

===============启动HA:===================
在两个节点上启动HA:
启动步骤:
1。在不启动HA的情况下,先做两种同步,一种是Topology,一种是Resources;
2。启动HA(只要启动Startup Cluster Information Daemon就够了)
(对HA的操作最忌讳的是两个节点同时做,尤其是做同步的时候,最好是一个节点作完了,
再做另一个节点,否则HA容易乱)

Start Cluster Services

Type or select values in entry fields.
Press Enter AFTER making all desired changes.

[Entry Fields]
* Start now, on system restart or both now +

BROADCAST message at startup? false +
Startup Cluster Lock Services? false +
Startup Cluster Information Daemon? true +
Reacquire resources after forced down ? false +

===================验证HA:===================
[m80a]/> tail -f /tmp/hacmp.out

End time: Fri Dec 20 17:12:26 2002

Action: Resource: Script Name:
----------------------------------------------------------------------------
Resource group online: rac node_up_local_complete
Search on: Fri.Dec.20.17:12:24.BEIST.2002.node_up_local_complete.rac.ref
Resource group online: rsg_m80a node_up_local_complete
Search on: Fri.Dec.20.17:12:25.BEIST.2002.node_up_local_complete.rsg_m80a.ref
----------------------------------------------------------------------------
[m80a]/>
[m80a]/>
[m80a]/>
[m80a]/>

===================或者:===================
[m80a]/> lsvg -o
vga
oravg
rootvg
[m80a]/>

===================或者:===================
[m80a]/> lssrc -g cluster
Subsystem Group PID Status
clstrmgrES cluster 770276 active
clsmuxpdES cluster 794858 active
clinfoES cluster 762110 active
[m80a]/>

===================或者:===================
[m80a]/> lssrc -a |egrep 'svcs|ES'
topsvcs topsvcs 385190 active
grpsvcs grpsvcs 680170 active
grpglsm grpsvcs 786632 active
emsvcs emsvcs 720966 active
clstrmgrES cluster 770276 active
clsmuxpdES cluster 794858 active
clinfoES cluster 762110 active
emaixos emsvcs inoperative
[m80a]/>

===============验证cluster用户=====================
[m80a]/> lssrc -ls grpsvcs
Subsystem Group PID Status
grpsvcs grpsvcs 704520 active
4 locally-connected clients. Their PIDs:
803028(hagsglsmd) 778476(haemd) 745688(clstrmgr) 827604(gsclvmd)
HA Group Services domain information:
Domain established by node 2
Number of groups known locally: 6
Number of Number of local
Group name providers providers/subscribers
s001IGKU0009G000000U6SUEB74 2 1 0
hagsglsm_cfg 2 1 0
ha_em_peers 2 1 0
CLRESMGRD_60 2 1 0
CLSTRMGR_60 2 1 0
d001IGKU0009G000000U6SUEB74 2 1 0
[m80a]/>


==========在运行runInstaller.sh以前,以root做:=============
[m80a]/cdrom> ./rootpre.sh

Configuring Asynchronous I/O...
Asynchronous I/O is already defined
Adding r/w perms for group hagsuser to /var/ha/soc/grpsvcsdsocket.TEST...success
[m80a]/cdrom>
如果有问题,参见下面的部分手工做(配置HAGS)

===================建立安装目录==========================
按照提示,以root运行/tmp/orainstRoot.sh:
[m80a]/tmp> ./orainstRoot.sh
Creating Oracle Inventory pointer file (/etc/oraInst.loc)
Changing groupname of /oracle/oraInventory to dba.
[m80a]/tmp>
如果有问题,可以手工做,或者参考下面的创建数据库之前的检查信息

===============在创建数据库之前,需要确认的信息(在两个节点都要确认):=================
[m80a]/oracle> ls -l /dev/rsrvconfig
crw-rw-r-- 1 oracle dba 45, 21 Dec 17 20:59 /dev/rsrvconfig
[m80a]/oracle> ls -l /etc/srvConfig.loc
-rw-r--r-- 1 oracle dba 30 Dec 20 14:35 /etc/srvConfig.loc
[m80a]/oracle> cat /etc/srvConfig.loc
srvconfig_loc=/dev/rsrvconfig
[m80a]/oracle>

[m80a]/oracle> ls -l /etc/oratab
-rw-rw-r-- 1 oracle dba 688 Dec 20 15:25 /etc/oratab
[m80a]/oracle> tail /etc/oratab
#
# The first and second fields are the system identifier and home
# directory of the database respectively. The third filed indicates
# to the dbstart utility that the database should , "Y", or should not,
# "N", be brought up at system boot time.
#
# Multiple entries with the same $ORACLE_SID are not allowed.
#
#
*:/oracle:N
[m80a]/oracle> ls -l /etc/oraInst.loc
-rw-r--r-- 1 oracle dba 50 Dec 20 14:35 /etc/oraInst.loc
[m80a]/oracle> cat /etc/oraInst.loc
inventory_loc=/oracle/oraInventory
inst_group=dba
[m80a]/oracle>

[m80a]/oracle> ls -l /var/opt/oracle/srvConfig.loc
-rw-rw-r-- 1 oracle dba 30 Dec 20 14:12 /var/opt/oracle/srvConfig.loc
[m80a]/oracle> cat /var/opt/oracle/srvConfig.loc
srvconfig_loc=/dev/rsrvconfig
[m80a]/oracle>

不放心的话,再手工执行下列命令(在两个节点):
(配置HAGS,参见metalink:2064876.102)
chmod a+x /usr/sbin/cluster/utilities/cldomain
chgrp hagsuser /var/ha/soc/grpsvcsdsocket.`/usr/sbin/cluster/utilities/cldomain`
chmod g+w /var/ha/soc/grpsvcsdsocket.`/usr/sbin/cluster/utilities/cldomain`

测试HAGS:
[m80a]/> /usr/sbin/cluster/utilities/cllsif
Adapter Type Network Net Type Attribute Node IP Address Hardware Address Interface Name Global Name Netmask

m80a service ether_net ether public m80a 192.168.2.215 en0 255.255.255.0
m80a_stb standby ether_net ether public m80a 10.1.1.1 en1 255.255.255.0
m80a_tty0 service rs232_net rs232 serial m80a /dev/tty0
m80b service ether_net ether public m80b 192.168.2.216 en0 255.255.255.0
m80b_stb standby ether_net ether public m80b 10.1.1.2 en1 255.255.255.0
m80b_tty0 service rs232_net rs232 serial m80b /dev/tty0
[m80a]/>

===========在oracle code安装完成(2个节点),执行/oracle/root.sh==============
[m80b]/oracle> ./root.sh
Running Oracle9 root.sh script...

The following environment variables are set as:
ORACLE_OWNER= oracle
ORACLE_HOME= /oracle

Enter the full pathname of the local bin directory: [/usr/local/bin]:
The file "dbhome" already exists in /usr/local/bin. Overwrite it? (y/n)
: y
Copying dbhome to /usr/local/bin ...
The file "oraenv" already exists in /usr/local/bin. Overwrite it? (y/n)
: y
Copying oraenv to /usr/local/bin ...
The file "coraenv" already exists in /usr/local/bin. Overwrite it? (y/n)
: y
Copying coraenv to /usr/local/bin ...

Creating /etc/oratab file...
Adding entry to /etc/oratab file...
Entries will be added to the /etc/oratab file as needed by
Database Configuration Assistant when a database is created
Finished running generic part of root.sh script.
Now product-specific root actions will be performed.
[m80b]/oracle>
如果出现问题,手工做也可以


如果上述问题都解决了,在安装软件期间应该不会有太大问题,如果有,出现问题的可能性比较多的是:
1。解决 PRKR-1023 : 在两个节点都做
$ ln -sf /etc/srvConfig.loc /oracle/srvm/config/srvConfig.loc

2。解决 PRKR-1064 :在两个节点都做
$ su
root's Password:
# mkdir /var/opt/oracle
# chown oracle:dba /var/opt/oracle
# chmod -R 777 /var/opt/oracle
# ln -sf /etc/srvConfig.loc /var/opt/oracle/srvConfig.loc
# ls -l /var/opt/oracle/srvConfig.loc
lrwxrwxrwx 1 root system 18 Nov 27 15:18 /var/opt/oracle/srvConfig.loc -> /etc/srvConfig.loc
#

# more /var/opt/oracle/srvConfig.loc
srvconfig_loc=/dev/rsrvconfig
# ls -l /dev/rsrvconfig
crw-rw-r-- 1 oracle dba 45, 21 Nov 25 14:52 /dev/rsrvconfig
#

# dd if=/dev/rsrvconfig of=/dev/null bs=8192
16384+0 records in
16384+0 records out
#

3。报jre错:
[m80a]/oracle> srvconfig -init -f
jre: 2520-014 Provider token 0 does not exist.
jre: 2520-014 Provider token 0 does not exist.
[m80a]/oracle>
解决方法:
安装JDK和JRE必需的Patchs
如果使用HTTP SERVER: IY30886 JRE1.1.8
如果不使用HTTP SERVER: IY31033 JDK1.3.1

创建数据库的时候报错(发生在5%,启动实例的时候)
*
ERROR at line 1:
ORA-03113: end-of-file on communication channel

挑aio,AIX 5L 的默认值是1--100,挑的时候,可以以10为单位累加,
注意:
1。两个节点要一致
2。挑正aio之后需要reboot,然后起HA,直到不报错为止。
(我的机器上安装了informix,没装以前MINIMUM number of servers=10就可以了,现在看来需要20,呵呵)

Change / Show Characteristics of Asynchronous I/O

Type or select values in entry fields.
Press Enter AFTER making all desired changes.

[Entry Fields]
MINIMUM number of servers [20] #
MAXIMUM number of servers [100] #
Maximum number of REQUESTS [8192] #
Server PRIORITY [39] #
STATE to be configured at system restart available +
State of fast path enable +
内容来自用户分享和网络整理,不保证内容的准确性,如有侵权内容,可联系管理员处理 点击这里给我发消息
标签: