您的位置:首页 > 其它

11g R2 节点系统重建后,删除节点及添加节点 过程和问题解决

2016-07-05 23:19 489 查看
故障现象:
http://www.santongit.com/thread-12327-1-1.html 
  一个RAC数据库,两个节点,RedHat 6.3_X64的系统,因为业务问题,节点2的服务器的系统进行了重装。

  现需要重建节点2 。
节点的重建
  一:先从集群中清除节点2的信息

      因为节点2服务器系统已经重装所以在清除节点时,在RAC上清除本地的操作就不需要操作了。直接在节点1上面

    从集群中清除节点2的信息

    (1):

        [root@racdb1 ~]# olsnodes -t –s                  #####查看集群中的节点   

        [root@racdb1 ~]# crsctl unpin css -n racdb2       #####在所有保留的节点上执行

        

    (2):删除节点2的数据库实例 使用dbca

     [oracle@racdb1 ~]$ dbca –图形界面

     验证racdb2实例已被删除

        查看活动的实例:

        [oracle@racdb1 ~]$ sqlplus / as sysdba

        SQL> select thread#,status,instance from v$thread;

       

       注:此过程可能报错,因为在节点2上已经重装系统,在DBCA删除实例时无法找到节点2上的相应文件。只要保证

           在数据库中查不到racdb2的实例即可

       

          查看库的配置:

           [root@racdb1 ~]# srvctl config database -d orcl

           

     (3):在racdb1节点上停止racdb2节点NodeApps

           

          [oracle@racdb1 bin]$ srvctl stop nodeapps -n racdb2 -f

                           

     (4):在保留节点使用oracle用户更新集群列表

            在每个保留的节点上执行:----------因为这是两个节点所以只在racdb1上执行就可以了

            [root@racdb1 ~]# su – oracle

            [oracle@racdb1 ~]$ $ORACLE_HOME/oui/bin/runInstaller -updateNodeList ORACLE_HOME=$ORACLE_HOME 

            “CLUSTER_NODES={racdb1}”
       注:此时会报错,因为这是集群需要在racdb2上执行相关语句。报如下错误:

          SEVERE: oracle.sysman.oii.oiip.oiipg.OiipgRemoteOpsException: Error occured while trying to run Unix command /u01/app/11.2.0/grid/oui/bin/../bin/runInstaller  -paramFile /u01/app/11.2.0/grid/oui/bin/../clusterparam.ini  -silent -ignoreSysPrereqs -updateNodeList
-noClusterEnabled ORACLE_HOME=/u01/app/11.2.0/grid CLUSTER_NODES=racdb1,racdb2 CRS=true  "INVENTORY_LOCATION=/u01/app/oraInventory" LOCAL_NODE=racdb2 -remoteInvocation -invokingNodeName racdb1 -logFilePath "/u01/app/oraInventory/logs" -timestamp 2014-12-03_11-23-57PM
on nodes racdb2. [PRKC-1044 : Failed to check remote command execution setup for node racdb2 using shells /usr/bin/ssh and /usr/bin/rsh 

  File "/usr/bin/rsh" does not exist on node "racdb2"

  No RSA host key is known for racdb2 and you have requested strict checking.Host key verification failed.]

  at oracle.sysman.oii.oiip.oiipg.OiipgClusterRunCmd.runCmdOnUnix(OiipgClusterRunCmd.java:276)

  at oracle.sysman.oii.oiip.oiipg.OiipgClusterRunCmd.runAnyCmdOnNodes(OiipgClusterRunCmd.java:369)

  at oracle.sysman.oii.oiip.oiipg.OiipgClusterRunCmd.runCmd(OiipgClusterRunCmd.java:314)

  at oracle.sysman.oii.oiic.OiicBaseInventoryApp.runRemoteInvOpCmd(OiicBaseInventoryApp.java:281)

  at oracle.sysman.oii.oiic.OiicUpdateNodeList.clsCmdUpdateNodeList(OiicUpdateNodeList.java:296)

  at oracle.sysman.oii.oiic.OiicUpdateNodeList.doOperation(OiicUpdateNodeList.java:240)

  at oracle.sysman.oii.oiic.OiicBaseInventoryApp.main_helper(OiicBaseInventoryApp.java:890)

  at oracle.sysman.oii.oiic.OiicUpdateNodeList.main(OiicUpdateNodeList.java:401)

  Caused by: oracle.ops.mgmt.cluster.ClusterException: PRKC-1044 : Failed to check remote command execution setup for node racdb2 using shells /usr/bin/ssh and /usr/bin/rsh 

  File "/usr/bin/rsh" does not exist on node "racdb2"

  No RSA host key is known for racdb2 and you have requested strict checking.Host key verification failed.

  at oracle.ops.mgmt.cluster.ClusterCmd.runCmd(ClusterCmd.java:2149)

  at oracle.sysman.oii.oiip.oiipg.OiipgClusterRunCmd.runCmdOnUnix(OiipgClusterRunCmd.java:270)

  ... 7 more

  SEVERE: Remote 'UpdateNodeList' failed on nodes: 'racdb2'. Refer to '/u01/app/oraInventory/logs/UpdateNodeList2014-12-03_11-23-57PM.log' for details.

  It is recommended that the following command needs to be manually run on the failed nodes: 

   /u01/app/11.2.0/grid/oui/bin/runInstaller -updateNodeList -noClusterEnabled ORACLE_HOME=/u01/app/11.2.0/grid CLUSTER_NODES=racdb1,racdb2 CRS=true  "INVENTORY_LOCATION=/u01/app/oraInventory" LOCAL_NODE=<node on which command is to be run>. 

  Please refer 'UpdateNodeList' logs under central inventory of remote nodes where failure occurred for more details.

  

        因为racdb2重装了系统没有相对应的文件,所以执行不成功。此时打开  RAC 的节点和CRS 配置文件 inventory.xml 进行手工

        清除 racdb2 的节点。

        

(5):删除racdb2节点的VIP

    [root@racdb1 ~]# crs_stat -t

如果仍然有racdb2节点的VIP服务存在,执行如下:

[root@racdb1 ~]# srvctl stop vip -i ora.racdb2.vip -f

[root@racdb1 ~]# srvctl remove vip -i ora.racdb2.vip -f

[root@racdb1 ~]# crsctl delete resource ora.racdb2.vip -f

       (6):在任一保留的节点(racdb1)上删除racdb2节点

        [root@racdb1 ~]# crsctl delete node -n racdb2

        [root@racdb1 ~]# olsnodes -t -s
       (7):保留节点(racdb1)使用grid用户更新集群列表

        在所有保留的节点上执行:  

[grid@racdb1 ~]$ $ORACLE_HOME/oui/bin/runInstaller -updateNodeList ORACLE_HOME=$ORACLE_HOME “CLUSTER_NODES={racdb1}” CRS=true

        
        注:此时会报错,因为这是集群需要在racdb2上执行相关语句。报如下错误:

          SEVERE: oracle.sysman.oii.oiip.oiipg.OiipgRemoteOpsException: Error occured while trying to run Unix command /u01/app/11.2.0/grid/oui/bin/../bin/runInstaller  -paramFile /u01/app/11.2.0/grid/oui/bin/../clusterparam.ini  -silent -ignoreSysPrereqs -updateNodeList
-noClusterEnabled ORACLE_HOME=/u01/app/11.2.0/grid CLUSTER_NODES=racdb1,racdb2 CRS=true  "INVENTORY_LOCATION=/u01/app/oraInventory" LOCAL_NODE=racdb2 -remoteInvocation -invokingNodeName racdb1 -logFilePath "/u01/app/oraInventory/logs" -timestamp 2014-12-03_11-23-57PM
on nodes racdb2. [PRKC-1044 : Failed to check remote command execution setup for node racdb2 using shells /usr/bin/ssh and /usr/bin/rsh 

  File "/usr/bin/rsh" does not exist on node "racdb2"

  No RSA host key is known for racdb2 and you have requested strict checking.Host key verification failed.]

         at oracle.sysman.oii.oiip.oiipg.OiipgClusterRunCmd.runCmdOnUnix(OiipgClusterRunCmd.java:276)

         at oracle.sysman.oii.oiip.oiipg.OiipgClusterRunCmd.runAnyCmdOnNodes(OiipgClusterRunCmd.java:369)

  at oracle.sysman.oii.oiip.oiipg.OiipgClusterRunCmd.runCmd(OiipgClusterRunCmd.java:314)

  at oracle.sysman.oii.oiic.OiicBaseInventoryApp.runRemoteInvOpCmd(OiicBaseInventoryApp.java:281)

  at oracle.sysman.oii.oiic.OiicUpdateNodeList.clsCmdUpdateNodeList(OiicUpdateNodeList.java:296)

  at oracle.sysman.oii.oiic.OiicUpdateNodeList.doOperation(OiicUpdateNodeList.java:240)

  at oracle.sysman.oii.oiic.OiicBaseInventoryApp.main_helper(OiicBaseInventoryApp.java:890)

  at oracle.sysman.oii.oiic.OiicUpdateNodeList.main(OiicUpdateNodeList.java:401)

  Caused by: oracle.ops.mgmt.cluster.ClusterException: PRKC-1044 : Failed to check remote command execution setup for node racdb2 using shells /usr/bin/ssh and /usr/bin/rsh 

  File "/usr/bin/rsh" does not exist on node "racdb2"

  No RSA host key is known for racdb2 and you have requested strict checking.Host key verification failed.

        at oracle.ops.mgmt.cluster.ClusterCmd.runCmd(ClusterCmd.java:2149)

  at oracle.sysman.oii.oiip.oiipg.OiipgClusterRunCmd.runCmdOnUnix(OiipgClusterRunCmd.java:270)

  ... 7 more
        SEVERE: Remote 'UpdateNodeList' failed on nodes: 'racdb2'. Refer to '/u01/app/oraInventory/logs/UpdateNodeList2014-12-03_11-23-57PM.log'
for details.

  It is recommended that the following command needs to be manually run on the failed nodes: 

   /u01/app/11.2.0/grid/oui/bin/runInstaller -updateNodeList -noClusterEnabled ORACLE_HOME=/u01/app/11.2.0/grid CLUSTER_NODES=racdb1,racdb2 CRS=true  "INVENTORY_LOCATION=/u01/app/oraInventory" LOCAL_NODE=<node on which command is to be run>. 

  Please refer 'UpdateNodeList' logs under central inventory of remote nodes where failure occurred for more details.

         

          因为racdb2重装了系统没有相对应的文件,所以执行不成功。此时打开 racdb1上的 RAC 的节点和CRS 配置文件 inventory.xml 进行手工清除 racdb2 的CRS信息
        (8)验证racdb2节点被删除

        在任一保留的节点上:

        [grid@racdb1 ~]$ cluvfy stage -post nodedel -n racdb2

        [grid@racdb1 ~]$ crsctl status resource -t

验证racdb2节点被删除

查看活动的实例:

[oracle@racdb1 ~]$ sqlplus / as sysdba

SQL> select thread#,status,instance from v$thread;

    

至此集群中的节点2的信息完全清除完毕!
因为racdb2 的系统已经重装,所以在删除节点更新节点信息和集群信息时 会报错。此时可以手工修改,但是在安装时会有报错

提示racdb2没有清除干净

    

二:重新添加racdb2

    

  (1):添加相应的用户和组 添加相应的用户和组 添加相应的用户和组

              

  (2):配置 hosts hostshostshosts文件 ,新增节点和原有都配置为相同的 新增节点和原有都配置为相同
  (3):配置系统参数,用户参数和原有节点一样

  

  (4):创建相应的目录

  

  (5):检查racdb2是否满足rac安装条件(在已经有节点下面用grid,oracle用户执行)

              [root@racdb1 ~]# su - grid

   [grid@racdb1 ~]$ cluvfy stage -pre nodeadd -n racdb2 -fixup -verbose

   [grid@racdb1 ~]$ cluvfy stage -post hwos -n racdb2
  (6): 添加新节点的软件

   在已经有节点下面执行这个命令添加新节点的集群软件(grid用户执行)

  [root@racdb1 ~]# su - grid

  [grid@racdb1 ~]$ /u01/app/grid/product/11.2.0/grid/oui/bin/addNode.sh -silent "CLUSTER_NEW_NODES={racdb2}" "CLUSTER_NEW_VIRTUAL_HOSTNAMES={racdb2-vip}" "CLUSTER_NEW_PRIVATE_NODE_NAMES={racdb2-priv}" #用grid用户执行

   
注:因为在删除时节点已经重装系统,无法在节点2上执行相应的操作,我们是在节点1上手工清除的节点2的信息

        手工清除后,再进行检测时是清除干净的。但是在添加时会报如下错误

          Performing tests to see whether nodes racdb2,racdb2 are available

    ............................................................... 100% Done.
          Error ocurred while retrieving node numbers of the existing nodes. Please
check if clusterware home is properly configured.

    SEVERE:Error ocurred while retrieving node numbers of the existing nodes. Please check if clusterware home is properly configured.

                  此时在官网上查到的解决办法是

                   grid@racdb1 bin]$
./detachHome.sh 

      Starting Oracle Universal Installer...
     Checking swap space: must be greater than 500 MB.   Actual 2986 MB    Passed

                   The inventory pointer is located at /etc/oraInst.loc

                   The inventory is located at /u01/app/oraInventory

                   'DetachHome' was successful.
                   [grid@racdb1 bin]$ ./attachHome.sh 

                   Starting Oracle Universal Installer...
                   Checking swap space: must be greater than 500 MB.   Actual 2986
MB    Passed

                   Preparing to launch Oracle Universal Installer from /tmp/OraInstall2010-06-01_08-53-48PM. Please wait ...[grid@racdb1 bin]$ The inventory pointer is located at /etc/oraInst.loc

                   The inventory is located at /u01/app/oraInventory

                    'AttachHome' was successful.

                 这两个步骤就是根据集群信息重新重建一下  inventory.xml

                 有的地方提示是要执行下面的脚本,个人感觉不能执行,因为若要执行会把racdb1的Cluster 删除掉,这对于集群是致命的!!

                  [grid@racdb1 bin]$ ./runInstaller -updateNodeList ORACLE_HOME=$oracle_home "CLUSTER_NODES={racdb1}" -local

                   Starting Oracle Universal Installer...
                   Checking swap space: must be greater than 500 MB.   Actual 2671
MB    Passed

                    The inventory pointer is located at /etc/oraInst.loc

                    The inventory is located at /u01/app/oraInventory

  (7):运行提示的root.sh脚本

   /u01/app/oraInventory/orainstRoot.sh                                                                          #在新节点 racdb2用root用户执行 

   /u01/app/grid/product/11.2.0/grid/root.sh                                                                    #在新节点racdb2用root用户执行

  

  (8):验证集群软件是否添加成功

  [grid@racdb1 bin]$ cluvfy stage -post nodeadd -n racdb2 -verbose 

   (9).添加新节点数据库

  为新节点安装数据库软件(在已经有节点下用oracle用户执行)

  [root@racdb1 ~]# su - oracle

   [oracle@racdb1 ~]$ /app/oracle/product/11.2.0/db_1/oui/bin/addNode.sh -silent "CLUSTER_NEW_NODES={racdb2}     

  运行提示的root.sh脚本

      #在新节点 racdb2用root用户执行                                                                

     /app/oracle/product/11.2.0/db_1/root.sh                                                             
              注:在添加节点数据库时可能会遇到无法进行cp 但是也没有报错,此时可以直接把 oracle软件从racdb1 拷贝到 racdb2 上,直接拷贝的
                    不需要执行root.sh
         (10):添加实例

  [oracle@racdb1 ~]$ dbca

  或用命令行直接添加实例(在已经有节点下面用oracle用户执行)

  [oracle@racdb1 ~]$ dbca -silent -addInstance -nodeList racdb2  -gdbName orcl -instanceName orcldb2 -sysDBAUserName sys -sysDBAPassword "***"   在oracle用户下面执行 
              注:再添加完实例后,若是直接拷贝的数据库软件,新添加实例可能无法启动报如下错误:

                    

                    ORA-01078: failure in processing system parameters

                    ORA-01565: error in identifying file '+DATA1/orcl/spfileorcl.ora'

                    ORA-17503: ksfdopn:2 Failed to open file +DATA1/orcl/spfileorcl.ora

  报以上错误是因为直接拷贝的数据库软件有两个软件权限不对,执行以下语句修改权限

       cd $GRID_HOME/bin

                     chmod 6751 oracle
                     cd $ORACLE_HOME/bin

                     chmod 6751 oracle

  (11)验证已添加实例

   查看活动的实例:

  [oracle@racdb1 ~]$ sqlplus / as sysdba

   SQL> select thread#,status,instance from gv$thread; 
内容来自用户分享和网络整理,不保证内容的准确性,如有侵权内容,可联系管理员处理 点击这里给我发消息
标签: