how to clean up a cluster .
2010-03-15 14:33
411 查看
Recently I encountered a situation in an Oracle RAC cluster whereby
files were accidentally deleted within the CRS_HOME on one of the nodes
resulting in a node failure. I discovered that the conventional method
of node removal and addition from the cluster didn't work.
This document describes how to clean up a cluster in such a situation.
1. During or after the conventional Oracle method of removing a node,
(as documented in Note:269320.1, Removing a Node from a 10g RAC Cluster),
various errors might be encountered such as;
[oracle@<working node name> bin]$ ./srvctl stop nodeapps -n <broken node name>
CRS-0216: Could not stop resource 'ora.<broken node name>.ons'.
CRS-0216: Could not stop resource 'ora.<broken node name>.vip'.
CRS-0216: Could not stop resource 'ora.<broken node name>.gsd'.
[oracle@<working node name> bin]$
[root@<working node name> bin]# ./srvctl remove nodeapps -n <broken node name>
Please confirm that you intend to remove the node-level applications on node <broken node name> (y/
) y
PRKO-2112 : Some or all node applications are not removed successfully on node: <broken node name>
2. However, according to the OCR all information for the broken node has been removed.
[oracle@<working node name> bin]$ ./crs_stat -u
NAME=ora.<working node name>.inst
TYPE=application
TARGET=ONLINE
STATE=ONLINE on <working node name>
NAME=ora.cmastage.db
TYPE=application
TARGET=ONLINE
STATE=ONLINE on <working node name>
NAME=ora.<working node name>.ASM1.asm
TYPE=application
TARGET=ONLINE
STATE=OFFLINE
NAME=ora.<working node name>.LISTENER_<WORKING_NODE_NAME>.lsnr
TYPE=application
TARGET=ONLINE
STATE=ONLINE on <working node name>
NAME=ora.<working node name>.gsd
TYPE=application
TARGET=ONLINE
STATE=ONLINE on <working node name>
NAME=ora.<working node name>.ons
TYPE=application
TARGET=ONLINE
STATE=UNKNOWN on <working node name>
NAME=ora.<working node name>.vip
TYPE=application
TARGET=ONLINE
STATE=ONLINE on <working node name>
3. If there still appear to be resources in the ocr for the broken node,
they can be removed as follows:
$CRS_HOME/bin/crs_unregister <resource name>
(where resource name is acquired from the output of the crs_stat command as above)
4. Now one might think the procedure has completed okay and that the broken node
can be added back into the cluster using the standard add node procedure but alas,
all sorts of weird errors might be encountered from here on in, if so this indicates
that the OCR might have become corrupted and will need to be re-initialised. This will
require an outage to the cluster and is detailed below.
5. Shutdown the Oracle Clusterware stack on all the nodes using command crsctl stop crs as root user.
6. Execute the following on all nodes:
<CRS_HOME>/install/rootdelete.sh
7. Execute the following on the node which is supposed to be the first node:
<CRS_HOME>/install/rootdeinstall.sh
8. The following commands should return nothing
ps -e | grep -i 'ocs[s]d'
ps -e | grep -i 'cr[s]d.bin'
ps -e | grep -i 'ev[m]d.bin'
9. Execute <CRS_HOME>/root.sh on first node
10. After successful root.sh execution on first node, execute root.sh on the rest of the nodes of the cluster.
11. The nodeapps might need to be added manually using the srvctl command as follows (as root user for each node):
[root@<working node name> bin]# ./srvctl add nodeapps -n <working node name> -o /u01/app/oracle/product/10.2/db_1 -A <working node name vip>/<netmask>/<device name>
(where <working node name vip> = hosts file entry for vip, or IP address, and <device name> = device name such as eth0)
12. Add the database to the OCR using the appropriate srvctl add database command as the user who owns the database,
ensure that this is not run as root user
13. Add ASM, DB, Instance, services using approproate srvctl add commands.
14. Add the listener using netca. This may give errors if the listener.ora contains the entries already.
If this is the case, move the listener.ora to /tmp from the $ORACLE_HOME/network/admin or from the
$TNS_ADMIN directory if the TNS_ADMIN environmental is defined and then run netca.
Add all the listeners that were added earlier.
References:
Removing a Node from a 10g RAC Cluster. Note:269320.1 (Oracle Metalink)
Re-initialising the OCR. Note:399482.1 (Oracle Metalink)
files were accidentally deleted within the CRS_HOME on one of the nodes
resulting in a node failure. I discovered that the conventional method
of node removal and addition from the cluster didn't work.
This document describes how to clean up a cluster in such a situation.
1. During or after the conventional Oracle method of removing a node,
(as documented in Note:269320.1, Removing a Node from a 10g RAC Cluster),
various errors might be encountered such as;
[oracle@<working node name> bin]$ ./srvctl stop nodeapps -n <broken node name>
CRS-0216: Could not stop resource 'ora.<broken node name>.ons'.
CRS-0216: Could not stop resource 'ora.<broken node name>.vip'.
CRS-0216: Could not stop resource 'ora.<broken node name>.gsd'.
[oracle@<working node name> bin]$
[root@<working node name> bin]# ./srvctl remove nodeapps -n <broken node name>
Please confirm that you intend to remove the node-level applications on node <broken node name> (y/
) y
PRKO-2112 : Some or all node applications are not removed successfully on node: <broken node name>
2. However, according to the OCR all information for the broken node has been removed.
[oracle@<working node name> bin]$ ./crs_stat -u
NAME=ora.<working node name>.inst
TYPE=application
TARGET=ONLINE
STATE=ONLINE on <working node name>
NAME=ora.cmastage.db
TYPE=application
TARGET=ONLINE
STATE=ONLINE on <working node name>
NAME=ora.<working node name>.ASM1.asm
TYPE=application
TARGET=ONLINE
STATE=OFFLINE
NAME=ora.<working node name>.LISTENER_<WORKING_NODE_NAME>.lsnr
TYPE=application
TARGET=ONLINE
STATE=ONLINE on <working node name>
NAME=ora.<working node name>.gsd
TYPE=application
TARGET=ONLINE
STATE=ONLINE on <working node name>
NAME=ora.<working node name>.ons
TYPE=application
TARGET=ONLINE
STATE=UNKNOWN on <working node name>
NAME=ora.<working node name>.vip
TYPE=application
TARGET=ONLINE
STATE=ONLINE on <working node name>
3. If there still appear to be resources in the ocr for the broken node,
they can be removed as follows:
$CRS_HOME/bin/crs_unregister <resource name>
(where resource name is acquired from the output of the crs_stat command as above)
4. Now one might think the procedure has completed okay and that the broken node
can be added back into the cluster using the standard add node procedure but alas,
all sorts of weird errors might be encountered from here on in, if so this indicates
that the OCR might have become corrupted and will need to be re-initialised. This will
require an outage to the cluster and is detailed below.
5. Shutdown the Oracle Clusterware stack on all the nodes using command crsctl stop crs as root user.
6. Execute the following on all nodes:
<CRS_HOME>/install/rootdelete.sh
7. Execute the following on the node which is supposed to be the first node:
<CRS_HOME>/install/rootdeinstall.sh
8. The following commands should return nothing
ps -e | grep -i 'ocs[s]d'
ps -e | grep -i 'cr[s]d.bin'
ps -e | grep -i 'ev[m]d.bin'
9. Execute <CRS_HOME>/root.sh on first node
10. After successful root.sh execution on first node, execute root.sh on the rest of the nodes of the cluster.
11. The nodeapps might need to be added manually using the srvctl command as follows (as root user for each node):
[root@<working node name> bin]# ./srvctl add nodeapps -n <working node name> -o /u01/app/oracle/product/10.2/db_1 -A <working node name vip>/<netmask>/<device name>
(where <working node name vip> = hosts file entry for vip, or IP address, and <device name> = device name such as eth0)
12. Add the database to the OCR using the appropriate srvctl add database command as the user who owns the database,
ensure that this is not run as root user
13. Add ASM, DB, Instance, services using approproate srvctl add commands.
14. Add the listener using netca. This may give errors if the listener.ora contains the entries already.
If this is the case, move the listener.ora to /tmp from the $ORACLE_HOME/network/admin or from the
$TNS_ADMIN directory if the TNS_ADMIN environmental is defined and then run netca.
Add all the listeners that were added earlier.
References:
Removing a Node from a 10g RAC Cluster. Note:269320.1 (Oracle Metalink)
Re-initialising the OCR. Note:399482.1 (Oracle Metalink)
相关文章推荐
- How to cleanup ASM installation (RAC and Non-RAC)
- How to cleanup orphaned Data Pump jobs in DBA_DATAPUMP_JOBS ?
- 10g RAC: How to Clean Up After a Failed CRS Install
- How to Clean Up Duplicate Objects Owned by SYS and SYSTEM Schema (Doc ID 1030426.6)
- How To Automate Cleanup Of Dead Connections And INACTIVE Sessions [ID 206007.1]
- How to Clean Up After a Failed 10g or 11.1 Oracle Clusterware Installation [ID 239998.1]
- How to Set Up SQL Server 2008 FileStream In the Cluster
- How to clean up BizTalk Message Box
- [转载]How to set up a clean UTF-8 environment in Lin
- How to cleanup ASM installation (RAC and Non-RAC) [ID 311350.1]
- Case: How To Clean Up Large Table
- How To Cleanup Orphaned DataPump Jobs In DBA_DATAPUMP_JOBS ?
- How to cleanup and shrink disk space usage of a Windows KVM virtual machine
- HOWTO set up a MySQL Cluster for two servers
- How to Create a Node.js Cluster for Speeding Up Your Apps
- How to clean up disk space on Linux
- How To Automate Cleanup Of Dead Connections And INACTIVE 【如何自动清理ORACLE中的死连接和非活动会话】
- About UID and How to autostart an application on boot up in 3rd- Startup List Management API
- How To Configure a Redis Cluster on CentOS 7
- svn :Cleanup failed to process the following paths 解决办法