您的位置:首页 > 其它

rac实例故障问题解决报告

2014-11-21 10:38 260 查看
工程日志
填报日期:2014/11/20

[实施目的]

1. 解决rac其中一个节点实例只能启动到nomount;

[项目环境]

操作系统

linux

主机名

zgcrac1

数据库版本

Oracle 11.2.0

字符集

生产库实例名

PROD

监听

LISTENER/1521

[实施步骤]

1. 分析错误原因

1.1查看监听,并分析:

根据李昕描述,监听无法启动,查看监听信息

[grid@zgcrac1~]$ lsnrctl status

LSNRCTL forLinux: Version 11.2.0.3.0 - Production on 19-11月-2014 09:55:30

Copyright (c)1991, 2011, Oracle. All rights reserved.

Connecting to(DESCRIPTION=(ADDRESS=(PROTOCOL=IPC)(KEY=LISTENER)))

STATUS of theLISTENER

------------------------

Alias LISTENER

Version TNSLSNR for Linux: Version11.2.0.3.0 - Production

StartDate 21-10月-201420:22:13

Uptime 28 days 13 hr. 33 min. 16 sec

TraceLevel off

Security ON: Local OS Authentication

SNMP OFF

ListenerParameter File /g01/11ggrid/app/11.2.0/grid/network/admin/listener.ora

Listener LogFile /g01/11ggrid/app/11.2.0/grid/log/diag/tnslsnr/zgcrac1/listener/alert/log.xml

ListeningEndpoints Summary...

(DESCRIPTION=(ADDRESS=(PROTOCOL=ipc)(KEY=LISTENER)))

(DESCRIPTION=(ADDRESS=(PROTOCOL=tcp)(HOST=172.16.88.11)(PORT=1521)))

(DESCRIPTION=(ADDRESS=(PROTOCOL=tcp)(HOST=172.16.88.12)(PORT=1521)))

ServicesSummary...

Service"+ASM" has 1 instance(s).

Instance "+ASM1", status READY, has1 handler(s) for this service...

Service"PROD" has 1 instance(s).

Instance"PROD1", status BLOCKED, has 1 handler(s) for this service...

Grid用户

Name Type Target State Host

------------------------------------------------------------

ora.DATADG.dg ora....up.typeONLINE ONLINE zgcrac1

ora.FDG.dg ora....up.typeONLINE ONLINE zgcrac2

ora....ER.lsnr ora....er.type ONLINE ONLINE zgcrac2

ora....N1.lsnr ora....er.type ONLINE ONLINE zgcrac1

ora....EMDG.dg ora....up.type ONLINE ONLINE zgcrac1

ora.asm ora.asm.type ONLINE ONLINE zgcrac1

ora.cvu ora.cvu.type ONLINE ONLINE zgcrac1

ora.gsd ora.gsd.type OFFLINE OFFLINE

ora....network ora....rk.type ONLINE ONLINE zgcrac1

ora.oc4j ora.oc4j.type ONLINE ONLINE zgcrac1

ora.ons ora.ons.type ONLINE ONLINE zgcrac1

ora.prod.db ora....se.typeONLINE ONLINE zgcrac2

ora.scan1.vip ora....ip.typeONLINE ONLINE zgcrac1

ora....SM1.asm application ONLINE ONLINE zgcrac1

ora....C1.lsnr application OFFLINE OFFLINE

ora....ac1.gsd application OFFLINE OFFLINE

ora....ac1.ons application ONLINE ONLINE zgcrac1

ora....ac1.vip ora....t1.type ONLINE ONLINE zgcrac1

ora....SM2.asm application ONLINE ONLINE zgcrac2

ora....C2.lsnr application ONLINE ONLINE zgcrac2

ora....ac2.gsd application OFFLINE OFFLINE

ora....ac2.ons application ONLINE ONLINE zgcrac2

ora....ac2.vip ora....t1.type ONLINE ONLINE zgcrac2

根据监听信息,可以看到报错。

1.2判断监听:

查看12服务器监听,没有发现问题;

查看22服务器监听,也没有发现问题。

判断是否为网络或者ip问题

1.3hosts文件:

127.0.0.1 localhostlocalhost.localdomain localhost4 localhost4.localdomain4

::1 localhostlocalhost.localdomain localhost6 localhost6.localdomain6

172.16.88.11 zgcrac1 zgcrac1.com

172.16.88.12 zgcrac1-vip

172.16.88.21 zgcrac2 zgcrac2.com

172.16.88.22 zgcrac2-vip

172.16.88.10 zgcrac-clusterzgcrac-cluster-scan

10.10.1.1 zgcrac1-priv

10.10.1.2 zgcrac2-priv

两个服务器ip都没有问题

1.4查看数据库实例运行:

select instance_name,host_name,status from v$instance

*

ERROR at line 1:

ORA-01034: ORACLE not available

查看全部实例运行情况

INSTANCE_NAME HOST_NAME STATUS

------------------------------ --------------------------------------------------------

+ASM1 zgcrac1 STARTED

+ASM2 zgcrac2 STARTED

INSTANCE_NAME HOST_NAME STATUS

---------------- -------------------- ------------

PROD2 zgcrac2 OPEN

PROD1 zgcrac1 STARTED

实例PROD1无法open

1.5查看rac服务运行情况:

[grid@zgcrac1 admin]$crsctl stat res -t
--------------------------------------------------------------------------------
NAME TARGET STATE SERVER STATE_DETAILS

--------------------------------------------------------------------------------
Local Resources
--------------------------------------------------------------------------------
ora.DATADG.dg
ONLINE ONLINE zgcrac1

ONLINE ONLINE zgcrac2
ora.FDG.dg
ONLINE OFFLINE zgcrac1

ONLINE ONLINE zgcrac2

ora.LISTENER.lsnr
ONLINE ONLINE zgcrac1

ONLINE ONLINE zgcrac2

ora.SYSTEMDG.dg
ONLINE ONLINE zgcrac1

ONLINE ONLINE zgcrac2

ora.asm
ONLINE ONLINE zgcrac1 Started

ONLINE ONLINE zgcrac2 Started

ora.gsd
OFFLINE OFFLINE zgcrac1

OFFLINE OFFLINE zgcrac2

ora.net1.network
ONLINE ONLINE zgcrac1

ONLINE ONLINE zgcrac2

ora.ons
ONLINE ONLINE zgcrac1

ONLINE ONLINE zgcrac2

--------------------------------------------------------------------------------
Cluster Resources
--------------------------------------------------------------------------------
ora.LISTENER_SCAN1.lsnr
1 ONLINE ONLINE zgcrac1
ora.cvu
1 ONLINE ONLINE zgcrac1

ora.oc4j
1 ONLINE ONLINE zgcrac1

ora.prod.db
1 ONLINE OFFLINE Instance Shutdown

2 ONLINE ONLINE zgcrac2 Open

ora.scan1.vip
1 ONLINE ONLINE zgcrac1

ora.zgcrac1.vip
1 ONLINE ONLINE zgcrac1

ora.zgcrac2.vip
1 ONLINE ONLINE zgcrac2

发现节点zgcrac1的ora.FDG.dg 是offline的
判断问题在于ASM磁盘

1.6查看ASM磁盘状态:

zgcrac1节点
SQL> select name,state fromv$asm_diskgroup;

NAME
--------------------------------------------------------------------------------
STATE
---------------------------------
DATADG
MOUNTED

SYSTEMDG
MOUNTED

zgcrac2节点

SQL> select name ,statefrom v$asm_diskgroup;

NAME STATE
-----------------------------------------
DATADG CONNECTED
FDG CONNECTED
SYSTEMDG MOUNTED

通过对比发现节点zgcrac1之所以无法启动是由于ASM磁盘组无法识别造成的。

SQL> alter diskgroup FDGcheck all;

alter diskgroup FDG checkall

*

ERROR at line 1:

ORA-15032: not allalterations performed

ORA-15001: diskgroup"FDG" does not exist or is not mounted

SQL> alter diskgroup FDGmount;

alter diskgroup FDG mount

*

ERROR at line 1:

ORA-15032: not all alterationsperformed

ORA-15017: diskgroup"FDG" cannot be mounted

ORA-15063: ASM discoveredan insufficient number of disks for diskgroup "FDG"

ORA-15080: synchronous I/Ooperation to a disk failed

ORA-15080: synchronous I/Ooperation to a disk failed

ORA-15080: synchronous I/Ooperation to a disk failed

ORA-15080: synchronous I/Ooperation to a disk failed

可以断定问题是在ASM磁盘组FDG,也证实之前开启实例时控制文件缺失的报错。
1.7查看最早的alertlog:

SQL> CREATE DISKGROUPFDG EXTERNAL REDUNDANCY DISK'/dev/mapper/mpathop1' SIZE 446462M ATTRIBUTE 'compatible.asm'='11.2.0.0.0','au_size'='1M' /* ASMCA */

ERROR: failed to updatediskgroup resource ora.FDG.dg

WARNING: failed to onlinediskgroup resource ora.FDG.dg (unable to communicate with CRSD/OHASD)

ORA-15032: not allalterations performed

磁盘组在创建的时候就有问题,根据以往经验,判断可能是磁盘组权限问题。

1.8查看磁盘组权限:

[root@zgcrac1 mapper]# llmpa*

lrwxrwxrwx 1 grid asmadmin7 Oct 21 11:02 mpathb -> ../dm-5

lrwxrwxrwx 1 grid asmadmin7 Oct 21 11:02 mpathbp1 -> ../dm-6

lrwxrwxrwx 1 grid asmadmin8 Oct 21 11:02 mpathc -> ../dm-20

lrwxrwxrwx 1 grid asmadmin8 Oct 21 11:02 mpathcp1 -> ../dm-22

lrwxrwxrwx 1 grid asmadmin8 Oct 21 11:02 mpathd -> ../dm-19

lrwxrwxrwx 1 grid asmadmin8 Oct 21 11:02 mpathdp1 -> ../dm-21

lrwxrwxrwx 1 grid asmadmin7 Oct 21 11:02 mpathe -> ../dm-9

lrwxrwxrwx 1 grid asmadmin8 Oct 21 11:02 mpathep1 -> ../dm-11

lrwxrwxrwx 1 grid asmadmin7 Oct 21 11:02 mpathf -> ../dm-7

lrwxrwxrwx 1 grid asmadmin7 Oct 21 11:02 mpathfp1 -> ../dm-8

lrwxrwxrwx 1 grid asmadmin7 Oct 21 11:02 mpathg -> ../dm-3

lrwxrwxrwx 1 grid asmadmin7 Oct 21 11:02 mpathgp1 -> ../dm-4

lrwxrwxrwx 1 grid asmadmin8 Oct 21 11:02 mpathh -> ../dm-23

lrwxrwxrwx 1 grid asmadmin8 Oct 21 11:02 mpathhp1 -> ../dm-24

lrwxrwxrwx 1 grid asmadmin8 Oct 21 11:02 mpathi -> ../dm-27

lrwxrwxrwx 1 grid asmadmin8 Oct 21 11:02 mpathip1 -> ../dm-29

lrwxrwxrwx 1 grid asmadmin8 Oct 21 11:02 mpathj -> ../dm-28

lrwxrwxrwx 1 grid asmadmin8 Oct 21 11:02 mpathjp1 -> ../dm-30

lrwxrwxrwx 1 grid asmadmin8 Oct 21 11:34 mpathk -> ../dm-25

lrwxrwxrwx 1 grid asmadmin8 Oct 21 11:34 mpathkp1 -> ../dm-26

lrwxrwxrwx 1 grid asmadmin8 Oct 21 11:34 mpathl -> ../dm-14

lrwxrwxrwx 1 grid asmadmin8 Oct 21 11:34 mpathlp1 -> ../dm-16

lrwxrwxrwx 1 grid asmadmin8 Oct 21 11:34 mpathm -> ../dm-17

lrwxrwxrwx 1 grid asmadmin8 Oct 21 11:34 mpathmp1 -> ../dm-18

lrwxrwxrwx 1 grid asmadmin8 Oct 21 11:34 mpathn -> ../dm-10

lrwxrwxrwx 1 grid asmadmin8 Oct 21 11:34 mpathnp1 -> ../dm-13

lrwxrwxrwx 1 grid asmadmin8 Oct 21 11:34 mpatho -> ../dm-12

lrwxrwxrwx 1 grid asmadmin8 Oct 21 11:34 mpathop1 -> ../dm-15

各个磁盘权限并无问题。

检查磁盘组信息

[root@zgcrac1 mapper]# kfoddisk=all

WARNING: Using brute forcemethod to determine the size of /dev/raw/rawctl.

There will be performance issues. Please checkconfiguration to determine the cause for the failure of ioctl

--------------------------------------------------------------------------------

Disk Size Path User Group

================================================================================

1: 524288 Mb /dev/mapper/mpathb grid asmadmin

2: 524285 Mb /dev/mapper/mpathbp1 grid asmadmin

3: 524288 Mb /dev/mapper/mpathc grid asmadmin

4: 524285 Mb /dev/mapper/mpathcp1 grid asmadmin

5: 524288 Mb /dev/mapper/mpathd grid asmadmin

6: 524285 Mb /dev/mapper/mpathdp1 grid asmadmin

7: 10240 Mb /dev/mapper/mpathe grid asmadmin

8: 10236 Mb /dev/mapper/mpathep1 grid asmadmin

9: 10240 Mb /dev/mapper/mpathf grid asmadmin

10: 10236 Mb /dev/mapper/mpathfp1 grid asmadmin

11: 10240 Mb /dev/mapper/mpathg grid asmadmin

12: 10236 Mb /dev/mapper/mpathgp1 grid asmadmin

13: 524288 Mb /dev/mapper/mpathh grid asmadmin

14: 524285 Mb /dev/mapper/mpathhp1 grid asmadmin

15: 524288 Mb /dev/mapper/mpathi grid asmadmin

16: 524285 Mb /dev/mapper/mpathip1 grid asmadmin

17: 524288 Mb /dev/mapper/mpathj grid asmadmin

18: 524285 Mb /dev/mapper/mpathjp1 grid asmadmin

19: 524288 Mb /dev/mapper/mpathk root disk

20: 524285 Mb /dev/mapper/mpathkp1 root disk

21: 524288 Mb /dev/mapper/mpathl root disk

22: 524285 Mb /dev/mapper/mpathlp1 root disk

23: 524288 Mb /dev/mapper/mpathm root disk

24: 524285 Mb /dev/mapper/mpathmp1 root disk

25: 524288 Mb /dev/mapper/mpathn root disk

26: 524285 Mb /dev/mapper/mpathnp1 root disk

27: 446464 Mb /dev/mapper/mpatho root disk

28: 446462 Mb /dev/mapper/mpathop1 root disk

--------------------------------------------------------------------------------

ORACLE_SID ORACLE_HOME

================================================================================

+ASM2 /g01/11ggrid/app/11.2.0/grid

+ASM1 /g01/11ggrid/app/11.2.0/grid

发现不问磁盘属主是root,判断这就是实例无法识别磁盘组的原因。

更改属主即可

[root@zgcrac1 dev]# chown -Rgrid:asmadmin dm-10
[root@zgcrac1 dev]# chown -Rgrid:asmadmin dm-12
[root@zgcrac1 dev]# chown -Rgrid:asmadmin dm-13
[root@zgcrac1 dev]# chown -Rgrid:asmadmin dm-14
[root@zgcrac1 dev]# chown -Rgrid:asmadmin dm-15
[root@zgcrac1 dev]# chown -Rgrid:asmadmin dm-16
[root@zgcrac1 dev]# chown -Rgrid:asmadmin dm-17
[root@zgcrac1 dev]# chown -Rgrid:asmadmin dm-18
[root@zgcrac1 dev]# chown -Rgrid:asmadmin dm-25
[root@zgcrac1 dev]# chown -R grid:asmadmindm-26

2.更改后rac情况

[root@zgcrac1 dev]#kfod disk=all

WARNING: Usingbrute force method to determine the size of /dev/raw/rawctl.

There will be performance issues. Please checkconfiguration to determine the cause for the failure of ioctl

--------------------------------------------------------------------------------

Disk Size Path User Group

================================================================================

1: 524288 Mb /dev/mapper/mpathb grid asmadmin

2: 524285 Mb /dev/mapper/mpathbp1 grid asmadmin

3: 524288 Mb /dev/mapper/mpathc grid asmadmin

4: 524285 Mb /dev/mapper/mpathcp1 grid asmadmin

5: 524288 Mb /dev/mapper/mpathd grid asmadmin

6: 524285 Mb /dev/mapper/mpathdp1 grid asmadmin

7: 10240 Mb /dev/mapper/mpathe grid asmadmin

8: 10236 Mb/dev/mapper/mpathep1 grid asmadmin

9: 10240 Mb /dev/mapper/mpathf grid asmadmin

10: 10236 Mb /dev/mapper/mpathfp1 grid asmadmin

11: 10240 Mb /dev/mapper/mpathg grid asmadmin

12: 10236 Mb /dev/mapper/mpathgp1 grid asmadmin

13: 524288 Mb /dev/mapper/mpathh grid asmadmin

14: 524285 Mb /dev/mapper/mpathhp1 grid asmadmin

15: 524288 Mb /dev/mapper/mpathi grid asmadmin

16: 524285 Mb /dev/mapper/mpathip1 grid asmadmin

17: 524288 Mb /dev/mapper/mpathj grid asmadmin

18: 524285 Mb /dev/mapper/mpathjp1 grid asmadmin

19: 524288 Mb /dev/mapper/mpathk grid asmadmin

20: 524285 Mb /dev/mapper/mpathkp1 grid asmadmin

21: 524288 Mb /dev/mapper/mpathl grid asmadmin

22: 524285 Mb /dev/mapper/mpathlp1 grid asmadmin

23: 524288 Mb /dev/mapper/mpathm grid asmadmin

24: 524285 Mb /dev/mapper/mpathmp1 grid asmadmin

25: 524288 Mb /dev/mapper/mpathn grid asmadmin

26: 524285 Mb /dev/mapper/mpathnp1 grid asmadmin

27: 446464 Mb /dev/mapper/mpatho grid asmadmin

28: 446462 Mb /dev/mapper/mpathop1 grid asmadmin

rac服务运行

[root@zgcrac1 dev]# crs_stat-t
Name Type Target State Host
------------------------------------------------------------
ora.DATADG.dg ora....up.type ONLINE ONLINE zgcrac1
ora.FDG.dg ora....up.type ONLINE ONLINE zgcrac1
ora....ER.lsnr ora....er.typeONLINE ONLINE zgcrac1
ora....N1.lsnr ora....er.typeONLINE ONLINE zgcrac2
ora....EMDG.dg ora....up.typeONLINE ONLINE zgcrac1
ora.asm ora.asm.type ONLINE ONLINE zgcrac1
ora.cvu ora.cvu.type ONLINE ONLINE zgcrac2
ora.gsd ora.gsd.type OFFLINE OFFLINE
ora....network ora....rk.typeONLINE ONLINE zgcrac1
ora.oc4j ora.oc4j.type ONLINE ONLINE zgcrac2
ora.ons ora.ons.type ONLINE ONLINE zgcrac1
ora.prod.db ora....se.type ONLINE ONLINE zgcrac1
ora.scan1.vip ora....ip.type ONLINE ONLINE zgcrac2
ora....SM1.asmapplication ONLINE ONLINE zgcrac1
ora....C1.lsnrapplication ONLINE ONLINE zgcrac1
ora....ac1.gsdapplication OFFLINE OFFLINE
ora....ac1.onsapplication ONLINE ONLINE zgcrac1
ora....ac1.vip ora....t1.typeONLINE ONLINE zgcrac1
ora....SM2.asmapplication ONLINE ONLINE zgcrac2
ora....C2.lsnrapplication ONLINE ONLINE zgcrac2
ora....ac2.gsdapplication OFFLINE OFFLINE
ora....ac2.onsapplication ONLINE ONLINE zgcrac2
ora....ac2.vip ora....t1.typeONLINE ONLINE zgcrac2

监听
[grid@zgcrac1 ~]$ lsnrctlstatus

LSNRCTL for Linux: Version11.2.0.3.0 - Production on 20-11月-2014 17:18:52

Copyright (c) 1991, 2011,Oracle. All rights reserved.

Connecting to(DESCRIPTION=(ADDRESS=(PROTOCOL=IPC)(KEY=LISTENER)))
STATUS of the LISTENER
------------------------
Alias LISTENER
Version TNSLSNR for Linux: Version11.2.0.3.0 - Production
Start Date 21-10月-201420:22:13
Uptime 29 days 20 hr. 56 min. 38sec
Trace Level off
Security ON: Local OS Authentication
SNMP OFF
Listener Parameter File /g01/11ggrid/app/11.2.0/grid/network/admin/listener.ora
Listener Log File /g01/11ggrid/app/11.2.0/grid/log/diag/tnslsnr/zgcrac1/listener/alert/log.xml
Listening Endpoints Summary...
(DESCRIPTION=(ADDRESS=(PROTOCOL=ipc)(KEY=LISTENER)))
(DESCRIPTION=(ADDRESS=(PROTOCOL=tcp)(HOST=172.16.88.12)(PORT=1521)))
(DESCRIPTION=(ADDRESS=(PROTOCOL=tcp)(HOST=172.16.88.11)(PORT=1521)))
Services Summary...
Service "+ASM" has 1instance(s).
Instance "+ASM1", status READY, has1 handler(s) for this service...
Service "PROD" has 1instance(s).
Instance "PROD1", status READY, has1 handler(s) for this service...
The command completedsuccessfully

实例运行
SQL> selectinstance_name,host_name,status from gv$instance;

INSTANCE_NAME
----------------
HOST_NAME STATUS
----------------------------------------------------------------------------
PROD1
zgcrac1 OPEN

PROD2
zgcrac2 OPEN

磁盘组运行
SQL> select group_number,name,state from v$asm_diskgroup;

GROUP_NUMBER NAME STATE
------------------------------------------ -----------
1 DATADG CONNECTED
2 FDG CONNECTED
3 SYSTEMDG MOUNTED
内容来自用户分享和网络整理,不保证内容的准确性,如有侵权内容,可联系管理员处理 点击这里给我发消息
标签: