RMAN备份时报ORA-19501错误--问题定位篇
2013-08-05 20:56
92 查看
一个库,在备份时报错ORA-19501,下面将我的分析过程简单罗列下
环境:linux + oracle 10.1.0.4.2
错误内容如下
首先根据上面的错误信息
1. 查看了该数据文件,发现它在物理上是存在的
2.根据oracle的错误编号,挖掘更多的内容
分析:读取文件错误,推断可能有坏块,具体是物理坏块还是逻辑坏块呢
3.查看告警日志,里面没有错误信息,没有提供有价值的信息
下面就从坏块入手
4.查看坏块所在的表空间及对象
分析:可以访问坏块上的表的数据,这里有2种情况:
(1)该表的所有数据都在内存中,查询时全部逻辑读 ---无法判断该表是否是存在逻辑坏块还是物理坏块;
(2)该表中的数据在内存和磁盘中都有,查询时,一部分物理读 ---
假设法 --由于不确定是物理坏块还是逻辑坏块,那么就假设为逻辑坏块。
为了验证是逻辑坏块,执行下面操作
5.用dbv工具验证是否存在逻辑坏块
分析:dbv验证没有逻辑坏块,可是为什么rman下验证时候又报错呢?
答案只有一个,那就是不是逻辑坏块,而是物理坏块
那么为了验证是物理坏块,执行下面操作
6.用cp的命令验证物理坏块
分析:上面的测试结果,让我怀疑磁盘坏了,为了验证我的怀疑,执行如下内容
7.验证磁盘是否健康正常
结论:上面的测试结果证明了我的推断,磁盘坏了,产生了坏道,导致备份时,物理读取该数据文件时候报错
但是这样就又有了新的问题,磁盘坏了,应该所在坏道上的数据文件逻辑结构也损坏,也就是应该产生
逻辑坏块,但是事实并没有。而且在一次数据库重启后,数据库正常,并为报与之有关的错误
环境:linux + oracle 10.1.0.4.2
错误内容如下
RMAN> run { 2> backup database format '/XXX/flash_recovery_area/prod/backupset/%U.dbf'; 3> } Starting backup at 01-AUG-13 allocated channel: ORA_DISK_1 channel ORA_DISK_1: sid=296 devtype=DISK channel ORA_DISK_1: starting full datafile backupset channel ORA_DISK_1: specifying datafile(s) in backupset input datafile fno=00046 name=/XXX/oradata/prod/datafile/o1_mf_esbigtbl_1q6k0sp9_.dbf input datafile fno=00002 name=/XXX/oradata/prod/datafile/o1_mf_undotbs1_1q6jqcko_.dbf input datafile fno=00063 name=/XXX/oradata/prod/datafile/o1_mf_content__1q6k05ym_.dbf ...... ...... channel ORA_DISK_1: starting piece 1 at 01-AUG-13 RMAN-00571: =========================================================== RMAN-00569: =============== ERROR MESSAGE STACK FOLLOWS =============== RMAN-00571: =========================================================== RMAN-03009: failure of backup command on ORA_DISK_1 channel at 08/01/2013 15:48:46 ORA-19501: read error on file "/XXX/oradata/prod/datafile/o1_mf_content__1q6k05ym_.dbf", blockno 15233 (blocksize=8192) ORA-27072: File I/O error Linux Error: 2: No such file or directory Additional information: 15232
首先根据上面的错误信息
1. 查看了该数据文件,发现它在物理上是存在的
2.根据oracle的错误编号,挖掘更多的内容
[oracle@infra bin]$ oerr ora 19501 19501, 00000, "read error on file \"%s\", blockno %s (blocksize=%s)" // *Cause: read error on input file // *Action: check the file [oracle@infra bin]$ oerr ora 27072 27072, 00000, "File I/O error" // *Cause: read/write/readv/writev system call returned error, additional // information indicates starting block number of I/O // *Action: check errno
分析:读取文件错误,推断可能有坏块,具体是物理坏块还是逻辑坏块呢
3.查看告警日志,里面没有错误信息,没有提供有价值的信息
下面就从坏块入手
4.查看坏块所在的表空间及对象
SQL> r 1 SELECT OWNER, SEGMENT_NAME, SEGMENT_TYPE, TABLESPACE_NAME, A.PARTITION_NAME 2 FROM DBA_EXTENTS A 3 WHERE FILE_ID = &FILE_ID 4* AND &BLOCK_ID BETWEEN BLOCK_ID AND BLOCK_ID + BLOCKS - 1 Enter value for file_id: 63 old 3: WHERE FILE_ID = &FILE_ID new 3: WHERE FILE_ID = 63 Enter value for block_id: 15233 old 4: AND &BLOCK_ID BETWEEN BLOCK_ID AND BLOCK_ID + BLOCKS - 1 new 4: AND 15233 BETWEEN BLOCK_ID AND BLOCK_ID + BLOCKS - 1 OWNER SEGMENT_NAME SEGMENT_TYPE TABLESPACE_NAME PARTITION_NAME -------------------- -------------------- ------------------ -------------------- ------------------------------ CONTENT DR$IFS_TEXT$I TABLE CONTENT_IFS_CTX_K SQL> select count(*) from CONTENT.DR$IFS_TEXT$I; COUNT(*) ---------- 3257212
分析:可以访问坏块上的表的数据,这里有2种情况:
(1)该表的所有数据都在内存中,查询时全部逻辑读 ---无法判断该表是否是存在逻辑坏块还是物理坏块;
(2)该表中的数据在内存和磁盘中都有,查询时,一部分物理读 ---
假设法 --由于不确定是物理坏块还是逻辑坏块,那么就假设为逻辑坏块。
为了验证是逻辑坏块,执行下面操作
5.用dbv工具验证是否存在逻辑坏块
[oracle@infra bin]$ dbv file=/XXX/oradata/prod/datafile/o1_mf_content__1q6k05ym_.dbf blocksize=8192 DBVERIFY: Release 10.1.0.4.2 - Production on Thu Aug 1 16:26:07 2013 Copyright (c) 1982, 2005, Oracle. All rights reserved. DBVERIFY - Verification starting : FILE = /XXX/oradata/prod/datafile/o1_mf_content__1q6k05ym_.dbf DBVERIFY - Verification complete Total Pages Examined : 15258 Total Pages Processed (Data) : 13427 Total Pages Failing (Data) : 0 Total Pages Processed (Index): 20 Total Pages Failing (Index): 0 Total Pages Processed (Other): 1707 Total Pages Processed (Seg) : 0 Total Pages Failing (Seg) : 0 Total Pages Empty : 104 Total Pages Marked Corrupt : 0 Total Pages Influx : 0 Highest block SCN : 1978186924 (0.1978186924) RMAN> run { 2> backup validate datafile 63 format '/XXX/flash_recovery_area/prod/backupset/%U.dbf'; 3> } Starting backup at 01-AUG-13 using target database controlfile instead of recovery catalog allocated channel: ORA_DISK_1 channel ORA_DISK_1: sid=367 devtype=DISK channel ORA_DISK_1: starting full datafile backupset channel ORA_DISK_1: specifying datafile(s) in backupset input datafile fno=00063 name=/XXX/oradata/prod/datafile/o1_mf_content__1q6k05ym_.dbf RMAN-00571: =========================================================== RMAN-00569: =============== ERROR MESSAGE STACK FOLLOWS =============== RMAN-00571: =========================================================== RMAN-03009: failure of backup command on ORA_DISK_1 channel at 08/01/2013 16:32:35 ORA-19501: read error on file "/XXX/oradata/prod/datafile/o1_mf_content__1q6k05ym_.dbf", blockno 15233 (blocksize=8192) ORA-27072: File I/O error Additional information: 15232 RMAN> backup check logical validate datafile 63; Starting backup at 01-AUG-13 using channel ORA_DISK_1 channel ORA_DISK_1: starting full datafile backupset channel ORA_DISK_1: specifying datafile(s) in backupset input datafile fno=00063 name=/XXX/oradata/prod/datafile/o1_mf_content__1q6k05ym_.dbf RMAN-00571: =========================================================== RMAN-00569: =============== ERROR MESSAGE STACK FOLLOWS =============== RMAN-00571: =========================================================== RMAN-03009: failure of backup command on ORA_DISK_1 channel at 08/01/2013 17:42:10 ORA-19501: read error on file "/XXX/oradata/prod/datafile/o1_mf_content__1q6k05ym_.dbf", blockno 15233 (blocksize=8192) ORA-27072: File I/O error Additional information: 15232
分析:dbv验证没有逻辑坏块,可是为什么rman下验证时候又报错呢?
答案只有一个,那就是不是逻辑坏块,而是物理坏块
那么为了验证是物理坏块,执行下面操作
6.用cp的命令验证物理坏块
[oracle@infra datafile]$ cp /XXX/oradata/prod/datafile/o1_mf_content__1q6k05ym_.dbf /tmp/1.dbf cp: reading `/XXX/oradata/prod/datafile/o1_mf_content__1q6k05ym_.dbf': Input/output error [oracle@infra datafile]$ cp /XXX/oradata/prod/datafile/rman01.dbf /tmp/2.dbf --没报错 [oracle@infra datafile]$ cp /XXX/oradata/prod/datafile/o1_mf_ovfmetri_1q6jw7hm_.dbf /tmp/3.dbf --没报错
分析:上面的测试结果,让我怀疑磁盘坏了,为了验证我的怀疑,执行如下内容
7.验证磁盘是否健康正常
[oracle@infra oracle]$ dmesg 0 0 0 0 0 0 00 17 00F 0F 1 1 0 1 0 1 1 A9 IO APIC #9...... .... register #00: 09000000 ....... : physical APIC id: 09 ....... : Delivery Type: 0 ....... : LTS : 0 .... register #01: 00178020 ....... : max redirection entries: 0017 ....... : PRQ implemented: 1 ....... : IO APIC version: 0020 .... register #03: 00000001 ....... : Boot DT : 1 .... IRQ redirection table: NR Log Phy Mask Trig IRR Pol Stat Dest Deli Vect: 00 000 00 1 0 0 0 0 0 0 00 01 000 00 1 0 0 0 0 0 0 00 02 00F 0F 1 1 0 1 0 1 1 B1 03 000 00 1 0 0 0 0 0 0 00 04 000 00 1 0 0 0 0 0 0 00 05 000 00 1 0 0 0 0 0 0 00 06 000 00 1 0 0 0 0 0 0 00 07 000 00 1 0 0 0 0 0 0 00 08 000 00 1 0 0 0 0 0 0 00 09 000 00 1 0 0 0 0 0 0 00 0a 000 00 1 0 0 0 0 0 0 00 0b 000 00 1 0 0 0 0 0 0 00 0c 000 00 1 0 0 0 0 0 0 00 0d 000 00 1 0 0 0 0 0 0 00 0e 000 00 1 0 0 0 0 0 0 00 0f 000 00 1 0 0 0 0 0 0 00 10 000 00 1 0 0 0 0 0 0 00 11 000 00 1 0 0 0 0 0 0 00 12 000 00 1 0 0 0 0 0 0 00 13 000 00 1 0 0 0 0 0 0 00 14 000 00 1 0 0 0 0 0 0 00 15 000 00 1 0 0 0 0 0 0 00 16 000 00 1 0 0 0 0 0 0 00 17 000 00 1 0 0 0 0 0 0 00 IRQ to pin mappings: IRQ0 -> 0:2 IRQ1 -> 0:1 IRQ4 -> 0:4 IRQ5 -> 0:5 IRQ6 -> 0:6 IRQ8 -> 0:8 IRQ10 -> 0:10 IRQ12 -> 0:12 IRQ13 -> 0:13 IRQ14 -> 0:14 IRQ15 -> 0:15 IRQ16 -> 0:16 IRQ17 -> 0:17 IRQ18 -> 0:18 IRQ19 -> 0:19 IRQ23 -> 0:23 IRQ26 -> 1:2 .................................... done. Using local APIC timer interrupts. calibrating APIC timer ... ..... CPU clock speed is 2792.9879 MHz. ..... host bus clock speed is 199.4990 MHz. cpu: 0, clocks: 1994990, slice: 398998 CPU0<T0:1994976,T1:1595968,D:10,S:398998,C:1994990> ...
audit subsystem ver 0.1 initialized mtrr: type mismatch for fd000000,800000 old: uncachable new: write-combining mtrr: type mismatch for fd000000,800000 old: uncachable new: write-combining SCSI disk error : host 0 channel 0 id 0 lun 0 return code = 25040001 I/O error: dev 08:05, sector 18790976 SCSI disk error : host 0 channel 0 id 0 lun 0 return code = 25040001 I/O error: dev 08:05, sector 18790984 SCSI disk error : host 0 channel 0 id 0 lun 0 return code = 25040001 I/O error: dev 08:05, sector 18790992 SCSI disk error : host 0 channel 0 id 0 lun 0 return code = 25040001 I/O error: dev 08:05, sector 18791000 SCSI disk error : host 0 channel 0 id 0 lun 0 return code = 25040001 I/O error: dev 08:05, sector 18791008 SCSI disk error : host 0 channel 0 id 0 lun 0 return code = 25040001 I/O error: dev 08:05, sector 18791016 SCSI disk error : host 0 channel 0 id 0 lun 0 return code = 25040001 I/O error: dev 08:05, sector 18791024 SCSI disk error : host 0 channel 0 id 0 lun 0 return code = 25040001 I/O error: dev 08:05, sector 18791032 SCSI disk error : host 0 channel 0 id 0 lun 0 return code = 25040001 I/O error: dev 08:05, sector 18791040 SCSI disk error : host 0 channel 0 id 0 lun 0 return code = 25040001 I/O error: dev 08:05, sector 18791048 SCSI disk error : host 0 channel 0 id 0 lun 0 return code = 25040001 I/O error: dev 08:05, sector 18791056 SCSI disk error : host 0 channel 0 id 0 lun 0 return code = 25040001 I/O error: dev 08:05, sector 18791064 SCSI disk error : host 0 channel 0 id 0 lun 0 return code = 25040001 I/O error: dev 08:05, sector 18791072 SCSI disk error : host 0 channel 0 id 0 lun 0 return code = 25040001 I/O error: dev 08:05, sector 18791080 SCSI disk error : host 0 channel 0 id 0 lun 0 return code = 25040001 I/O error: dev 08:05, sector 18791088 SCSI disk error : host 0 channel 0 id 0 lun 0 return code = 25040001 I/O error: dev 08:05, sector 18791096 SCSI disk error : host 0 channel 0 id 0 lun 0 return code = 25040001 I/O error: dev 08:05, sector 18791096 SCSI disk error : host 0 channel 0 id 0 lun 0 return code = 25040001 I/O error: dev 08:05, sector 18791096 SCSI disk error : host 0 channel 0 id 0 lun 0 return code = 25040001 I/O error: dev 08:05, sector 18791096 SCSI disk error : host 0 channel 0 id 0 lun 0 return code = 25040001 I/O error: dev 08:05, sector 18791096 SCSI disk error : host 0 channel 0 id 0 lun 0 return code = 25040001 I/O error: dev 08:05, sector 18791096 SCSI disk error : host 0 channel 0 id 0 lun 0 return code = 25040001 I/O error: dev 08:05, sector 18791096 SCSI disk error : host 0 channel 0 id 0 lun 0 return code = 25040001 I/O error: dev 08:05, sector 18791096 SCSI disk error : host 0 channel 0 id 0 lun 0 return code = 25040001 I/O error: dev 08:05, sector 18791096 SCSI disk error : host 0 channel 0 id 0 lun 0 return code = 25040001 I/O error: dev 08:05, sector 18791096 SCSI disk error : host 0 channel 0 id 0 lun 0 return code = 25040001 I/O error: dev 08:05, sector 18791096 SCSI disk error : host 0 channel 0 id 0 lun 0 return code = 25040001 I/O error: dev 08:05, sector 18791096 SCSI disk error : host 0 channel 0 id 0 lun 0 return code = 25040001 I/O error: dev 08:05, sector 18791096 SCSI disk error : host 0 channel 0 id 0 lun 0 return code = 25040001 I/O error: dev 08:05, sector 18791096 SCSI disk error : host 0 channel 0 id 0 lun 0 return code = 25040001 I/O error: dev 08:05, sector 18791096 application bug: sqlplus(2014) has SIGCHLD set to SIG_IGN but calls wait(). (see the NOTES section of 'man 2 wait'). Workaround activated. SCSI disk error : host 0 channel 0 id 0 lun 0 return code = 25040001 I/O error: dev 08:05, sector 18791048 SCSI disk error : host 0 channel 0 id 0 lun 0 return code = 25040001 I/O error: dev 08:05, sector 18791056 SCSI disk error : host 0 channel 0 id 0 lun 0 return code = 25040001 I/O error: dev 08:05, sector 18791064 SCSI disk error : host 0 channel 0 id 0 lun 0 return code = 25040001 I/O error: dev 08:05, sector 18791072 SCSI disk error : host 0 channel 0 id 0 lun 0 return code = 25040001 I/O error: dev 08:05, sector 18791080 SCSI disk error : host 0 channel 0 id 0 lun 0 return code = 25040001 I/O error: dev 08:05, sector 18791088 SCSI disk error : host 0 channel 0 id 0 lun 0 return code = 25040001 I/O error: dev 08:05, sector 18791096 SCSI disk error : host 0 channel 0 id 0 lun 0 return code = 25040001 I/O error: dev 08:05, sector 18791096 application bug: sqlplus(3841) has SIGCHLD set to SIG_IGN but calls wait(). (see the NOTES section of 'man 2 wait'). Workaround activated. application bug: sqlplus(3841) has SIGCHLD set to SIG_IGN but calls wait(). (see the NOTES section of 'man 2 wait'). Workaround activated. application bug: sqlplus(5580) has SIGCHLD set to SIG_IGN but calls wait(). (see the NOTES section of 'man 2 wait'). Workaround activated.
结论:上面的测试结果证明了我的推断,磁盘坏了,产生了坏道,导致备份时,物理读取该数据文件时候报错
但是这样就又有了新的问题,磁盘坏了,应该所在坏道上的数据文件逻辑结构也损坏,也就是应该产生
逻辑坏块,但是事实并没有。而且在一次数据库重启后,数据库正常,并为报与之有关的错误
相关文章推荐
- 如何解决数学软件Maple v9.5在中文Windows下,公式输入中光标定位错误的问题
- 程序无法定位具体错误时,IIS日志终极排除法帮你解决问题
- .NET分布式事务未提交造成6107错误或系统被挂起的问题分析定位
- 使用AlloyLever来搞定开发调试发布,错误监控上报,用户问题定位
- PDF 补丁丁 0.6.0.3383 版发布(修复书签编辑器坐标定位错误的问题)
- 段错误问题定位总结
- 总线错误和段错误问题的定位
- 链接错误问题原因定位
- 使用 jstack 查询线程死锁错误日志 定位问题
- 记录3——快速检查机器的各种实时错误日志并且及时定位问题———修订版1
- 请教关于总线错误(Bus error) 和 定位处理问题.
- hadoop通过log分析mapreduce的过程及定位错误、分析问题
- SIGSEGV段错误问题定位方法(踩内存)
- [求助]关于Vsiual Studio.NET 2005中Error List种错误的定位问题
- 新编译版本读配置文件错误问题定位
- Android studio 下百度定位API 505 错误 签名问题
- VC中Bug错误定位的问题?
- 定位浏览器页面内部错误(主要是css)问题,请使用HttpWatch
- 绝对定位在IE6下存在left和bottom的定位错误问题
- windows 7安装问题 无法定位到系统分区 + 展开文件时出现网络问题(错误代码:0x80070005)+ win8安装 找不到分区