通过ADDM嗅到存储硬盘故障
2012-07-17 09:31
211 查看
今天ADDM巡检发现出现问题:Finding The throughput of the I/O subsystem was significantly lower than expected
该问题从来未出现过,立即引起笔者的警觉,展开如下相关项发现多个裸设备同时出现IO异常的告警,而按笔者所在的业务系统,该时段显然未进入一天的业务最高锋,而这个问题是以往哪怕是节前最高峰也从未出现的。马上要求系统工程师确认存储子系统有无问题,答复是“远程管理口未接上”。当天下班后笔者强烈的直觉感觉到可能存在存储异常状况,决定前往IDC机房巡检查看存储系统。到IDC居然发现由于临时太急,存储的钥匙也未带上,后通过存储柜门的小孔透视发现一块磁盘亮黄灯。于是立即向系统工程师反馈这一故障,当然我们的存储由于RAID+HOTSPARE结构,即使坏两块盘也不丢数据。
最后分析应该是该块磁盘故障导致IO临时异常,提醒大家,ADDM中观测到大量的裸设备或文件系统异常时一定要关注磁盘有无异常状况。
后续改进措施:要求存储系统接上远程管理口,便于远程检查,以笔者所在机房为例,打车28元,时间至少半个小时以上,如果有远程管理口,这部分时间和金钱显然可以省下来
该问题从来未出现过,立即引起笔者的警觉,展开如下相关项发现多个裸设备同时出现IO异常的告警,而按笔者所在的业务系统,该时段显然未进入一天的业务最高锋,而这个问题是以往哪怕是节前最高峰也从未出现的。马上要求系统工程师确认存储子系统有无问题,答复是“远程管理口未接上”。当天下班后笔者强烈的直觉感觉到可能存在存储异常状况,决定前往IDC机房巡检查看存储系统。到IDC居然发现由于临时太急,存储的钥匙也未带上,后通过存储柜门的小孔透视发现一块磁盘亮黄灯。于是立即向系统工程师反馈这一故障,当然我们的存储由于RAID+HOTSPARE结构,即使坏两块盘也不丢数据。
最后分析应该是该块磁盘故障导致IO临时异常,提醒大家,ADDM中观测到大量的裸设备或文件系统异常时一定要关注磁盘有无异常状况。
后续改进措施:要求存储系统接上远程管理口,便于远程检查,以笔者所在机房为例,打车28元,时间至少半个小时以上,如果有远程管理口,这部分时间和金钱显然可以省下来
Finding The throughput of the I/O subsystem was significantly lower than expected. Impact (minutes) 32.2 Impact (%) 27.5 Recommendations Show All Details | Hide All Details Details Category Benefit (%) Hide Host Configuration 27.5 Action Consider increasing the throughput of the I/O subsystem. Oracle's recommended solution is to stripe all data file using the SAME methodology. You might also need to increase the number of disks for better performance. Alternatively, consider using Oracle's Automatic Storage Management solution. Rationale During the analysis period, the average data files' I/O throughput was 898 K per second for reads and 40 K per second for writes. The average response time for single block reads was 19 milliseconds. Hide Host Configuration 24.2 Action The performance of file /dev/rgaza_disk was significantly worse than other files. If striping all files using the SAME methodology is not possible, consider striping this file over multiple disks. Rationale The average response time for single block reads for this file was 112 milliseconds. Hide Host Configuration 1 Action The performance of file /dev/rsystem_disk was significantly worse than other files. If striping all files using the SAME methodology is not possible, consider striping this file over multiple disks. Rationale The average response time for single block reads for this file was 206 milliseconds. Hide Host Configuration 0.8 Action The performance of file /dev/rdata35_disk was significantly worse than other files. If striping all files using the SAME methodology is not possible, consider striping this file over multiple disks. Rationale The average response time for single block reads for this file was 527 milliseconds. Hide Host Configuration 0.6 Action The performance of file /dev/rtemp1_disk was significantly worse than other files. If striping all files using the SAME methodology is not possible, consider striping this file over multiple disks. Rationale The average response time for single block reads for this file was 34 milliseconds. Findings Path
相关文章推荐
- 分享一例EVA 4400存储硬盘故障数据恢复方案和数据恢复过程 推荐
- DELL EqualLogic PS存储硬盘故障数据恢复
- rsync 实现文件同步 (重要数据通过rsyncr把数据同步到不同的两台服务器上,这样可以防止服务器的硬盘故障导致数据丢失) 客户端同步时如果要排某个目录
- EVA 4400存储硬盘故障导致的数据丢失应该怎么找回和恢复
- S5020 光纤存储FC硬盘故障数据恢复成功案方法和数据恢复过程
- IBM DS4700 存储在线更换故障硬盘步骤及注意事项
- 关于日立存储更换故障硬盘
- 安装Win7系统后只能通过U盘从硬盘启动才能进入系统的故障原因及解决方法
- 关于日立存储更换故障硬盘
- 命令行中通过wmic命令获取硬盘和USB存储设备的序列号
- IBM DS 5300存储硬盘故障数据恢复详解
- Linux下通过scsi-target-utils搭建多硬盘共享存储
- 编写一个函数,从标准输入读取一列整数, 把这些值存储在一个动态分配的数组中并返回这个数组。 函数通过观察EOF判断输入列表是否结束。 数组的第一个数是数组包含的值的个数, 它的后面就是这些整数值。
- 删除Win7 flash缓存文件以节省硬盘的存储空间
- 解决硬盘循环死锁故障妙法
- 冬瓜哥的一项新存储技术专利已正式通过
- MASM32编程通过WMI获取BIOS、主板、硬盘、CPU、网卡的信息
- 虚拟机虚拟硬盘文件丢失,通过xx-flat.vmdk恢复方法
- 通过ADDM进行SQL调优
- 通过分析 JDK 源代码研究 Hash 存储机制