LAD(Log Archive Dest)配置不当引起备份失败
2015-06-26 18:41
501 查看
一.问题起因
2014/10/14接某客户反馈,备份数据库的crontab执行失败。远程连接分析后发现是因为2014/09/13灾备演练过后dataguard参数没有正确调整导致的归档未清理,过多归档备份时因空间不足而失败。详细过程如下二.日志分析
1.登陆后检查备份日志后发现数据文件备份成功但是备份归档时失败:
including current SPFILE in backup set
channel c1: starting piece 1 at 13-OCT-14
channel c1: finished piece 1 at 13-OCT-14
piece handle=/backup/addrrman/full_ADDRPROD_20141013_14004_1 tag=TAG20141013T220005 comment=NONE
channel c1: backup set complete, elapsed time: 00:00:01
channel c2: finished piece 1 at 13-OCT-14
piece handle=/backup/addrrman/full_ADDRPROD_20141013_14001_1 tag=TAG20141013T220005 comment=NONE
channel c2: backup set complete, elapsed time: 01:45:12
channel c3: finished piece 1 at 13-OCT-14
piece handle=/backup/addrrman/full_ADDRPROD_20141013_14002_1 tag=TAG20141013T220005 comment=NONE
channel c3: backup set complete, elapsed time: 01:46:01
Finished backup at 13-OCT-14
sql statement: alter system archive log current
。。。。skip .....
released channel: c1
released channel: c2
released channel: c3
RMAN-00571: ===========================================================
RMAN-00569: =============== ERROR MESSAGE STACK FOLLOWS ===============
RMAN-00571: ===========================================================
RMAN-03009: failure of backup command on c3 channel at 10/14/2014 00:30:34
<span style="color:#ff0000;">ORA-19502: write error on file "/backup/addrrman/arch_ADDRPROD_20141014_14093_1", block number 442369 (block size=512)
ORA-27063: number of bytes read/written is incorrect
IBM AIX RISC System/6000 Error: 28: No space left on device
Additional information: -1
Additional information: 1048576</span>
2.检查数据文件备份集大小发现数据量未剧增
oracle@p740a:/backup/addrrman[addr11g1]$ls -ltr
total 143197088
-rw------- 1 oracle oinstall 98 Aug 21 18:53 nohup.out
-rw-r--r-- 1 oracle oinstall 7702 Oct 13 22:00 analyze.lst
-rw-r----- 1 oracle asmadmin 23931797504 Oct 13 23:44 full_ADDRPROD_20141013_14000_1
-rw-r----- 1 oracle asmadmin 7847936 Oct 13 23:44 full_ADDRPROD_20141013_14003_1
-rw-r----- 1 oracle asmadmin 98304 Oct 13 23:44 full_ADDRPROD_20141013_14004_1
-rw-r----- 1 oracle asmadmin 23550468096 Oct 13 23:45 full_ADDRPROD_20141013_14001_1
-rw-r----- 1 oracle asmadmin 25820962816 Oct 13 23:46 full_ADDRPROD_20141013_14002_1
-rw-r--r-- 1 oracle oinstall 2659758 Oct 14 00:34 rman_delete.log
-rw-r--r-- 1 oracle oinstall 803655 Oct 14 00:37 delete_local_std_arch.log
-rw-r--r-- 1 oracle oinstall 1210456 Oct 14 00:38 rman_bk.log
-rw-r--r-- 1 oracle oinstall 527 Oct 14 00:38 delete_cd_std_arch.log
3.检查归档删除日志发现9/13日归档因为没有在所有standby去apply
RMAN-08120: WARNING: archived log not deleted, not yet applied by standby
archived log file name=+ARCHDG/addrprod/archivelog/2014_09_13/thread_1_seq_13079.1905.858179699 thread=1 sequence=13079
RMAN-08120: WARNING: archived log not deleted, not yet applied by standby
archived log file name=+ARCHDG/addrprod/archivelog/2014_09_13/thread_1_seq_13080.1618.858181499 thread=1 sequence=13080
<span style="color:#ff0000;">RMAN-08120: WARNING: archived log not deleted, not yet applied by standby</span>
archived log file name=+ARCHDG/addrprod/archivelog/2014_09_13/thread_1_seq_13081.1619.858182367 thread=1 sequence=13081
4.结合归档删除脚本中的archivelog删除策略
rman target / nocatalog log /backup/addrrman/rman_delete.log<<EOF
allocate channel for maintenance type disk connect 'sys/xxxx@addr11g1';
allocate channel for maintenance type disk connect 'sys/xxxx@addr11g2';
CONFIGURE RETENTION POLICY TO REDUNDANCY 1;
<span style="color:#ff0000;">CONFIGURE ARCHIVELOG DELETION POLICY TO APPLIED ON ALL STANDBY;-->在所有standby应用后才能删除</span>
crosscheck backup;
crosscheck archivelog all;
delete noprompt archivelog until time 'sysdate-7';
delete noprompt obsolete;
delete noprompt expired backup;
exit
EOF
5.检查log_archive_dest和log_archive_dest_state发现有defer的LAD
NAME TYPE VALUE
------------------------------------ ----------- ------------------------------
log_archive_dest string
log_archive_dest_1 string LOCATION=+ARCHDG VALID_FOR=(AL
L_LOGFILES,ALL_ROLES) DB_UNIQU
E_NAME=addrprod
log_archive_dest_3 string service=ADDRCD arch async vali
d_for=(ONLINE_LOGFILES,PRIMARY
_ROLE) reopen=60 db_unique_nam
e=ADDRCD
log_archive_dest_4 string service=ADDRPROD_STD arch asyn
c valid_for=(ONLINE_LOGFILES,P
RIMARY_ROLE) reopen=60 db_uniq
ue_name=ADDRPROD_STD
log_archive_dest_state_1 string ENABLE
<span style="background-color: rgb(255, 255, 0);">log_archive_dest_state_3 string defer</span>
log_archive_dest_state_4 string enable
三.问题解决
清理log_archive_dest_3后重新手工删除archivelog 成功:SQL> show parameter log_archive_dest_3;
NAME TYPE VALUE
------------------------------------ ---------- ------------------------------
log_archive_dest_3 string service=ADDRCD arch async vali
d_for=(ONLINE_LOGFILES,PRIMARY
_ROLE) reopen=60 db_unique_nam
e=ADDRCD
log_archive_dest_30 string
log_archive_dest_31 string
SQL> alter system set log_archive_dest_3='' scope=both sid='*';
System altered.
SQL> show parameter log_archive_dest_3;
NAME TYPE VALUE
------------------------------------ ---------- ------------------------------
log_archive_dest_3 string
log_archive_dest_30 string
log_archive_dest_31 string
删除归档时未再报错: RMAN> CONFIGURE ARCHIVELOG DELETION POLICY TO APPLIED ON ALL S 4000 TANDBY; delete noprompt archivelog until time 'sysdate-7';using target database control file instead of recovery catalog old RMAN configuration parameters: CONFIGURE ARCHIVELOG DELETION POLICY TO APPLIED ON ALL STANDBY; new RMAN configuration parameters: CONFIGURE ARCHIVELOG DELETION POLICY TO APPLIED ON ALL STANDBY; new RMAN configuration parameters are successfully stored RMAN> allocated channel: ORA_DISK_1 channel ORA_DISK_1: SID=963 instance=addr11g1 device type=DISK allocated channel: ORA_DISK_2 channel ORA_DISK_2: SID=1717 instance=addr11g1 device type=DISK allocated channel: ORA_DISK_3 channel ORA_DISK_3: SID=1908 instance=addr11g1 device type=DISK allocated channel: ORA_DISK_4 channel ORA_DISK_4: SID=2189 instance=addr11g1 device type=DISK List of Archived Log Copies for database with db_unique_name ADDRPROD ===================================================================== Key Thrd Seq S Low Time ------- ---- ------- - --------- 168624 1 13079 A 13-SEP-14 Name: +ARCHDG/addrprod/archivelog/2014_09_13/thread_1_seq_13079.1905.858179699 168643 1 13080 A 13-SEP-14 Name: +ARCHDG/addrprod/archivelog/2014_09_13/thread_1_seq_13080.1618.858181499 168646 1 13081 A 13-SEP-14 Name: +ARCHDG/addrprod/archivelog/2014_09_13/thread_1_seq_13081.1619.858182367 168648 1 13082 A 13-SEP-14 Name: +ARCHDG/addrprod/archivelog/2014_09_13/thread_1_seq_13082.1620.858182411 168656 1 13083 A 13-SEP-14 Name: +ARCHDG/addrprod/archivelog/2014_09_13/thread_1_seq_13083.1625.858182901 168658 1 13084 A 13-SEP-14 Name: +ARCHDG/addrprod/archivelog/2014_09_13/thread_1_seq_13084.1624.858182903 168662 1 13085 A 13-SEP-14 Name: +ARCHDG/addrprod/archivelog/2014_09_13/thread_1_seq_13085.1627.858182967 168666 1 13086 A 13-SEP-14 Name: +ARCHDG/addrprod/archivelog/2014_09_13/thread_1_seq_13086.1629.858184767 168670 1 13087 A 13-SEP-14 Name: +ARCHDG/addrprod/archivelog/2014_09_13/thread_1_seq_13087.1631.858186569 168674 1 13088 A 13-SEP-14 Name: +ARCHDG/addrprod/archivelog/2014_09_13/thread_1_seq_13088.1633.858188367
四.小结
这种临时性操作的收尾不干净导致的问题应该也不少见,本次没有引起重大故障(当然并不意味着每次都不会引起重大故障)。所以,日常工作中我们还是需要从多方面入手确保系统的正常运行,例如:1).足够熟悉系统环境,清楚掌握各个临时操作之后如何恢复回去;
2).当然以上一点纯粹不靠谱啦,都说好记性不如烂笔头,最好还是有标准化的OM咯;
3).相关临时操作完成后需要对系统进行一次完整的检查。
相关文章推荐
- 黑马程序员——Java基础---IO(一)---IO流概述、字符流、字节流、流操作规律
- 在arcgis javascript 中map地图的div 总是显示高度400的问题
- 第二阶段冲刺总结
- 华为2012.09.03浙大机试题
- ExtJs布局中,控件如何水平居中?
- ExtJs布局中,控件如何水平居中?
- 做汉堡
- Docker的网络模式及Pipework工具介绍(转)
- 回答自己的提问
- 浙江大华2012笔试题+答案解析
- java获取当前系统信息
- Android(Lollipop/5.0) Material Design(二) 入门指南
- iOS - iOS开发碎碎念
- Android 仿美团网,大众点评购买框悬浮效果之修改版
- 人月经典语录
- 【Unity3D Android】ADT下载 Androkd 5.1.1(API 22)
- asp.net Js里面用padLeft ()方法
- 004 Annoyance
- 微信账号的区别
- 转载:Best Practices for Speeding Up Your Web Site