案例:归档自动清理脚本失效及连带影响
前些天给同事准备一套模拟环境用于测试一个OGG问题: 环境架构:Oracle 11.2.0.4 RAC + 单实例11.2.0.4 ADG(同时作为OGG源端,OGG版本19.1.0.0.4) + 单实例19.3多租户(其中1个PDB作为OGG目标端,OGG版本19.1.0.0.4) 现象概述:发现OGG进程abended,原因是主库归档满,但是实际已配置归档自动清理脚本(归档空间使用大于90%时清理),进一步查看发现根源是归档清理失效,报错RMAN-08137,导致的影响有很多,首先主库无法进行测试数据写入,其次ADG备库产生延迟,然后OGG源端抽取进程因超时报错OGG-02149导致abended..
- 1.故障现象:归档清理报错RMAN-08137
- 2.解决方案:设置"_deferred_log_dest_is_valid"参数
- 3.恢复故障:归档清理恢复正常,ADG同步正常,OGG进程启动正常
1.故障现象:归档清理报错RMAN-08137
自动归档清理失效,报错RMAN-08137,手工清理现象一样:
RMAN> DELETE NOPROMPT ARCHIVELOG ALL COMPLETED BEFORE 'SYSDATE-1/24';
RMAN-08137: WARNING: archived log not deleted, needed for standby or upstream capture process
archived log file name=+FRA/crmdb/archivelog/2020_07_08/thread_1_seq_422.426.1045198149 thread=1 sequence=422
RMAN-08137: WARNING: archived log not deleted, needed for standby or upstream capture process
archived log file name=+FRA/crmdb/archivelog/2020_07_08/thread_1_seq_423.424.1045198157 thread=1 sequence=423
...
使用oerr查看RMAN-08137的描述:
[oracle@jystdrac1 logs]$ oerr rman 8137
8137, 3, "WARNING: archived log not deleted, needed for standby or upstream capture process"
// *Cause: An archived log that should have been deleted was not as it was
// required by upstream capture process or Data Guard.
// The next message identifies the archived log.
// *Action: This is an informational message. The archived log can be
// deleted after it is no longer needed. See the
// documentation for Data Guard to alter the set of active
// Data Guard destinations. See the documentation for
// Streams to alter the set of active streams.
查看rman的设置,未发现特殊设置:
RMAN> show all;
using target database control file instead of recovery catalog
RMAN configuration parameters for database with db_unique_name CRMDB are:
CONFIGURE RETENTION POLICY TO REDUNDANCY 1; # default
CONFIGURE BACKUP OPTIMIZATION OFF; # default
CONFIGURE DEFAULT DEVICE TYPE TO DISK; # default
CONFIGURE CONTROLFILE AUTOBACKUP OFF; # default
CONFIGURE CONTROLFILE AUTOBACKUP FORMAT FOR DEVICE TYPE DISK TO '%F'; # default
CONFIGURE DEVICE TYPE DISK PARALLELISM 1 BACKUP TYPE TO BACKUPSET; # default
CONFIGURE DATAFILE BACKUP COPIES FOR DEVICE TYPE DISK TO 1; # default
CONFIGURE ARCHIVELOG BACKUP COPIES FOR DEVICE TYPE DISK TO 1; # default
CONFIGURE MAXSETSIZE TO UNLIMITED; # default
CONFIGURE ENCRYPTION FOR DATABASE OFF; # default
CONFIGURE ENCRYPTION ALGORITHM 'AES128'; # default
CONFIGURE COMPRESSION ALGORITHM 'BASIC' AS OF RELEASE 'DEFAULT' OPTIMIZE FOR LOAD TRUE ; # default
CONFIGURE ARCHIVELOG DELETION POLICY TO NONE; # default
CONFIGURE SNAPSHOT CONTROLFILE NAME TO '+DATA/crmdb/snapcf_crmdb.f';
2.解决方案:设置"_deferred_log_dest_is_valid"参数
进一步搜索查询,匹配到MOS:
- RMAN-08137 on Primary Database although Archive Destination to Standby is deferred (Doc ID 1380368.1)
给出的原因和解决方案引用如下:
CAUSE If we defer an Archive Destination to a Standby Database, the Primary Database will still consider the Standby Database as existing but temporary unavailable eg. for Maintenance. This can happen if you stop Log Transport Services from the Data Guard Broker or manually defer the State for the Archive Destination. SOLUTION As long as the Archive Destination (log_archive_dest_n) is still set, we consider the Standby Database as still existing and preserve the ArchiveLogs on the Primary Database to perform Gap Resolution when the Archive Destination is valid again. There are Situations when this is not wanted, eg. the Standby Database was activated or removed but you still keep the Archive Destination because you want to rebuild the Standby Database later again. In this Case you can set the hidden Parameter "_deferred_log_dest_is_valid" to FALSE (default TRUE) which will consider deferred Archive Destinations as completely unavailable and will not preserve ArchiveLogs for those Destinations any more. It is a dynamic Parameter and can be set this Way: SQL> alter system set "_deferred_log_dest_is_valid" = FALSE scope=both; NOTE: This Parameter has been introduced with Oracle Database 11.2.0.x. In earlier Versions you have to unset the log_archive_dest_n-Parameter pointing to the remote Standby Database to make the Primary Database considering it as completely unavailable. There also exists a Patch on Top of 11.1.0.7 for some Platforms to include this Parameter in 11.1.0.7, too. This is Patch Number 8468117.
上面的描述很清楚了,实际结合当前环境,发现确实是有其他log_archive_dest_n设置为defer,而这些我们实际不再用了,要么彻底清除,要么按照MOS设置隐藏参数,我们先查下这个隐藏参数的当前默认设置,发现是Ture:
NAME DESCRIPTION VALUE
----------------------------------- ------------------------------------------------------------------ ------------------------------
_deferred_log_dest_is_valid consider deferred log dest as valid for log deletion (TRUE/FALSE) TRUE
这是一个动态的隐藏参数,可以直接修改为FALSE:
alter system set "_deferred_log_dest_is_valid" = FALSE;
--再次查询,已经修改成功:
NAME DESCRIPTION VALUE
----------------------------------- ------------------------------------------------------------------ ------------------------------
_deferred_log_dest_is_valid consider deferred log dest as valid for log deletion (TRUE/FALSE) FALSE
3.恢复故障:归档清理恢复正常,ADG同步正常,OGG进程启动正常
3.1 归档清理恢复正常 上面设置隐藏参数之后,就可以正常删除归档了:
RMAN> DELETE NOPROMPT ARCHIVELOG ALL COMPLETED BEFORE 'SYSDATE-1/24';
...
archived log file name=+FRA/crmdb/archivelog/2020_07_09/thread_2_seq_399.462.1045343023 RECID=1655 STAMP=1045343047
deleted archived log
archived log file name=+FRA/crmdb/archivelog/2020_07_09/thread_2_seq_400.396.1045343069 RECID=1657 STAMP=1045343099
deleted archived log
archived log file name=+FRA/crmdb/archivelog/2020_07_09/thread_2_seq_401.404.1045343111 RECID=1661 STAMP=1045343151
Deleted 87 objects
3.2 ADG同步恢复正常 主库切下日志,再看ADG同步状态,最终恢复正常:
SQL> @dg
NAME VALUE UNIT TIME_COMPUTED DATUM_TIME
------------------------------ ------------------------------ ------------------------------ ------------------------------ ------------------------------
transport lag +06 03:04:01 day(2) to second(0) interval 07/16/2020 00:10:08 07/16/2020 00:10:08
apply lag +06 03:04:01 day(2) to second(0) interval 07/16/2020 00:10:08 07/16/2020 00:10:08
apply finish time +00 00:00:04.358 day(2) to second(3) interval 07/16/2020 00:10:08
estimated startup time 65 second 07/16/2020 00:10:08
SQL> /
NAME VALUE UNIT TIME_COMPUTED DATUM_TIME
------------------------------ ------------------------------ ------------------------------ ------------------------------ ------------------------------
transport lag +00 00:00:00 day(2) to second(0) interval 07/16/2020 00:10:16 07/16/2020 00:10:15
apply lag +00 00:00:00 day(2) to second(0) interval 07/16/2020 00:10:16 07/16/2020 00:10:15
apply finish time +00 00:00:00.000 day(2) to second(3) interval 07/16/2020 00:10:16
estimated startup time 65 second 07/16/2020 00:10:16
3.3 OGG进程启动正常 再将OGG的进程手工启动,恢复正常:
GGSCI (test03) 1> info all
Program Status Group Lag at Chkpt Time Since Chkpt
MANAGER RUNNING
EXTRACT RUNNING DPAUDDG 00:00:00 00:00:10
EXTRACT ABENDED EXTAUDDG 00:00:00 146:59:54
GGSCI (test03) 2> view report EXTAUDDG
...
2020-07-09 21:11:37 ERROR OGG-02149 Standby database has made no progress for more than 30,000 seconds.
GGSCI (test03) 3> info all
Program Status Group Lag at Chkpt Time Since Chkpt
MANAGER RUNNING
EXTRACT RUNNING DPAUDDG 00:00:00 00:00:02
EXTRACT ABENDED EXTAUDDG 00:00:00 147:01:07
GGSCI (test03) 4> start EXTAUDDG
Sending START request to MANAGER ...
EXTRACT EXTAUDDG starting
GGSCI (test03) 5> info all
Program Status Group Lag at Chkpt Time Since Chkpt
MANAGER RUNNING
EXTRACT RUNNING DPAUDDG 00:00:00 00:00:00
EXTRACT RUNNING EXTAUDDG 00:00:00 00:00:00
--target ogg ok!
GGSCI (db19) 1> info all
Program Status Group Lag at Chkpt Time Since Chkpt
MANAGER RUNNING
REPLICAT RUNNING REPAUD1A 00:00:00 00:00:05
至此,整个架构涉及到的所有环境均恢复正常。
- JavaScript 教程
- JavaScript 编辑工具
- JavaScript 与HTML
- JavaScript 与Java
- JavaScript 数据结构
- JavaScript 基本数据类型
- JavaScript 特殊数据类型
- JavaScript 运算符
- JavaScript typeof 运算符
- JavaScript 表达式
- JavaScript 类型转换
- JavaScript 基本语法
- JavaScript 注释
- Javascript 基本处理流程
- Javascript 选择结构
- Javascript if 语句
- Javascript if 语句的嵌套
- Javascript switch 语句
- Javascript 循环结构
- Javascript 循环结构实例
- Javascript 跳转语句
- Javascript 控制语句总结
- Javascript 函数介绍
- Javascript 函数的定义
- Javascript 函数调用
- Javascript 几种特殊的函数
- JavaScript 内置函数简介
- Javascript eval() 函数
- Javascript isFinite() 函数
- Javascript isNaN() 函数
- parseInt() 与 parseFloat()
- escape() 与 unescape()
- Javascript 字符串介绍
- Javascript length属性
- javascript 字符串函数
- Javascript 日期对象简介
- Javascript 日期对象用途
- Date 对象属性和方法
- Javascript 数组是什么
- Javascript 创建数组
- Javascript 数组赋值与取值
- Javascript 数组属性和方法
- hadoop2.6.0-HA-QJM
- hadoop常用维护命令
- 自定义Chrome等浏览器搜索引擎
- hiveServer2服务端安装
- hive数据加载
- Python 技术篇-使用opencv读取图片实例演示,python安装opencv库
- 关于页更改并加入一些在线服务
- Hadoop-2.6.0为基础的Hive安装
- Python 技术篇-opencv读取中文路径图片报错及解决办法
- Javaweb鼠标事件案例分析—鼠标移入移出表格颜色变化
- docker registry V2私有仓库搭建
- Python 路径问题:cv2.error: OpenCV(4.1.0)...size.width>0 && size.height>0 in function 'cv::imshow'. 原因与解决
- 算法案例分析—字符串模式匹配算法
- Docker-软件工程集装箱技术
- PyQt5 技术篇-获取电脑屏幕桌面的宽、高和分辨率