11g备库搭建碰到自己给自己埋的坑(r7笔记第63天)

记得之前在《一半技术一半生活》中分享过一个设计，因为业务的需求，为了提高业务的处理效率，采用了根据业务的拆库拆表的方式，类似下面的图示。

开发团队也很给力，帮我们协调了好的机器，加了内存，也在新业务2的环境上同步了表结构，抽取了部分数据，然后业务2就开始了紧张的测试，通过这几天的测试，发现系统的性能逐步稳定下来。忙完了这茬，赶紧来考虑搭建备库。自己也算是搭建过很多dataguard环境了，一般的环境中检测dataguard搭建成功与否的一种方式就是使用dg broker来验证，一条简单的show configuration命令如果显示SUCCESS则基本意味着备库搭建成功。所以新申请的机器也没有做过多的改动，感觉都是现成的了。这个环境有一些特殊，特殊之处就是主库为ASM存储，备库为普通文件系统，所以主要的工作就是设置两个convert参数了。使用dg broker能够给予我们非常多的便利。这也是越来越依赖dg broker的原因，搭建备库还是采用最经典的active dupliate方式。 > rman target sys@testbi auxiliary sys@stestbi nocatalog >duplicate target database for standby from active database nofilenamecheck; 同步很快就完成了，然后我就开始设置dg broker的配置。 create configuration dg_testbi as primary database is testbi connect identifier is testbi; add database stestbi as connect identifier is stestbi maintained as physical; 设置完毕，手工检查show configuartion为success DGMGRL> enable configuration; Enabled. DGMGRL> show configuration; Configuration - dg_testbi Protection Mode: MaxPerformance Databases: testbi - Primary database stestbi - Physical standby database Fast-Start Failover: DISABLED Configuration Status: SUCCESS 看起来一切都在计划和控制之中。准备手工，但是发现一个比较奇怪的问题，就是备库是11gR2的，但是无法启动到open阶段。手工尝试启动直接报错。 SQL> alter database open; alter database open * ERROR at line 1: ORA-10458: standby database requires recovery ORA-01152: file 1 was not restored from a sufficiently old backup ORA-01110: data file 1: '/home/U01/app/oracle/oradata/testbi/datafile/system.407.899224793' 看这个情况是备份集出现了问题。这个时候再次查看dg broker的状态就会有错误 Error: ORA-16724: cannot resolve gap for one or more standby databases DGMGRL> show configuration; Configuration - dg_testbi Protection Mode: MaxPerformance Databases: testbi - Primary database Error: ORA-16724: cannot resolve gap for one or more standby databases stestbi - Physical standby database Fast-Start Failover: DISABLED Configuration Status: ERROR 如此一来这个备库还是有一些问题，尝试查看fal_client,fal_servre的设置也没有发现任何问题。但是侥幸重新设置配置，竟然又成功了。 DGMGRL> remove configuration; Removed configuration DGMGRL> create configuration dg_testbi as primary database is testbi connect identifier is testbi; Configuration "dg_testbi" created with primary database "testbi" DGMGRL> add database stestbi as connect identifier is stestbi maintained as physical; Database "stestbi" added DGMGRL> enable configuration; Enabled. DGMGRL> show configuration; Configuration - dg_testbi Protection Mode: MaxPerformance Databases: testbi - Primary database stestbi - Physical standby database Fast-Start Failover: DISABLED Configuration Status: SUCCESS 然后再次open，问题依旧，这可是11gR2的库，ADG也要求不高，问题依旧是 Error: ORA-16724: cannot resolve gap for one or more standby databases 当然设置显示为SUCCESS,我使用verbose的方式查看备库的情况，发现已经有了近4个半小时的延时。 DGMGRL> show database stestbi; Database - stestbi Role: PHYSICAL STANDBY Intended State: APPLY-ON Transport Lag: (unknown) Apply Lag: 4 hours 29 minutes 48 seconds Real Time Query: OFF Instance(s): testbi Database Status: SUCCESS DGMGRL> DGMGRL> exit 这部分日志就是不应用，从后台日志也可以看出，只用RFS工作，查看MRP也没有抛出什么错误来。当然这个问题看起来蛮奇怪，还是需要反复验证，尝试取消日志应用，然后把备库开启到read only状态，11gR2默认会把它再设置为real time apply的方式，从日志里也可以看出。备库中的alert日志内容如下： Managed Standby Recovery starting Real Time Apply Media Recovery Waiting for thread 1 sequence 101 Wed Dec 30 23:00:34 2015 Standby crash recovery need archive log for thread 1 sequence 101 to continue. Please verify that primary database is transporting redo logs to the standby database. Wait timeout: thread 1 sequence 101 Standby crash recovery aborted due to error 16016. Errors in file /home/U01/app/oracle/diag/rdbms/stestbi/testbi/trace/testbi_ora_3241.trc: ORA-16016: archived log for thread 1 sequence# 101 unavailable Recovery interrupted! Completed standby crash recovery. Signalling error 1152 for datafile 1! Errors in file /home/U01/app/oracle/diag/rdbms/stestbi/testbi/trace/testbi_ora_3241.trc: ORA-10458: standby database requires recovery ORA-01152: file 1 was not restored from a sufficiently old backup ORA-01110: data file 1: '/home/U01/app/oracle/oradata/testbi/datafile/system.407.899224793' ORA-10458 signalled during: alter database open... 可以发现原来备库中已经接收不到序列号为101的归档了。在备库中查看，确实只有102开头的归档了，那么101的归档呢。这个时候回过头来再看，发现主库竟然默默在运行着一个crontab 任务。而且触发频率较高。 0,15,30,45 * * * * $HOME/dbadmin/scripts/rm_archive.sh 查看这个脚本的内容，已经让我心灰意冷。这个脚本本身还是存在一些问题，算是直接删除归档的节奏。也没有判断是否应用到备库。 #!/bin/bash . ~oracle/.bash_profile rman target / < CONFIGURE ARCHIVELOG DELETION POLICY TO none; crosscheck archivelog all; delete noprompt expired archivelog all; delete noprompt archivelog until time "sysdate-1/12"; exit EOF 当然我们需要修改一下。至少得让归档应用到备库去。 CONFIGURE ARCHIVELOG DELETION POLICY TO APPLIED ON ALL STANDBY; crosscheck archivelog all; delete noprompt expired archivelog all; delete noprompt archivelog until time "sysdate-1"; 看来自己真是给自己埋了一个坑，自己也毫不犹豫就跳了进去，等回过头来，发现又是一场白忙活，因为库不是很大，如果统计库几个T,几十个T，那就绝对会被耗掉意志。