[English]
作者:
fuyuncat
来源:
www.HelloDBA.com
这个数据拯救的测试中,环境更加复杂:旧数据库异常关闭;旧数据库仅剩下三个数据文件,其中两个属于系统表空间且文件号不连续,其它文件,诸如控制文件、在线日志文件、UNDO表空间文件、临时表空间及其它数据文件都已丢失。因此在数据恢复过程中也遇到更多的问题。
同样,需要先获得数据库版本、数据库名称和原始的数据文件位置及大小等信息,设置SID和密码。以下为参数文件内容:
SQL代码
- _allow_resetlogs_corruption = true
- control_files = (/opt/oracle/oradata/ORA10R2/control/ora_control1,
- /opt/oracle/oradata/ORA10R2/control/ora_control2,
- /opt/oracle/oradata/ORA10R2/control/ora_control3)
- db_name = ORA10R2
- db_domain = ""
- db_block_size = 8192
- undo_management = manual
- undo_tablespace = ''
- UNDO_RETENTION = 900
- nls_language = "AMERICAN"
- nls_territory = "AMERICA"
- user_dump_dest = /opt/oracle/admin/ORA10R2/udump
- background_dump_dest = /opt/oracle/admin/ORA10R2/bdump
- core_dump_dest = /opt/oracle/admin/ORA10R2/cdump
- sga_max_size = 200M
- sga_target = 200M
然后创建新库:
SQL代码
- SQL> CREATE DATABASE ORA10R2
- USER SYS IDENTIFIED BY oracle
- USER SYSTEM IDENTIFIED BY oracle
- LOGFILE GROUP 1 ('/opt/oracle/oradata/ORA10R2/onlinelog/redo01.log') SIZE 20M,
- GROUP 2 ('/opt/oracle/oradata/ORA10R2/onlinelog/redo02.log') SIZE 20M,
- GROUP 3 ('/opt/oracle/oradata/ORA10R2/onlinelog/redo03.log') SIZE 20M
- MAXLOGFILES 5
- MAXLOGMEMBERS 5
- MAXLOGHISTORY 1
- MAXDATAFILES 100
- MAXINSTANCES 1
- CHARACTER SET US7ASCII
- NATIONAL CHARACTER SET AL16UTF16
- DATAFILE '/opt/oracle/oradata/ORA10R2/datafile/ORA10R2_system_200m' SIZE 180M REUSE
- --, '/opt/oracle/oradata/ORA10R2/datafile/ORA10R2_system_02' SIZE 200M REUSE
- EXTENT MANAGEMENT LOCAL
- SYSAUX DATAFILE '/opt/oracle/oradata/ORA10R2/datafile/sysaux01' SIZE 100M
- DEFAULT TEMPORARY TABLESPACE tempts1
- TEMPFILE '/opt/oracle/oradata/ORA10R2/datafile/temp01' SIZE 20M
- --UNDO TABLESPACE undotbs1
- -- DATAFILE '/opt/oracle/oradata/ORA10R2/datafile/undotbs01'
- -- SIZE 500M AUTOEXTEND ON NEXT 5120K MAXSIZE UNLIMITED
- ;
- 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23
- Database created.
因为我们旧的系统表空间中有两个不连续的数据文件,因此我们需要想办法创建出相应文件号的数据文件。
首先,我们用BBED可以看到第二个数据文件的文件号:
SQL代码
- BBED> p kcvfh
- ...
- ub2 kccfhfno @52 0x0008
- ...
为了让新的系统文件号为8,我们需要创建几个临时的数据文件提高文件号,然后再添加新的系统文件:
SQL代码
- SQL> create tablespace test datafile '/opt/oracle/oradata/ORA10R2/datafile/test3' size 10M, '/opt/oracle/oradata/ORA10R2/datafile/test4' size 10M, '/opt/oracle/oradata/ORA10R2/datafile/test5' size 10M, '/opt/oracle/oradata/ORA10R2/datafile/test6' size 10M, '/opt/oracle/oradata/ORA10R2/datafile/test7' size 10M;
- alter tablespace system add datafile '/opt/oracle/oradata/ORA10R2/datafile/ORA10R2_system_02' size 200M;
- DROP TABLESPACE test INCLUDING CONTENTS AND DATAFILES;
- Tablespace created.
- SQL>
- Tablespace altered.
- SQL>
- Tablespace dropped.
如果这里我们按照之前11g的过程继续操作,将会遇到大量错误,诸如ORA-00600 [4000], ORA-01555, ORA-01111, ORA-01173, ORA-00600 [4049]等等。经过深入分析,发现导致这些错误的最终原因是:一致性读。
因为新库创建后,其Checkpoint SCN非常小,因此在Bootstrap过程中读取系统数据字典数据块时,与数据块上的ITL中scn比较,发现需要进行一致性回滚,因而会读取回滚段数据,但这些回滚段通常都是不存在,因此会导致以上诸多错误的出现。
当然,这也产生了一个疑问:为什么在11g的恢复过程没有发生这些问题?通过对比发现,在11g的恢复过程中,NID操作重新设置了系统文件中的Checkpoint SCN,使其恢复为其原来的SCN。因为我们已经将其文件头替换为新文件的文件头,因此我猜测NID过程中读取了数据字典中SCN信息,并重写的文件头。从nid过程抛出的ORA-00600错误也可以看到,它调用了系统的内部包完成ID修改过程。
但在10g的NID过程,并没有修改SCN,因此需要我们手动提高SCN。这里,我使用了隐含参数_minimum_giga_scn。设置该参数,我们需要观察alert log,因为数值太小不会成功,日志中出现以下记录
SQL代码
- Current SCN is not changed: _minimum_giga_scn (scn 274877906944) is too small
我们这里设置_minimum_giga_scn = 1024,重启数据库,alert日志提示设置成功
SQL代码
- Advancing SCN to 1099511627776 according to _minimum_giga_scn
接下来重命名新系统文件,拷贝旧数据文件到当前数据文件目录:
SQL代码
- [oracle@server1 ORA10R2]$ mv datafile/ORA10R2_system_200m datafile/ORA10R2_system_200m_new
- [oracle@server1 ORA10R2]$ mv datafile/ORA10R2_system_02 datafile/ORA10R2_system_02_new
- [oracle@server1 ORA10R2]$ cp ../ORA10R2_COPY/datafile/ORA10R2_system_200m ./datafile/
- [oracle@server1 ORA10R2]$ cp ../ORA10R2_COPY/datafile/ORA10R2_system_02 ./datafile/
- [oracle@server1 ORA10R2]$ cp ../ORA10R2_COPY/datafile/ORA10R2_example_50m ./datafile/
然后执行nid修改数据库ID(不过我认为这一步在这里已经不重要了):
SQL代码
- [oracle@server1 ORA10R2]$ nid target=sys/oracle
- DBNEWID: Release 10.2.0.1.0 - Production on Tue Mar 19 00:44:33 2013
- ... ...
- Database ID for database ORA10R2 changed to 4153675250.
- All previous backups and archived redo logs for this database are unusable.
- Database has been shutdown, open database with RESETLOGS option.
- Succesfully changed database ID.
- DBNEWID - Completed succesfully.
重建控制文件,并恢复、OPEN RESETLOGS:
SQL代码
- SQL> CREATE CONTROLFILE REUSE DATABASE ORA10R2 RESETLOGS NOARCHIVELOG
- MAXLOGFILES 5
- MAXLOGMEMBERS 5
- MAXDATAFILES 100
- MAXINSTANCES 1
- MAXLOGHISTORY 292
- LOGFILE
- GROUP 1 '/opt/oracle/oradata/ORA10R2/onlinelog/redo01.log' SIZE 20M,
- GROUP 2 '/opt/oracle/oradata/ORA10R2/onlinelog/redo02.log' SIZE 20M,
- GROUP 3 '/opt/oracle/oradata/ORA10R2/onlinelog/redo03.log' SIZE 20M
- DATAFILE
- '/opt/oracle/oradata/ORA10R2/datafile/ORA10R2_system_200m' SIZE 180M
- , '/opt/oracle/oradata/ORA10R2/datafile/ORA10R2_system_02' SIZE 200M
- --, '/opt/oracle/oradata/ORA10R2/datafile/sysaux01' size 100M
- --, '/opt/oracle/oradata/ORA10R2/datafile/undotbs01'
- CHARACTER SET US7ASCII
- ;
- 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17
- Control file created.
- SQL> recover database using backup controlfile until cancel;
- ORA-00279: change 1099511627897 generated at 03/19/2013 00:43:31 needed for
- thread 1
- ORA-00289: suggestion : /opt/oracle/product/10.2.0/db/dbs/arch1_2_810434521.dbf
- ORA-00280: change 1099511627897 for thread 1 is in sequence #2
- Specify log: {<RET>=suggested | filename | AUTO | CANCEL}
- cancel
- Media recovery cancelled.
- SQL> alter database open resetlogs;
- alter database open resetlogs
- *
- ERROR at line 1:
- ORA-01092: ORACLE instance terminated. Disconnection forced
数据库打开失败,从alert日志看到错误内容为:
SQL代码
- ORA-01177: data file does not match dictionary - probably old incarnation
- ORA-01110: data file 1: '/opt/oracle/oradata/ORA10R2/datafile/ORA10R2_system_200m'
再次执行恢复和打开过程,并启动10046事件跟踪,从跟踪内容中看到引起该错误的语句及等待事件:
SQL代码
- PARSING IN CURSOR #2 len=122 dep=1 uid=0 oct=3 lid=0 tim=1331710783410619 hv=1330125001 ad='2c3341c8'
- select blocks,NVL(ts#,-1),status$,NVL(relfile#,0),maxextend,inc, crscnwrp,crscnbas,NVL(spare1,0) from file$ where file#=:1
- END OF STMT
- PARSE #2:c=0,e=300,p=0,cr=0,cu=0,mis=1,r=0,dep=1,og=4,tim=1331710783410615
- BINDS #2:
- kkscoacd
- Bind#0
- oacdty=02 mxl=22(22) mxlc=00 mal=00 scl=00 pre=00
- oacflg=08 fl2=0001 frm=00 csi=00 siz=24 off=0
- kxsbbbfp=b71e3928 bln=22 avl=02 flg=05
- value=1
- EXEC #2:c=1000,e=574,p=0,cr=0,cu=0,mis=1,r=0,dep=1,og=4,tim=1331710783411304
- WAIT #2: nam='db file sequential read' ela= 14 file#=1 block#=258 blocks=1 obj#=-1 tim=1331710783411361
- WAIT #2: nam='db file sequential read' ela= 10 file#=1 block#=114 blocks=1 obj#=-1 tim=1331710783411419
- ...
- ORA-01177: data file does not match dictionary - probably old incarnation
- ORA-01110: data file 1: '/opt/oracle/oradata/ORA10R2/datafile/ORA10R2_system_200m'
显然,启动进程读取了数据字典file$的数据(第114块数据块),再与数据文件头中的信息对比,存在不匹配数据,则抛出1777错误。
处理1777错误前,我们启动致mount状态,并做文件恢复
SQL代码
- SQL> startup mount
- SQL> recover datafile 1
- Media recovery complete.
- SQL> recover datafile 8
- Media recovery complete.
用bbed打开114号数据块,即file$的数据块,查看第一条记录,即文件1的数据
SQL代码
- BBED> set file 1 block 114
- FILE# 1
- BLOCK# 114
- BBED> p *kdbr[0]
- BBED> x /rnnnnnnnnnnnnnnnnnnnnnnnnnnn
- rowdata[678] @8153
- ------------
- flag@8153: 0x2c (KDRHFL, KDRHFF, KDRHFH)
- lock@8154: 0x00
- cols@8155: 11
- col 0[2] @8156: 1 -- FILE#
- col 1[2] @8159: 2 -- STATUS$
- col 2[4] @8162: 23040 -- BLOCKS
- col 3[1] @8167: 0 -- TS#
- col 4[2] @8169: 1 -- RELFILE#
- col 5[1] @8172: 0 -- MAXEXTEND
- col 6[1] @8174: 0 -- INC
- col 7[1] @8176: 0 -- CRSCNWRP
- col 8[2] @8178: 8 -- CRSCNBAS
- col 9[0] @8181: *NULL* -- OWNERINSTANCE
- col 10[5] @8182: 4194306 -- SPARE1
再打开文件头,可以发现create scn(CRSCNWRP, CRSCNBAS)不匹配
SQL代码
- BBED> set file 1 block 1
- BBED> map
- BBED> p kcvfh
- struct kcvfh, 676 bytes @0
- struct kcvfhbfh, 20 bytes @0
- ub1 type_kcbh @0 0x0b
- ub1 frmt_kcbh @1 0xa2
- ub1 spare1_kcbh @2 0x00
- ...
- struct kcvfhcrs, 8 bytes @100
- ub4 kscnbas @100 0x00000006 -- CRSCNBAS
- ub2 kscnwrp @104 0x0000 -- CRSCNWRP
- ...
- struct kcvfhckp, 36 bytes @484
- struct kcvcpscn, 8 bytes @484
- ub4 kscnbas @484 0x00006fa4 -- CKPCNBAS
- ub2 kscnwrp @488 0x0100 -- CKPCNWRP
- ...
要规避这个问题,就需要时数据字典中的数据与文件头中的信息一致。这里我修改了数据字典当中的数据
用dump函数看我们需要写入什么数据:
SQL代码
- SQL> select dump(6,16) from dual;
- DUMP(8,16)
- -----------------
- Typ=2 Len=2: c1,7
然后用BBED修改该记录:
SQL代码
- BBED> set file 1 block 114
- BBED> dump offset 8178 count 4
- BBED> modify /x 02c107ff
- BBED> sum apply
接下来避免8号文件出现同样错误,同样修改其记录。
8号文件头数据:
SQL代码
- struct kcvfhcrs, 8 bytes @100
- ub4 kscnbas @100 0x00001c64
- ub2 kscnwrp @104 0x0000
数据字典数据:
SQL代码
- col 0[2] @7478: 8
- col 1[2] @7481: 2
- col 2[3] @7484: 25600
- col 3[1] @7488: 0
- col 4[2] @7490: 8
- col 5[1] @7493: 0
- col 6[1] @7495: 0
- col 7[1] @7497: 0
- col 8[4] @7499: 140113
- col 9[0] @7504: *NULL*
- col 10[5] @7505: 33554434
Dump结果:
SQL代码
- SQL> select dump(7268,16) from dual;
- DUMP(7268,16)
- ---------------------
- Typ=2 Len=3: c2,49,45
- SQL> select dump(140113,16) from dual;
- DUMP(140113,16)
- ---------------------
- Typ=2 Len=4: c3,f,2,e
再用BBED修改
SQL代码
- BBED> dump offset 7499 count 6
- BBED> modify /x 04c24945
- BBED> sum apply
修改完成后,开启数据库成功,并读取出旧数据库中数据字典内容:
SQL代码
- SQL> alter database open;
- Database altered.
- SQL> select file_id, tablespace_name from dba_data_files where tablespace_name like '%EXAMPLE%';
- FILE_ID TABLESPACE_NAME
- ---------- ------------------------------
- 4 EXAMPLE
- SQL> select file#, name from v$datafile where file#=4;
- FILE# NAME
- ---------- --------------------------------------------------------------------------------
- 4 /opt/oracle/product/10.2.0/db/dbs/MISSING00004
重命名数据文件:
SQL代码
- SQL> ALTER DATABASE RENAME FILE '/opt/oracle/product/10.2.0/db/dbs/MISSING00004' TO '/opt/oracle/oradata/ORA10R2/datafile/ORA10R2_example_50m';
- Database altered.
重新恢复数据库,使数据文件ONLINE:
SQL代码
- SQL> shutdown
- ... ...
- SQL> recover database until cancel;
- Media recovery complete.
- SQL> ALTER DATABASE DATAFILE '/opt/oracle/oradata/ORA10R2/datafile/ORA10R2_example_50m' ONLINE;
- Database altered.
- SQL> alter database open resetlogs;
- alter database open resetlogs
- *
- ERROR at line 1:
- ORA-01122: database file 4 failed verification check
- ORA-01110: data file 4: '/opt/oracle/oradata/ORA10R2/datafile/ORA10R2_example_50m'
- ORA-01206: file is not part of this database - wrong database id
在打开数据库时,抛错,指示加入的数据文件的数据库ID不匹配。
通过其他文件可以看到当前数据库的DBID:
SQL代码
- BBED> set file 1 block 1
- FILE# 1
- BLOCK# 1
- BBED> p kcvfh
- ...
- ub4 kccfhdbi @28 0xf7940df2
- ...
- BBED> dump offset 28 count 4
- File: /opt/oracle/oradata/ORA10R2/datafile/ORA10R2_system_200m (1)
- Block: 1 Offsets: 28 to 31 Dba:0x00400001
- ------------------------------------------------------------------------
- f20d94f7
通过BBED修改ID:
SQL代码
- BBED> set file 4 block 1
- FILE# 4
- BLOCK# 1
- BBED> p kcvfh
- ...
- ub4 kccfhdbi @28 0xb9888c45
- ...
- BBED> dump offset 30 count 2
- File: /opt/oracle/oradata/ORA10R2/datafile/ORA10R2_example_50m (4)
- Block: 1 Offsets: 30 to 31 Dba:0x01000001
- ------------------------------------------------------------------------
- 88b9
- BBED> modify /x 94f7
- File: /opt/oracle/oradata/ORA10R2/datafile/ORA10R2_example_50m (4)
- Block: 1 Offsets: 30 to 31 Dba:0x01000001
- ------------------------------------------------------------------------
- 94f7
- BBED> dump offset 28 count 4
- File: /opt/oracle/oradata/ORA10R2/datafile/ORA10R2_example_50m (4)
- Block: 1 Offsets: 28 to 31 Dba:0x01000001
- ------------------------------------------------------------------------
- f29d94f7
- BBED> modify /x f20d
- File: /opt/oracle/oradata/ORA10R2/datafile/ORA10R2_example_50m (4)
- Block: 1 Offsets: 28 to 31 Dba:0x01000001
- ------------------------------------------------------------------------
- f20d94f7
- BBED> sum apply
- Check value for File 4, Block 1:
- current = 0x4460, required = 0x4460
再次启动,成功!
SQL代码
- SQL> alter database open resetlogs;
- Database altered.
创建临时表空间,修改用户:
SQL代码
- SQL> CREATE TEMPORARY TABLESPACE temp2 TEMPFILE '/opt/oracle/oradata/ORA10R2/datafile/temp02' size 100M;
- Tablespace created.
- SQL> select 'ALTER USER '||username||' TEMPORARY TABLESPACE TEMP2;' from dba_users;
- 'ALTERUSER'||USERNAME||'TEMPORARYTABLESPACETEMP2;'
- ---------------------------------------------------------------------
- ALTER USER STMADMIN TEMPORARY TABLESPACE TEMP2;
- ... ...
- SQL> ALTER USER STMADMIN TEMPORARY TABLESPACE TEMP2;
- ... ...
- User altered.
用EXP导出数据:
SQL代码
- [oracle@server1 ORA10R2]$ exp system/oracle tablespaces=EXAMPLE file=resecue.dat log=resecue.log
- ... ...
- . . exporting table USR_MV 11 rows exported
- . exporting referential integrity constraints
- . exporting triggers
- Export terminated successfully without warnings.
数据已经成功拯救出来。
不过,这两个测试都是在实验环境当中的,实际情况可能更为复杂。这个恢复方法只能视为一根救命稻草,真正能保证数据安全的还是完善的备份计划和容灾方案。
--- Fuyuncat ---