[中文]
Author:
fuyuncat
Source:
www.HelloDBA.com
Date:
2013-03-20 05:02:15
In this case, the situation is more complicated. It's an Oracle 10.2.0.1 database running in Linux. The database was shutdown abnormally; There are only 3 data files can be used, 2 of them are system files, whose file IDs are not in sequence; the other one is the data file contain the data to be rescued. The control files, redo logs, undo files, temp files and other data files are lost. During this process, I encoutered more problems, but I work it out finally.
Of course, we also need the RDBMS version, DB name, file position/size and other information, and then set SID and password. The content of the parameter is shown as below,
SQL代码
- _allow_resetlogs_corruption = true
- control_files = (/opt/oracle/oradata/ORA10R2/control/ora_control1,
- /opt/oracle/oradata/ORA10R2/control/ora_control2,
- /opt/oracle/oradata/ORA10R2/control/ora_control3)
- db_name = ORA10R2
- db_domain = ""
- db_block_size = 8192
- undo_management = manual
- undo_tablespace = ''
- UNDO_RETENTION = 900
- nls_language = "AMERICAN"
- nls_territory = "AMERICA"
- user_dump_dest = /opt/oracle/admin/ORA10R2/udump
- background_dump_dest = /opt/oracle/admin/ORA10R2/bdump
- core_dump_dest = /opt/oracle/admin/ORA10R2/cdump
- sga_max_size = 200M
- sga_target = 200M
All of these prepare, I start to create a new database:
SQL代码
- SQL> CREATE DATABASE ORA10R2
- USER SYS IDENTIFIED BY oracle
- USER SYSTEM IDENTIFIED BY oracle
- LOGFILE GROUP 1 ('/opt/oracle/oradata/ORA10R2/onlinelog/redo01.log') SIZE 20M,
- GROUP 2 ('/opt/oracle/oradata/ORA10R2/onlinelog/redo02.log') SIZE 20M,
- GROUP 3 ('/opt/oracle/oradata/ORA10R2/onlinelog/redo03.log') SIZE 20M
- MAXLOGFILES 5
- MAXLOGMEMBERS 5
- MAXLOGHISTORY 1
- MAXDATAFILES 100
- MAXINSTANCES 1
- CHARACTER SET US7ASCII
- NATIONAL CHARACTER SET AL16UTF16
- DATAFILE '/opt/oracle/oradata/ORA10R2/datafile/ORA10R2_system_200m' SIZE 180M REUSE
- --, '/opt/oracle/oradata/ORA10R2/datafile/ORA10R2_system_02' SIZE 200M REUSE
- EXTENT MANAGEMENT LOCAL
- SYSAUX DATAFILE '/opt/oracle/oradata/ORA10R2/datafile/sysaux01' SIZE 100M
- DEFAULT TEMPORARY TABLESPACE tempts1
- TEMPFILE '/opt/oracle/oradata/ORA10R2/datafile/temp01' SIZE 20M
- --UNDO TABLESPACE undotbs1
- -- DATAFILE '/opt/oracle/oradata/ORA10R2/datafile/undotbs01'
- -- SIZE 500M AUTOEXTEND ON NEXT 5120K MAXSIZE UNLIMITED
- ;
- 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23
- Database created.
Because the file IDs of the system data files are not in sequence, we need make sure the new file id align with the old one. And we can use BBED to get the file id of the 2nd system file.
SQL代码
- BBED> p kcvfh
- ...
- ub2 kccfhfno @52 0x0008
- ...
To assign file id as 8, we need create some transient data files to advance the file id, and then add the 2nd system file,
SQL代码
- SQL> create tablespace test datafile '/opt/oracle/oradata/ORA10R2/datafile/test3' size 10M, '/opt/oracle/oradata/ORA10R2/datafile/test4' size 10M, '/opt/oracle/oradata/ORA10R2/datafile/test5' size 10M, '/opt/oracle/oradata/ORA10R2/datafile/test6' size 10M, '/opt/oracle/oradata/ORA10R2/datafile/test7' size 10M;
- alter tablespace system add datafile '/opt/oracle/oradata/ORA10R2/datafile/ORA10R2_system_02' size 200M;
- DROP TABLESPACE test INCLUDING CONTENTS AND DATAFILES;
- Tablespace created.
- SQL>
- Tablespace altered.
- SQL>
- Tablespace dropped.
If I continue the process as same as the previous one in 11g, I will encounter a huge number errors, such as ORA-00600 [4000], ORA-01555, ORA-01111, ORA-01173, ORA-00600 [4049], etc. With further analysing, I found the root cause of these erros is "consistent reads".
During the database open phase, oracle need read the system dictionaries to complete bootstrap. However, the Checkpoint SCN of the new database is quite small, when the process reading the data blocks of the dictionaries, it found the SCN in the ITL is larger than the transaction SCN, and then it read the rollback segments to keep the data consistency. While the undo blocks of rollback segments may not exist or over written, consequently, it raised such errors.
But the truth also brings a question: why did I not encounter the similar problem in the previous test. After analyse the data files, I found it is because the NID process modified the SCN information in file header of the system file. However, I have replaced the file header with the new one. So, I guess the NID process might read the data from dictionary and wrote them to the file header. And we can also get tips from the ORA-00600 error raised in previous case, which indicated the NID process call an internal package to change the DBID.
However, the nid process of 10g did not modify the SCN. Hence, I have to advance the SCN manually. To achive this objective, I used the hidden parameter _minimum_giga_scn. One thing we need note is that it will fail if the value is too small. Correspondingly, we can find below entries from alert log,
SQL代码
- Current SCN is not changed: _minimum_giga_scn (scn 274877906944) is too small
I set _minimum_giga_scn = 1024, and rebound the database, the alert log indicate that the SCN is advanced successfully.
SQL代码
- Advancing SCN to 1099511627776 according to _minimum_giga_scn
Then, rename the new system files and copy the old data files to current data file folder,
SQL代码
- [oracle@server1 ORA10R2]$ mv datafile/ORA10R2_system_200m datafile/ORA10R2_system_200m_new
- [oracle@server1 ORA10R2]$ mv datafile/ORA10R2_system_02 datafile/ORA10R2_system_02_new
- [oracle@server1 ORA10R2]$ cp ../ORA10R2_COPY/datafile/ORA10R2_system_200m ./datafile/
- [oracle@server1 ORA10R2]$ cp ../ORA10R2_COPY/datafile/ORA10R2_system_02 ./datafile/
- [oracle@server1 ORA10R2]$ cp ../ORA10R2_COPY/datafile/ORA10R2_example_50m ./datafile/
And execute NID to modify the DBID (But I don't think this step is necessary now)
SQL代码
- [oracle@server1 ORA10R2]$ nid target=sys/oracle
- DBNEWID: Release 10.2.0.1.0 - Production on Tue Mar 19 00:44:33 2013
- ... ...
- Database ID for database ORA10R2 changed to 4153675250.
- All previous backups and archived redo logs for this database are unusable.
- Database has been shutdown, open database with RESETLOGS option.
- Succesfully changed database ID.
- DBNEWID - Completed succesfully.
Re-create the control files, open database RESETLOGS:
SQL代码
- SQL> CREATE CONTROLFILE REUSE DATABASE ORA10R2 RESETLOGS NOARCHIVELOG
- MAXLOGFILES 5
- MAXLOGMEMBERS 5
- MAXDATAFILES 100
- MAXINSTANCES 1
- MAXLOGHISTORY 292
- LOGFILE
- GROUP 1 '/opt/oracle/oradata/ORA10R2/onlinelog/redo01.log' SIZE 20M,
- GROUP 2 '/opt/oracle/oradata/ORA10R2/onlinelog/redo02.log' SIZE 20M,
- GROUP 3 '/opt/oracle/oradata/ORA10R2/onlinelog/redo03.log' SIZE 20M
- DATAFILE
- '/opt/oracle/oradata/ORA10R2/datafile/ORA10R2_system_200m' SIZE 180M
- , '/opt/oracle/oradata/ORA10R2/datafile/ORA10R2_system_02' SIZE 200M
- --, '/opt/oracle/oradata/ORA10R2/datafile/sysaux01' size 100M
- --, '/opt/oracle/oradata/ORA10R2/datafile/undotbs01'
- CHARACTER SET US7ASCII
- ;
- 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17
- Control file created.
- SQL> recover database using backup controlfile until cancel;
- ORA-00279: change 1099511627897 generated at 03/19/2013 00:43:31 needed for
- thread 1
- ORA-00289: suggestion : /opt/oracle/product/10.2.0/db/dbs/arch1_2_810434521.dbf
- ORA-00280: change 1099511627897 for thread 1 is in sequence #2
- Specify log: {<RET>=suggested | filename | AUTO | CANCEL}
- cancel
- Media recovery cancelled.
- SQL> alter database open resetlogs;
- alter database open resetlogs
- *
- ERROR at line 1:
- ORA-01092: ORACLE instance terminated. Disconnection forced
Opening database failed. The errors from alert log are as below,
SQL代码
- ORA-01177: data file does not match dictionary - probably old incarnation
- ORA-01110: data file 1: '/opt/oracle/oradata/ORA10R2/datafile/ORA10R2_system_200m'
I did it again with 10046 event enabled, and I found the SQL caused the error and the waits.
SQL代码
- PARSING IN CURSOR #2 len=122 dep=1 uid=0 oct=3 lid=0 tim=1331710783410619 hv=1330125001 ad='2c3341c8'
- select blocks,NVL(ts#,-1),status$,NVL(relfile#,0),maxextend,inc, crscnwrp,crscnbas,NVL(spare1,0) from file$ where file#=:1
- END OF STMT
- PARSE #2:c=0,e=300,p=0,cr=0,cu=0,mis=1,r=0,dep=1,og=4,tim=1331710783410615
- BINDS #2:
- kkscoacd
- Bind#0
- oacdty=02 mxl=22(22) mxlc=00 mal=00 scl=00 pre=00
- oacflg=08 fl2=0001 frm=00 csi=00 siz=24 off=0
- kxsbbbfp=b71e3928 bln=22 avl=02 flg=05
- value=1
- EXEC #2:c=1000,e=574,p=0,cr=0,cu=0,mis=1,r=0,dep=1,og=4,tim=1331710783411304
- WAIT #2: nam='db file sequential read' ela= 14 file#=1 block#=258 blocks=1 obj#=-1 tim=1331710783411361
- WAIT #2: nam='db file sequential read' ela= 10 file#=1 block#=114 blocks=1 obj#=-1 tim=1331710783411419
- ...
- ORA-01177: data file does not match dictionary - probably old incarnation
- ORA-01110: data file 1: '/opt/oracle/oradata/ORA10R2/datafile/ORA10R2_system_200m'
Obviousely, the process read data from file$, and compare them with the information in file header, and found something unmatched, so, it raised the errors.
Before to solve the problem, I mounted the database and recovered the data files,
SQL代码
- SQL> startup mount
- SQL> recover datafile 1
- Media recovery complete.
- SQL> recover datafile 8
- Media recovery complete.
Using BBED to open #114 block who belogns to file$, get the 1st record, which is the meta data of the 1st data file.
SQL代码
- BBED> set file 1 block 114
- FILE# 1
- BLOCK# 114
- BBED> p *kdbr[0]
- BBED> x /rnnnnnnnnnnnnnnnnnnnnnnnnnnn
- rowdata[678] @8153
- ------------
- flag@8153: 0x2c (KDRHFL, KDRHFF, KDRHFH)
- lock@8154: 0x00
- cols@8155: 11
- col 0[2] @8156: 1 -- FILE#
- col 1[2] @8159: 2 -- STATUS$
- col 2[4] @8162: 23040 -- BLOCKS
- col 3[1] @8167: 0 -- TS#
- col 4[2] @8169: 1 -- RELFILE#
- col 5[1] @8172: 0 -- MAXEXTEND
- col 6[1] @8174: 0 -- INC
- col 7[1] @8176: 0 -- CRSCNWRP
- col 8[2] @8178: 8 -- CRSCNBAS
- col 9[0] @8181: *NULL* -- OWNERINSTANCE
- col 10[5] @8182: 4194306 -- SPARE1
Open the file header, I found the creation SCN not matched.
SQL代码
- BBED> set file 1 block 1
- BBED> map
- BBED> p kcvfh
- struct kcvfh, 676 bytes @0
- struct kcvfhbfh, 20 bytes @0
- ub1 type_kcbh @0 0x0b
- ub1 frmt_kcbh @1 0xa2
- ub1 spare1_kcbh @2 0x00
- ...
- struct kcvfhcrs, 8 bytes @100
- ub4 kscnbas @100 0x00000006 -- CRSCNBAS
- ub2 kscnwrp @104 0x0000 -- CRSCNWRP
- ...
- struct kcvfhckp, 36 bytes @484
- struct kcvcpscn, 8 bytes @484
- ub4 kscnbas @484 0x00006fa4 -- CKPCNBAS
- ub2 kscnwrp @488 0x0100 -- CKPCNWRP
- ...
To avoid this problem, we need make sure the data in both sides matched. Therefore, I modified the data in the dictionary.
Using DUMP see what data I need input
SQL代码
- SQL> select dump(6,16) from dual;
- DUMP(8,16)
- -----------------
- Typ=2 Len=2: c1,7
And using BBED modify the record
SQL代码
- BBED> set file 1 block 114
- BBED> dump offset 8178 count 4
- BBED> modify /x 02c107ff
- BBED> sum apply
And then modify the record of the 8th file
File header of the 8th data file
SQL代码
- struct kcvfhcrs, 8 bytes @100
- ub4 kscnbas @100 0x00001c64
- ub2 kscnwrp @104 0x0000
Dictionary data:
SQL代码
- col 0[2] @7478: 8
- col 1[2] @7481: 2
- col 2[3] @7484: 25600
- col 3[1] @7488: 0
- col 4[2] @7490: 8
- col 5[1] @7493: 0
- col 6[1] @7495: 0
- col 7[1] @7497: 0
- col 8[4] @7499: 140113
- col 9[0] @7504: *NULL*
- col 10[5] @7505: 33554434
Result of dump:
XML/HTML代码
- SQL> select dump(7268,16) from dual;
- DUMP(7268,16)
- ---------------------
- Typ=2 Len=3: c2,49,45
- SQL> select dump(140113,16) from dual;
- DUMP(140113,16)
- ---------------------
- Typ=2 Len=4: c3,f,2,e
Modify the data using BBED,
SQL代码
- BBED> dump offset 7499 count 6
- BBED> modify /x 04c24945
- BBED> sum apply
After all of these done, the database can be opened successfully, data form old dictionaries can be read now,
SQL代码
- SQL> alter database open;
- Database altered.
- SQL> select file_id, tablespace_name from dba_data_files where tablespace_name like '%EXAMPLE%';
- FILE_ID TABLESPACE_NAME
- ---------- ------------------------------
- 4 EXAMPLE
- SQL> select file#, name from v$datafile where file#=4;
- FILE# NAME
- ---------- --------------------------------------------------------------------------------
- 4 /opt/oracle/product/10.2.0/db/dbs/MISSING00004
Rename the datafile, recover the database and alter the data file online
SQL代码
- SQL> ALTER DATABASE RENAME FILE '/opt/oracle/product/10.2.0/db/dbs/MISSING00004' TO '/opt/oracle/oradata/ORA10R2/datafile/ORA10R2_example_50m';
- Database altered.
- SQL> shutdown
- ... ...
- SQL> recover database until cancel;
- Media recovery complete.
- SQL> ALTER DATABASE DATAFILE '/opt/oracle/oradata/ORA10R2/datafile/ORA10R2_example_50m' ONLINE;
- Database altered.
- SQL> alter database open resetlogs;
- alter database open resetlogs
- *
- ERROR at line 1:
- ORA-01122: database file 4 failed verification check
- ORA-01110: data file 4: '/opt/oracle/oradata/ORA10R2/datafile/ORA10R2_example_50m'
- ORA-01206: file is not part of this database - wrong database id
Here I encountered another error, which indicate the DB ID of the data file does not match to the database. I get the DBID from file header of other data file.
SQL代码
- BBED> set file 1 block 1
- FILE# 1
- BLOCK# 1
- BBED> p kcvfh
- ...
- ub4 kccfhdbi @28 0xf7940df2
- ...
- BBED> dump offset 28 count 4
- File: /opt/oracle/oradata/ORA10R2/datafile/ORA10R2_system_200m (1)
- Block: 1 Offsets: 28 to 31 Dba:0x00400001
- ------------------------------------------------------------------------
- f20d94f7
And use BBED to modify the data file to be online
SQL代码
- BBED> set file 4 block 1
- FILE# 4
- BLOCK# 1
- BBED> p kcvfh
- ...
- ub4 kccfhdbi @28 0xb9888c45
- ...
- BBED> dump offset 30 count 2
- File: /opt/oracle/oradata/ORA10R2/datafile/ORA10R2_example_50m (4)
- Block: 1 Offsets: 30 to 31 Dba:0x01000001
- ------------------------------------------------------------------------
- 88b9
- BBED> modify /x 94f7
- File: /opt/oracle/oradata/ORA10R2/datafile/ORA10R2_example_50m (4)
- Block: 1 Offsets: 30 to 31 Dba:0x01000001
- ------------------------------------------------------------------------
- 94f7
- BBED> dump offset 28 count 4
- File: /opt/oracle/oradata/ORA10R2/datafile/ORA10R2_example_50m (4)
- Block: 1 Offsets: 28 to 31 Dba:0x01000001
- ------------------------------------------------------------------------
- f29d94f7
- BBED> modify /x f20d
- File: /opt/oracle/oradata/ORA10R2/datafile/ORA10R2_example_50m (4)
- Block: 1 Offsets: 28 to 31 Dba:0x01000001
- ------------------------------------------------------------------------
- f20d94f7
- BBED> sum apply
- Check value for File 4, Block 1:
- current = 0x4460, required = 0x4460
Then the database was opened successfully.
SQL代码
- SQL> alter database open resetlogs;
- Database altered.
Create the temporary tablespace and modify the users,
SQL代码
- SQL> CREATE TEMPORARY TABLESPACE temp2 TEMPFILE '/opt/oracle/oradata/ORA10R2/datafile/temp02' size 100M;
- Tablespace created.
- SQL> select 'ALTER USER '||username||' TEMPORARY TABLESPACE TEMP2;' from dba_users;
- 'ALTERUSER'||USERNAME||'TEMPORARYTABLESPACETEMP2;'
- ---------------------------------------------------------------------
- ALTER USER STMADMIN TEMPORARY TABLESPACE TEMP2;
- ... ...
- SQL> ALTER USER STMADMIN TEMPORARY TABLESPACE TEMP2;
- ... ...
- User altered.
Export the data using EXP.
SQL代码
- [oracle@server1 ORA10R2]$ exp system/oracle tablespaces=EXAMPLE file=resecue.dat log=resecue.log
- ... ...
- . . exporting table USR_MV 11 rows exported
- . exporting referential integrity constraints
- . exporting triggers
- Export terminated successfully without warnings.
Now, the data is rescued succesfully.
Bear in mind, both data rescuing processes were completed in an testing environment, while the real case may be more complicated. Anyway, this method could be the last choice if you are trapped in such situation. To guarantee the saftety of the data, backup plan and disaster recover solution are always the first choice.
--- Wei Huang ---