Rescue data in the worst situation (2)

Date: 2013-03-20 05:02:15

Share to

In this case, the situation is more complicated. It's an Oracle 10.2.0.1 database running in Linux. The database was shutdown abnormally; There are only 3 data files can be used, 2 of them are system files, whose file IDs are not in sequence; the other one is the data file contain the data to be rescued. The control files, redo logs, undo files, temp files and other data files are lost. During this process, I encoutered more problems, but I work it out finally.

Of course, we also need the RDBMS version, DB name, file position/size and other information, and then set SID and password. The content of the parameter is shown as below,

SQL代码

_allow_resetlogs_corruption = true

control_files              = (/opt/oracle/oradata/ORA10R2/control/ora_control1,

                              /opt/oracle/oradata/ORA10R2/control/ora_control2,

                              /opt/oracle/oradata/ORA10R2/control/ora_control3)

db_name                    = ORA10R2

db_domain                  = ""

db_block_size              = 8192

undo_management            = manual

undo_tablespace            = ''

UNDO_RETENTION             = 900

nls_language               = "AMERICAN"

nls_territory              = "AMERICA"

user_dump_dest             = /opt/oracle/admin/ORA10R2/udump

background_dump_dest       = /opt/oracle/admin/ORA10R2/bdump

core_dump_dest             = /opt/oracle/admin/ORA10R2/cdump

sga_max_size               = 200M

sga_target                 = 200M

All of these prepare, I start to create a new database:

SQL代码

SQL> CREATE DATABASE ORA10R2

   USER SYS IDENTIFIED BY oracle

   USER SYSTEM IDENTIFIED BY oracle

   LOGFILE GROUP 1 ('/opt/oracle/oradata/ORA10R2/onlinelog/redo01.log') SIZE 20M,

           GROUP 2 ('/opt/oracle/oradata/ORA10R2/onlinelog/redo02.log') SIZE 20M,

           GROUP 3 ('/opt/oracle/oradata/ORA10R2/onlinelog/redo03.log') SIZE 20M

   MAXLOGFILES 5

   MAXLOGMEMBERS 5

   MAXLOGHISTORY 1

   MAXDATAFILES 100

   MAXINSTANCES 1

   CHARACTER SET US7ASCII

   NATIONAL CHARACTER SET AL16UTF16

   DATAFILE '/opt/oracle/oradata/ORA10R2/datafile/ORA10R2_system_200m' SIZE 180M REUSE

   --, '/opt/oracle/oradata/ORA10R2/datafile/ORA10R2_system_02' SIZE 200M REUSE

   EXTENT MANAGEMENT LOCAL

   SYSAUX DATAFILE '/opt/oracle/oradata/ORA10R2/datafile/sysaux01' SIZE 100M

   DEFAULT TEMPORARY TABLESPACE tempts1

      TEMPFILE '/opt/oracle/oradata/ORA10R2/datafile/temp01' SIZE 20M

   --UNDO TABLESPACE undotbs1

   --   DATAFILE '/opt/oracle/oradata/ORA10R2/datafile/undotbs01'

   --   SIZE 500M AUTOEXTEND ON NEXT 5120K MAXSIZE UNLIMITED

   ;



  2    3    4    5    6    7    8    9   10   11   12   13   14   15   16   17  18   19   20   21   22   23

Database created.

Because the file IDs of the system data files are not in sequence, we need make sure the new file id align with the old one. And we can use BBED to get the file id of the 2nd system file.

SQL代码

BBED> p kcvfh

...

      ub2 kccfhfno                          @52       0x0008

...

To assign file id as 8, we need create some transient data files to advance the file id, and then add the 2nd system file,

SQL代码

SQL> create tablespace test datafile '/opt/oracle/oradata/ORA10R2/datafile/test3' size 10M, '/opt/oracle/oradata/ORA10R2/datafile/test4' size 10M, '/opt/oracle/oradata/ORA10R2/datafile/test5' size 10M, '/opt/oracle/oradata/ORA10R2/datafile/test6' size 10M, '/opt/oracle/oradata/ORA10R2/datafile/test7' size 10M;

alter tablespace system add datafile '/opt/oracle/oradata/ORA10R2/datafile/ORA10R2_system_02' size 200M;

DROP TABLESPACE test INCLUDING CONTENTS AND DATAFILES;



Tablespace created.



SQL>

Tablespace altered.



SQL>

Tablespace dropped.

If I continue the process as same as the previous one in 11g, I will encounter a huge number errors, such as ORA-00600 [4000], ORA-01555, ORA-01111, ORA-01173, ORA-00600 [4049], etc. With further analysing, I found the root cause of these erros is "consistent reads".

During the database open phase, oracle need read the system dictionaries to complete bootstrap. However, the Checkpoint SCN of the new database is quite small, when the process reading the data blocks of the dictionaries, it found the SCN in the ITL is larger than the transaction SCN, and then it read the rollback segments to keep the data consistency. While the undo blocks of rollback segments may not exist or over written, consequently, it raised such errors.

But the truth also brings a question: why did I not encounter the similar problem in the previous test. After analyse the data files, I found it is because the NID process modified the SCN information in file header of the system file. However, I have replaced the file header with the new one. So, I guess the NID process might read the data from dictionary and wrote them to the file header. And we can also get tips from the ORA-00600 error raised in previous case, which indicated the NID process call an internal package to change the DBID.

However, the nid process of 10g did not modify the SCN. Hence, I have to advance the SCN manually. To achive this objective, I used the hidden parameter _minimum_giga_scn. One thing we need note is that it will fail if the value is too small. Correspondingly, we can find below entries from alert log,

SQL代码

Current SCN is not changed: _minimum_giga_scn (scn 274877906944) is too small

I set _minimum_giga_scn = 1024, and rebound the database, the alert log indicate that the SCN is advanced successfully.

SQL代码

Advancing SCN to 1099511627776 according to _minimum_giga_scn

Then, rename the new system files and copy the old data files to current data file folder,

SQL代码

[oracle@server1 ORA10R2]$ mv datafile/ORA10R2_system_200m datafile/ORA10R2_system_200m_new

[oracle@server1 ORA10R2]$ mv datafile/ORA10R2_system_02 datafile/ORA10R2_system_02_new

[oracle@server1 ORA10R2]$ cp ../ORA10R2_COPY/datafile/ORA10R2_system_200m ./datafile/

[oracle@server1 ORA10R2]$ cp ../ORA10R2_COPY/datafile/ORA10R2_system_02 ./datafile/

[oracle@server1 ORA10R2]$ cp ../ORA10R2_COPY/datafile/ORA10R2_example_50m ./datafile/

And execute NID to modify the DBID (But I don't think this step is necessary now)

SQL代码

[oracle@server1 ORA10R2]$ nid target=sys/oracle



DBNEWID: Release 10.2.0.1.0 - Production on Tue Mar 19 00:44:33 2013

... ...

Database ID for database ORA10R2 changed to 4153675250.

All previous backups and archived redo logs for this database are unusable.

Database has been shutdown, open database with RESETLOGS option.

Succesfully changed database ID.

DBNEWID - Completed succesfully.

Re-create the control files, open database RESETLOGS:

SQL代码

SQL> CREATE CONTROLFILE REUSE DATABASE ORA10R2 RESETLOGS NOARCHIVELOG

    MAXLOGFILES 5

    MAXLOGMEMBERS 5

    MAXDATAFILES 100

    MAXINSTANCES 1

    MAXLOGHISTORY 292

LOGFILE

  GROUP 1 '/opt/oracle/oradata/ORA10R2/onlinelog/redo01.log' SIZE 20M,

  GROUP 2 '/opt/oracle/oradata/ORA10R2/onlinelog/redo02.log' SIZE 20M,

  GROUP 3 '/opt/oracle/oradata/ORA10R2/onlinelog/redo03.log' SIZE 20M

DATAFILE

  '/opt/oracle/oradata/ORA10R2/datafile/ORA10R2_system_200m' SIZE 180M

  , '/opt/oracle/oradata/ORA10R2/datafile/ORA10R2_system_02' SIZE 200M

  --, '/opt/oracle/oradata/ORA10R2/datafile/sysaux01' size 100M

  --, '/opt/oracle/oradata/ORA10R2/datafile/undotbs01'

CHARACTER SET US7ASCII

;

  2    3    4    5    6    7    8    9   10   11   12   13   14   15   16   17

Control file created.



SQL> recover database using backup controlfile until cancel;

ORA-00279: change 1099511627897 generated at 03/19/2013 00:43:31 needed for

thread 1

ORA-00289: suggestion : /opt/oracle/product/10.2.0/db/dbs/arch1_2_810434521.dbf

ORA-00280: change 1099511627897 for thread 1 is in sequence #2





Specify log: {<RET>=suggested | filename | AUTO | CANCEL}

cancel

Media recovery cancelled.

SQL> alter database open resetlogs;

alter database open resetlogs

*

ERROR at line 1:

ORA-01092: ORACLE instance terminated. Disconnection forced

Opening database failed. The errors from alert log are as below,

SQL代码

ORA-01177: data file does not match dictionary - probably old incarnation

ORA-01110: data file 1: '/opt/oracle/oradata/ORA10R2/datafile/ORA10R2_system_200m'

I did it again with 10046 event enabled, and I found the SQL caused the error and the waits.

SQL代码

PARSING IN CURSOR #2 len=122 dep=1 uid=0 oct=3 lid=0 tim=1331710783410619 hv=1330125001 ad='2c3341c8'

select blocks,NVL(ts#,-1),status$,NVL(relfile#,0),maxextend,inc, crscnwrp,crscnbas,NVL(spare1,0) from file$ where file#=:1

END OF STMT

PARSE #2:c=0,e=300,p=0,cr=0,cu=0,mis=1,r=0,dep=1,og=4,tim=1331710783410615

BINDS #2:

kkscoacd

Bind#0

  oacdty=02 mxl=22(22) mxlc=00 mal=00 scl=00 pre=00

  oacflg=08 fl2=0001 frm=00 csi=00 siz=24 off=0

  kxsbbbfp=b71e3928  bln=22  avl=02  flg=05

  value=1

EXEC #2:c=1000,e=574,p=0,cr=0,cu=0,mis=1,r=0,dep=1,og=4,tim=1331710783411304

WAIT #2: nam='db file sequential read' ela= 14 file#=1 block#=258 blocks=1 obj#=-1 tim=1331710783411361

WAIT #2: nam='db file sequential read' ela= 10 file#=1 block#=114 blocks=1 obj#=-1 tim=1331710783411419

...

ORA-01177: data file does not match dictionary - probably old incarnation

ORA-01110: data file 1: '/opt/oracle/oradata/ORA10R2/datafile/ORA10R2_system_200m'

Obviousely, the process read data from file$, and compare them with the information in file header, and found something unmatched, so, it raised the errors.

Before to solve the problem, I mounted the database and recovered the data files,

SQL代码

SQL> startup mount

SQL> recover datafile 1

Media recovery complete.

SQL> recover datafile 8

Media recovery complete.

Using BBED to open #114 block who belogns to file$, get the 1st record, which is the meta data of the 1st data file.

SQL代码

BBED> set file 1 block 114

        FILE#           1

        BLOCK#          114

BBED> p *kdbr[0]

BBED> x /rnnnnnnnnnnnnnnnnnnnnnnnnnnn

rowdata[678]                                @8153

------------

flag@8153: 0x2c (KDRHFL, KDRHFF, KDRHFH)

lock@8154: 0x00

cols@8155:   11



col    0[2] @8156: 1 -- FILE#

col    1[2] @8159: 2 -- STATUS$

col    2[4] @8162: 23040 -- BLOCKS

col    3[1] @8167: 0 -- TS#

col    4[2] @8169: 1 -- RELFILE#

col    5[1] @8172: 0 -- MAXEXTEND

col    6[1] @8174: 0 -- INC

col    7[1] @8176: 0 -- CRSCNWRP

col    8[2] @8178: 8 -- CRSCNBAS

col    9[0] @8181: *NULL* -- OWNERINSTANCE

col   10[5] @8182: 4194306 -- SPARE1

Open the file header, I found the creation SCN not matched.

SQL代码

BBED> set file 1 block 1

BBED> map

BBED> p kcvfh

struct kcvfh, 676 bytes                     @0

   struct kcvfhbfh, 20 bytes                @0

      ub1 type_kcbh                         @0        0x0b

      ub1 frmt_kcbh                         @1        0xa2

      ub1 spare1_kcbh                       @2        0x00

...

   struct kcvfhcrs, 8 bytes                 @100

      ub4 kscnbas                           @100      0x00000006  -- CRSCNBAS

      ub2 kscnwrp                           @104      0x0000      -- CRSCNWRP

...

   struct kcvfhckp, 36 bytes                @484

      struct kcvcpscn, 8 bytes              @484

         ub4 kscnbas                        @484      0x00006fa4  -- CKPCNBAS

         ub2 kscnwrp                        @488      0x0100      -- CKPCNWRP

...

To avoid this problem, we need make sure the data in both sides matched. Therefore, I modified the data in the dictionary.

Using DUMP see what data I need input

SQL代码

SQL> select dump(6,16) from dual;



DUMP(8,16)

-----------------

Typ=2 Len=2: c1,7

And using BBED modify the record

SQL代码

BBED> set file 1 block 114

BBED> dump offset 8178 count 4

BBED> modify /x 02c107ff

BBED> sum apply

And then modify the record of the 8th file

File header of the 8th data file

SQL代码

struct kcvfhcrs, 8 bytes                 @100

  ub4 kscnbas                           @100      0x00001c64

  ub2 kscnwrp                           @104      0x0000

Dictionary data:

SQL代码

col    0[2] @7478: 8

col    1[2] @7481: 2

col    2[3] @7484: 25600

col    3[1] @7488: 0

col    4[2] @7490: 8

col    5[1] @7493: 0

col    6[1] @7495: 0

col    7[1] @7497: 0

col    8[4] @7499: 140113

col    9[0] @7504: *NULL*

col   10[5] @7505: 33554434

Result of dump:

XML/HTML代码

SQL> select dump(7268,16) from dual;



DUMP(7268,16)

---------------------

Typ=2 Len=3: c2,49,45



SQL> select dump(140113,16) from dual;



DUMP(140113,16)

---------------------

Typ=2 Len=4: c3,f,2,e

Modify the data using BBED,

SQL代码

BBED> dump offset 7499 count 6

BBED> modify /x 04c24945

BBED> sum apply

After all of these done, the database can be opened successfully, data form old dictionaries can be read now,

SQL代码

SQL> alter database open;



Database altered.



SQL> select file_id, tablespace_name from dba_data_files where tablespace_name like '%EXAMPLE%';



   FILE_ID TABLESPACE_NAME

---------- ------------------------------

         4 EXAMPLE



SQL> select file#, name from v$datafile where file#=4;



     FILE# NAME

---------- --------------------------------------------------------------------------------

         4 /opt/oracle/product/10.2.0/db/dbs/MISSING00004

Rename the datafile, recover the database and alter the data file online

SQL代码

SQL> ALTER DATABASE RENAME FILE '/opt/oracle/product/10.2.0/db/dbs/MISSING00004' TO '/opt/oracle/oradata/ORA10R2/datafile/ORA10R2_example_50m';



Database altered.



SQL> shutdown

... ...

SQL> recover database until cancel;

Media recovery complete.

SQL> ALTER DATABASE DATAFILE '/opt/oracle/oradata/ORA10R2/datafile/ORA10R2_example_50m' ONLINE;



Database altered.



SQL> alter database open resetlogs;

alter database open resetlogs

*

ERROR at line 1:

ORA-01122: database file 4 failed verification check

ORA-01110: data file 4: '/opt/oracle/oradata/ORA10R2/datafile/ORA10R2_example_50m'

ORA-01206: file is not part of this database - wrong database id

Here I encountered another error, which indicate the DB ID of the data file does not match to the database. I get the DBID from file header of other data file.

SQL代码

BBED> set file 1 block 1

        FILE#           1

        BLOCK#          1



BBED> p kcvfh

...

      ub4 kccfhdbi                          @28       0xf7940df2

...

BBED> dump offset 28 count 4

File: /opt/oracle/oradata/ORA10R2/datafile/ORA10R2_system_200m (1)

Block: 1                Offsets:   28 to   31           Dba:0x00400001

------------------------------------------------------------------------

f20d94f7

And use BBED to modify the data file to be online

SQL代码

BBED> set file 4 block 1

        FILE#           4

        BLOCK#          1



BBED> p kcvfh

...

      ub4 kccfhdbi                          @28       0xb9888c45

...

BBED> dump offset 30 count 2

File: /opt/oracle/oradata/ORA10R2/datafile/ORA10R2_example_50m (4)

Block: 1                Offsets:   30 to   31           Dba:0x01000001

------------------------------------------------------------------------

88b9

BBED> modify /x 94f7

File: /opt/oracle/oradata/ORA10R2/datafile/ORA10R2_example_50m (4)

Block: 1                Offsets:   30 to   31           Dba:0x01000001

------------------------------------------------------------------------

94f7

BBED> dump offset 28 count 4

File: /opt/oracle/oradata/ORA10R2/datafile/ORA10R2_example_50m (4)

Block: 1                Offsets:   28 to   31           Dba:0x01000001

------------------------------------------------------------------------

f29d94f7

BBED> modify /x f20d

File: /opt/oracle/oradata/ORA10R2/datafile/ORA10R2_example_50m (4)

Block: 1                Offsets:   28 to   31           Dba:0x01000001

------------------------------------------------------------------------

f20d94f7



BBED> sum apply

Check value for File 4, Block 1:

current = 0x4460, required = 0x4460

Then the database was opened successfully.

SQL代码

SQL> alter database open resetlogs;



Database altered.

Create the temporary tablespace and modify the users,

SQL代码

SQL> CREATE TEMPORARY TABLESPACE temp2 TEMPFILE '/opt/oracle/oradata/ORA10R2/datafile/temp02' size 100M;



Tablespace created.



SQL> select 'ALTER USER '||username||' TEMPORARY TABLESPACE TEMP2;' from dba_users;



'ALTERUSER'||USERNAME||'TEMPORARYTABLESPACETEMP2;'

---------------------------------------------------------------------

ALTER USER STMADMIN TEMPORARY TABLESPACE TEMP2;

... ...



SQL> ALTER USER STMADMIN TEMPORARY TABLESPACE TEMP2;

... ...

User altered.

Export the data using EXP.

SQL代码

[oracle@server1 ORA10R2]$ exp system/oracle tablespaces=EXAMPLE file=resecue.dat log=resecue.log

... ...

. . exporting table                         USR_MV         11 rows exported

. exporting referential integrity constraints

. exporting triggers

Export terminated successfully without warnings.

Now, the data is rescued succesfully.

Bear in mind, both data rescuing processes were completed in an testing environment, while the real case may be more complicated. Anyway, this method could be the last choice if you are trapped in such situation. To guarantee the saftety of the data, backup plan and disaster recover solution are always the first choice.

--- Wei Huang ---

Previous：Rescue data in the worst situation (1)

Next：Oracle 12c TOP－N Frequency Histogram

Return To this Category