[中文]
Author:
fuyuncat
Source:
www.HelloDBA.com
Date:
2012-08-01 09:22:02
Oracle version: 10.2.0.3
Background of the case: We have a Java program to read and execute below SQL from a configuration file to detect if there is any record be found, then process following logic code base on the result.
The original statement is:SQL代码
- Select count(1) as rowcount
- from B2B_BIZ_KEY B
- WHERE EXISTS
- (SELECT ID FROM B2B_TRANSACTION T
- WHERE T.ID=B.TRANSACTION_ID AND T.MSG_REQ_ID <= RPAD('EDI', 11, TO_CHAR(TO_TIMESTAMP('2012-07-31 00:00:00','YYYY-MM-DD HH24:MI:SS.FF'), 'YYYYMMDD')));
And the original simple judgement is "if rowcount > 0 then do these things; else do others".
Table B2B_BIZ_KEY is child table who has a foreign key & index on (TRANSACTION_ID) referring to table B2B_TRANSACTION's primary key (ID). For the sake of large volume in both tables, although there is an index on B2B_TRANSACTION(MSG_REQ_ID, ID), performance of this SQL is quite poor. Here is the performance statistics data of the original one:SQL代码
- Elapsed: 00:02:12.48
- Execution Plan
- ----------------------------------------------------------
- 0 SELECT STATEMENT Optimizer=ALL_ROWS (Cost=165253 Card=1 Bytes=71)
- 1 0 SORT (AGGREGATE)
- 2 1 HASH JOIN (RIGHT SEMI) (Cost=165253 Card=35644057 Bytes=2530728047)
- 3 2 INDEX (FAST FULL SCAN) OF 'B2B_TRANSACTION_UN' (INDEX(UNIQUE)) (Cost=15035 Card=8184629 Bytes=384677563)
- 4 2 INDEX (FAST FULL SCAN) OF 'B2B_BIZ_KEY_UN' (INDEX (UNIQUE)) (Cost=65567 Card=36056379 Bytes=865353096)
- Statistics
- ----------------------------------------------------------
- 943 recursive calls
- 0 db block gets
- 363128 consistent gets
- 416008 physical reads
- 0 redo size
- 353 bytes sent via SQL*Net to client
- 435 bytes received via SQL*Net from client
- 2 SQL*Net roundtrips to/from client
- 0 sorts (memory)
- 0 sorts (disk)
- 1 rows processed
With logic analysis, obviously, the exact ROWCOUNT of the query is not necessary. Hence, we can add a ROWNUM<=1 criteria to the SQL to avoid scanning unnecessary rows.
Here is the updated one:SQL代码
- Select count(1) as rowcount
- from B2B_BIZ_KEY B
- WHERE EXISTS
- (SELECT ID FROM B2B_TRANSACTION T
- WHERE T.ID=B.TRANSACTION_ID AND T.MSG_REQ_ID <= RPAD('EDI', 11, TO_CHAR(TO_TIMESTAMP('2012-07-31 00:00:00','YYYY-MM-DD HH24:MI:SS.FF'), 'YYYYMMDD')))
- AND ROWNUM <= 1;
However, although the statistics data reduced significant comparing to the original one, the performance of enhanced SQL is not so satifying as we expected. Here is the performance data.
SQL代码
- Elapsed: 00:00:19.76
- Execution Plan
- ----------------------------------------------------------
- 0 SELECT STATEMENT Optimizer=ALL_ROWS (Cost=165253 Card=1 Bytes=71)
- 1 0 SORT (AGGREGATE)
- 2 1 COUNT (STOPKEY)
- 3 2 HASH JOIN (RIGHT SEMI) (Cost=165253 Card=35644057 Bytes=2530728047)
- 4 3 INDEX (FAST FULL SCAN) OF 'B2B_TRANSACTION_UN' (INDEX (UNIQUE)) (Cost=15035 Card=8184629 Bytes=384677563)
- 5 3 INDEX (FAST FULL SCAN) OF 'B2B_BIZ_KEY_UN' (INDEX (UNIQUE)) (Cost=65567 Card=36056379 Bytes=865353096)
- Statistics
- ----------------------------------------------------------
- 249 recursive calls
- 0 db block gets
- 66098 consistent gets
- 13694 physical reads
- 0 redo size
- 350 bytes sent via SQL*Net to client
- 435 bytes received via SQL*Net from client
- 2 SQL*Net roundtrips to/from client
- 0 sorts (memory)
- 0 sorts (disk)
- 1 rows processed
Added the "ROWNUM <= 1", we want to improve the SQL avoid to scan redundant rows of the tables, and Opimtizer can shoose a best plan based on the goal, which is known as "First K rows" optimziation.
However, because of COUNT(1), Optimizer knows that server will just response client once, and therefore believes it's meaningess to optimize execution plan with "First K rows". If we just simply remove the COUNT, the result will like this:SQL代码
- HELLODBA.COM> Select 1 as rowcount
- 2 from B2B_BIZ_KEY B
- 3 WHERE EXISTS
- 4 (SELECT ID FROM B2B_TRANSACTION T
- 5 WHERE T.ID=B.TRANSACTION_ID AND T.MSG_REQ_ID <= RPAD('EDI', 11, TO_CHAR(TO_TIMESTAMP('2012-07-31 00:00:00','YYYY-MM-DD HH24:MI:SS.FF'), 'YYYYMMDD')))
- 6 AND ROWNUM<=1;
- Elapsed: 00:00:00.43
- Execution Plan
- ----------------------------------------------------------
- 0 SELECT STATEMENT Optimizer=ALL_ROWS (Cost=8 Card=1 Bytes=71)
- 1 0 COUNT (STOPKEY)
- 2 1 NESTED LOOPS (SEMI) (Cost=8 Card=3 Bytes=213)
- 3 2 TABLE ACCESS (FULL) OF 'B2B_BIZ_KEY' (TABLE) (Cost=2 Card=22996310 Bytes=551911440)
- 4 2 INDEX (RANGE SCAN) OF 'B2B_TRANSACTION_IX2' (INDEX) (Cost=2 Card=5456419 Bytes=256451693)
- Statistics
- ----------------------------------------------------------
- 1 recursive calls
- 0 db block gets
- 7 consistent gets
- 0 physical reads
- 0 redo size
- 350 bytes sent via SQL*Net to client
- 435 bytes received via SQL*Net from client
- 2 SQL*Net roundtrips to/from client
- 0 sorts (memory)
- 0 sorts (disk)
- 1 rows processed
That's right. That's what we are expecting. However, if implement the SQL like this, there is will be a defect --- we did not handle the NO_DATA_FOUND exception in the Java program, we we dont want to change the Java code and redeploy the program to production. What can we do?
I was thinking that if the query result could be write into a temp table and then count rows in the temp, the problem could be resolve. With this in mind, I tried to re-write the SQL as below,SQL代码
- HELLODBA.COM> with V as
- 2 (Select /*+ MATERIALIZE qb_name(wv)*/1
- 3 from B2B_BIZ_KEY B
- 4 WHERE EXISTS
- 5 (SELECT ID FROM B2B_TRANSACTION T
- 6 WHERE T.ID=B.TRANSACTION_ID AND T.MSG_REQ_ID <= RPAD('EDI', 11, TO_CHAR(TO_TIMESTAMP('2012-07-31 00:00:00','YYYY-MM-DD HH24:MI:SS.FF'), 'YYYYMMDD')))
- 7 AND ROWNUM<=1)
- 8 select count(1) from v;
- Elapsed: 00:00:15.84
- Execution Plan
- ----------------------------------------------------------
- 0 SELECT STATEMENT Optimizer=ALL_ROWS (Cost=165255 Card=1)
- 1 0 TEMP TABLE TRANSFORMATION
- 2 1 LOAD AS SELECT
- 3 2 COUNT (STOPKEY)
- 4 3 HASH JOIN (RIGHT SEMI) (Cost=165253 Card=35644057 Bytes=2530728047)
- 5 4 INDEX (FAST FULL SCAN) OF 'B2B_TRANSACTION_UN' (INDEX (UNIQUE)) (Cost=15035 Card=8184629 Bytes=384677563)
- 6 4 INDEX (FAST FULL SCAN) OF 'B2B_BIZ_KEY_UN' (INDEX(UNIQUE)) (Cost=65567 Card=36056379 Bytes=865353096)
- 7 1 SORT (AGGREGATE)
- 8 7 VIEW (Cost=2 Card=1)
- 9 8 TABLE ACCESS (FULL) OF 'SYS_TEMP_0FD9FC8C0_51579C6B'(TABLE (TEMP)) (Cost=2 Card=1 Bytes=13)
- Statistics
- ----------------------------------------------------------
- 520 recursive calls
- 10 db block gets
- 66112 consistent gets
- 11595 physical reads
- 1460 redo size
- 350 bytes sent via SQL*Net to client
- 435 bytes received via SQL*Net from client
- 2 SQL*Net roundtrips to/from client
- 0 sorts (memory)
- 0 sorts (disk)
- 1 rows processed
Just look at the execution plan, it quite similar to what we want. SQL engine will query the subquery and write the result to a system temporary table, then count the content of the temp table. However, the process is not separated, and the COUNT still exists, as a result, the cost of execution plan is close to the previous one.
OK, let's use some tricks to get rid off the COUNT.SQL代码
- HELLODBA.COM> with V as
- 2 (Select 1
- 3 from B2B_BIZ_KEY B
- 4 WHERE EXISTS
- 5 (SELECT ID FROM B2B_TRANSACTION T
- 6 WHERE T.ID=B.TRANSACTION_ID AND T.MSG_REQ_ID <= RPAD('EDI', 11, TO_CHAR(TO_TIMESTAMP('2012-07-31 00:00:00','YYYY-MM-DD HH24:MI:SS.FF'), 'YYYYMMDD')))
- 7 AND ROWNUM<=1)
- 8 select nvl2((select 1 from v),1,0) as aaa from dual;
- Elapsed: 00:00:00.21
- Execution Plan
- ----------------------------------------------------------
- 0 SELECT STATEMENT Optimizer=ALL_ROWS (Cost=2 Card=1)
- 1 0 VIEW (Cost=8 Card=1)
- 2 1 COUNT (STOPKEY)
- 3 2 NESTED LOOPS (SEMI) (Cost=8 Card=3 Bytes=213)
- 4 3 TABLE ACCESS (FULL) OF 'B2B_BIZ_KEY' (TABLE) (Cost=2Card=22996310 Bytes=551911440)
- 5 3 INDEX (RANGE SCAN) OF 'B2B_TRANSACTION_IX2' (INDEX)(Cost=2 Card=5456419 Bytes=256451693)
- 6 0 FAST DUAL (Cost=2 Card=1)
- Statistics
- ----------------------------------------------------------
- 0 recursive calls
- 0 db block gets
- 7 consistent gets
- 0 physical reads
- 0 redo size
- 345 bytes sent via SQL*Net to client
- 435 bytes received via SQL*Net from client
- 2 SQL*Net roundtrips to/from client
- 0 sorts (memory)
- 0 sorts (disk)
- 1 rows processed
Ok. Performance is acceptable and there is no NO_DATA_FOUND exception to be raised.
--- Fuyuncat ---