A SQL Tuning case to improve performance of SQL with ROWNUM

Date: 2012-08-01 09:22:02

Share to

Oracle version: 10.2.0.3

Background of the case: We have a Java program to read and execute below SQL from a configuration file to detect if there is any record be found, then process following logic code base on the result.
The original statement is:

SQL代码

Select count(1) as rowcount

from B2B_BIZ_KEY B

WHERE EXISTS

(SELECT ID FROM B2B_TRANSACTION T

  WHERE T.ID=B.TRANSACTION_ID AND T.MSG_REQ_ID <= RPAD('EDI', 11, TO_CHAR(TO_TIMESTAMP('2012-07-31 00:00:00','YYYY-MM-DD HH24:MI:SS.FF'), 'YYYYMMDD')));

And the original simple judgement is "if rowcount > 0 then do these things; else do others".

Table B2B_BIZ_KEY is child table who has a foreign key & index on (TRANSACTION_ID) referring to table B2B_TRANSACTION's primary key (ID). For the sake of large volume in both tables, although there is an index on B2B_TRANSACTION(MSG_REQ_ID, ID), performance of this SQL is quite poor. Here is the performance statistics data of the original one:

SQL代码

Elapsed: 00:02:12.48



Execution Plan

----------------------------------------------------------

   0      SELECT STATEMENT Optimizer=ALL_ROWS (Cost=165253 Card=1 Bytes=71)

   1    0   SORT (AGGREGATE)

   2    1     HASH JOIN (RIGHT SEMI) (Cost=165253 Card=35644057 Bytes=2530728047)

   3    2       INDEX (FAST FULL SCAN) OF 'B2B_TRANSACTION_UN' (INDEX(UNIQUE)) (Cost=15035 Card=8184629 Bytes=384677563)

   4    2       INDEX (FAST FULL SCAN) OF 'B2B_BIZ_KEY_UN' (INDEX (UNIQUE)) (Cost=65567 Card=36056379 Bytes=865353096)



Statistics

----------------------------------------------------------

        943  recursive calls

          0  db block gets

     363128  consistent gets

     416008  physical reads

          0  redo size

        353  bytes sent via SQL*Net to client

        435  bytes received via SQL*Net from client

          2  SQL*Net roundtrips to/from client

          0  sorts (memory)

          0  sorts (disk)

          1  rows processed

With logic analysis, obviously, the exact ROWCOUNT of the query is not necessary. Hence, we can add a ROWNUM<=1 criteria to the SQL to avoid scanning unnecessary rows.

Here is the updated one:

SQL代码

Select count(1) as rowcount

from B2B_BIZ_KEY B

WHERE EXISTS

(SELECT ID FROM B2B_TRANSACTION T

  WHERE T.ID=B.TRANSACTION_ID AND T.MSG_REQ_ID <= RPAD('EDI', 11, TO_CHAR(TO_TIMESTAMP('2012-07-31 00:00:00','YYYY-MM-DD HH24:MI:SS.FF'), 'YYYYMMDD')))

AND ROWNUM <= 1;

However, although the statistics data reduced significant comparing to the original one, the performance of enhanced SQL is not so satifying as we expected. Here is the performance data.

SQL代码

Elapsed: 00:00:19.76



Execution Plan

----------------------------------------------------------

   0      SELECT STATEMENT Optimizer=ALL_ROWS (Cost=165253 Card=1 Bytes=71)

   1    0   SORT (AGGREGATE)

   2    1     COUNT (STOPKEY)

   3    2       HASH JOIN (RIGHT SEMI) (Cost=165253 Card=35644057 Bytes=2530728047)

   4    3         INDEX (FAST FULL SCAN) OF 'B2B_TRANSACTION_UN' (INDEX (UNIQUE)) (Cost=15035 Card=8184629 Bytes=384677563)

   5    3         INDEX (FAST FULL SCAN) OF 'B2B_BIZ_KEY_UN' (INDEX (UNIQUE)) (Cost=65567 Card=36056379 Bytes=865353096)



Statistics

----------------------------------------------------------

        249  recursive calls

          0  db block gets

      66098  consistent gets

      13694  physical reads

          0  redo size

        350  bytes sent via SQL*Net to client

        435  bytes received via SQL*Net from client

          2  SQL*Net roundtrips to/from client

          0  sorts (memory)

          0  sorts (disk)

          1  rows processed

Added the "ROWNUM <= 1", we want to improve the SQL avoid to scan redundant rows of the tables, and Opimtizer can shoose a best plan based on the goal, which is known as "First K rows" optimziation.

However, because of COUNT(1), Optimizer knows that server will just response client once, and therefore believes it's meaningess to optimize execution plan with "First K rows". If we just simply remove the COUNT, the result will like this:

SQL代码

HELLODBA.COM> Select 1 as rowcount

  2  from B2B_BIZ_KEY B

  3  WHERE EXISTS

  4  (SELECT ID FROM B2B_TRANSACTION T

  5    WHERE T.ID=B.TRANSACTION_ID AND T.MSG_REQ_ID <= RPAD('EDI', 11, TO_CHAR(TO_TIMESTAMP('2012-07-31 00:00:00','YYYY-MM-DD HH24:MI:SS.FF'), 'YYYYMMDD')))

  6    AND ROWNUM<=1;



Elapsed: 00:00:00.43



Execution Plan

----------------------------------------------------------

   0      SELECT STATEMENT Optimizer=ALL_ROWS (Cost=8 Card=1 Bytes=71)

   1    0   COUNT (STOPKEY)

   2    1     NESTED LOOPS (SEMI) (Cost=8 Card=3 Bytes=213)

   3    2       TABLE ACCESS (FULL) OF 'B2B_BIZ_KEY' (TABLE) (Cost=2 Card=22996310 Bytes=551911440)

   4    2       INDEX (RANGE SCAN) OF 'B2B_TRANSACTION_IX2' (INDEX) (Cost=2 Card=5456419 Bytes=256451693)



Statistics

----------------------------------------------------------

          1  recursive calls

          0  db block gets

          7  consistent gets

          0  physical reads

          0  redo size

        350  bytes sent via SQL*Net to client

        435  bytes received via SQL*Net from client

          2  SQL*Net roundtrips to/from client

          0  sorts (memory)

          0  sorts (disk)

          1  rows processed

That's right. That's what we are expecting. However, if implement the SQL like this, there is will be a defect --- we did not handle the NO_DATA_FOUND exception in the Java program, we we dont want to change the Java code and redeploy the program to production. What can we do?

I was thinking that if the query result could be write into a temp table and then count rows in the temp, the problem could be resolve. With this in mind, I tried to re-write the SQL as below,

SQL代码

HELLODBA.COM> with V as

  2  (Select /*+ MATERIALIZE qb_name(wv)*/1

  3  from B2B_BIZ_KEY B

  4  WHERE EXISTS

  5  (SELECT ID FROM B2B_TRANSACTION T

  6    WHERE T.ID=B.TRANSACTION_ID AND T.MSG_REQ_ID <= RPAD('EDI', 11, TO_CHAR(TO_TIMESTAMP('2012-07-31 00:00:00','YYYY-MM-DD HH24:MI:SS.FF'), 'YYYYMMDD')))

  7    AND ROWNUM<=1)

  8  select count(1) from v;



Elapsed: 00:00:15.84



Execution Plan

----------------------------------------------------------

   0      SELECT STATEMENT Optimizer=ALL_ROWS (Cost=165255 Card=1)

   1    0   TEMP TABLE TRANSFORMATION

   2    1     LOAD AS SELECT

   3    2       COUNT (STOPKEY)

   4    3         HASH JOIN (RIGHT SEMI) (Cost=165253 Card=35644057 Bytes=2530728047)

   5    4           INDEX (FAST FULL SCAN) OF 'B2B_TRANSACTION_UN' (INDEX (UNIQUE)) (Cost=15035 Card=8184629 Bytes=384677563)

   6    4           INDEX (FAST FULL SCAN) OF 'B2B_BIZ_KEY_UN' (INDEX(UNIQUE)) (Cost=65567 Card=36056379 Bytes=865353096)

   7    1     SORT (AGGREGATE)

   8    7       VIEW (Cost=2 Card=1)

   9    8         TABLE ACCESS (FULL) OF 'SYS_TEMP_0FD9FC8C0_51579C6B'(TABLE (TEMP)) (Cost=2 Card=1 Bytes=13)



Statistics

----------------------------------------------------------

        520  recursive calls

         10  db block gets

      66112  consistent gets

      11595  physical reads

       1460  redo size

        350  bytes sent via SQL*Net to client

        435  bytes received via SQL*Net from client

          2  SQL*Net roundtrips to/from client

          0  sorts (memory)

          0  sorts (disk)

          1  rows processed

Just look at the execution plan, it quite similar to what we want. SQL engine will query the subquery and write the result to a system temporary table, then count the content of the temp table. However, the process is not separated, and the COUNT still exists, as a result, the cost of execution plan is close to the previous one.

OK, let's use some tricks to get rid off the COUNT.

SQL代码

HELLODBA.COM> with V as

  2  (Select 1

  3  from B2B_BIZ_KEY B

  4  WHERE EXISTS

  5  (SELECT ID FROM B2B_TRANSACTION T

  6    WHERE T.ID=B.TRANSACTION_ID AND T.MSG_REQ_ID <= RPAD('EDI', 11, TO_CHAR(TO_TIMESTAMP('2012-07-31 00:00:00','YYYY-MM-DD HH24:MI:SS.FF'), 'YYYYMMDD')))

  7    AND ROWNUM<=1)

  8  select nvl2((select 1 from v),1,0) as aaa from dual;



Elapsed: 00:00:00.21



Execution Plan

----------------------------------------------------------

   0      SELECT STATEMENT Optimizer=ALL_ROWS (Cost=2 Card=1)

   1    0   VIEW (Cost=8 Card=1)

   2    1     COUNT (STOPKEY)

   3    2       NESTED LOOPS (SEMI) (Cost=8 Card=3 Bytes=213)

   4    3         TABLE ACCESS (FULL) OF 'B2B_BIZ_KEY' (TABLE) (Cost=2Card=22996310 Bytes=551911440)

   5    3         INDEX (RANGE SCAN) OF 'B2B_TRANSACTION_IX2' (INDEX)(Cost=2 Card=5456419 Bytes=256451693)

   6    0   FAST DUAL (Cost=2 Card=1)



Statistics

----------------------------------------------------------

          0  recursive calls

          0  db block gets

          7  consistent gets

          0  physical reads

          0  redo size

        345  bytes sent via SQL*Net to client

        435  bytes received via SQL*Net from client

          2  SQL*Net roundtrips to/from client

          0  sorts (memory)

          0  sorts (disk)

          1  rows processed

Ok. Performance is acceptable and there is no NO_DATA_FOUND exception to be raised.

--- Fuyuncat ---

Previous： Execution plan problem caused by unbalanced partition and bind variable peeking

Next：SQL Tuning case --- EXP tuning

Return To this Category