[English]
作者:
fuyuncat
来源:
www.HelloDBA.com
环境:
OS:linux
CPU:8个
Oracle Database:10.2.0.3.0
接到开发组的一个调优请求,任务是对一张海量表CS2_CT_MVMT(近2亿多记录,表大小48G)进行数据update,而更新数据来自于另外一张海量的日志表CS2_TXN_LOG(同样近2亿,表大小42G),数据处理的语句如下:
UPDATE CS2_CT_MVMT CTSET CT.LAST_MOD_DT = (SELECT TL.MSG_GMT_DTFROM CS2_TXN_LOG TLWHERE TL.MSG_ID > 9000000000000000000AND CT.MSG_ID = TL.MSG_IDAND TL.MSG_GMT_DT IS NOT NULL)WHERE EXISTS (SELECT 1FROM CS2_TXN_LOG TLWHERE TL.MSG_ID > 9000000000000000000AND CT.MSG_ID = TL.MSG_IDAND TL.MSG_GMT_DT IS NOT NULL);
根据开发人员对数据的估算,在此逻辑下,CS2_CT_MVMT将会有近一半数据(近1亿条)要被更新,相对应的,数据来自于CS2_TXN_LOG中的近1一亿条数据。开发人员曾经尝试过运行语句,但是3天都没执行完。
第一步,对语句进行调优
首先,先看下语句的查询计划:
SQL> set line 300SQL> set pages 50SQL> explain plan for2 UPDATE CS2_CT_MVMT CT3 SET CT.LAST_MOD_DT = (SELECT TL.MSG_GMT_DT4 FROM CS2_TXN_LOG TL5 WHERE TL.MSG_ID > 90000000000000000006 AND CT.MSG_ID = TL.MSG_ID7 AND TL.MSG_GMT_DT IS NOT NULL)8 WHERE EXISTS (SELECT 19 FROM CS2_TXN_LOG TL10 WHERE TL.MSG_ID > 900000000000000000011 AND CT.MSG_ID = TL.MSG_ID12 AND TL.MSG_GMT_DT IS NOT NULL);Explained.SQL> select * from table(dbms_xplan.display());PLAN_TABLE_OUTPUT---------------------------------------------------------------------------Plan hash value: 3604468536---------------------------------------------------------------------------| Id | Operation | Name | Rows | Bytes |TempSpc| Cost (%CPU)| Time |---------------------------------------------------------------------------| 0 | UPDATE STATEMENT | | 11M| 432M| | 3580K (1)| 11:56:06 || 1 | UPDATE | CS2_CT_MVMT | | | | | ||* 2 | HASH JOIN SEMI | | 11M| 432M| 352M| 3580K (1)| 11:56:06 ||* 3 | TABLE ACCESS FULL | CS2_CT_MVMT | 11M| 216M| | 1690K (1)| 05:38:12 ||* 4 | TABLE ACCESS FULL | CS2_TXN_LOG | 214M| 3888M| | 1552K (1)| 05:10:25 ||* 5 | FILTER | | | | | | ||* 6 | TABLE ACCESS BY INDEX ROWID| CS2_TXN_LOG | 1 | 19 | | 4 (0)| 00:00:01 ||* 7 | INDEX UNIQUE SCAN | CS2_TXN_LOG_PK | 1 | | | 3 (0)| 00:00:01 |---------------------------------------------------------------------------Predicate Information (identified by operation id):---------------------------------------------------2 - access("CT"."MSG_ID"="TL"."MSG_ID")3 - filter("CT"."MSG_ID">9000000000000000000)4 - filter("TL"."MSG_ID">9000000000000000000 AND "TL"."MSG_GMT_DT" IS NOT NULL)5 - filter(9000000000000000000<:B1)PLAN_TABLE_OUTPUT---------------------------------------------------------------------------6 - filter("TL"."MSG_GMT_DT" IS NOT NULL)7 - access("TL"."MSG_ID"=:B1)filter("TL"."MSG_ID">9000000000000000000)25 rows selected.
从查询计划看,两表之间存在一个hash join。再看看两表之间的关系:
SQL> select c.table_name as child_table, c.constraint_name as FK, p.table_name as parent_table, p.constraint_name as PK2 from all_constraints p, all_constraints c3 where c.table_name in ('CS2_CT_MVMT','CS2_TXN_LOG')4 and c.r_constraint_name = p.constraint_name5 and c.constraint_type = 'R'6 and p.constraint_type='P';CHILD_TABLE FK PARENT_TABLE PK------------------------------ ------------------------------ -------------CS2_CT_MVMT CS2_CT_MVMT_FK1 CS2_TXN_LOG CS2_TXN_LOG_PKSQL> select table_name, column_name from dba_ind_columns2 where index_name = 'CS2_TXN_LOG_PK';TABLE_NAME COLUMN_NAME------------------------------ ------------------------------CS2_TXN_LOG MSG_ID
可见,CS2_CT_MVMT是依赖于CS2_TXN_LOG的子表,其依赖字段MSG_ID正是CS2_TXN_LOG的主键唯一字段。我们可以对这条语句的逻辑描述如下:
CS2_CT_MVMT依据主外键关系从父表CS2_TXN_LOG中更新相应数据,且只从CS2_TXN_LOG获取满足(MSG_ID > 9000000000000000000 AND MSG_GMT_DT IS NOT NULL)。我们可以这个逻辑关系对语句进行调整以减少查询计划中的访问路径:
根据业务分析,CS2_CT_MVMT中将近一半的数据满足更新条件,也就是由hash join只能过滤一半的数据,而在SET子句中的子查询已经保证了两表数据的完整性关系已经对父表的数据过滤条件。这样的话,这个join的代价就太大了,它应该从语句中拿掉:
UPDATE CS2_CT_MVMT CTSET CT.LAST_MOD_DT = (SELECT TL.MSG_GMT_DTFROM CS2_TXN_LOG TLWHERE TL.MSG_ID > 9000000000000000000AND CT.MSG_ID = TL.MSG_IDAND TL.MSG_GMT_DT IS NOT NULL);
但这样会产生一个问题:子查询的记录数可能为1或0(1:1),也就是将近1半为NULL(即对应的CS2_CT_MVMT记录无需更新)。很简单,我们用NVL函数来处理:
UPDATE CS2_CT_MVMT CTSET CT.LAST_MOD_DT = NVL((SELECT TL.MSG_GMT_DTFROM CS2_TXN_LOG TLWHERE TL.MSG_ID > 9000000000000000000AND CT.MSG_ID = TL.MSG_IDAND TL.MSG_GMT_DT IS NOT NULL), CT.LAST_MOD_DT);从查询计划中看到,COST将近一半:
---------------------------------------------------------------------------| Id | Operation | Name | Rows | Bytes | Cost (%CPU)| Time |---------------------------------------------------------------------------| 0 | UPDATE STATEMENT | | 119M| 2160M| 1699K (2)| 05:39:57 || 1 | UPDATE | CS2_CT_MVMT | | | | || 2 | TABLE ACCESS FULL | CS2_CT_MVMT | 119M| 2160M| 1699K (2)| 05:39:57 ||* 3 | FILTER | | | | | ||* 4 | TABLE ACCESS BY INDEX ROWID| CS2_TXN_LOG | 1 | 19 | 4 (0)| 00:00:01 ||* 5 | INDEX UNIQUE SCAN | CS2_TXN_LOG_PK | 1 | | 3 (0)| 00:00:01 |---------------------------------------------------------------------------Predicate Information (identified by operation id):---------------------------------------------------3 - filter(9000000000000000000<:B1)4 - filter("TL"."MSG_GMT_DT" IS NOT NULL)5 - access("TL"."MSG_ID"=:B1)filter("TL"."MSG_ID">9000000000000000000)
再进一步,由于存在等价关系CT.MSG_ID = TL.MSG_ID,我们可以将TL.MSG_ID > 9000000000000000000转换为CT.MSG_ID > 9000000000000000000,作为对CS2_CT_MVMT扫描后的filter,从而大大减少UPDATE操作,也大大减少子查询的运行次数。
UPDATE CS2_CT_MVMT CTSET CT.LAST_MOD_DT = NVL((SELECT TL.MSG_GMT_DTFROM CS2_TXN_LOG TLWHERE CT.MSG_ID = TL.MSG_IDAND TL.MSG_GMT_DT IS NOT NULL), CT.LAST_MOD_DT)WHERE CT.MSG_ID > 9000000000000000000;---------------------------------------------------------------------------| Id | Operation | Name | Rows | Bytes | Cost (%CPU)| Time |-----------------------------------------------------------------------------------------------| 0 | UPDATE STATEMENT | | 11M| 216M| 1690K (1)| 05:38:12 || 1 | UPDATE | CS2_CT_MVMT | | | | ||* 2 | TABLE ACCESS FULL | CS2_CT_MVMT | 11M| 216M| 1690K (1)| 05:38:12 ||* 3 | TABLE ACCESS BY INDEX ROWID| CS2_TXN_LOG | 1 | 19 | 4 (0)| 00:00:01 ||* 4 | INDEX UNIQUE SCAN | CS2_TXN_LOG_PK | 1 | | 3 (0)| 00:00:01 |---------------------------------------------------------------------------Predicate Information (identified by operation id):---------------------------------------------------2 - filter("CT"."MSG_ID">9000000000000000000)3 - filter("TL"."MSG_GMT_DT" IS NOT NULL)4 - access("TL"."MSG_ID"=:B1)
最后再看子查询部分:对于每条满足条件的CS2_CT_MVMT中的记录,都将执行一次子查询,由CS2_TXN_LOG的主键获取