[English]
作者:
fuyuncat
来源:
www.HelloDBA.com
第三章 诊断结果分析
我们从上面的建议结果看到了,ADDM Report的结果与Statspack Report的结果大不相同。Statspack Report的结果给出的都是统计数据、各种事件,然后由DBA根据这些数据给出优化建议,而ADDM Report的结果包含就已经是给出的优化建议了(汗!DBA以后要失业了!)
第一部分:
Analysis Period: 23-NOV-2005 from 15:02:27 to 16:06:42
Database ID/Instance: 1712582900/1
Database/Instance Names: EDGAR/edgar
Host Name: HUANGED
Database Version: 10.2.0.1.0
Snapshot Range: from 65 to 66
Database Time: 1463 seconds
Average Database Load: .4 active sessions
这一部分包括一些基础信息,分析时间段、DB和instance ID&名字、主机名字、Oracle版本、快照范围、数据库消耗时间、多少个活动会话。
第二部分:
下面就是ADDM发现的问题,并给出的相应建议。在我们这个例子中总共发现4个问题,下面一一解释一下。第一个问题:
FINDING 1: 100% impact (1463 seconds)
-------------------------------------
Significant virtual memory paging was detected on the host operating system.
RECOMMENDATION 1: Host Configuration, 100% benefit (1463 seconds)
ACTION: Host operating system was experiencing significant paging but no
particular root cause could be detected. Investigate processes that
do not belong to this instance running on the host that are consuming
significant amount of virtual memory. Also consider adding more
physical memory to the host.
先看第一行:100% impact (1463 seconds),这是这个问题所持续的实践及其对系统的影响,它的时间是1463秒,和分析期间的数据库消耗时间(在第一部分中)是一样(1463秒),所以对系统的影响是1463/1463*100=100%的。
再看第二行:Significant virtual memory paging was detected on the host operating system.,这是ADDM发现的这个问题的具体描述:在操作系统中发现有显著的虚拟内存页入页出的问题。
然后看ADDM给出的建议及其作用:Host Configuration, 100% benefit (1463 seconds)——更改主机配置,100%有效。
最后是具体该如何操作:略——在主机的操作系统上发现了明显的页入页出,但是没有发现明显导致内存频繁换如换出的根本原因。需要仔细检查那些消耗大量虚拟内存的进程(除Oracle实例外)。此外,还可以考虑增大主机的物理内存。说明一下:我的这个实例是跑在我自己的PC机上,Oracle运行的同时有大量的其他消耗内存的程序(word等)在运行,所以肯定有大量的内存交换存在。
再看第二个问题:
FINDING 2: 100% impact (1463 seconds)
-------------------------------------
SQL statements consuming significant database time were found.
RECOMMENDATION 1: SQL Tuning, 68% benefit (998 seconds)
ACTION: Tune the PL/SQL block with SQL_ID "064wqx7c5b81z". Refer to the
"Tuning PL/SQL Applications" chapter of Oracle's "PL/SQL User's Guide
and Reference"
RELEVANT OBJECT: SQL statement with SQL_ID 064wqx7c5b81z
DECLARE
v_var number;
BEGIN
FOR n IN 1..10000
LOOP
select count(*) into v_var from bigtab b, smalltab a;
END LOOP;
END;
ADDM发现有SQL语句在消耗大量数据库时间,它的影响是100%的。给出的建议是优化SQL,能取得68%的效果。
具体操作是优化ADDM找到的PL/SQL块,它的SQL_ID是"064wqx7c5b81z"(可以通过select sql_text from v$sql where sql_id=’064wqx7c5b81z’;查到)。至于如何优化SQL语句,可以参考Oracle文档PL/SQL User's Guide and Reference中的Tuning PL/SQL Applications章节。下面的内容便是我们用来插入数据的测试语句。
下面是ADDM发现的其他问题语句:
FINDING 3: 69% impact (1002 seconds)
------------------------------------
Time spent on the CPU by the instance was responsible for a substantial part
of database time.
RECOMMENDATION 1: SQL Tuning, 67% benefit (986 seconds)
ACTION: Run SQL Tuning Advisor on the SQL statement with SQL_ID
"fvqfghq71cqns".
RELEVANT OBJECT: SQL statement with SQL_ID fvqfghq71cqns and
PLAN_HASH 3281046854
SELECT COUNT(*) FROM BIGTAB B, SMALLTAB A
RATIONALE: SQL statement with SQL_ID "fvqfghq71cqns" was executed 6
times and had an average elapsed time of 166 seconds.
RATIONALE: Average CPU used per execution was 162 seconds.
RECOMMENDATION 2: SQL Tuning, 2.1% benefit (30 seconds)
ACTION: Tune the PL/SQL block with SQL_ID "2b064ybzkwf1y". Refer to the
"Tuning PL/SQL Applications" chapter of Oracle's "PL/SQL User's Guide
and Reference"
RELEVANT OBJECT: SQL statement with SQL_ID 2b064ybzkwf1y
BEGIN EMD_NOTIFICATION.QUEUE_READY(:1, :2, :3); END;
RATIONALE: SQL statement with SQL_ID "2b064ybzkwf1y" was executed 125
times and had an average elapsed time of 0.26 seconds.
RATIONALE: Average CPU used per execution was 0.24 seconds.
这个问题的描述是,实例消耗的CPU事件占据了大量的数据库运行时间。由于发现了两条问题语句,所以这里有两个建议。
第一个建议就是优化我们的测试语句。并且说明了这个问题的根本原因:这条语句总共执行过6次,平均每次消耗了166秒。平均这个问题消耗的CPU时间是162秒。
第二个建议实际上是针对一个系统过程,这个过程是用来读取队列信息的,消耗的资源比较小,我们这里就不需要关心了。
再看最后一个问题:
FINDING 4: 2.2% impact (33 seconds)
-----------------------------------
PL/SQL execution consumed significant database time.
RECOMMENDATION 1: SQL Tuning, 2.2% benefit (33 seconds)
ACTION: Tune the PL/SQL block with SQL_ID "2b064ybzkwf1y". Refer to the
"Tuning PL/SQL Applications" chapter of Oracle's "PL/SQL User's Guide
and Reference"
RELEVANT OBJECT: SQL statement with SQL_ID 2b064ybzkwf1y
BEGIN EMD_NOTIFICATION.QUEUE_READY(:1, :2, :3); END;
RATIONALE: SQL statement with SQL_ID "2b064ybzkwf1y" was executed 125
times and had an average elapsed time of 0.26 seconds.
RATIONALE: Average time spent in PL/SQL execution was 0.26 seconds.
从内容上看,这个问题就是上一个问题中的第二个建议。但是,它导致的结果是不一样的。看这个问题的描述:PL/SQL的执行次数消耗了大量的数据库时间。它的根本原因是因为执行次数太多(125次)。可见ADDM的问题检查相当全面。
第三部分:
这一部分的内容是关于此次优化建议的一些附加信息:
ADDITIONAL INFORMATION
----------------------
Wait class "Application" was not consuming significant database time.
Wait class "Commit" was not consuming significant database time.
Wait class "Concurrency" was not consuming significant database time.
Wait class "Configuration" was not consuming significant database time.
Wait class "Network" was not consuming significant database time.
Wait class "User I/O" was not consuming significant database time.
Session connect and disconnect calls were not consuming significant database
time.Hard parsing of SQL statements was not consuming significant database time.
The analysis of I/O performance is based on the default assumption that the
average read time for one database block is 10000 micro-seconds.这是关于这次优化诊断对各类事件(在Oracle10g,新增了很多新的事件,主要是将原先一些较含糊的事件细化了,同时将所有事件进行了归类。可以查看视图V$SYSTEM_WAIT_CLASS)的一些总结:Application、Commit、Concurrency、Configuration、Network、User I/O类等待事件没有显著消耗数据库时间;会话连接、断连请求没有消耗大量数据库时间;对SQL语句的硬解析没有消耗大量数据库时间;对IO性能的分析是基于默认假设每次读一个数据块的时间是10000微秒的。
第四部分:
这部分是对诊断报告中用到的术语的解释:
TERMINOLOGY
-----------
DATABASE TIME: This is the ADDM's measurement of throughput. From the user's
point of view: this is the total amount of time spent by users waiting for
a response from the database after issuing a call (not including
networking). From the database instance point of view: this is the total
time spent by forground processes waiting for a database resource (e.g.,
read I/O), running on the CPU and waiting for a free CPU (run-queue). The
target of ADDM analysis is to reduce this metric as much as possible,
thereby reducing the instance's response time.
AVERAGE DATABASE LOAD: At any given time we can count how many users (also
called 'Active Sessions') are waiting for an answer from the instance. This
is the ADDM's measurement for instance load. The 'Average Database Load' is
the average of the the load measurement taken over the entire analysis
period. We get this number by dividing the 'Database Time' by the analysis
period. For example, if the analysis period is 30 minutes and the 'Database
Time' is 90 minutes, we have an average of 3 users waiting for a response.
IMPACT: Each finding has an 'Impact' associated with it. The impact is the
portion of the 'Database Time' the finding deals with. If we assume that
the problem described by the finding is completely solved, then the
'Database Time' will be reduced by the amount of the 'Impact'.
BENEFIT: Each recommendation has a 'benefit' associated with it. The ADDM
analysis estimates that the 'Database Time' can be reduced by the 'benefit'
amount if all the actions of the recommendation are performed.
DATABASE TIME:是ADDM的度量数据。从用户角度看:这是从向数据库请求开始,消耗在用户等待响应上的全部时间(不包括网络响应时间);从数据库实例角度看:前台进程消耗在等待一种数据库资源(例如,IO读)、CPU运行和等待CPU释放(队列等待)的总共时间。ADDM分析的目标就尽量降低这个数字,也就是减少实例响应时间。
AVERAGE DATABASE LOAD:所有能统计到的有多少用户(也称为“活动会话”)等待实例响应。这是实例负荷的度量指标。平均数据库负荷是由整个分析计算出来的平均负荷。通过“Database Time”除以分析周期时间得到。例如,分析周期时30分钟,而数据库运行消耗时间是90分钟,那就说明平均有3个用户在等待响应。
IMPACT:每一个找到的问题都有“影响”这一项。“影响”是数据库消耗时间用于处理这个问题的时间不分。假定我们所找到的这个问题完全解决,那么数据库消耗时间就会相应减少“影响”时间。
BENEFIT:每一个找到的问题都“受益”这一项。如果所有建议操作得到实施,ADDM分析估计数据库消耗时间能减少“受益”的全部时间。
第四章 使用STA来优化语句
ADDM得出了诊断结果,并给出了优化建议。通常90%的性能问题都是由于应用引起的,而应用问题肯定离不开问题语句。那么如何优化这些语句呢,以前靠的是DBA的经验,现在就可以使用STA了。
备注:关于STA的使用,请参考我的另一篇文章《利用SQL优化器(STA)优化语句》
STA的使用非常简单,从ADDM诊断报告中,发现有一个与句很影响系统性能,我们就对这条语句进行优化:
第一步:创建优化任务并执行
SQL> DECLARE
2 my_task_name VARCHAR2(30);
3 my_sqltext CLOB;
4 BEGIN
5 my_sqltext := 'select a.table_name, b.object_id from bigtab b, smalltab a';
6
7 my_task_name := DBMS_SQLTUNE.CREATE_TUNING_TASK(
8 sql_text => my_sqltext,
9 user_name => 'DEMO',
10 scope => 'COMPREHENSIVE',
11 time_limit => 60,
12 task_name => 'TEST_sql_tuning_task',
13 description => 'Task to tune a query on a specified PRODUCT');
14
15 dbms_sqltune.Execute_tuning_task (task_name => 'TEST_sql_tuning_task');
16 END;
17 /
PL/SQL procedure successfully completed
DBMS_SQLTUNE.CREATE_TUNING_TASK就是用来创建优化任务的函数。其中,sql_text是需要优化的语句,user_name是该语句通过哪个用户执行,scope是优化范围(limited或comprehensive),time_limit优化过程的时间限制,task_name优化任务名称,description优化任务描述。
dbms_sqltune.Execute_tuning_task是执行优化的函数。
第二步:查看优化建议结果
SQL> set long 10000
SQL> set longchunksize 1000
Cannot SET LONGCHUNKSIZE
SQL> set linesize 100
SQL> SELECT DBMS_SQLTUNE.REPORT_TUNING_TASK('TEST_sql_tuning_task') FROM DUAL;
DBMS_SQLTUNE.REPORT_TUNING_TAS
--------------------------------------------------------------------------------
GENERAL INFORMATION SECTION
-------------------------------------------------------------------------------
Tuning Task Name : TEST_sql_tuning_task
Tuning Task Owner : DEMO
Scope : COMPREHENSIVE
Time Limit(seconds) : 60
Completion Status : COMPLETED
Started at : 11/24/2005 14:46:23
Completed at : 11/24/2005 14:46:24
Number of SQL Restructure Findings: 1
-------------------------------------------------------------------------------
Schema Name: DEMO
SQL ID : f5k29d73zdpsd
SQL Text : select a.table_name, b.object_id from bigtab b, smalltab a
-------------------------------------------------------------------------------
FINDINGS SECTION (1 finding)
-------------------------------------------------------------------------------
1- Restructure SQL finding (see plan 1 in explain plans section)
----------------------------------------------------------------
An expensive cartesian product operation was found at line ID 1 of the
execution plan.
Recommendation
--------------
- Consider removing the disconnected table or view from this statement or
add a join condition which refers to it.
Rationale
---------
A cartesian product should be avoided whenever possible because it is an
expensive operation and might produce a large amount of data.
-------------------------------------------------------------------------------
EXPLAIN PLANS SECTION
-------------------------------------------------------------------------------
1- Original
-----------
Plan hash value: 3479921507
--------------------------------------------------------------------------------
| Id | Operation | Name | Rows | Bytes | Cost (%CPU)| Time
--------------------------------------------------------------------------------
| 0 | SELECT STATEMENT | | 1474M| 32G| 4316K (2)| 14:23:21
| 1 | MERGE JOIN CARTESIAN| | 1474M| 32G| 4316K (2)| 14:23:21
| 2 | TABLE ACCESS FULL | SMALLTAB | 1223 | 23237 | 11 (0)| 00:00:01
| 3 | BUFFER SORT | | 1205K| 5887K| 4316K (2)| 14:23:21
| 4 | TABLE ACCESS FULL | BIGTAB | 1205K| 5887K| 3530 (2)| 00:00:43
--------------------------------------------------------------------------------
-------------------------------------------------------------------------------
优化建议结果分为三部分,下面简单介绍一下每个部分。
第一部分:
和所有Oracle的报告一样,第一部分总是基础信息J。
GENERAL INFORMATION SECTION
-------------------------------------------------------------------------------
Tuning Task Name : TEST_sql_tuning_task
Tuning Task Owner : DEMO
Scope : COMPREHENSIVE
Time Limit(seconds) : 60
Completion Status : COMPLETED
Started at : 11/24/2005 14:46:23
Completed at : 11/24/2005 14:46:24
Number of SQL Restructure Findings: 1
-------------------------------------------------------------------------------
Schema Name: DEMO
SQL ID : f5k29d73zdpsd
SQL Text : select a.table_name, b.object_id from bigtab b, smalltab a
这部分信息包括:任务名称、任务所有者、任务范围、任务执行时间限制(前面几个信息实际上就是我们创建任务时指定的参数)、任务状态、任务开始时间、完成时间、发现的需要重新构造的问题语句数量。接下来就是schema名称、SQL ID和SQL内容。
第二部分:
这部分是优化器发现的需要优化的语句,在我们的例子中,肯定只有一条:
-------------------------------------------------------------------------------
FINDINGS SECTION (1 finding)
-------------------------------------------------------------------------------
1- Restructure SQL finding (see plan 1 in explain plans section)
----------------------------------------------------------------
An expensive cartesian product operation was found at line ID 1 of the
execution plan.
Recommendation
--------------
- Consider removing the disconnected table or view from this statement or
add a join condition which refers to it.
Rationale
---------
A cartesian product should be avoided whenever possible because it is an
expensive operation and might produce a large amount of data.
首先是在查询计划中发现的问题:在语句1的查询计划中发现有很大的笛卡尔积操作(显然,我们的语句是一个外连接操作,做的就是迪卡尔积操作);
其次是针对问题给出的建议:考虑将不需要做连接查询的表或视图去掉,或者增加一个连接条件;
最后导致问题的根本原因:由于迪卡尔积是一个会产生很大数据量的消耗资源的操作,所以需要尽量避免迪卡尔积操作。
第三部分:
这一部分是SQL语句的查询计划。
-------------------------------------------------------------------------------
EXPLAIN PLANS SECTION
-------------------------------------------------------------------------------
1- Original
-----------
Plan hash value: 3479921507
--------------------------------------------------------------------------------
| Id | Operation | Name | Rows | Bytes | Cost (%CPU)| Time
--------------------------------------------------------------------------------
| 0 | SELECT STATEMENT | | 1474M| 32G| 4316K (2)| 14:23:21
| 1 | MERGE JOIN CARTESIAN| | 1474M| 32G| 4316K (2)| 14:23:21
| 2 | TABLE ACCESS FULL | SMALLTAB | 1223 | 23237 | 11 (0)| 00:00:01
| 3 | BUFFER SORT | | 1205K| 5887K| 4316K (2)| 14:23:21
| 4 | TABLE ACCESS FULL | BIGTAB | 1205K| 5887K| 3530 (2)| 00:00:43
--------------------------------------------------------------------------------
-------------------------------------------------------------------------------
这里给出的是语句优化前的原始查询计划。如果优化器认为这条语句需要重写,它还会给出重写后的查询计划。
第五章 总结
哎,不多说了,总之一个字:简单!如果Oracle在这样下去,这个世界上谁都是DBA,谁都不需要DBA了。