[English]
作者:
fuyuncat
来源:
www.HelloDBA.com
1、并行化跟踪设置
在OLAP或DSS系统中,或者在OLTP中大批量处理数据时,我们经常会借助于并行化处理来最大化利用硬件资源使作业在更短的时间内返回结果。由于影响并行化的因素比较多,如cpu数量、系统参数设置、分区等等,在并行化过程中,又可能会遇到一些意想不到的问题,例如,语句不能并行化或者并行度(DOP Degree Of Parallel)和预期的不一致。这时我们就可以通过Parallel Trace对并行过程进行分析,找出问题所在。
在9i之前,可以通过设置10384~10399 event对并行会话进行trace,在9i以后,可以通过在会话中设置“_px_trace”来实现:
alter session set "_px_trace"=[[Verbosity,]area],[[Verbosity,]area],..,[time];例如:
alter session set "_px_trace"=high,execution,medium,execution,time;其中参数可以设置的值:
Verbosity:
Ø high
Ø medium
Ø low
Area:
Ø scheduling - ( 与某些10384和10390事件相同)
Ø execution - (与某些10390事件相同)
Ø granule - (与某些10390和10391事件相同)
Ø messaging - (与某些10392和10393事件相同)
Ø buffer - (和10399事件相同)
Ø compilation
Ø all - all of the above
Ø none - none of the above.
Timing
Ø time
2、跟踪文件解析
下面我们产生一个trace文件,并分别解释Trace文件中各个部分的含义(蓝色字体部分为解释):
SQL> ALTER SESSION FORCE PARALLEL DML;Session altered.SQL> alter session set "_px_trace"= "all";Session altered.SQL> UPDATE /*+ PARALLEL(CT,10) */ CS2_CT_MVMT CT2 SET CT.LAST_MOD_DT = NVL((SELECT TL.MSG_GMT_DT3 FROM CS2_TXN_LOG TL4 WHERE CT.MSG_ID = TL.MSG_ID),CT.LAST_MOD_DT)5 WHERE CT.MSG_ID > 9000000000000000000;ERROR at line 1:ORA-01013: user requested cancel of current operationSQL> alter session set "_px_trace"=”none”;Session altered.· 文件头,关于Oracle DB Server的基本信息:
/home/oracle/admin/cs2prd/udump/cs2prd1_ora_27073.trcOracle Database 10g Enterprise Edition Release 10.2.0.3.0 - 64bit ProductionWith the Partitioning, Real Application Clusters, OLAP and Data Mining optionsORACLE_HOME = /home/oracle/RAC/10.2.0System name: LinuxNode name: pmrac01.cargosmart.comRelease: 2.6.9-55.ELsmpVersion: #1 SMP Fri Apr 20 16:36:54 EDT 2007Machine: x86_64Instance name: cs2prd1Redo thread mounted by this instance: 1Oracle process number: 153Unix process pid: 27073, image: oracle@pmrac01.cargosmart.com (TNS V1-V3)· 以下为并行度信息:
*** ACTION NAME:() 2009-03-16 17:31:25.122*** MODULE NAME:(SQL*Plus) 2009-03-16 17:31:25.122*** SERVICE NAME:(SYS$USERS) 2009-03-16 17:31:25.122*** SESSION ID:(1802.17915) 2009-03-16 17:31:25.122Kkfdapdml --- 函数名称pgadep:0 pdml mode:2 PQ allowed DML allowed not autonomous => allowed--> 从这段信息看,Parallel DML已经激活了The table : 65109 ---> 相关对象的object_idkkfdPaPrmDOP = 10 (computed from hint/dictionary) --->并行度(及并行度的来源,这里是从HINT或者数据字典中——即对象属性设置中来)use hint DOP on table ---> 最终使用HINT中的DOP设置kkfdPaForcePrm force DML d = 32767 ---> 当强制使用PDML时,DOP可能的最大值为 32767The table : 64796 ---> 下一个相关对象的object_id… …kxfrDefaultDOP --> 计算默认DOP的函数DOP Trace -- compute default DOP# CPU = 8 ---> CPU数量Threads/CPU = 2 ("parallel_threads_per_cpu") ---> parallel_threads_per_cpu的设置值default DOP = 16 (# CPU * Threads/CPU) --> 单个实例的默认DOP=CPU数量*parallel_threads_per_cpudefault DOP = 32 (DOP * # instance) --> RAC的默认DOP=单个实例的默认DOP*RAC实例数… …The table : 65109kkfdPaPrmDOP = 10 (computed from hint/dictionary)use hint DOP on tablekkfdtipserial - DOP = 1 (from kkfdPaPrm()) or index not part'd ---> 对象未分区The table : 65109… …kkfdPiAloDOP = 4 (kkfdpideg)Part. No. = 0 (kkfdpiPnum)No Pruning InfoPararllel Operation Type = 6 (kkfdpiOpType)Flags = 48 (kkfdpiflg)min. tranx free list = 65535 (kkfdpimgi) --> Min Transaction Freelists, 限制了DOP的最大值… …· 以下开始请求和分配并行进程:
kxfraloDOP trace -- requested thread from best ref obj = 10 (from kxfrIsBestRef()) ---> 根据最佳对象的DOP来请求线程数.kxfralothreads requested = 10 (from kxfrComputeThread()) ---> 请求分配线程数kxfraloadjusted no. threads = 10 (from kxfrAdjustDOP()) ---> 根据相关算法调整后的线程数kxfraloabout to allocate 10 slaves --> 最终准备分配的线程数kxfrAllocSlavesDOP trace -- call kxfpgsg to get 10 slaves ---> 调用函数kxfpgsg来获取slave进程kxfpgsgnum server requested = 10kxfplistGetting instance info for open groupkxfpiinfoinst[cpus:mxslv] ---每个实例的CPU数量和parallel_max_servers的设置,我们这里是2个实例的RAC环境1[8:160] 2[8:0] -- 第一个实例中有8颗CPU,parallel_max_servers为160,第二个实例中有8颗CPU,parallel_max_servers为0kxfpclinfoinst(load:user:pct:fact)aff --->各个实例的负载信息,Oracle会根据这些负载在并行化时进行负载均衡。1(6:0:100:133) 2(7:0:100:114)---> 实例1的负载为6,并行用户数为0,能100%的利用系统资源(即没有资源限制),负载因子为133 ---> 实例2的负载为7,并行用户数为0,能100%的利用系统资源(即没有资源限制),负载因子为114---> 这里最关键的数据为负载因子,它是由前3个数据计算得出,表明该机器还能接受多大量的负载,数字越大,说明能接受的负载越多 kxfpAdaptDOPRequested=10 Granted=10 Target=128 Load=13 Default=32 users=0 sets=2---> 请求10个进程, 并同意了这10个请求 ---> 根据整个RAC的资源状况,128个负载是一个最佳的负载状态.如果这个值小于请求数,那么可能会抛出ORA错误或者自动减少进程数(如PARALLEL_AUTOMATIC_TUNING或者PARALLEL_ADAPTIVE_MULTI_USER被设置为true时) ---> 当前负载为13(6+7).----> 默认DOP是32---> 当前并行用户数为0.---> 请求的进程集数为2.kxfpgsgload adapt num servers requested to = 10 (from kxfpAdaptDOP())kxfpsoriSorted: 1(6:133) 2(7:114) --->根据实例的负载情况排序将有限从排序更前的实例中分配进程kxfrDefaultDOPDOP Trace -- compute default DOP# CPU = 8Threads/CPU = 2 ("parallel_threads_per_cpu")default DOP = 16 (# CPU * Threads/CPU)default DOP = 16 (DOP * # instance)kxfpldblcompute default DOP = 64 (from kxfrDefaultDOP()) for loadbalancingkxfpunitvar=1 limit=64 use_aff=0 aff_num=0 unit=5 ---> 每次以5个为单位发出进程分配请求.kxfpgsggetting 2 sets of 10 threads, client parallel query execution flg=0x30 ---> coordinator请求分配2个进程集(可参加语句的查询计划)Height=0, Affinity List Size=0, inst_total=2, coord=1Insts 1 2 ---> 每个进程集准备从各个实例请求分配的进程数Threads 10 0 ---> 从实例1请求分配10个,从实例2请求分配0个--- 开始为第一个进程集分配进程
kxfpg1srvtrying to get slave P000 on instance 1 –--> 试图从实例1中获取P000 slave进程kxfpg1sgGot It. 1 so far. ---> 成功获取到进程,到目前为止已分配了1个进程……kxfpg1sgGot It. 10 so far. ---> 成功为第一个进程集分配到了10个进程kxfpg1sggot 10 servers (sync), returning...--- 开始为第二个进程集分配进程
kxfpg1srvtrying to get slave P000 on instance 1kxfpg1srvtrying to get slave P001 on instance 1kxfpg1srvtrying to get slave P002 on instance 1… …kxfpg1sgGot It. 10 so far. ---> 成功为第二个进程集分配到了10个进程kxfpg1sggot 10 servers (sync), returning...GROUP GETAcquired 20 slaves in 2 sets q serial 3454977 --- 总共为2个进程集分配了20个slave进程P000 inst 1 spid 27075P001 inst 1 spid 27077P002 inst 1 spid 27084P003 inst 1 spid 27086P004 inst 1 spid 27088P005 inst 1 spid 27090P006 inst 1 spid 27092P007 inst 1 spid 27094P008 inst 1 spid 27096P009 inst 1 spid 27098P010 inst 1 spid 27100P011 inst 1 spid 27102P012 inst 1 spid 27108P013 inst 1 spid 27110P014 inst 1 spid 27112P015 inst 1 spid 27114P016 inst 1 spid 27116P017 inst 1 spid 27118P018 inst 1 spid 27120P019 inst 1 spid 27122kxfpgsggot 10 threads on 1 instance(s), avg height=10 ---> 每个进程集从单个实例中分配了10个进程,平均高度为10.这里的高度是指每个进程集在一个实例上的进程数量。这里只从一个实例分配了10个进程,因此为10.假如一个进程集分别从2个实例上分配了5个进程,那么高度就为5.当然,这数量是不能低于前面的单元数的。Insts 1 2Svrs 20 0 ---> 最终,2个进程集总共从实例1分配了20个进程,从实例2分配了0个进程(因为我们的实例的参数parallel_max_servers为0,因此不可能从实例2上分配并行进程)
· 以下部分显示了优化器对并行化查询的解析过程:
--- 进程集1中的解析过程
kxfxcPutSession --> 这一部分只在10g和以上版本中才有,显示了在CBO影响并行化执行的优化器参数.Send Compilation environment:optimizer_mode_hinted = falseoptimizer_features_hinted = 0.0.0parallel_execution_enabled = trueparallel_query_forced_dop = 0parallel_dml_forced_dop = defaultparallel_ddl_forced_degree = 0… …--- 以下为解析过程
flashback session env:scn: 0xffff.ffffffffkxfxpndi: 0, cnt: 10kxfxpndcalling kxfxmdmpenqueueing:MSG( -->, KXFXOparse, STREAM_HINT )kxfxmh at addr: 0x16368a760kxfxmh->kxfxmhh.kxfmhtyp: KXFMHFSXkxfxmh->kxfxmhmsz: 1912kxfxmh->kxfxmhflg: 0x5kxfxmh->kxfxmhotyp: , KXFXOparsekxfxmp->kxfxmppxid: 1kxfxmp->kxfxmptqidb: 3583324kxfxmp->kxfxmpssno: 1kxfxmp->kxfxmpsno: 0kxfxmp->kxfxmpnslv: 10kxfxmp->kxfxmpntqd: 2kxfxmp->kxfxmpnpqd: 10kxfxmp->kxfxmpnss: 0kxfxmp->kxfxmpndfo: 2kxfxmp->kxfxmpsqllen: 251kxfxmp->kxfxmpgvsqllen: 0kxfxmp->kxfxmpoct: 6kxfxmp->kxfxmpflg: TOP/NO PEEK/TRANS/CVM OFF/PDML ON/KKOISRECUR OFFkxfxmp->kxfxmpqcxid: xid: 0x004a.002.00087961kxfxmp->kxfxmpbuf: 0x16368a858… …kxfxcp1Sending parse to slave set 1: ---> 进程集1中解析的语句User sqllen sent from QC = 251UPDATE /*+ PARALLEL(CT,10) */ CS2_CT_MVMT CTSET CT.LAST_MOD_DT = NVL((SELECT TL.MSG_GMT_DTFROM CS2_TXN_LOG TLWHERE CT.MSG_ID = TL.MSG_ID),CT.LAST_MOD_DT)WHERE CT.MSG_ID > 9000000000000000000……接下来是进程集2中的解析过程
kxfxcPutSessionSend Compilation environment:optimizer_mode_hinted = falseoptimizer_features_hinted = 0.0.0parallel_execution_enabled = trueparallel_query_forced_dop = 0parallel_dml_forced_dop = defaultparallel_ddl_forced_degree = 0parallel_ddl_forced_instances = 0_query_rewrite_fudge = 90optimizer_features_enable = 10.2.0.3_optimizer_search_limit = 5……
3、案例分析
在我们对一次海量数据更新的调优过程中(见文章《让语句横着走————对海量数据更新的并行优化》),曾经出现一件令人费解的事情:有时从会话信息中找不到任何并行会话,看起来似乎并行化“消失”了。于是,在测试脚本中,我加入了以下语句打开并行跟踪。
alter session set "_px_trace"= "all";终于再一次的发生这种情况,并成功获取到了trace文件。打开文件看看到底怎么回事?
首先看并行度部分:
kkfdapdmlpgadep:0 pdml mode:2 PQ allowed DML allowed not autonomous => allowedThe table : 65109kkfdPaPrmDOP = 6 (computed from hint/dictionary)use hint DOP on table……kxfraloDOP trace -- requested thread from best ref obj = 6 (from kxfrIsBestRef())kxfralothreads requested = 6 (from kxfrComputeThread())kxfraloadjusted no. threads = 6 (from kxfrAdjustDOP())可以看到PDML已经并激活,并行度计算也正确。接下来,就发现问题了:
kxfpiinfoinst[cpus:mxslv]1[8:160] 2[8:160]我们的环境是RAC,有两个节点,parallel_max_servers都被设置为0,为了测试,我在测试脚本中临时将parallel_max_servers改为160,并且只在实例1上进行了测试。但是这里却发现2个节点都被设置为了160。
继续看下面的跟踪信息:
kxfpclinfoinst(load:user:pct:fact)aff1(10:0:100:80) 2(5:0:100:160)这里说明实例1的负载容量远远小于负载2,接下来的信息也证明了我的猜想——所有并行进程都被分配到实例2上面去了!
kxfpgsggetting 2 sets of 6 threads, client parallel query execution flg=0x30Height=0, Affinity List Size=0, inst_total=2, coord=1Insts 2 1Threads 6 0… …GROUP GETAcquired 12 slaves in 2 sets q serial 3618817P000 inst 2 spid 27299P001 inst 2 spid 27305P002 inst 2 spid 27307P003 inst 2 spid 27309P004 inst 2 spid 27311P005 inst 2 spid 27396P006 inst 2 spid 27398P007 inst 2 spid 27400P008 inst 2 spid 27402P009 inst 2 spid 27404P010 inst 2 spid 29654P011 inst 2 spid 29656kxfpgsggot 6 threads on 1 instance(s), avg height=6Insts 2 1Svrs 12 0通过Trace文件,我们轻松的知道了为什么在实例1上看不到并行进程了——在实例1 的负载高时,并行进程被负载均衡分配到了实例2上去了。
这个问题的根本原因是实例2上的parallel_max_servers被修改了。但是,这个参数怎么会被修改呢?看下测试脚本:
Test.sh:
… …sqlplus -s /nolog << EOFconnect cs2_party_owner/cs2_party_owner@test@test.sqlEOFTest.sql:
ALTER SYSTEM SET parallel_max_servers=160 scope=memory;…..UPDATE /*+ parallel(CT, 6) */ ……ALTER SYSTEM SET parallel_max_servers=0 scope=memory;我们在脚本中临时修改了parallel_max_servers的值,脚本完成后再改回。而我们是通过tnsname——test来连接数据库,并且我设置了test的连接串连接到实例1上,会不会test的内容被修改过了呢?查看test的内容,果然被人修改过了…
test =(DESCRIPTION =(ADDRESS = (PROTOCOL = TCP)(HOST = rac01)(PORT = 1521))(ADDRESS = (PROTOCOL = TCP)(HOST = rac02)(PORT = 1521))(LOAD_BALANCE = yes)(CONNECT_DATA =(SERVER = DEDICATED)(SERVICE_NAME = prd))被人修改成了负载均衡方式。因此在rac01负载高时,脚本被放到rac02上去执行了,再加上测试过程中会出现脚本错误或者被人工Ctrl+c终止脚本,导致最后修改回0的语句并不能执行到,因此rac02的参数就这样被修改了。)