All about oracle sorting

Date: 2010-03-18 03:17:15

Share to

sort arithmetic

    During sorting, data will first be input into a private memory area named sort area, and be sorted in memory. If the data to be sorted is too large to be handle once, the sorted data will be directly written (no caching) into temporary tablespace as a data set. Once all of data be written into disk, oracle will merge those data sets recursively, till to all data be merged in one data set.

Initial Runs

    The process that sort data in-memory is called initial run. Up to 80~90% of the sort_area_size may be used for read buffers during an initial run, and the rest is used for write buffers. If we know how many data (considering output records, average length and block size, it could be estimated) need be sorted, we can estimate the number of initial runs by below formula.
    Initial Runs = CEIL(SORT_DATA_BLOKCS/ROUND(0.8*SORT_AREA_SIZE/BLOCK_SIZE))

Merges

    2 or more data sets could be merged once. The number data sets be merged once is called merge width. Merging is also be processed in sort area. To merge the data set, need first directly read them from disk. Just like MBRC, mutliple blocks could be read in one io, which is controlled by SMRC (sort_multiblock_read_count, who become a hidden parameter from 9i). However, the value of this parameter is not the actual SMRC, it actually is to limit the MAX SMRC. To merge data sets, at least 2 data set must be selected. And correspond to asynchronous io (disk_asynch_io=TRUE), 2 read buffers required. So, the actual SMRC could be calculated as below.
    SMRC = MIN(ROUND(SORT_AREA_SIZE/(2*2*BLOCK_SIZE))-1, _sort_multiblock_read_count)

    Then, the merge width could be calculated base on the SMRC,
    merge width = ROUND(SORT_AREA_SIZE/(2*SMRC*BLOCK_SIZE))-1

    It will keep merging, untill all data are merged in one data set. The process will be recursive. For the last data sets, if the number is less than merge width, they will be put into next round. We have below function to calculate the number of merges.

    Note: All of the merges except the last one are named Intermediate runs.

SQL代码

HELLODBA.COM>create or replace function sort_merges(init_runs number, width number)

  2  return number

  3  as

  4   i number:=init_runs;

  5   n number:=0;

  6  begin

  7    while i>width loop

  8      n := n+floor(i/width);

  9      i := floor(i/width)+ mod(i,width);

10    end loop;

11    if i > 0 then

12      n := n+1;

13    end if;

14    return n;

15  end;

16  /



Function created.

    Look into an example. Here we have 65536b sort area, 721 blocks estimated to be sorted with 8k block size. We can get the sorting statistics data by below query.

SQL代码

HELLODBA.COM>select init_runs,SMRC,merge_width,sort_merges(init_runs,merge_width) merges from

  2  (select init_runs,

  3         SMRC,

  4         round(sort_area_size / (2 * SMRC * block_size), 0) - 1 as merge_width

  5    from (select sort_blocks, sort_area_size, block_size, ceil(sort_blocks / round(0.8 * sort_area_size / block_size, 0)) init_runs,

  6                 round(sort_area_size / (2 * 2 * block_size), 0) - 1 SMRC

  7            from (select 721 as sort_blocks, 65536 as sort_area_size, 8192 as block_size from dual)));



INIT_RUNS       SMRC MERGE_WIDTH     MERGES

---------- ---------- ----------- ----------

       121          1           3         60



1 row selected.

new sort algorithm

    From 10gR2, new sort algorithm is introduced, which controled by hidden parameter "_new_sort", default is true. Here is its improvement.

*In-memory sort.
    Less momory required for those small data set sorting in memory.
*Merge Sort
    Will save keys for those initial data sets, and will choose the data sets to be merged by comparing the keys. And you will see a new statistics entry from the trace file:
       Comparisons while searching for key in-memory 120
    The temporary blocks will be more possible to be re-used when mergeing, save disk blocks.

Trace

    10031, 10032, 10033 event traces will help us to understand the sorting. Below is the trace contents of previous example.

SQL代码

Recording run at 401ca9 for 6 blocks

Recording run at 401cb0 for 6 blocks

Recording run at 401cb6 for 6 blocks

Recording run at 401d3c for 6 blocks

Recording run at 401d42 for 6 blocks

Recording run at 401d48 for 6 blocks

...

---- Sort Parameters ------------------------------

sort_area_size                    65536

sort_area_retained_size           65536

sort_multiblock_read_count        1

max intermediate merge width      3

Merging run at 401d1a for 1 blocks

Merging run at 401c3a for 6 blocks

Total number of blocks to read: 7 blocks

Recording run at 401d1b for 7 blocks

Merging run at 401c40 for 6 blocks

Merging run at 401c46 for 6 blocks

Merging run at 401c5c for 6 blocks

Total number of blocks to read: 18 blocks

Recording run at 401d22 for 17 blocks

...

---- Sort Statistics ------------------------------

Initial runs                              121

Intermediate runs                         59

Number of merges                          60

Input records                             47582

Output records                            47582

Disk blocks 1st pass                      721

Total disk blocks used                    896

Total number of comparisons performed     651813

  Comparisons performed by in-memory sort 320660

  Comparisons performed during merge      331033

  Comparisons while searching for key in-memory 120

Temp segments allocated                   1

Extents allocated                         56

Uses version 2 sort

Uses asynchronous IO

    ---- Run Directory Statistics ----

Run directory block reads (buffer cache)  240

Block pins (for run directory)            1

Block repins (for run directory)          239

    ---- Direct Write Statistics -----

Write slot size                           8192

Write slots used during in-memory sort    2

Number of direct writes                   2884

Num blocks written (with direct write)    2884

Block pins (for sort records)             2884

Waits for async writes                    1731

    ---- Direct Read Statistics ------

Size of read slots for merge phase        8192

Number of read slots for merge phase      6

Size of read slots for output             8192

Number of read slots for output           8

Number of direct sync reads               1498

Number of blocks read synchronously       1498

Number of direct async reads              1386

Number of blocks read asynchronously      1386

Waits for async reads                     347

---- End of Sort Statistics -----------------------

   --- Fuyuncat ---

Previous：TIPS: Build OUTLINE agilely

Next：Bloom Filter

Return To this Category