HelloDBA [中文]
Search Internet Search HelloDBA
  Oracle Technology Site. email: fuyuncat@gmail.com  MSN: fuyuncat@hotmail.com Add to circle  acoug  acoug 

Oracle Sorting Algorithm


Author:  fuyuncat

Source:  www.HelloDBA.com

Date:  2011-11-16 08:30:02

Share to  Twitter Facebook GMail Blogger Orkut Google Bookmarks Sina Weibo QQ Renren Tieba Kaixin Douban Taobao

Sorting operation is completed in special memory, e.g. sorting work area, when execute SQL query. Data is unsorted when put into this area initially. Oracle should choose an appropritate sorting algorithm to sort them.

Before 10g, sort area is fixed according to sort_area_size. Any session adapts such size unless the setting is modified in session level. Hence, it decides the sort area could not be too large. Under such backgroup, the space become high priority factor impacting Oracle choosing the algorithm, which should be a statble algorithm with the smallest space complexity. Amoung all known Comparasion-Based sorting algorithms, Insertion Sort, Selection Sort and Heap Sort have the smallest space complexity of O(1). However, Selection Sort and Heap Sort are unstable algorithms. Therefore, Insertion Sort becomes the final choice.

Implementation of Insertion Sort is simple, with following steps:
    1. One by one, take item from unsorted data, put it in a correct position in the sorted data;
    2. The sorted data is orgnized by a special data constructrue, for example, Array or List, while in Oracle, it's Balanced Binary Tree.

But Insertion Sort is an inefficient algorithm, whose computational complexity is O(n^2). It means it will cost more CPU when sorting. It's called version 1 sort.

Memory management is innovated in 10g. PGA becomes scalable and working area allocation become adaptive. For example, defaultly, working area in 2g of PGA could be 204k up to 40M. With such flexible memory space, oracle could choose more efficient algorithm.

As we know, Quick Sort is the most efficient Comparasion-Based sorting algorithm, whose computational complexity is O(n Log n). However, it's also an unstable method. To enhance it, many researchers tried to combine it with other algorithms, to improve stability as well as performance.

Oracle also introduced an enhanced quick sort in 10g. Acctually, the new algorithm is a hybrid of a Quick Sort algorithm and Most Significant Digit Radix Sort algorithm.

Radix Sort is an non-comparison sorting approach, and it's evolved from Bucket Sort. To realise this algorithm, the data should fulfill below requirements:
    1. The length of sorting key should be fixed. Othetwise, it should complement with 0 in the left side to align with the longest key value;
    2. To be stable, it should should recognize data range of the sorting key, to decide the bucket range.

The MSD Radix Sort's implementations are described as below.
    1. Starts processing the keys from the most significant digit;
    2. Based on the selected digit, put each element into a bucket associating with data range;
    3. Move to the next digit, sort elements in each bucket;
    4. Recursively call the method, untill to the least significant digit;
    5. Concatenate the buckets together;

Such sort is a stable algorithm, with computational complexity of O(n*k). Within, k is number of digit in the sorting key and it's the key factor affecting efficiency. For example, comparing to Quick Sort, it will be more efficient if k<Log(n).

Consequently, oracle combined these 2 methods togather and complement each other's advantages.
    * Enhance stability utilizing Radix Sort;
    * Reduce number of key digits to be sorted using Quick Sort;

Before understand these points, we should acquaint the Quick Sort first.
    1. Find a pivot, then partition the data into 3 partitions: elements smaller than pivot are put in the left side, equals are put in middle and larger ones are put in the right side (some implementatios may split into 2 partitions, equals are put in left or right side);
    2. Recursively sort the partitions, untill there is only 1 elemnt in it.

We could understand that unstability is sourcing from choosing of pivot: if always get the smallest or largest one, computational complexity will be O(n^2). To reduce unstability, oracle will firstly sample the data and choose the intermediate number as the pivot. As well as, another important approach to reduce unstability is that, it will calls Raix Sort function instead of Quick Sort function to sort the partitions. During the initial runing of Quick Sort, all elements are scaned, which means the condition of a stable MSD Radix Sort could be fulfllt during this round. Particularly, sorting keys are represented as bytes, which mean the range of each digit is 0 up to 255. That is to say, it should just allocate 256 bucket when processing Radix Sort.

Besides, Oracle also accomplished another crucial thing. It found common prefix bytes of each partition, which means it can skip such byte when processing Radix Sort. Similarly, when sorting elements of buckets, oracle does not call Radix Sort function but Quick Sort function in order to find the common prefix bytes again.

In this way, two methods call each other recursively, the performance and stability are improved. It's called version 2 sort.

I implement this new sorting algorithm in Perl. Below is the running result:

  1. D:\Documents\HelloDBA.COM\scripts>perl OracleNewSort.pl  
  2. unsorted data:  
  3. [[      50      5]  
  4. [       30      7]  
  5. [       33      15]  
  6. [       12      28]  
  7. [       30      53]  
  8. [       1       321]  
  9. [       123     32]  
  10. [       32      1]  
  11. [       212     3]  
  12. [       1       32]  
  13. ]  
  14. sorted data:  
  15. [[      1       32]  
  16. [       1       321]  
  17. [       12      28]  
  18. [       30      7]  
  19. [       30      53]  
  20. [       32      1]  
  21. [       33      15]  
  22. [       50      5]  
  23. [       123     32]  
  24. [       212     3]  
  25. ]  

--- Fuyuncat ---


Copyright ©2005, HelloDBA.Com All reseverd.

by fuyuncat