[中文]
Author:
fuyuncat
Source:
www.HelloDBA.com
Date:
2010-01-13 03:06:07
3. Converting to endpoint value
If we draw the histogram, the endpoint value should be x axis, and the endpoint number should be y axis. For those columns with histogram, oracle will convert their values to the data type of endpoint value, say, number. We will discuss how does the endpoint number and endpoint value be stored.
Endpoint Number
For Endpoint Number, I have concluded it in previous paper "Frequency or Height Balanced".
- In Frequency Histogram, the Endpoint Number is the accumunltive records number of the values in sequence. For example, there are 2 records are "1", 1 record is "2", 5 records are "3" ..., then, the correspond buckets' Endpoint Number are 2, 3(2+1), 8(3+5);
- In Height Balanced Histogram, the Endpoint Number is the number of average bucket. For example, value 1,2,3 are located in the 1st bucket; value 4 is a popular value, which occupied 2 buckets, who has been compressed as 1 bucket; value 5 is located in the 3rd bucket..., then, Endpoint Number of these 3 buckets are 1, 3(2+1), 4(3+1).
Endpoing Value
We can find the data type of endpoint value from the dictionary table histgrm$. It means all kinds of datatype will be converted to number finally.
SQL代码
- HELLODBA.COM>desc histgrm$
- Name Null? Type
- ------------- -------- -------------------------------------
- OBJ# NOT NULL NUMBER
- COL# NOT NULL NUMBER
- ROW# NUMBER
- BUCKET NOT NULL NUMBER
- ENDPOINT NOT NULL NUMBER
- INTCOL# NOT NULL NUMBER
- EPVALUE VARCHAR2(1000)
- SPARE1 NUMBER
- SPARE2 NUMBER
Let's look into the conversion one datatype by one.
Note: Data precision to be analyzed for creating histogram is not same as the endpoint value data precision, which will also be discussed below.
Let's prepare the test data first.
SQL代码
- HELLODBA.COM>create table demo.htc3 (a number, b date, c raw(100), d varchar2(100), e clob, f rowid);
- Table created.
- HELLODBA.COM>insert into demo.htc3 values (1, to_date('2010-12-07 00:00:01', 'YYYY-MM-DD HH24:MI:SS'), '01', 'A', 'A', 'AAAxdYAAFAAAPJUAAA');
- 1 row created.
- HELLODBA.COM>insert into demo.htc3 values (2, to_date('2010-12-07 00:00:02', 'YYYY-MM-DD HH24:MI:SS'), '02', 'BB', 'B','AAAxdYAAFAAAPJUAAB');
- 1 row created.
- HELLODBA.COM>insert into demo.htc3 values (3, to_date('2010-12-07 00:00:03', 'YYYY-MM-DD HH24:MI:SS'), '03', 'CCC', 'C','AAAxdYAAFAAAPJUAAC');
- 1 row created.
- HELLODBA.COM>insert into demo.htc3 values (4, to_date('2010-12-07 00:00:04', 'YYYY-MM-DD HH24:MI:SS'), '04', 'DDDDD', 'D','AAAxdYAAFAAAPJUAAD');
- 1 row created.
- HELLODBA.COM>insert into demo.htc3 values (5, to_date('2010-12-07 00:00:05', 'YYYY-MM-DD HH24:MI:SS'), '05', 'EEEEEE', 'E','AAAxdYAAFAAAPJUAAE');
- 1 row created.
- HELLODBA.COM>insert into demo.htc3 values (6, to_date('2010-12-07 12:50:01', 'YYYY-MM-DD HH24:MI:SS'), '06', 'FFFFFFF', 'F','AAAxdYAAFAAAPJUAAF');
- 1 row created.
- HELLODBA.COM>insert into demo.htc3 values (7.654321, to_date('2010-12-07 12:50:02', 'YYYY-MM-DD HH24:MI:SS')+1, '07', 'FFFFFF1', 'G','AAAxdYAAFAAAPJUAAG');
- 1 row created.
- HELLODBA.COM>insert into demo.htc3 values (8.7654321, to_date('2010-12-07 12:50:03', 'YYYY-MM-DD HH24:MI:SS')+2, '08', 'FFFFFF2', 'H','AAAxdYAAFAAAPJUAAH');
- 1 row created.
- HELLODBA.COM>insert into demo.htc3 values (9.87654321, to_date('2010-12-07 12:50:04', 'YYYY-MM-DD HH24:MI:SS')+3, '09', 'FFFFFF3', 'I','AAAxdYAAFAAAPJUAAI');
- 1 row created.
- HELLODBA.COM>insert into demo.htc3 values (10.987654321, to_date('2010-12-07 12:50:05', 'YYYY-MM-DD HH24:MI:SS'), '0A', 'FFFFFFF', 'J','AAAxdYAAFAAAPJUAAJ');
- 1 row created.
- HELLODBA.COM>insert into demo.htc3 values (2.123456789123123456789123456789123456789E33, to_date('2010-12-07 12:50:01', 'YYYY-MM-DD HH24:MI:SS')+100,'AC1265231212CDAC1265231212CDAC1265231212CDAC1265231212CDAC1265231212CD', lpad('A',35,'C'), lpad('A',35,'C'),'AAAxdYAAFAAAPJUAAK');
- 1 row created.
- HELLODBA.COM>insert into demo.htc3 values (2.123456789123123456789123456789123456790E33, to_date('2010-12-07 12:50:01', 'YYYY-MM-DD HH24:MI:SS')+101,'AC1265231212CDAC1265231212CDAC1265231212CDAC1265231212CDAC1265231212AB', lpad('A',35,'C')||'1', lpad('A',35,'C')||'1','AAAxdYAAFAAAPJUAAL');
- 1 row created.
- HELLODBA.COM>insert into demo.htc3 values (2.123456789123123456789123456789123456789E35, to_date('2010-12-07 12:50:59', 'YYYY-MM-DD HH24:MI:SS'), 'AC1265231212CDAC1265231212CDAC1265231212CDAC1265231212CDAC1265231212EF', lpad('A',35,'C')||'2', lpad('A',35,'C')||'2','AAAxdYAAFAAAPJUAAM');
- 1 row created.
- HELLODBA.COM>commit;
- Commit complete.
- HELLODBA.COM>set serveroutput on
- HELLODBA.COM>begin
- 2 dbms_output.enable(1000000);
- 3 dbms_stats.set_param('TRACE',16383);
- 4 dbms_stats.gather_table_stats('DEMO','HTC3',NULL,0,FALSE,'FOR ALL COLUMNS');
- 5 end;
- 6 /
- ...
Number (NUMBER and the subtypes)
We can acknowledge oracle use the original data to group the bucket for histogram from the traced query.
SQL代码
- select substrb(dump(val, 16, 0, 32), 1, 120) ep, cnt
- from (select /*+ no_parallel(t) no_parallel_index(t) dbms_stats cursor_sharing_exact use_weak_name_resl dynamic_sampling(0) no_monitoring */
- "A" val, count(*) cnt
- from "DEMO"."HTC3" t
- where "A" is not null
- group by "A")
- order by val
To store a number value as endpoint value, oracle will ROUND it to the 15th position start from left end.
SQL代码
- HELLODBA.COM>set numw 50
- HELLODBA.COM>select endpoint_number,
- 2 endpoint_value
- 3 from dba_histograms
- 4 where owner = 'DEMO'
- 5 and table_name = 'HTC3'
- 6 and column_name = 'A';
- ENDPOINT_NUMBER ENDPOINT_VALUE
- ---------------------------------------- ----------------------------------------
- 1 1
- 2 2
- 3 3
- 4 4
- 5 5
- 6 6
- 7 7.654321
- 8 8.7654321
- 9 9.87654321
- 10 10.987654321
- 11 2123456789123120000000000000000000
- 12 2123456789123120000000000000000000
- 13 212345678912312000000000000000000000
- 13 rows selected.
Wo can calculate the endpoint value from actual data value with rounding.
SQL代码
- HELLODBA.COM>select a, round(a,15-length(trunc(a))) from demo.htc3;
- A ROUND(A,15-LENGTH(TRUNC(A)))
- ---------------------------------------- ----------------------------------------
- 1 1
- 2 2
- 3 3
- 4 4
- 5 5
- 6 6
- 7.654321 7.654321
- 8.7654321 8.7654321
- 9.87654321 9.87654321
- 10.987654321 10.987654321
- 2123456789123123456789123456789123.45679 2123456789123120000000000000000000
- 2123456789123123456789123456789123.45679 2123456789123120000000000000000000
- 212345678912312345678912345678912345.679 212345678912312000000000000000000000
- 13 rows selected.
Because the difference of precision between data to create histogram and the endpoint value, there may be multiple buckets with same endpoint value. This will confuse the optimizer to estimate the cost. Look at below case.
SQL代码
- HELLODBA.COM>create table demo.htc5 (a number);
- Table created.
- HELLODBA.COM>insert into demo.htc5 values(123456789.123456789);
- 1 row created.
- HELLODBA.COM>insert into demo.htc5 values(123456789.123456799);
- 1 row created.
- HELLODBA.COM>insert into demo.htc5 values(123456789.123456799);
- 1 row created.
- HELLODBA.COM>insert into demo.htc5 values(123456789.123456799);
- 1 row created.
- HELLODBA.COM>insert into demo.htc5 values(123456789.123456799);
- 1 row created.
- HELLODBA.COM>insert into demo.htc5 values(123456789.123456799);
- 1 row created.
- HELLODBA.COM>insert into demo.htc5 values(123456799.123456799);
- 1 row created.
- HELLODBA.COM>insert into demo.htc5 values(123456799.123456799);
- 1 row created.
- HELLODBA.COM>commit;
- Commit complete.
- HELLODBA.COM>begin
- 2 dbms_output.enable(1000000);
- 3 dbms_stats.set_param('TRACE',16383);
- 4 dbms_stats.gather_table_stats('DEMO','HTC5',NULL,0,FALSE,'FOR ALL COLUMNS');
- 5 end;
- 6 /
- PL/SQL procedure successfully completed.
- HELLODBA.COM>select endpoint_number,
- 2 endpoint_value
- 3 from dba_histograms
- 4 where owner = 'DEMO'
- 5 and table_name = 'HTC5'
- 6 and column_name = 'A';
- ENDPOINT_NUMBER ENDPOINT_VALUE
- ---------------------------------------- ----------------------------------------
- 1 123456789.123457
- 6 123456789.123457
- 8 123456799.123457
The endpoint value of the 1st & 2nd bucket are the same value. Let's explain the query to equal predict these values.
SQL代码
- HELLODBA.COM>set autot trace exp
- HELLODBA.COM>select * from demo.htc5 where a=123456789.123456789;
- Execution Plan
- ----------------------------------------------------------
- Plan hash value: 4195264197
- --------------------------------------------------------------------------
- | Id | Operation | Name | Rows | Bytes | Cost (%CPU)| Time |
- --------------------------------------------------------------------------
- | 0 | SELECT STATEMENT | | 5 | 60 | 3 (0)| 00:00:01 |
- |* 1 | TABLE ACCESS FULL| HTC5 | 5 | 60 | 3 (0)| 00:00:01 |
- --------------------------------------------------------------------------
- Predicate Information (identified by operation id):
- ---------------------------------------------------
- 1 - filter("A"=123456789.123456789)
- HELLODBA.COM>select * from demo.htc5 where a=123456789.123456799;
- Execution Plan
- ----------------------------------------------------------
- Plan hash value: 4195264197
- --------------------------------------------------------------------------
- | Id | Operation | Name | Rows | Bytes | Cost (%CPU)| Time |
- --------------------------------------------------------------------------
- | 0 | SELECT STATEMENT | | 5 | 60 | 3 (0)| 00:00:01 |
- |* 1 | TABLE ACCESS FULL| HTC5 | 5 | 60 | 3 (0)| 00:00:01 |
- --------------------------------------------------------------------------
- Predicate Information (identified by operation id):
- ---------------------------------------------------
- 1 - filter("A"=123456789.123456799)
- HELLODBA.COM>select * from demo.htc5 where a=123456799.123456799;
- Execution Plan
- ----------------------------------------------------------
- Plan hash value: 4195264197
- --------------------------------------------------------------------------
- | Id | Operation | Name | Rows | Bytes | Cost (%CPU)| Time |
- --------------------------------------------------------------------------
- | 0 | SELECT STATEMENT | | 2 | 24 | 3 (0)| 00:00:01 |
- |* 1 | TABLE ACCESS FULL| HTC5 | 2 | 24 | 3 (0)| 00:00:01 |
- --------------------------------------------------------------------------
- Predicate Information (identified by operation id):
- ---------------------------------------------------
- 1 - filter("A"=123456799.123456799)
When estimate the number of rows of the value of the 1st bucket, the optimizer got the number of the 2nd bucket.
Date (DATE, TIMESTAMP and subtypes)
Oracle also uses the original data to create histogram:
SQL代码
- select substrb(dump(val, 16, 0, 32), 1, 120) ep, cnt
- from (select /*+ no_parallel(t) no_parallel_index(t) dbms_stats cursor_sharing_exact use_weak_name_resl dynamic_sampling(0) no_monitoring */
- "B" val, count(*) cnt
- from "DEMO"."HTC3" t
- where "B" is not null
- group by "B")
- order by val
To store the data as endpoint value, it needs convert the date to number. The rule is converting the value to days, both date & time parts. To convert time parts to day, it will convert it to seconds then divide 86400(24*60*60), which is the number of seconds of one day. After that, the number should also be ROUND to the 15th position before be stored.
SQL代码
- HELLODBA.COM>select endpoint_number,
- 2 endpoint_value
- 3 from dba_histograms
- 4 where owner = 'DEMO'
- 5 and table_name = 'HTC3'
- 6 and column_name = 'B';
- ENDPOINT_NUMBER ENDPOINT_VALUE
- ---------------------------------------- ----------------------------------------
- 1 2455538.00001157
- 2 2455538.00002315
- 3 2455538.00003472
- 4 2455538.0000463