Striving for Optimal Performance
  • Home
  • Blog
    • Archive
    • Categories
  • Troubleshooting Oracle Performance
    • Description
    • Structure
    • Table of Contents
    • Forewords
    • Reviews
    • Downloadable Files
    • Addenda and Errata
  • Publications
  • Public Appearances
    • Past Public Appearances
  • Contact
  • Search
  • About

Posts in category Object Statistics

Challenges and Chances of the 11g Query Optimizer

Dec12
2011
2 Comments Written by Christian Antognini

Challenges and Chances of the 11g Query Optimizer is the name of a presentation I gave at several events (e.g. Trivadis Performance Days, Oracle OpenWorld, DOAG Konferenz, UKOUG Conference) throughout 2011. Its abstract is the following:

With every new release, the query optimizer is enhanced. Oracle Database 11g Release 1 and Release 2 are no exception to the rule. Specifically, they introduce key improvements in the following areas: indexing, optimization techniques, object statistics and plan stability. The aim of this presentation is to review the new features from a practical point of view as well as to point out challenges related to them. In other words, to let you know what you can expect from the query optimizer when you upgrade to Oracle Database 11g.

The aim of this short post is to point out that I made available the current version of the slides and all the scripts that go with them here.

The structure of the presentation (incl. a reference to the available scripts) is the following:

  • Observations
    • Number of Query Optimizer Parameters by Release
    • Number of Query Optimizer Bugs Fixed by Patchset
  • Indexing
    • Invisible Indexes (ex_invisible_index.sql)
    • Index Support for Linguistic LIKE (ex_linguistic_like.sql)
    • INDEX REBUILD and Statistics History (ex_index_rebuild.sql)
  • Optimization Techniques
    • Full Outer Join (ex_full_outer_join.sql)
    • Join-Filter Pruning (ex_join_filter_pruning.sql)
    • Table Expansion (ex_table_expansion.sql)
    • Join Factorization (ex_join_factorization.sql)
    • OR Expansion (ex_or_expansion.sql)
    • Join Elimination (ex_join_elimination.sql)
    • Subquery Unnesting (ex_subquery_unnesting.sql)
  • System and Object Statistics (DBMS_STATS)
    • Workload System Statistics
    • Object Statistics – Default Preferences
    • Object Statistics – Auto Sample Size
    • Object Statistics – Pending Statistics (ex_pending_object_statistics.sql)
    • Object Statistics – Incremental Statistics (ex_incremental_stats.sql)
    • Object Statistics – Extended Statistics on Expressions (ex_extended_statistics1.sql)
    • Object Statistics – Extended Statistics on Column Groups (ex_extended_statistics2.sql)
    • Object Statistics – Seeding Column Groups
    • Object Statistics – Comparing Statistics (ex_comparing_statistics.sql)
    • Object Statistics – Locks not Exported
    • JOB_QUEUE_PROCESSES
  • Plan Stability
    • CURSOR_SHARING
    • SQL Plan Baselines (ex_execution_plan_stability.sql, ex_execution_plan_stability_10g.sql, ex_execution_plan_stability_11g.sql)
    • Stored Outlines
    • Adaptive Cursor Sharing (ex_bind_peeking.sql, ex_bind_peeking_bind_aware.sql)
    • Cardinality Feedback (ex_cardinality_feedback.sql)
Posted in 11gR1, 11gR2, Bug, Indexes, Query Optimizer, Speaking, System Statistics

Does CREATE INDEX Gather Global Statistics?

Dec17
2009
4 Comments Written by Christian Antognini

You can add the COMPUTE STATISTICS clause to the CREATE INDEX statement. It instructs the SQL statement to gather and store index statistics in the data dictionary, while creating the index. This is useful because the overhead associated with the gathering of statistics while executing this SQL statement is negligible. In Oracle9i, the gathering of statistics is performed only when this clause is specified. As of Oracle Database 10g, whenever statistics are not locked, their gathering is done by default, which means the COMPUTE STATISTICS clause is deprecated and available for backward compatibility only.

Unfortunately, CREATE INDEX does not gather global statistics. As a result, whenever you are creating partitioned indexes, the global statistics might be inaccurate. Let me show you an example:

  • Create partitioned table, insert data (notice that the number of distinct values is equal to the number of rows) and create a local index
SQL> CREATE TABLE t (n1 number, n2 number)
  2  PARTITION BY RANGE (n1) (
  3    PARTITION p1 VALUES LESS THAN (11),
  4    PARTITION p2 VALUES LESS THAN (21)
  5  );

Table created.

SQL> INSERT INTO t
  2  SELECT rownum, rownum
  3  FROM dual
  4  CONNECT BY level <= 20;

20 rows created.

SQL> CREATE INDEX i ON t (n2) LOCAL;

Index created.
  • The CREATE INDEX statement gathered the statistics for the index; let’s check them…
SQL> SELECT partition_name, global_stats, distinct_keys
  2  FROM user_ind_statistics
  3  WHERE index_name = 'I';

PARTITION_NAME GLOBAL_STATS DISTINCT_KEYS
-------------- ------------ -------------
               NO                      10
P1             NO                      10
P2             NO                      10

As you can see 1) the number of distinct keys at the global level is wrong; it should be 20! 2) the GLOBAL_STATS column at the index level is set to NO. As a result, when you create a partitioned index, you should manually gather the global index statistics straight after. In other words, you should do the following:

  • Manually gather global level index statistics
SQL> execute dbms_stats.gather_index_stats(ownname=>user, indname=>'i', granularity=>'global')

PL/SQL procedure successfully completed.
  • Check whether the index statistics are accurate
SQL> SELECT partition_name, global_stats, distinct_keys
  2  FROM user_ind_statistics
  3  WHERE index_name = 'I';

PARTITION_NAME GLOBAL_STATS DISTINCT_KEYS
-------------- ------------ -------------
               YES                     20
P1             NO                      10
P2             NO                      10

There are situations, however, where it is not necessary to manually gather the global index statistics. For example, when the index is prefixed. But, as a general rule, I would not rely on the automatically gathered statistics for partitioned indexes.

Posted in 10gR1, 10gR2, 11gR1, 11gR2, 9iR2, Indexes, Partitioning

Granularity 'APPROX_GLOBAL AND PARTITION'

Oct03
2008
2 Comments Written by Christian Antognini

The patchset 11.1.0.7 introduces in the package DBMS_STATS a new value for the parameter GRANULARITY.

The description provided by the development team in $ORACLE_HOME/rdbms/admin/dbmsstat.sql is the following:

'APPROX_GLOBAL AND PARTITION' - This option is similar to
   'GLOBAL AND PARTITION'. But the global statistics are aggregated
   from partition level statistics. It will aggregate all statistics except number of
   distinct values for columns and number of distinct keys of indexes.
   The existing histograms of the columns at the table level
   are also aggregated.The global statistics are  gathered
   (i.e., going back to GLOBAL AND PARTITION behaviour)
   if partname argument is null or if the aggregation cannot be done
   e.g., statistics of one of the partitions is missing.
   This option is useful when you collect statistics for a new partition added
   into a range partitioned table (for example, a table  partitioned by month).
   The new data in the partition makes the global statistics stale (especially
   the min/max values of the partitioning column). This stale global statistics
   may cause suboptimal plans.  In this scenario, users can collect statistics
   for the newly added partition with 'APPROX_GLOBAL AND PARTITION'
   option so that the global statistics will reflect the newly added range.
   This option will take less time than 'GLOBAL AND PARTITION' option since the
   global statistics are aggregated from underlying partition level statistics.
   Note that, if you are using APPROX_GLOBAL AND PARTITION,
   you still  need to collect global statistics (with granularity = 'GLOBAL' option)
   when there is substantial amount of change at the table level.
   For example you added 10% more data to the table.  This is needed to get the
   correct number of distinct values/keys statistic at table level.

To illustrate how it works, let’s have a look to an example…

  • Create a range-partitioned table with a local index, load some data and gather object statistics.
SQL> CREATE TABLE t (
  2    id NUMBER,
  3    n NUMBER,
  4    d DATE,
  5    pad VARCHAR2(4000)
  6  )
  7  PARTITION BY RANGE (d) (
  8    PARTITION t_jan_2008 VALUES LESS THAN (to_date('2008-02-01','yyyy-mm-dd'))
  9  )
 10  NOLOGGING;

SQL> CREATE INDEX i ON t(n) LOCAL;

SQL> INSERT /*+ append */ INTO t
  2  WITH
  3    t1000 AS (SELECT rownum AS dummy
  4              FROM dual
  5              CONNECT BY level <=1000)
  6  SELECT 1000000+rownum,
  7         CASE WHEN rownum <= 500 THEN mod(rownum,11) ELSE nullif(mod(rownum,2),1) END,
  8         to_date('2008-01-01','yyyy-mm-dd')+mod(rownum,31),
  9         rpad('*',100,'*')
 10  FROM t1000, t1000
 11  WHERE rownum <= 500000;

500000 rows created.

SQL> COMMIT;

SQL> BEGIN
  2    dbms_stats.gather_table_stats(
  3      ownname => user,
  4      tabname => 't',
  5      estimate_percent => 100,
  6      granularity => 'global and partition',
  7      method_opt => 'for olumns size 1 id, d, pad, n size 254',
  8      cascade => TRUE
  9    );
 10  END;
 11  /
  • Add a new partition and load data into it.
SQL> ALTER TABLE t
  2  ADD PARTITION t_feb_2008 VALUES LESS THAN (to_date('2008-03-01','yyyy-mm-dd'));

SQL> INSERT /*+ append */ INTO t
  2  WITH
  3    t1000 AS (SELECT rownum AS dummy
  4              FROM dual
  5              CONNECT BY level <=1000)
  6  SELECT 2000000+rownum,
  7         CASE WHEN rownum <= 500 THEN mod(rownum,12) ELSE nullif(mod(rownum,2),1) END,
  8         to_date('2008-02-01','yyyy-mm-dd')+mod(rownum,29),
  9         rpad('*',200,'*')
 10  FROM t1000, t1000;

1000000 rows created.

SQL> COMMIT;
  • Gather object statistics (notice that the new value for the parameter GRANULARITY is used).
SQL> BEGIN
  2    dbms_stats.gather_table_stats(
  3      ownname => user,
  4      tabname => 't',
  5      partname => 't_feb_2008',
  6      estimate_percent => 100,
  7      granularity => 'approx_global and partition',
  8      method_opt => 'for columns size 1 id, d, pad, n size 254',
  9      cascade => TRUE
 10    );
 11  END;
 12  /
  • Compare the current object statistics with the previous ones.
SQL> SELECT *
  2  FROM table(dbms_stats.diff_table_stats_in_history(
  3               ownname => user,
  4               tabname => 't',
  5               time1 => localtimestamp,
  6               time2 => localtimestamp-to_dsinterval('0 00:00:15'),
  7               pctthreshold => 0
  8             ));

STATISTICS DIFFERENCE REPORT FOR:
.................................

TABLE         : T
OWNER         : OPS$CHA
SOURCE A      : Statistics as of 21-SEP-08 06.39.37.597595 PM +02:00
SOURCE B      : Statistics as of 21-SEP-08 06.39.22.597595 PM +02:00
PCTTHRESHOLD  : 0
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~

TABLE / (SUB)PARTITION STATISTICS DIFFERENCE:
.............................................

OBJECTNAME                  TYP SRC ROWS       BLOCKS     ROWLEN     SAMPSIZE
...............................................................................

T                           T   A   1500000    40091      183        500000
                                B   500000     8614       116        500000
T_FEB_2008                  P   A   1000000    31477      216        1000000
                                B   NO_STATS
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~

COLUMN STATISTICS DIFFERENCE:
.............................

COLUMN_NAME     SRC NDV     DENSITY    HIST NULLS   LEN  MIN   MAX   SAMPSIZ
...............................................................................

D               A   31      .032258064 NO   0       8    786C0 786C0 1500000
                B   31      .032258064 NO   0       8    786C0 786C0 500000
ID              A   500000  .000002    NO   0       6    C4020 C404  1500000
                B   500000  .000002    NO   0       6    C4020 C4023 500000
N               A   11      .000001998 YES  749500  2    80    C10C  750500
                B   11      .000001998 YES  249750  2    80    C10B  250250
PAD             A   1       1          NO   0       168  2A2A2 2A2A2 1500000
                B   1       1          NO   0       101  2A2A2 2A2A2 500000

                              PARTITION: T_FEB_2008
                              .....................

D               A   29      .034482758 NO   0       8    786C0 786C0 1000000
                B   NO_STATS
ID              A   1000000 .000001    NO   0       6    C4030 C404  1000000
                B   NO_STATS
N               A   12      .000000999 YES  499750  2    80    C10C  500250
                B   NO_STATS
PAD             A   1       1          NO   0       201  2A2A2 2A2A2 1000000
                B   NO_STATS
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~

INDEX / (SUB)PARTITION STATISTICS DIFFERENCE:
.............................................

OBJECTNAME      TYP SRC ROWS    LEAFBLK DISTKEY LF/KY DB/KY CLF     LVL SAMPSIZ
...............................................................................

                                    INDEX: I
                                    ........

I               I   A   750500  1221    12      101   3332  39991   2   750500
                    B   250250  407     11      37    778   8565    1   250250
T_FEB_2008      P   A   500250  814     12      67    2618  31426   2   500250
                    B   NO_STATS

A few remarks about the number of distinct keys…

As documented, the number of distinct values at table level has not been updated. I do understand that aggregating them is not possible. However, IMHO, it should be quite easy to do a kind of sanity check and to set the number of distinct values at the table level according to the highest number found at partition level. For example, in the previous case, the new partition contains 1M distinct values for the column ID. Hence, the number of distinct values for that column at the table level should be at least 1M.

Interestingly, the number of distinct values at the index level is correct. The reason is quite simple… they do gather them as usual (I checked that with SQL trace). In other words, the new value for the parameter GRANULARITY is only valid for tables. This is also the case when the procedure GATHER_INDEX_STATS is used. In other words, the new value for the parameter granularity is accepted but ignored!

What really puzzle me is why such a feature has been implemented in first place. In fact, as of Oracle Database 11g, with an incremental gathering is possible to have better statistics than with this feature.

Any suggestion/thought/opinion is welcome…

Posted in 11gR1, Partitioning

EvoLve theme by Theme4Press  •  Powered by WordPress Striving for Optimal Performance