Managing statistics

In addition to collect appropriate statistics, it is equally important to provide a comprehensive framework for managing them. Oracle offers a number of methods to do this including the ability to restore statistics to a previous version, the option to transfer statistic from one system to another, or even manually setting the statistics values yourself. These options are extremely useful in specific cases, but are not recommended to replace standard statistics gathering methods using the DBMS_STATS package.

Restoring Statistics

From Oracle Database 10g onwards, when you gather statistics using DBMS_STATS, the original statistics are automatically kept as a backup in dictionary tables, and can be easily restored by running DBMS_STATS.RESTORE_TABLE_STATS if the newly gathered statistics lead to any kind of problem. The dictionary view DBA_TAB_STATS_HISTORY contains a list of timestamps when statistics were saved for each table.

The example below restores the statistics for the table SALES to what they were yesterday, and automatically invalidates all of the cursors referencing the SALES table in the SHARED_POOL. We want to invalidate all of the cursors; because we are restoring yesterday’s statistics and want them to impact any cursor instantaneously. The value of the NO_INVALIDATE parameter determines if the cursors referencing the table will be invalidated or not.

BEGIN
DBMS_STATS.RESTORE_TABLE_STATS(ownname => ‘SH’,
tabname => ‘SALES’,
as_of_timestamp => SYSTIMESTAMP-1
force => FALSE,
no_invalidate => FALSE);
END;
/

Pending Statistics

By default when statistics are gathered, they are published (written) immediately to the appropriate dictionary tables and begin to be used by the Optimizer. In Oracle Database 11g, it is possible to gather Optimizer statistics but not have them published immediately; and instead store them in an unpublished, ‘pending’ state. Instead of going into the usual dictionary tables, the statistics are stored in pending tables so that they can be tested before they are published. These pending statistics can be enabled for individual sessions, in a controlled fashion, which allows you to validate the statistics before they are published. To activate pending statistics collection, you need to use one of the DBMS_STATS.SET_*_PREFS procedures to change value of the parameter PUBLISH from TRUE (default) to FALSE for the object(s) you wish to create pending statistics for.

BEGIN
DBMS_STATS.SET_TABLE_PREFS(‘SH’,’SALES’,‘PUBLISH’,’FALSE’);
END;
/
Gather statistics on the object(s) as normal.
BEGIN
DBMS_STATS.GATHER_TABLE_STATS(‘SH’,’SALES’);
END;
/

The statistics gathered for these objects can be displayed using the dictionary views called USER_*_PENDING_STATS. You can tell the Optimizer to use pending statistics by issuing an alter session command to set the initialization parameter OPTIMIZER_USE_PENDING_STATS to TRUE and running a SQL workload. For tables accessed in the workload that do not have pending statistics the Optimizer will use the current statistics in the standard data dictionary tables. Once you have validated the pending statistics, you can publish them using the procedure DBMS_STATS.PUBLISH_PENDING_STATS.

BEGIN
DBMS_STATS.PUBLISH_PENDING_STATS(‘SH’,’SALES’);
END;
/

Exporting / Importing Statistics

One of the most important aspects of rolling out a new application or a new part of an existing application is testing it at scale. Ideally, you want the test system to be identical to production in terms of hardware and data size. This is not always possible, most commonly due to the size of the production environments. By copying the Optimizer statistics from a production database to any other system running the same Oracle version, e.g., a scaled-down test database, you can emulate the Optimizer behavior of a production environment. The production statistics can be copied to the test database using the DBMS_STATS.EXPORT_*_STATS and DBMS_STATS.IMPORT_*_STATS procedures.

Before exporting statistics, you need to create a table to store the statistics using DBMS_STATS.CREATE_STAT_TABLE. After the table has been created, you can export statistics from the data dictionary using the DBMS_STATS.EXPORT_*_STATS procedures. Once the statistics have been packed into the statistics table, you can then use datadump to extract the statistics table from the production database, and import it into the test database. Once the statistics table is successfully imported into the test system, you can import the statistics into the data dictionary using the DBMS_STATS.IMPORT_*_STATS procedures. The following example creates a statistics table called TAB1 and exports the statistics from the SH schema into the MYSTATS statistics table.

Figure 23. Exporting the Optimizer statistics for the SH schema

Copying Partition Statistics

When dealing with partitioned tables the Optimizer relies on both the statistics for the entire table (global statistics) as well as the statistics for the individual partitions (partition statistics) to select a good execution plan for a SQL statement. If the query needs to access only a single partition, the Optimizer uses only the statistics of the accessed partition. If the query access more than one partition, it uses a combination of global and partition statistics.

It is very common with range partitioned tables to have a new partition added to an existing table, and rows inserted into just that partition. If end-users start to query the newly inserted data before statistics have been gathered, it is possible to get a suboptimal execution plan due to stale statistics. One of the most common cases occurs when the value supplied in a where clause predicate is outside the domain of values represented by the [minimum, maximum] column statistics. This is known as an ‘out-of-range’ error. In this case, the Optimizer prorates the selectivity based on the distance between the predicate value, and the maximum value (assuming the value is higher than the max), that is, the farther the value is from the maximum or minimum value, the lower the selectivity will be.

The "Out of Range" condition can be prevented by using the DBMS_STATS.COPY_TABLE_STATS procedure (available from Oracle Database 10.2.0.4 onwards). This procedure copies the statistics of a representative source [sub] partition to the newly created and empty destination [sub] partition. It also copies the statistics of the dependent objects: columns, local (partitioned) indexes, etc. The minimum and maximum values of the partitioning column are adjusted as follows;

  • If the partitioning type is HASH the minimum and maximum values of the destination partition are same as that of the source partition.
  • If the partitioning type is LIST and the destination partition is a NOT DEFAULT partition then the minimum value of the destination partition is set to the minimum value of the value list that describes the destination partition. The maximum value of the destination partition is set to the maximum value of the value list that describes the destination partition
  • If the partitioning type is LIST and the destination partition is a DEFAULT partition, then the minimum value of the destination partition is set to the minimum value of the source partition. The maximum value of the destination partition is set to the maximum value of the source partition
  • If the partitioning type is RANGE then the minimum value of the destination partition is set to the high bound of previous partition and the maximum value of the destination partition is set to the high bound of the destination partition unless the high bound of the destination partition is MAXVALUE, in which case the maximum value of the destination partition is set to the high bound of the previous partition

It can also scale the statistics (such as the number of blocks, or number of rows) based on the given scale_factor. The following command copies the statistics from SALES_Q3_2011 range partition to the SALES_Q4_2011 partition of the SALES table and scales the basic statistics by a factor of 2.

BEGIN
DBMS_STATS.COPY_TABLE_STATS('SH','SALES','SALES_Q3_2002','SALES_Q4_2002', 2);
END;
/

Index statistics are only copied if the index partition names are the same as the table partition names (this is the default). Global or table level statistics are not updated by default. The only time global level statistics would be impacted by the DBMS_STATS.COPY_TABLE_STATS procedure would be if no statistics existed at the global level and global statistics were being generated via aggregation.

Comparing Statistics

One of the key reasons an execution plan can differ from one system to another is because the Optimizer statistics on each system are different, for example when data on a test system is not 100% in sync with real production system. To identify differences in statistics, the DBMS_STATS.DIFF_TABLE_STATS_* functions can be used to compare statistics for a table from two different sources. The statistic sources can be:

  • A user statistics table and the current statistics in the data dictionary
  • A single user statistics table containing two sets of statistics that can be identified using statids
  • Two different user statistics tables
  • Two points in history
  • Current statistics and a point in history
  • Pending Statistics with the current statistics in the dictionary
  • Pending Statistics with a user statistics table

The function also compares the statistics of the dependent objects (indexes, columns, partitions), and displays all the statistics for the object(s) from both sources if the difference between the statistics exceeds a specified threshold. The threshold can be specified as an argument to the function; the default value is 10%. The statistics corresponding to the first source will be used as the basis for computing the differential percentage.

In the example below, we compare the current dictionary statistics for the EMP table with the statistics for EMP in the statistics table TAB1; the SQL statement will generate a report as shown in Figure 24.

SELECT report, maxdiffpct
FROM table(DBMS_STATS.DIFF_TABLE_STATS_IN_STATTAB(‘SCOTT’,’EMP’,’TAB1’ ));

Figure 24. Report output after comparing the statistics for table SCOTT.EMP in the statistics table TAB1 and the current statistics in the dictionary.

Locking Statistics

In some cases, you may want to prevent any new statistics from being gathered on a table or schema by locking the statistics. Once statistics are locked, no modifications can be made to those statistics until the statistics have been unlocked or unless the FORCE parameter of the GATHER_*_STATS procedures has been set to TRUE.

Figure 25 Locking and unlocking table statistics

In Oracle Database 11g the DBMS_STATS package was expanded to allow statistics to be locked and unlocked at the partition level. These additional procedures allow for a finer granularity of control.

BEGIN
DBMS_STATS.LOCK_PARTITION_STATS(‘SH’,’SALES’, 'SALES_Q3_2000');
END;

You should note there is a hierarchy with locked statistics. For example, if you lock the statistic on a partitioned table, and then unlocked statistics on just one partition in order to re-gather statistics on that one partition it will fail with an error ORA-20005. The error occurs because the table level lock will still be honored even though the partition has been unlocked. The statistics gather for the partition will only be successfully if the FORCE parameter is set to TRUE.

Figure 26. Hierarchy with locked statistics; table level lock trumps partition level unlock

Manually setting Statistics

Under rare circumstances it may be beneficial to manually set the Optimizer statistics in the data dictionary. One such example could be a highly volatile global temporary table (note that while manually setting statistics is discussed in this paper, it is not generally recommended, because inaccurate or inconsistent statistics can lead to poor

<<Improving the efficiency of Gathering Statistics

Other Types of Statistics >>