Gathering Statistics

For database objects that are constantly changing, statistics must be regularly gathered so that they accurately describe the database object. The PL/SQL package, DBMS_STATS, is Oracle’s preferred method for gathering statistics, and replaces the now obsolete ANALYZE2 command for collecting statistics. The DBMS_STATS package contains over 50 different procedures for gathering and managing statistics but most important of these procedures are the GATHER_*_STATS procedures. These procedures can be used to gather table, column, and index statistics. You will need to be the owner of the object or have the ANALYZE ANY system privilege or the DBA role to run these procedures. The parameters used by these procedures are nearly identical, so this paper will focus on the GATHER_TABLE_STATS procedure.

GATHER_TABLE_STATS

The DBMS_STATS.GATHER_TABLE_STATS procedure allows you to gather table, partition, index, and column statistics. Although it takes 15 different parameters, only the first two or three parameters need to be specified to run the procedure, and are sufficient for most customers;

  • The name of the schema containing the table
  • The name of the table
  • A specific partition name if it’s a partitioned table and you only want to collect statistics for a specific partition (optional)
Gathering Statistics

Figure 14. Using the DBMS_STATS.GATHER_TABLE_STATS procedure

The remaining parameters can be left at their default values in most cases. Out of the remaining 12 parameters, the following are often changed from their default and warrant some explanation here.

ESTIMATE_PERCENT parameter

The ESTIMATE_PERCENT parameter determines the percentage of rows used to calculate the statistics. The most accurate statistics are gathered when all rows in the table are processed (i.e., 100% sample), often referred to as computed statistics. Oracle Database 11g introduced a new sampling algorithm that is hash based and provides deterministic statistics. This new approach has the accuracy close to a

100% sample but with the cost of, at most, a 10% sample. The new algorithm is used when ESTIMATE_PERCENT is set to AUTO_SAMPLE_SIZE (the default) in any of the DBMS_STATS.GATHER_*_STATS procedures. Historically, customers have set the ESTIMATE_PRECENT parameter to a low value to ensure that the statistics will be gathered quickly. However, without detailed testing, it is difficult to know which sample size to use to get accurate statistics. It is highly recommended that from Oracle Database 11g onward you let ESTIMATE_PRECENT default (i.e., not set explicitly).

METHOD_OPT parameter

The METHOD_OPT parameter controls the creation of histograms during statistics collection. Histograms are a special type of column statistic created when the data in a table column has a non-uniform distribution, as discussed in the previous section of this paper. With the default value of FOR ALL COLUMNS SIZE AUTO, Oracle automatically determines which columns require histograms and the number of buckets that will be used based on the column usage information (DBMS_STATS.REPORT_COL_USAGE) and the number of distinct values in the column. The column usage information reflects an analysis of all the SQL operations the database has processed for a given object. Column usage tracking is enabled by default.

A column is a candidate for a histogram if it has been seen in a where clause predicate, e.g., an equality, range, LIKE, etc. Oracle also verifies if the column data is skewed before creating a histogram, for example a unique column will not have a histogram created on it if it is only seen in equality predicates. It is strongly recommended you let the METHOD_OPT parameter default in the GATHER_*_STATS procedures.

DEGREE parameter

The DEGREE parameter controls the number of parallel server processes that will be used to gather the statistics. By default Oracle uses the same number of parallel server processes specified as an attribute of the table in the data dictionary (Degree of Parallelism). By default, all tables in an Oracle database have this attribute set to 1, so it may be useful to set this parameter if statistics are being gathered on a large table to speed up statistics collection. By setting the parameter DEGREE to AUTO_DEGREE, Oracle will automatically determine the appropriate number of parallel server processes that should be used to gather statistics, based on the size of an object. The value can be between 1 (serial execution) for small objects to DEFAULT_DEGREE (PARALLEL_THREADS_PER_CPU X CPU_COUNT) for larger objects.

GRANULARITY parameter

The GRANULARITY parameter dictates the levels at which statistics are gathered on a partitioned table. The possible levels are table (global), partition, or sub-partition. By default Oracle will determine which levels are necessary based on the table’s partitioning strategy. Statistics are always gathered on the first level of partitioning regardless of the partitioning type used. Sub-partition statistics are gathered when the subpartitioning type is LIST or RANGE. This parameter is ignored if the table is not partitioned.

CASCADE parameter

The CASCADE parameter determines whether or not statistics are gathered for the indexes on a table. By default, AUTO_CASCADE, Oracle will only re-gather statistics for indexes whose table statistics are stale. Cascade is often set to false when a large direct path data load is done and the indexes are disabled. After the load has been completed, the indexes are rebuilt and statistics will be automatically created for them, negating the need to gather index statistics when the table statistics are gathered.

NO_INVALIDATE parameter

The NO_INVALIDATE parameter determines if dependent cursors (cursors that access the table whose statistics are being re-gathered) will be invalidated immediately after statistics are gathered or not. With the default setting of DBMS_STATS.AUTO_INVALIDATE, cursors (statements that have already been parsed) will not be invalidated immediately. They will continue to use the plan built using the previous statistics until Oracle decides to invalidate the dependent cursors based on internal heuristics. The invalidations will happen gradually over time to ensure there is no performance impact on the shared pool or spike in CPU usage as there could be if you have a large number of dependent cursors and all of them were hard parsed at once.

Changing the default value for the parameters in DBMS_STATS.GATHER_*_STATS

You can specify a particular non-default parameter value for an individual DBMS_STATS.GATHER_*_STATS command, or override the default value for your database. You can override the default parameter values for DBMS_STATS.GATHER_*_STATS procedures using the DBMS_STATS.SET_*_PREFS procedures. The list of parameters that can be changed are as follows:

AUTOSTATS_TARGET (SET_GLOBAL_PREFS only as it relates to the auto stats job)
CONCURRENT (SET_GLOBAL_PREFS only)
CASCADE 
DEGREE 
ESTIMATE_PERCENT 
METHOD_OPT 
NO_INVALIDATE 
GRANULARITY 
PUBLISH 
INCREMENTAL 
STALE_PERCENT

You can override the default settings for each parameter at a table, schema, database, or global level using one of the following DBMS_STATS.SET_*_PREFS procedures, with the exception of AUTOSTATS_TARGET and CONCURRENT which can only be modified at the global level.

SET_TABLE_PREFS
SET_SCHEMA_PREFS
SET_DATABASE_PREFS
SET_GLOBAL_PREFS

The SET_TABLE_PREFS procedure allows you to change the default values of the parameters used by the DBMS_STATS.GATHER_*_STATS procedures for the specified table only.

The SET_SCHEMA_PREFS procedure allows you to change the default values of the parameters used by the DBMS_STATS.GATHER_*_STATS procedures for all of the existing tables in the specified schema. This procedure actually calls the SET_TABLE_PREFS procedure for each of the tables in the specified schema. Since it uses SET_TABLE_PREFS, calling this procedure will not affect any new objects created after it has been run. New objects will pick up the GLOBAL preference values for all parameters. The SET_DATABASE_PREFS procedure allows you to change the default values of the parameters used by the DBMS_STATS.GATHER_*_STATS procedures for all of the user-defined schemas in the database. This procedure actually calls the SET_TABLE_PREFS procedure for each table in each user-defined schema. Since it uses SET_TABLE_PREFS this procedure will not affect any new objects created after it has been run. New objects will pick up the GLOBAL preference values for all parameters. It is also possible to include the Oracle owned schemas (sys, system, etc) by setting the ADD_SYS parameter to TRUE. The SET_GLOBAL_PREFS procedure allows you to change the default values of the parameters used by the DBMS_STATS.GATHER_*_STATS procedures for any object in the database that does not have an existing table preference. All parameters default to the global setting unless there is a table preference set, or the parameter is explicitly set in the GATHER_*_STATS command. Changes made by this procedure will affect any new objects created after it has been run. New objects will pick up the GLOBAL_PREFS values for all parameters.

With SET_GLOBAL_PREFS it is also possible to set a default value for two additional parameters, AUTOSTAT_TARGET and CONCURRENT. AUTOSTAT_TARGET controls what objects the automatic statistic gathering job (that runs in the nightly maintenance window) will look after. The possible values for this parameter are ALL, ORACLE, and AUTO. The default value is AUTO. A more in-depth discussion about the automatic statistics collection can be found in the statistics management section of this paper.

The CONCURRENT parameter controls whether or not statistics will be gathered on multiple tables in a schema (or database), and multiple (sub)partitions within a table concurrently. It is a Boolean parameter, and is set to FALSE by default. The value of the CONCURRENT parameter does not impact the automatic statistics gathering job, which always does one object at a time. A more in-depth discussion about concurrent statistics gathering can be found in the Improving the efficiency of Gathering Statistics section of this paper. The DBMS_STATS.GATHER_*_STATS procedures and the automatic statistics gathering job obeys the following hierarchy for parameter values; parameter values explicitly set in the command overrule everything else. If the parameter has not been set in the command, we check for a table level preference. If there is no table preference set, we use the GLOBAL preference.

Gathering Statistics

Figure 15. DBMS_STATS.GATHER_*_STATS hierarchy for parameter values

If you are unsure of what preferences have been set, you can use the DBMS_STATS.GET_PREFS function to check. The function takes three arguments; the name of the parameter, the schema name, and the table name. In the example below (figure 16), we first check the value of STALE_PRECENT on the SH.SALES table. Then we set a table level preference, and check that it took affect using DBMS_STATS.GET_PREFS.

Gathering Statistics

Figure 16. Using DBMS_STATS.SET_PREFS procedure to change the parameter stale_percent for the sales table

Automatic Statistics Gathering Job

Oracle will automatically collect statistics for all database objects, which are missing statistics or have stale statistics by running an Oracle AutoTask task during a predefined maintenance window (10pm to 2am weekdays and 6am to 2am at the weekends).

This AutoTask gathers Optimizer statistics by calling the internal procedure DBMS_STATS.GATHER_DATABASE_STATS_JOB_PROC. This procedure operates in a very similar

fashion to the DBMS_STATS.GATHER_DATABASE_STATS procedure using the GATHER AUTO option. The primary difference is that Oracle internally prioritizes the database objects that require statistics, so that those objects, which most need updated statistics, are processed first. You can verify that the automatic statistics gathering job exists by querying the DBA_AUTOTASK_CLIENT_JOB view or through Enterprise Manager (Figure 17). You can also change the maintenance window that the job will run in through Enterprise Manager.

Gathering Statistics

Figure 17. Checking that the automatic statistics gathering job is enabled

Statistics on a table are considered stale when more than STALE_PERCENT (default 10%) of the rows are changed (total number of inserts, deletes, updates) in the table. Oracle monitors the DML activity for all tables and records it in the SGA. The monitoring information is periodically flushed to disk, and is exposed in the *_TAB_MODIFICATIONS view.

Gathering Statistics

Figure 18. Querying USER_TAB_MODIFICATIONS view to check DML activity on the PRODUCTS2 table

It is possible to manually flush this data by calling the procedure DBMS_STATS.FLUSH_DATABASE_MONITORING_INFO if you want to get up-to-date information at query time (internally the monitoring data is flushed before all statistics collection operations). You can then see which tables have stale statistics by querying the STALE_STATS column in the USER_TAB_STATISTICS view.

Gathering Statistics

Figure 19. Querying USER_TAB_STATISTICS to see if any tables have stale statistics

Tables where STALE_STATS is set to NO, have up to date statistics. Tables where STALE_STATS is set to YES, have stale statistics. Tables where STALE_STATS is not set are missing statistics altogether.

If you already have a well-established statistics gathering procedure or if for some other reason you want to disable automatic statistics gathering for your main application schema, consider leaving it on for the dictionary tables. You can do this by changing the value of AUTOSTATS_TARGET to ORACLE instead of AUTO using DBMS_STATS.SET_GLOBAL_PREFS procedure.

BEGIN
DBMS_STATS.SET_GLOBAL_PREFS(‘AUTOSTATS_TARGET’,’ ORACLE’);
END;
/

To disable the task altogether:

BEGIN
DBMS_AUTO_TASK_ADMIN.DISABLE(
client_name => 'auto optimizer stats collection',
operation => NULL,
window_name => NULL);

<<What are Optimizer Statistics

Improving the efficiency of Gathering Statistics >>