Improving the efficiency of Gathering Statistics

Once you define the statistics you are interested in, you want to ensure to collect these statistics in a timely manner. Traditionally people have sped up statistics gathering by using parallel execution as discussed above. However, what if all of the objects a schema were small and didn’t warrant parallel execution, how could you speed up gathering statistics on that schema?

Concurrent Statistic gathering

In Oracle Database 11g Release 2 (11.2.0.2), a concurrent statistics gathering mode was introduced to gather statistics on multiple tables in a schema (or database), and multiple (sub)partitions within a table concurrently. Gathering statistics on multiple tables and (sub)partitions concurrently can reduce the overall time it takes to gather statistics by allowing Oracle to fully utilize a multi-processor environment.

Concurrent statistics gathering is controlled by the global preference, CONCURRENT, which is set to either TRUE or FALSE. By default it is set to FALSE. When CONCURRENT is set to TRUE, Oracle employs Oracle Job Scheduler and Advanced Queuing components to create and manage multiple statistics gathering jobs concurrently.

Calling DBMS_STATS.GATHER_TABLE_STATS on a partitioned table when CONCURRENT is set to TRUE, causes Oracle to create a separate statistics gathering job for each (sub)partition in the table. How many of these jobs will execute concurrently, and how many will be queued is based on the number of available job queue processes (JOB_QUEUE_PROCESSES initialization parameter, per node on a RAC environment) and the available system resources. As the currently running jobs complete, more jobs will be dequeued and executed until all of the (sub)partitions have had their statistics gathered.

If you gather statistics using DBMS_STATS.GATHER_DATABASE_STATS, DBMS_STATS.GATHER_SCHEMA_STATS, or DBMS_STATS.GATHER_DICTIONARY_STATS, then Oracle will create a separate statistics gathering job for each non-partitioned table, and each (sub)partition for the partitioned tables. Each partitioned table will also have a coordinator job that manages its (sub)partition jobs. The database will then run as many concurrent jobs as possible, and queue the remaining jobs until the executing jobs complete. However, to prevent possible deadlock scenarios multiple partitioned tables cannot be processed simultaneously. Hence, if there are some jobs running for a partitioned table, other partitioned tables in a schema (or database or dictionary) will be queued until the current one completes. There is no such restriction for non-partitioned tables.

The following figure illustrates the creation of jobs at different levels, when a DBMS_STATS.GATHER_SCHEMA_STATS command has been issued on the SH schema. Oracle will create a statistics gathering job (Level 1 in Figure 20) for each of the non-partitioned tables;

CHANNELS,
COUNTRIES,
CUSTOMERS,
PRODUCTS,
PROMOTIONS,
TIMES

And, a coordinator job for each partitioned table, i.e., SALES and COSTS, it in turn creates a statistics gathering job for each of partition in SALES and COSTS tables, respectively (Level 2 in Figure 20).

Gathering Statistics

Figure 20. List of the statistics gathering job created when Concurrent Statistics Gathering occurs on the SH schema

Let’s assume that the parameter JOB_QUEUE_PROCESSES is set to 32, the Oracle Job Scheduler would allow 32 statistics gathering jobs to start, and would queue the rest (assuming that there are sufficient system resources for 32 jobs). Suppose that the first 29 jobs (one for each partition plus the coordinator job) for the COSTS table get started, then three non-partitioned table statistics gathering jobs would also be started. The statistics gathering jobs for the SALES table will be automatically queued, because only one partitioned table is processed at any one time. As each job finishes, another job will be dequeued and started, until all 64 jobs (6 level 1 jobs and 58 level 2 jobs) have been completed. Each of the individual statistics gathering job can also take advantage of parallel execution as describes above under the parameter DEGREE.

Configuration and Settings

In Oracle Database 11.2.0.2, the concurrency setting for statistics gathering is turned off by default. It can be turned on using the following command.

BEGIN
DBMS_STATS.SET_GLOBAL_PREFS('CONCURRENT','TRUE');
END;
/

You will also need some additional privileges above and beyond the regular privileges required to gather statistics. The user must have the following Job Scheduler and AQ privileges:

CREATE JOB
MANAGE SCHEDULER
MANAGE ANY QUEUE

The SYSAUX tablespace should be online, as the Job Scheduler stores its internal tables and views in SYSAUX tablespace. Finally the JOB_QUEUE_PROCESSES parameter should be set to fully utilize all of the system resources available (or allocated) for the statistics gathering process. If you don't plan to use parallel execution you should set the JOB_QUEUE_PROCESSES to 2 X total number of CPU cores (this is a per node parameter in a RAC environment). Please make sure that you set this parameter system-wise (ALTER SYSTEM ... or in init.ora file) rather than at the session level (ALTER SESSION).

If you are going to use parallel execution as part of concurrent statistics gathering you should disable the PARALLEL_ADAPTIVE_MULTI_USER initialization parameter. That is; ALTER SYSTEM SET parallel_adaptive_multi_user=false; It is also recommended that you enable parallel statement queuing. This requires Resource Manager to be activated (if not already), and the creation of a temporary resource plan where the consumer group "OTHER_GROUPS" should have queuing enabled. By default, Resource Manager is activated only during the maintenance windows. The following script illustrates one way of creating a temporary resource plan (pqq_test), and enabling the Resource Manager with this plan.

Gathering Statistics

Figure 21. Steps required to setup Resource Manager and parallel statement queuing for concurrent statistics gathering executed in parallel

You should note that the automatic statistics gathering job does not currently take advantage of concurrency. Setting CONCURRENT to TRUE will have no impact on the automatic statistics gathering job.

Gathering Statistics on Partitioned tables

Gathering statistics on partitioned tables consists of gathering statistics at both the table level and partition level. Prior to Oracle Database 11g, adding a new partition or modifying data in a few partitions required scanning the entire table to refresh table-level statistics. If you skipped gathering the global level statistics, the Optimizer would extrapolate the global level statistics based on the existing

partition level statistics. This approach is accurate for simple table statistics such as number of rows – by aggregating the individual rowcount of all partitions - but other statistics cannot be determined accurately: for example, it is not possible to accurately determine the number of distinct values for a column (one of the most critical statistics used by the Optimizer) based on the individual statistics of all partitions.

Oracle Database 11g enhances the statistics collection for partitioned tables with the introduction of incremental global statistics. If the INCREMENTAL preference for a partitioned table is set to TRUE, the DBMS_STATS.GATHER_*_STATS parameter GRANULARITY includes GLOBAL, and ESTIMATE_PERCENT is set to AUTO_SAMPLE_SIZE, Oracle will gather statistics on the new partition, and accurately update all global level statistics by scanning only those partitions that have been added or modified, and not the entire table.

Incremental global statistics works by storing a synopsis for each partition in the table. A synopsis is statistical metadata for that partition and the columns in the partition. Each synopsis is stored in the SYSAUX tablespace. Global statistics are then generated by aggregating the partition level statistics and the synopses from each partition, thus eliminating the need to scan the entire table to gather table level statistics (see Figure 22). When a new partition is added to the table, you only need to gather statistics for the new partition. The global statistics will be automatically and accurately updated using the new partition synopsis and the existing partitions’ synopses.

Gathering Statistics

Figure 22. Incremental Statistics gathering on a range partitioned table

Below are the steps necessary to use incremental global statistics.

Begin by switching on incremental statistics at either the table or the global level.

BEGIN
DBMS_STATS.SET_TABLE_PREFS(‘SH’,’SALES’,‘INCREMENTAL’,’TRUE’);
END;
/

Gather statistics on the object(s) as normal, letting the ESTIMATE_PERCENT and GRANULARITY parameters default.

BEGIN
DBMS_STATS.GATHER_TABLE_STATS(‘SH’,’SALES’);
END;
/

To check the current setting of INCREMENTAL for a given table, use DBMS_STATS.GET_PREFS.

SELECT DBMS_STATS.GET_PREFS(‘INCREMENTAL’,‘SH’,’SALES’)
FROM dual;
     

<<Gathering Statistics

Managing statistics >>