Other Types of Statistics

In addition to basic table, column, and index statistics, the Optimizer uses additional information to determine the execution plan of a statement. This additional information can come in the form of dynamic sampling and system statistics.

Dynamic Sampling

Dynamic sampling was introduced in Oracle Database 9i Release 2 to collect additional statement-specific object statistics during the optimization of a SQL statement. The most common misconception is that dynamic sampling can be used as a substitute for Optimizer statistics. The goal of dynamic sampling is to augment the existing statistics; it is used when regular statistics are not sufficient to get good quality cardinality estimates. So, how and when will dynamic sampling be used? During the compilation of a SQL statement, the Optimizer decides whether to use dynamic sampling or not by considering whether the available statistics are sufficient to generate a good execution plan. If the available statistics are not enough, dynamic sampling will be used. It is typically used to compensate for missing or insufficient statistics that would otherwise lead to a very bad plan. For the case where one or more of the tables in the query does not have statistics, dynamic sampling is used by the Optimizer to gather basic statistics on these tables before optimizing the statement. The statistics gathered in this case are not as high a quality or as complete as the statistics gathered using the DBMS_STATS package. This trade off is made to limit the impact on the compile time of the statement. The second scenario where dynamic sampling is used is when the statement contains a complex predicate expression, and extended statistics are not available, or cannot be used. For example, if you had a query that has non-equality where clause predicates on two correlated columns, standard statistics would not be sufficient in this case, and extended statistics could not be used. In this simple query against the SALES table, the Optimizer assumes that each of the where clause predicates will reduce the number of rows returned by the query, and based on the standard statistics, determines the cardinality to be 20,197, when in fact, the number of rows returned is ten times higher at 210,420.

SELECT count(*)
FROM sh.Sales
WHERE cust_id < 2222
AND prod_id > 5;

Figure 27. Execution plan for complex predicates without dynamic sampling

With standard statistics the Optimizer is not aware of the correlation between the CUST_ID and PROD_ID in the SALES table. By setting OPTIMIZER_DYNAMIC_SAMPLING to level 6, the Optimizer will use dynamic sampling to gather additional information about the complex predicate expression. The additional information provided by dynamic sampling allows the Optimizer to generate a more accurate cardinality estimate, and therefore a better performing execution plan.

Figure 28. Execution plan for complex predicates with dynamic sampling level 6

As seen in this example, dynamic sampling is controlled by the parameter OPTIMIZER_DYNAMIC_SAMPLING, which can be set to different levels (0-10). These levels control two different things; when dynamic sampling kicks in, and how large a sample size will be used to gather the statistics. The greater the sample size, the bigger impact dynamic sampling has on the compilation time of a query.

From Oracle Database 11g Release 2 onwards, the Optimizer will automatically decide if dynamic sampling will be useful, and what dynamic sampling level will be used for SQL statements executed in parallel. This decision is based on the size of the tables in the statement, and the complexity of the predicates. However, if the OPTIMIZER_DYNAMIC_SAMPLING parameter is explicitly set to a non-default value, then that user-specified value will be honored. You can tell if dynamic sampling kicks in by looking at the ‘note’ section of the execution plan. For example, if the parallel execution was enabled for the SALES table, and the following query was issued, the Optimizer would automatically enable dynamic sampling level 4.

Figure 29. Execution plan for a SQL statement with complex predicates executed in parallel

For serial SQL statements, the dynamic sampling level will depend on the value of the OPTIMIZER_DYNAMIC_SAMPLING parameter, and will not be triggered automatically by the Optimizer. The reason for this is that serial statements are typically short running, and any overhead at compile time could have a huge impact on their performance. Whereas parallel statements are expected to be more resource intensive, so the additional overhead at compile time is worth it to ensure the best possible execution plan.

System statistics

In Oracle Database 9i, system statistics were introduced to enable the Optimizer to more accurately cost each operation in an execution plan by using information about the actual system hardware executing the statement, such as CPU speed and IO performance.

System statistics are enabled by default, and are automatically initialized with default values; these values do are representative for most system. When system statistics are gathered they will override these initial values. To gather system statistics you can use DBMS_STATS.GATHER_SYSTEM_STATS during a representative workload time window, ideally at peak workload times.

System statistics need to be gathered only once. System statistics are not automatically collected as part of the automatic statistics gathering job. You must have GATHER_SYSTEM_STATISTICS or the DBA role to update system statistics.

Statistics on Dictionary Tables

Since the Cost Based Optimizer is now the only supported optimizer, all tables in the database need to have statistics, including all of the dictionary tables (tables owned by ‘SYS’,SYSTEM, etc, and residing in the system and SYSAUX tablespace). Statistics on the dictionary tables are maintained via the automatic statistics gathering job run during the nightly maintenance window. If you choose to switch off the automatic statistics gathering job for your main application schema consider leaving it on for the dictionary tables. You can do this by changing the value of AUTOSTATS_TARGET to ORACLE instead of AUTO using the procedure DBMS_STATS.SET_GLOBAL_PREFS.

BEGIN
DBMS_STATS.SET_GLOBAL_PREFS('AUTOSTATS_TARGET','ORACLE');
END;
/

Statistics can be manually gathered on the dictionary tables using the DBMS_STATS.GATHER_DICTIONARY_STATS procedure. You must have both the ANALYZE ANY DICTIONARY, and ANALYZE ANY system privilege, or the DBA role to update dictionary statistics. It is recommended that dictionary table statistics are maintained on a regular basis in a similar manner to user schemas.

Statistics on Fixed Objects

You will also need to gather statistics on dynamic performance tables and their indexes (fixed objects). These are the X$ tables on which the V$ views (V$SQL etc.) are built. Since V$ views can appear in SQL statements like any other user table or views, it is important to gather optimizer statistics on these tables to help the optimizer generate good execution plans. However, unlike other database tables, dynamic sampling is not automatically use for SQL statement involving X$ tables when optimizer statistics are missing. The Optimizer uses predefined default values for the statistics if they are missing. These defaults may not be representative and could potentially lead to a suboptimal execution plan, which could cause severe performance problems in your system. It is for this reason that we strong recommend you gather fixed objects statistics.

Fixed object statistics are not gathered or maintained by the automatic statistics gathering job. You can collect statistics on fixed objects using DBMS_STATS.GATHER_FIXED_OBJECTS_STATS procedure.

BEGIN
DBMS_STATS.GATHER_FIXED_OBJECTS_STATS;
END;
/

The DBMS_STATS.GATHER_FIXED_OBJECTS_STATS procedure gathers the same statistics as DBMS_STATS.GATHER_TABLE_STATS except for the number of blocks. Blocks is always set to 0 since the x$ tables are in memory structures only and are not stored on disk. Because of the transient

nature of the x$ tables it is import that you gather fixed object statistics when there is a representative workload on the system. You must have the ANALYZE ANY DICTIONARY system privilege or the DBA role to update fixed object statistics. It is recommend that you re-gather fixed object statistics if you do a major database or application upgrade.

<<What are Optimizer Statistics