Subprogram inlining in 11g

The release of Oracle 10g brought with it the first optimising compiler for PL/SQL. As discussed in this oracle-developer.net article, Oracle added two levels of optimisation to provide some impressive performance gains without changing a line of code.

Oracle 11g has taken compiler optimisation further, in particular with the concept of subprogram inlining. With subprogram inlining, Oracle will replace a call to a subroutine (such as a function) with the subroutine code itself during compilation. One of the benefits of this is that we can continue to write well-structured, modular code without any performance penalties. For SQL-intensive PL/SQL programs, the gains from inlining might be marginal, but for procedural code, inlining might provide some greater optimisation benefits, as we will see in this article.

a simple example

We will begin with an extremely simple example of subprogram inlining. In the following example, we request that Oracle inlines function F. We do this with the new PRAGMA INLINE syntax as follows.

SQL> DECLARE
  2
  3     n PLS_INTEGER;
  4
  5     FUNCTION f RETURN PLS_INTEGER IS
  6     BEGIN
  7        RETURN 10;
  8     END f;
  9
 10  BEGIN
 11
 12     PRAGMA INLINE(f, 'YES');
 13     n := f();
 14     DBMS_OUTPUT.PUT_LINE(n);
 15
 16  END;
 17  /
10

PL/SQL procedure successfully completed.

The PRAGMA INLINE syntax for a subprogram can take either a 'YES' or 'NO' value and precedes the first call to the subprogram. It will be effective for subsequent calls to the same subprogram within a statement (unless a subsequent 'NO' pragma overrides it). In the PL/SQL Language Reference, Oracle states:

When the INLINE pragma immediately precedes one of the following statements, the pragma affects every call to the specified subprogram in that statement:

  • Assignment
  • Call
  • Conditional
  • CASE
  • CONTINUE-WHEN
  • EXECUTE IMMEDIATE
  • EXIT-WHEN
  • LOOP
  • RETURN

There is a wide range of permutations for the behaviour of the PRAGMA INLINE directive which can be found in the documentation (a link is provided at the end of this article).

To reiterate, with subprogram inlining, Oracle will re-write our code to replace a call or calls to a subprogram with the executable code of the subprogram itself. We can imagine that our example above would be re-ordered during compilation to resemble the following pseudo-code (or rather, what would be compiled if we wrote the following "manually-inlined" code).

SQL> DECLARE
  2     n PLS_INTEGER;
  3  BEGIN
  4     n := 10;                 −−<−−this might be moved to line 2
  5     DBMS_OUTPUT.PUT_LINE(n); −−<−−this might include the constant 10
  6  END;
  7  ⁄
  

In the "optimised" version of our original program, we can see several further opportunities for optimisation and it is possible that Oracle will remove the variable "n" entirely. Of course, Oracle doesn't re-order or re-write the PL/SQL itself, just the compiled representation of it (either C or "p-code" depending on our compilation method). The PL/SQL we issue remains unchanged in the USER/ALL/DBA_SOURCE views.

investigating subprogram inlining

So far, we have seen a single example of the PRAGMA INLINE syntax and have stated that the PL/SQL compiler will re-write our code during compilation to include the subprogram's logic inline. But how can we investigate this without access to the re-written (and compiled) PL/SQL? Fortunately, there is a new application available in 11g named the PL/SQL Hierarchical Profiler. This new profiler (invoked via a new built-in package, DBMS_HPROF) records and reports the execution statistics for SQL and PL/SQL organised by subprograms (i.e. a hierarchy of function and procedure calls).

For the remainder of this article, we will use the Hierarchical Profiler to investigate when our subprograms are "optimised out" by the new PL/SQL compiler. We will begin by repeating our original example and demonstrating that the F function was inlined. To do this, we must include a couple of calls to DBMS_HPROF (to start and stop the trace collection) as follows.

SQL> DECLARE
  2
  3     n PLS_INTEGER;
  4
  5     FUNCTION f RETURN PLS_INTEGER IS
  6     BEGIN
  7        RETURN 10;
  8     END f;
  9
 10  BEGIN
 11
 12     DBMS_HPROF.START_PROFILING( location => 'LOG_DIR',
 13                                 filename => 'inline.trc' );
 14
 15     PRAGMA INLINE(f, 'YES');
 16     n := f();
 17     DBMS_OUTPUT.PUT_LINE(n);
 18
 19     DBMS_HPROF.STOP_PROFILING;
 20
 21  END;
 22  /
10

PL/SQL procedure successfully completed.

We start the trace by specifying a directory and trace file to write to (LOG_DIR is a directory created for a previous article). We stop profiling when the code we wish to trace is complete. The trace file itself is reasonably legible, but we can analyse it using the DBMS_HPROF.ANALYZE API. This will write the profiler data to a set of tables (to create these run ?/rdbms/admin/dbmshptab.sql in the schema that owns the code to be profiled). We analyse the trace file as follows.

SQL> DECLARE
  2     n NUMBER;
  3  BEGIN
  4     n := DBMS_HPROF.ANALYZE('LOG_DIR','inline.trc');
  5     DBMS_OUTPUT.PUT_LINE('Runid = ' || n);
  6  END;
  7  /
Runid = 56

PL/SQL procedure successfully completed.

The RUNID gives us the key to accessing the profiler data. There are three tables, prefixed DBMSHP_, with a variety of information, some of which helps us to see the effects of subprogram inlining, as follows.

SQL> SELECT function
  2  ,      line#
  3  ,      calls
  4  ,      subtree_elapsed_time  AS sub_ela
  5  ,      function_elapsed_time AS func_ela
  6  FROM   dbmshp_function_info
  7  WHERE  runid = 56;

FUNCTION              LINE#      CALLS    SUB_ELA   FUNC_ELA
---------------- ---------- ---------- ---------- ----------
STOP_PROFILING           53          1          0          0
NEW_LINE                117          1          5          5
PUT                      77          1         47         31
PUT_INIT                 67          1         16         16
PUT_LINE                109          1         64         12

5 rows selected.

We can see a number of subprogram calls (including the DBMS_OUTPUT module calls) but no "F" function. This is because it was optimised out (i.e. inlined) by the compiler. If we run the same example, but without the PRAGMA INLINE directive (or with it set to 'NO'), we see the F function appear in the trace, as follows.

SQL> SELECT function
  2  ,      line#
  3  ,      calls
  4  ,      subtree_elapsed_time  AS sub_ela
  5  ,      function_elapsed_time AS func_ela
  6  FROM   dbmshp_function_info
  7  WHERE  runid = 57;

FUNCTION                  LINE#      CALLS    SUB_ELA   FUNC_ELA
-------------------- ---------- ---------- ---------- ----------
__anonymous_block.F           5          1          4          4
STOP_PROFILING               53          1          0          0
NEW_LINE                    117          1          4          4
PUT                          77          1         30         16
PUT_INIT                     67          1         14         14
PUT_LINE                    109          1         43          9

6 rows selected.

As a tool to help us understand the execution profile of our code in terms of subprograms, the new Hierarchical Profiler is extremely useful. It tells us the time spent inside a module and also the time spent in subprogram calls (i.e. the module we are tracing might not actually be the problem: it might be one of the subprogram calls). It also clearly demonstrates the effects of subprogram inlining and we will use this method for the remaining examples in this article. We will not discuss the Profiler itself in any further detail, however, as the online documentation gives a good overview of the reporting capabilities of the tool (see Further Reading below).

inlining multiple subprograms

When we supply the PRAGMA INLINE instruction for a given subprogram, this does not cascade to the further subprogram calls contained within that module. In the following example, we specify function F3 is to be inlined. This function makes calls to functions F1 and F2, which will not be inlined. First we profile the code, as follows.

SQL> DECLARE
  2
  3     n1 PLS_INTEGER := 0;
  4
  5     FUNCTION f1 RETURN PLS_INTEGER IS
  6     BEGIN
  7        RETURN DBMS_RANDOM.VALUE(1,100);
  8     END f1;
  9
 10     FUNCTION f2 RETURN PLS_INTEGER IS
 11     BEGIN
 12        RETURN DBMS_RANDOM.VALUE(10,50);
 13     END f2;
 14
 15     FUNCTION f3 (p IN PLS_INTEGER) RETURN PLS_INTEGER IS
 16     BEGIN
 17        RETURN p + f1() + f2();
 18     END f3;
 19
 20  BEGIN
 21
 22     DBMS_HPROF.START_PROFILING('LOG_DIR', 'inline.trc');
 23
 24     FOR i IN 1 .. 10000 LOOP
 25        PRAGMA INLINE(f3,'YES');
 26        n1 := n1 + f3(i);
 27     END LOOP;
 28
 29     DBMS_HPROF.STOP_PROFILING;
 30
 31     DBMS_OUTPUT.PUT_LINE('Value of n1 is ' || n1);
 32
 33  END;
 34  /
Value of n1 is 50810766

PL/SQL procedure successfully completed.

The profiler report below shows that F3 was inlined but F1 and F2 were not as they did not have their own PRAGMA INLINE instructions.

FUNCTION LINE# CALLS SUB_ELA FUNC_ELA ---------------------- ---------- ---------- ---------- ---------- __anonymous_block.F1 5 10000 131431 47645 __anonymous_block.F2 10 10000 132004 48282 STOP_PROFILING 53 1 0 0 VALUE 69 20000 68686 68686 VALUE 103 20000 167508 98822 5 rows selected.

There is a mechanism available to automatically inline as many of our subprograms as possible, however, as we will see below.

plsql_optimize_level

Subprogram inlining is not enabled by default. In our earlier examples, we asked Oracle to inline specific modules, but there is a way to enable subprogram inlining globally. Most readers will be aware of the PLSQL_OPTIMIZE_LEVEL parameter (it is discussed in the background article referenced earlier). This parameter controls the depth of the optimisations that the PL/SQL compiler will undertake. Oracle has extended the scope of this parameter in 11g, so it now has the following settings.

  • Level 0: no compiler optimisations (PL/SQL compiled as is);
  • Level 1: high-level optimisations (such as moving constants out of loops);
  • Level 2: default level. Aggressive optimisations (such as rewriting cursor-for-loops as array fetches) and in 11g, also inlining any subprograms that we request with PRAGMA INLINE;
  • Level 3: most aggressive level: New in 11g, this will inline all subprograms where possible (excluding those contained in built-in packages).

Our examples so far have been at the default level of 2, which is why we needed to request inlining for our subprograms using the PRAGMA INLINE syntax. At level 3, however, we do not need this syntax as the compiler will automatically attempt to inline all subprograms where possible. We can see this below, where we will remove the pragma and run an earlier example at optimisation levels 2 and 3. We will begin with level 2 (default).

SQL> ALTER SESSION SET PLSQL_OPTIMIZE_LEVEL = 2;

Session altered.

SQL> DECLARE
  2
  3     v_int PLS_INTEGER;
  4     c_dir CONSTANT VARCHAR2(10) := 'LOG_DIR';
  5     c_trc CONSTANT VARCHAR2(10) := 'inline.trc';
  6
  7     FUNCTION f RETURN PLS_INTEGER IS
  8     BEGIN
  9        RETURN 10;
 10     END f;
 11
 12  BEGIN
 13
 14     DBMS_HPROF.START_PROFILING(c_dir, c_trc);
 15     v_int := f();
 16     DBMS_HPROF.STOP_PROFILING;
 17
 18     DBMS_OUTPUT.PUT_LINE(
 19        'Runid = ' || DBMS_HPROF.ANALYZE(c_dir, c_trc)
 20        );
 21
 22  END;
 23  /
Runid = 61

PL/SQL procedure successfully completed.

Having analyzed the trace file above, the Hierarchical Profiler report shows that (unsurprisingly) the function F is not inlined at level 2 without the PRAGMA INLINE instruction.

SQL> SELECT function
  2  ,      line#
  3  ,      calls
  4  ,      subtree_elapsed_time  AS sub_ela
  5  ,      function_elapsed_time AS func_ela
  6  FROM   dbmshp_function_info
  7  WHERE  runid = 61;

FUNCTION                   LINE#      CALLS    SUB_ELA   FUNC_ELA
--------------------- ---------- ---------- ---------- ----------
__anonymous_block.F            7          1          4          4
STOP_PROFILING                53          1          0          0

2 rows selected.

We will now run the same code under level 3 optimisation, as follows.

SQL> ALTER SESSION SET PLSQL_OPTIMIZE_LEVEL = 3;

Session altered.

SQL> DECLARE
  2
  3     v_int PLS_INTEGER;
  
 <<...snip...>>

 18     DBMS_OUTPUT.PUT_LINE(
 19        'Runid = ' || DBMS_HPROF.ANALYZE(c_dir, c_trc)
 20        );
 21
 22  END;
 23  /
Runid = 62

PL/SQL procedure successfully completed.

SQL> SELECT function
  2  ,      line#
  3  ,      calls
  4  ,      subtree_elapsed_time  AS sub_ela
  5  ,      function_elapsed_time AS func_ela
  6  FROM   dbmshp_function_info
  7  WHERE  runid = 62;

FUNCTION                 LINE#      CALLS    SUB_ELA   FUNC_ELA
------------------- ---------- ---------- ---------- ----------
STOP_PROFILING              53          1          0          0

1 row selected.

At optimisation level 3, our subprogram is inlined automatically. To avoid this behaviour for a particular subprogram (should we wish to), we need to use the PRAGMA INLINE syntax with an instruction of 'NO'. Oracle states that it would be extremely rare for inlining to cause a performance degradation, however, so we might never have the cause to disable it.

performance benefits

So far we have concentrated on the mechanics of subprogram inlining and as stated earlier, the documentation has a wide range of permutations and rules regarding the PRAGMA INLINE syntax and behaviour. We will complete this article with a look at the performance improvements we might expect from this new optimisation.

pl/sql-only subprograms

We will begin our performance investigations by testing a simple loop with multiple calls to a PL/SQL-only subprogram (i.e. there is no SQL in the subprogram). We will measure the performance gains of inlining using a simple timer (note that the Hierarchical Profiler timings could also be used for this purpose).

SQL> DECLARE
  2
  3     n1 PLS_INTEGER := 0;
  4
  5     FUNCTION f (p IN PLS_INTEGER) RETURN PLS_INTEGER IS
  6     BEGIN
  7        RETURN DBMS_RANDOM.VALUE(1,1000);
  8     END f;
  9
 10  BEGIN
 11
 12     timer.snap();
 13     FOR i IN 1 .. 1000000 LOOP
 14        PRAGMA INLINE(f,'NO');
 15        n1 := n1 + f(i);
 16     END LOOP;
 17     timer.show('FUNCTION');
 18
 19     timer.snap();
 20     FOR i IN 1 .. 1000000 LOOP
 21        PRAGMA INLINE(f,'YES');
 22        n1 := n1 + f(i);
 23     END LOOP;
 24     timer.show('INLINE  ');
 25
 26  END;
 27  /
[FUNCTION] 2.82 seconds
[INLINE  ] 2.75 seconds

PL/SQL procedure successfully completed.

We can see that the gains from inlining are marginal (the test was run several times and showed the same results from the second run onwards; the first run had "interference" associated with parsing/optimisation and DBMS_RANDOM initialisation). This is perhaps to be expected. The calls to F will comprise a very small proportion of the work carried out by Oracle. The majority of the time will be spent in DBMS_RANDOM (which cannot be inlined).

For most systems, the previous results will be less than inspirational. However, in scenarios where the call to the subprogram accounts for a larger proportion of the run time, the performance gains from inlining might be more significant. We will slightly modify our previous example to demonstrate this. In the loop (scaled by an order of magnitude to exaggerate the results), we will assign the result of the function call to the numeric variable rather than keep a running total. In addition, we will remove the calls to built-in packages (DBMS_RANDOM in the previous example). The example is as follows.

SQL> DECLARE
  2
  3     n1 PLS_INTEGER;
  4     n2 PLS_INTEGER;
  5
  6     FUNCTION f (p IN PLS_INTEGER) RETURN PLS_INTEGER IS
  7     BEGIN
  8        RETURN p;
  9     END f;
 10
 11  BEGIN
 12
 13     timer.snap();
 14     FOR i IN 1 .. 10000000 LOOP
 15        n1 := i;
 16        PRAGMA INLINE(f,'NO');
 17        n2 := f(n1);
 18     END LOOP;
 19     timer.show('FUNCTION');
 20
 21     timer.snap();
 22     FOR i IN 1 .. 10000000 LOOP
 23        n1 := i;
 24        PRAGMA INLINE(f,'YES');
 25        n2 := f(n1);
 26     END LOOP;
 27     timer.show('INLINE  ');
 28
 29  END;
 30  /
[FUNCTION] 2.40 seconds
[INLINE  ] 0.46 seconds

PL/SQL procedure successfully completed.

This time we have impressive performance gains from inlining. A repeat of the test with the Hierarchical Profiler running and using 1 million iterations per loop instead of 10 million shows the following results.

FUNCTION LINE# CALLS SUB_ELA FUNC_ELA ---------------------- ---------- ---------- ---------- ---------- __anonymous_block.F 6 1000000 1991944 1991944 STOP_PROFILING 53 1 0 0 NEW_LINE 117 2 7 7 PUT 77 2 24 13 PUT_INIT 67 1 11 11 PUT_LINE 109 2 43 12 GET_TIME 289 4 23 23 ELAPSED 38 2 29 17 REFORMAT 44 2 37 37 SHOW 63 2 197 88 SNAP 26 2 48 37 11 rows selected.

As we requested, the second call to function F was inlined and we can see this above (i.e. 1 million calls from 2 million loop interations in our example overall). It might appear as though the second loop was removed altogether, but we can use the PL/SQL Profiler (available since 8i) to confirm that the inlined version retains the loop. The following report is a summary of the profiler data from a re-run of the above example with the PL/SQL Profiler running.

     RUNID UNIT_NAME           LINE# TOTAL_OCCUR       TOTAL_TIME
---------- -------------- ---------- ----------- ----------------
        24 <anonymous>             8     1000000      54193775745
        24 <anonymous>             9     1000000      75635759014
        24 <anonymous>            13           1         11127391
        24 <anonymous>            14     1000001      57348801593
        24 <anonymous>            15     1000000      54923122352
        24 <anonymous>            16     1000000                0
        24             17     1000000     145286542284
        24 <anonymous>            19           1          2864051
        24 <anonymous>            21           1           350323
        24 <anonymous>            22     1000001      54518495227
        24 <anonymous>            23     1000000      49953127003
        24 <anonymous>            24     1000000                0
        24             25     1000000      44516146884
        24 <anonymous>            27           1          3561625
        24 <anonymous>            28           1          1035606
        24 TIMER                  30           2          8896382
        24 TIMER                  31           2           421003
        <<...snip...>>
        24 TIMER                 130           2          1948292

37 rows selected.

We can see in lines 16 and 24 that the PRAGMA INLINE instructions cost nothing in terms of time. We can also see quite clearly that both loops have been retained (lines 14 and 22) and they took approximately the same amount of time. The performance gain inside the second loop is attributable largely to inlining: the 1 million calls to subprogram F (line 17) took 3 times longer than the 1 million inlined direct assignments on line 25.

subprograms containing sql

Finally, we will measure the performance benefits of inlining a subprogram with embedded SQL. From our observations so far, we would expect these gains to be marginal. In the following example, we will make multiple calls to a small lookup function and measure the impact of inlining.

SQL> DECLARE
  2
  3     n1 PLS_INTEGER;
  4     n2 PLS_INTEGER;
  5
  6     FUNCTION f (p IN PLS_INTEGER) RETURN PLS_INTEGER IS
  7        n PLS_INTEGER;
  8     BEGIN
  9        SELECT MOD(p,100) INTO n FROM dual;
 10        RETURN n;
 11     END f;
 12
 13  BEGIN
 14
 15     timer.snap();
 16     FOR i IN 1 .. 100000 LOOP
 17        n1 := i;
 18        PRAGMA INLINE(f,'NO');
 19        n2 := f(n1);
 20     END LOOP;
 21     timer.show('FUNCTION');
 22
 23     timer.snap();
 24     FOR i IN 1 .. 100000 LOOP
 25        n1 := i;
 26        PRAGMA INLINE(f,'YES');
 27        n2 := f(n1);
 28     END LOOP;
 29     timer.show('INLINE  ');
 30
 31  END;
 32  /
[FUNCTION] 3.86 seconds
[INLINE  ] 3.64 seconds

PL/SQL procedure successfully completed.

Not surprisingly, the gains from inlining are tiny. The main consumer of time in this example is the SQL itself, as we can see if we repeat the example using the Hierarchical Profiler. The report is as follows.

FUNCTION LINE# CALLS SUB_ELA FUNC_ELA -------------------------- --------- ---------- ---------- ---------- __anonymous_block.F 6 100000 4375478 714279 STOP_PROFILING 53 1 0 0 NEW_LINE 117 2 6 6 PUT 77 2 26 15 PUT_INIT 67 1 11 11 PUT_LINE 109 2 46 14 GET_TIME 289 4 25 25 ELAPSED 38 2 28 16 REFORMAT 44 2 46 46 SHOW 63 2 206 86 SNAP 26 2 48 35 __static_sql_exec_line27 27 100000 3599126 3599126 __static_sql_exec_line9 9 100000 3661199 3661199 13 rows selected.

The embedded SQL statement in F (__static_sql_exec_line9) accounted for 84% of the elapsed time of the function overall, so it's clearly unlikely that moving this SQL statement around would make a difference. The benefits of inlining function F would have to be carved from the 0.7 seconds spent in the function but not executing the SQL. Some systems require this level of "tuning to the nth degree" but for most of us, we have lower hanging fruit to go after! The results of inlining the second set of calls to subprogram F can be seen at line 27 (__static_sql_exec_line27).