<?xml version="1.0" encoding="UTF-8"?>
<rss version="2.0"
	xmlns:content="http://purl.org/rss/1.0/modules/content/"
	xmlns:wfw="http://wellformedweb.org/CommentAPI/"
	xmlns:dc="http://purl.org/dc/elements/1.1/"
	xmlns:atom="http://www.w3.org/2005/Atom"
	xmlns:sy="http://purl.org/rss/1.0/modules/syndication/"
	xmlns:slash="http://purl.org/rss/1.0/modules/slash/"
	>

<channel>
	<title>Striving for Optimal PerformanceStriving for Optimal Performance &#187; Parallel Processing</title>
	<atom:link href="http://www.antognini.ch/category/oracledatabase/parallel-processing/feed/" rel="self" type="application/rss+xml" />
	<link>http://www.antognini.ch</link>
	<description></description>
	<lastBuildDate>Fri, 18 May 2012 11:35:24 +0000</lastBuildDate>
	<language>en</language>
	<sy:updatePeriod>hourly</sy:updatePeriod>
	<sy:updateFrequency>1</sy:updateFrequency>
	
		<item>
		<title>Parallel Full Table Scans Do Not Always Perform Direct Reads</title>
		<link>http://www.antognini.ch/2010/09/parallel-full-table-scans-do-not-always-perform-direct-reads/</link>
		<comments>http://www.antognini.ch/2010/09/parallel-full-table-scans-do-not-always-perform-direct-reads/#comments</comments>
		<pubDate>Sun, 12 Sep 2010 10:52:48 +0000</pubDate>
		<dc:creator>Christian Antognini</dc:creator>
				<category><![CDATA[10gR1]]></category>
		<category><![CDATA[10gR2]]></category>
		<category><![CDATA[11gR1]]></category>
		<category><![CDATA[11gR2]]></category>
		<category><![CDATA[Indexes]]></category>
		<category><![CDATA[Parallel Processing]]></category>

		<guid isPermaLink="false">http://antognini.ch/?p=1287</guid>
		<description><![CDATA[Even though in general parallel full table scans performs direct reads, some exceptions exist. The aim of this post is to show such an exception. For test purposes I build in my own schema a copy of the SH.SALES table (the one distributed by Oracle with the demo schemas…). On that table I build an [...]]]></description>
			<content:encoded><![CDATA[<p>Even though in general parallel full table scans performs direct reads, some exceptions exist. The aim of this post is to show such an exception.</p>
<p>For test purposes I build in my own schema a copy of the SH.SALES table (the one distributed by Oracle with the demo schemas…). On that table I build an index on the TIME_ID column and, at the same time, I trace the execution. The statements I use are the following.</p>
<pre>execute dbms_monitor.session_trace_enable
CREATE INDEX sales_time_id ON sales (time_id) PARALLEL 2 ONLINE;
execute dbms_monitor.session_trace_disable</pre>
<p>Since the index is built in parallel (DOP=2), 5 processes are used to run the statement: the query coordinator, 2 slaves for reading the table, and 2 slaves for building the index. Based on the data provided by extended SQL trace let’s have a look to some information about the execution.</p>
<ul>
<li>Execution plan (without runtime statistics and query optimizer estimations): both the build of the index and the full table scan are performed in parallel.</li>
</ul>
<pre>Operation
----------------------------------------
PX COORDINATOR
  PX SEND QC (ORDER) :TQ10001
    INDEX BUILD NON UNIQUE SALES_TIME_ID
      SORT CREATE INDEX
        PX RECEIVE
          PX SEND RANGE :TQ10000
            PX BLOCK ITERATOR
              TABLE ACCESS FULL SALES</pre>
<ul>
<li>Resource usage profile of the query coordinator: no real work is performed, it&#8217;s just coordination work&#8230;</li>
</ul>
<pre>                                               Total            Number of     Duration per
Component                               Duration [s]       %       Events       Events [s]
----------------------------------- ---------------- ------- ------------ ----------------
PX Deq: Execute Reply                          1.928  79.044           44            0.044
recursive statements                           0.427  17.487          n/a              n/a
CPU                                            0.024   0.984          n/a              n/a
os thread startup                              0.017   0.690            1            0.017
log file sync                                  0.014   0.569            2            0.007
PX Deq: Parse Reply                            0.014   0.568            4            0.003
enq: CR - block range reuse ckpt               0.005   0.202            2            0.002
PX Deq: Join ACK                               0.004   0.145            5            0.001
SQL*Net message from client                    0.002   0.098            1            0.002
PX qref latch                                  0.002   0.075            1            0.002
PX Deq: Table Q qref                           0.001   0.058            1            0.001
enq: RO - fast object reuse                    0.001   0.028            1            0.001
PX Deq: Signal ACK EXT                         0.001   0.025            6            0.000
PX Deq: Slave Session Stats                    0.000   0.014            3            0.000
reliable message                               0.000   0.006            2            0.000
latch: call allocation                         0.000   0.004            1            0.000
db file sequential read                        0.000   0.004            6            0.000
rdbms ipc reply                                0.000   0.002            2            0.000
SQL*Net message to client                      0.000   0.000            1            0.000
----------------------------------- ---------------- -------
Total                                          2.439 100.000</pre>
<ul>
<li>Resource usage profile of one of the slaves building the index: direct writes are used to store the index.</li>
</ul>
<pre>                                               Total            Number of     Duration per
Component                               Duration [s]       %       Events       Events [s]
----------------------------------- ---------------- ------- ------------ ----------------
direct path write                              0.982  49.671          376            0.003
CPU                                            0.781  39.508          n/a              n/a
PX Deq: Table Q Normal                         0.177   8.957          130            0.001
PX Deq: Execution Msg                          0.012   0.595            4            0.003
cursor: pin S wait on X                        0.011   0.535            1            0.011
recursive statements                           0.008   0.420          n/a              n/a
log file sync                                  0.005   0.247            3            0.002
Disk file operations I/O                       0.001   0.030            1            0.001
PX Deq: Slave Session Stats                    0.000   0.021            1            0.000
reliable message                               0.000   0.011            1            0.000
rdbms ipc reply                                0.000   0.006            2            0.000
----------------------------------- ---------------- -------
Total                                          1.977 100.000</pre>
<ul>
<li>Resource usage profile of one of the slaves scanning the table: instead of performing direct reads, “regular” buffered reads are performed (notice the <em>db file scattered read</em> event).</li>
</ul>
<pre>                                               Total            Number of     Duration per
Component                               Duration [s]       %       Events       Events [s]
----------------------------------- ---------------- ------- ------------ ----------------
PX Deq: Execution Msg                          1.571  80.295           23            0.068
CPU                                            0.311  15.889          n/a              n/a
PX Deq Credit: need buffer                     0.055   2.794          432            0.000
db file scattered read                         0.016   0.835           55            0.000
PX Deq: Table Q Get Keys                       0.002   0.109            1            0.002
Disk file operations I/O                       0.001   0.032            1            0.001
PX Deq Credit: send blkd                       0.001   0.028            1            0.001
PX Deq: Slave Session Stats                    0.000   0.010            1            0.000
db file sequential read                        0.000   0.005            4            0.000
PX qref latch                                  0.000   0.002            1            0.000
latch: cache buffers chains                    0.000   0.000            1            0.000
----------------------------------- ---------------- -------
Total                                          1.957 100.000</pre>
<p>As pointed out by the last resource usage profile no direct reads are performed. Why? In this case it is because the ONLINE option was specified. By the way, I do not know why there is such a limitation&#8230; Anway, without this option, for one of the slaves reading the table the following resource usage profile is used (notice the <em>direct path read</em> event). I do not show the other resource usage profiles and the execution plan because they do not change.</p>
<pre>                                               Total            Number of     Duration per
Component                               Duration [s]       %       Events       Events [s]
----------------------------------- ---------------- ------- ------------ ----------------
PX Deq: Execution Msg                          1.499  80.358           22            0.068
CPU                                            0.278  14.897          n/a              n/a
PX Deq Credit: need buffer                     0.071   3.790          504            0.000
direct path read                               0.012   0.630           52            0.000
PX Deq Credit: send blkd                       0.003   0.176            2            0.002
PX Deq: Table Q Get Keys                       0.001   0.047            2            0.000
PX Deq: Slave Session Stats                    0.001   0.041            1            0.001
Disk file operations I/O                       0.001   0.033            1            0.001
library cache: mutex X                         0.000   0.025            1            0.000
db file sequential read                        0.000   0.002            2            0.000
asynch descriptor resize                       0.000   0.001           18            0.000
----------------------------------- ---------------- -------
Total                                          1.866 100.000</pre>
<p>In summary, do not expect to always see direct reads when a parallel full table scan is performed.</p>
]]></content:encoded>
			<wfw:commentRss>http://www.antognini.ch/2010/09/parallel-full-table-scans-do-not-always-perform-direct-reads/feed/</wfw:commentRss>
		<slash:comments>5</slash:comments>
		</item>
		<item>
		<title>Parallel Processing With Standard Edition</title>
		<link>http://www.antognini.ch/2010/09/parallel-processing-with-standard-edition/</link>
		<comments>http://www.antognini.ch/2010/09/parallel-processing-with-standard-edition/#comments</comments>
		<pubDate>Wed, 01 Sep 2010 08:55:59 +0000</pubDate>
		<dc:creator>Christian Antognini</dc:creator>
				<category><![CDATA[11gR2]]></category>
		<category><![CDATA[Parallel Processing]]></category>

		<guid isPermaLink="false">http://antognini.ch/?p=1258</guid>
		<description><![CDATA[As clearly stated in the Licensing Information guide, all features related to parallel processing (parallel backup and recovery, parallel query/DML, parallel statistics gathering, parallel index build/scans, parallel Data Pump export/import, in-memory parallel execution, parallel statement queuing and parallel spatial index builds) are only available with the Enterprise Edition. However, as of Oracle Database 11g Release [...]]]></description>
			<content:encoded><![CDATA[<p>As clearly stated in the <a href="http://download.oracle.com/docs/cd/E11882_01/license.112/e10594/editions.htm#CJACGHEB">Licensing Information</a> guide, all features related to parallel processing (parallel backup and recovery, parallel query/DML, parallel statistics gathering, parallel index build/scans, parallel Data Pump export/import, in-memory parallel execution, parallel statement queuing and parallel spatial index builds) are only available with the Enterprise Edition. However, as of Oracle Database 11g Release 2, there is a feature that provides parallel processing capabilities in the Standard Edition as well. This feature is available through the DBMS_PARALLEL_EXECUTE package.</p>
<p>For example, let’s say that you have to execute an UPDATE statement on all rows of a table containing a huge amount of data. If you can take the table “offline”, provided you have enough space doing a CTAS statement instead of an UPDATE statement is probably much faster. However, if you cannot put the table &#8220;offline&#8221;, doing it in parallel might be a sensible way to speed-up the execution. Hence, if you are using the Enterprise Edition you can take advantage of the parallel processing features integrated in the SQL engine. Thus, you can execute something like that:</p>
<pre>ALTER SESSION ENABLE PARALLEL DML;
UPDATE /*+ parallel(t,4) */ t SET col = expr;</pre>
<p>Since such a possibility is not available with the Standard Edition, as of Oracle Database 11g Release 2 you might execute a PL/SQL block like the following one to perform the same operation in parallel:</p>
<pre>SET SERVEROUTPUT ON
DECLARE
  l_task_name user_parallel_execute_tasks.task_name%TYPE;
  l_sql_stmt user_parallel_execute_tasks.sql_stmt%TYPE;
BEGIN
  l_task_name := 'px_update';
  l_sql_stmt := 'UPDATE t SET col = expr WHERE rowid BETWEEN :start_id AND :end_id';

  dbms_parallel_execute.create_task(task_name => l_task_name);

  dbms_parallel_execute.create_chunks_by_rowid(
    task_name   => l_task_name,
    table_owner => user,
    table_name  => 'T',
    by_row      => FALSE,
    chunk_size  => 128
  );

  dbms_parallel_execute.run_task(
    task_name      => l_task_name,
    sql_stmt       => l_sql_stmt,
    language_flag  => dbms_sql.native,
    parallel_level => 4
  );

  WHILE (dbms_parallel_execute.task_status(task_name => l_task_name)
           NOT IN (
             dbms_parallel_execute.chunking_failed,
             dbms_parallel_execute.finished,
             dbms_parallel_execute.finished_with_error,
             dbms_parallel_execute.crashed
           ))
  LOOP
    dbms_lock.sleep(1);
  END LOOP;

  CASE dbms_parallel_execute.task_status(task_name => l_task_name)
    WHEN dbms_parallel_execute.chunking_failed THEN dbms_output.put_line('chunking_failed');
    WHEN dbms_parallel_execute.finished THEN dbms_output.put_line('finished');
    WHEN dbms_parallel_execute.finished_with_error THEN dbms_output.put_line('finished_with_error');
    WHEN dbms_parallel_execute.crashed THEN dbms_output.put_line('crashed');
  END CASE;

  dbms_parallel_execute.drop_task(task_name => l_task_name);
END;
/</pre>
<p>Note that using the DBMS_PARALLEL_EXECUTE package is not limited to the Standard Edition, though. I see at least two situations where it can be handy with the Enterprise Edition:</p>
<ul>
<li>You do not want to process the whole DML statement in a single transaction.</li>
<li>You want to process in parallel a PL/SQL block, not a DML statement.</li>
</ul>
<p>Both situations are relevant if, as of Oracle Database 11g Release 2, you plan to perform such an operation during an online application upgrade by taking advantage of edition-based redefinition. I guess that the package was implemented for that purpose&#8230;</p>
]]></content:encoded>
			<wfw:commentRss>http://www.antognini.ch/2010/09/parallel-processing-with-standard-edition/feed/</wfw:commentRss>
		<slash:comments>9</slash:comments>
		</item>
		<item>
		<title>Exadata Storage Server and the Query Optimizer &#8211; Part 3</title>
		<link>http://www.antognini.ch/2010/05/exadata-storage-server-and-the-query-optimizer-part-3/</link>
		<comments>http://www.antognini.ch/2010/05/exadata-storage-server-and-the-query-optimizer-part-3/#comments</comments>
		<pubDate>Wed, 05 May 2010 14:55:26 +0000</pubDate>
		<dc:creator>Christian Antognini</dc:creator>
				<category><![CDATA[11gR1]]></category>
		<category><![CDATA[11gR2]]></category>
		<category><![CDATA[Exadata]]></category>
		<category><![CDATA[Parallel Processing]]></category>
		<category><![CDATA[Query Optimizer]]></category>

		<guid isPermaLink="false">http://antognini.ch/?p=1018</guid>
		<description><![CDATA[In the first and second post of this series I shared with you some basics about smart scan and gave some details about projection and restriction. The aim of this post is to cover the third basic technique: join filtering. Join filtering is not something specific to the Exadata Storage Server. In fact, it is [...]]]></description>
			<content:encoded><![CDATA[<p>In the <a href="/2010/04/exadata-storage-server-and-the-query-optimizer-part-1/">first</a> and <a href="/2010/05/exadata-storage-server-and-the-query-optimizer-part-2/">second</a> post of this series I shared with you some basics about smart scan and gave some details about projection and restriction. The aim of this post is to cover the third basic technique: join filtering.</p>
<p>Join filtering is not something specific to the Exadata Storage Server. In fact, it is an Enterprise Edition feature available since Oracle Database 10g Release 2. Simply put, it is used to reduce data communication between slave processes in parallel joins. For more information about it I suggest you to read a paper I published in June 2008 entitled <a href="/publications/">Bloom Filters</a>. In it I describe not only what bloom filters are, but also how Oracle Database uses them. And, one of the use cases is join filtering.</p>
<p>What I want to show here is how Exadata Storage Server is able to take advantage of join filtering. For that purpose let’s have a look to the following execution plan:</p>
<pre>-----------------------------------------------------
| Id  | Operation                        | Name     |
-----------------------------------------------------
|   0 | SELECT STATEMENT                 |          |
|   1 |  PX COORDINATOR                  |          |
|   2 |   PX SEND QC (RANDOM)            | :TQ10002 |
|*  3 |    HASH JOIN BUFFERED            |          |
|   4 |     JOIN FILTER CREATE           | :BF0000  |
|   5 |      PX RECEIVE                  |          |
|   6 |       PX SEND HASH               | :TQ10000 |
|   7 |        PX BLOCK ITERATOR         |          |
|*  8 |         TABLE ACCESS STORAGE FULL| T1       |
|   9 |     PX RECEIVE                   |          |
|  10 |      PX SEND HASH                | :TQ10001 |
|  11 |       JOIN FILTER USE            | :BF0000  |
|  12 |        PX BLOCK ITERATOR         |          |
|* 13 |         TABLE ACCESS STORAGE FULL| T2       |
-----------------------------------------------------

Predicate Information (identified by operation id):
---------------------------------------------------
   3 - access("T1"."ID"="T2"."ID")
   8 - storage("T1"."MOD"=42)
       filter("T1"."MOD"=42)
  13 - storage(SYS_OP_BLOOM_FILTER(:BF0000,"T2"."ID"))
       filter(SYS_OP_BLOOM_FILTER(:BF0000,"T2"."ID"))</pre>
<p>As you can see, join filtering is used. In fact, the operation 4 (JOIN FILTER CREATE) builds a bloom filter that, later on, is used by operation 11 (JOIN FILTER USE) to filter out the data that does not fulfill the join condition. However, the most important thing to notice in this execution plan is the STORAGE predicate applied by the operation 13. According to it the bloom filter is applied not only by the operation 11, but also by the operation 13. And, since the operation 13 is a smart scan operation, the STORAGE predicate is evaluated by the cells. This means that the reduction of data communication does not only take place between slave processes, but also between the cells and the database instances. Remarkable!</p>
]]></content:encoded>
			<wfw:commentRss>http://www.antognini.ch/2010/05/exadata-storage-server-and-the-query-optimizer-part-3/feed/</wfw:commentRss>
		<slash:comments>2</slash:comments>
		</item>
		<item>
		<title>Does the Query Optimizer Cost PX Distribution Methods?</title>
		<link>http://www.antognini.ch/2010/01/does-the-query-optimizer-cost-px-distribution-methods/</link>
		<comments>http://www.antognini.ch/2010/01/does-the-query-optimizer-cost-px-distribution-methods/#comments</comments>
		<pubDate>Tue, 26 Jan 2010 11:55:15 +0000</pubDate>
		<dc:creator>Christian Antognini</dc:creator>
				<category><![CDATA[10gR1]]></category>
		<category><![CDATA[10gR2]]></category>
		<category><![CDATA[11gR1]]></category>
		<category><![CDATA[11gR2]]></category>
		<category><![CDATA[9iR2]]></category>
		<category><![CDATA[Parallel Processing]]></category>
		<category><![CDATA[Query Optimizer]]></category>

		<guid isPermaLink="false">http://antognini.ch/?p=45</guid>
		<description><![CDATA[The short answer to this question is &#8220;yes&#8221;, it does. Unfortunately, the distribution costs are not externalized through the execution plans and, as a result, this limitation (yes, it is really a limitation in the current implementation, not a bug) confuses everyone that carefully look at the information provided in an execution plan of a [...]]]></description>
			<content:encoded><![CDATA[<p>The short answer to this question is &#8220;yes&#8221;, it does. Unfortunately, the distribution costs are not externalized through the execution plans and, as a result, this limitation (yes, it is really a limitation in the current implementation, not a bug) confuses everyone that carefully look at the information provided in an execution plan of a SQL statement executed in parallel. Hence, let’s remove some confusion&#8230;</p>
<p>To illustrate what the problem is, let’s have a look to a simple query that joins two tables:</p>
<pre>SELECT * FROM master m JOIN detail d ON (m.id = d.id)</pre>
<p>Now, let’s have a look at two parallel executions. If the two tables are equipartitioned, the following execution plan (which takes advantage of partition-wise join) is probably the most effective for such a query. Note that thanks to the partition-wise join not only there is a single set of parallel slaves (Q1,00), but, in addition, the parallel slaves do not communicate with each other (they only communicate with the query coordinator). As a result, the communication costs are equal to zero (this is because the query optimizer does not compute the costs of the communication towards the query coordinator).</p>
<pre>----------------------------------------------------------------------------------------------
| Id  | Operation               | Name     | Bytes | Cost (%CPU)|    TQ  |IN-OUT| PQ Distrib |
----------------------------------------------------------------------------------------------
|   0 | SELECT STATEMENT        |          |    16G| 162524  (1)|        |      |            |
|   1 |  PX COORDINATOR         |          |       |            |        |      |            |
|   2 |   PX SEND QC (RANDOM)   | :TQ10000 |    16G| 162524  (1)|  Q1,00 | P->S | QC (RAND)  |
|   3 |    PX PARTITION HASH ALL|          |    16G| 162524  (1)|  Q1,00 | PCWC |            |
|   4 |     HASH JOIN           |          |    16G| 162524  (1)|  Q1,00 | PCWP |            |
|   5 |      TABLE ACCESS FULL  | MASTER   |   125M|   1422  (1)|  Q1,00 | PCWP |            |
|   6 |      TABLE ACCESS FULL  | DETAIL   |    15G| 161052  (1)|  Q1,00 | PCWP |            |
----------------------------------------------------------------------------------------------</pre>
<p>If the two tables are not equipartitioned, the following execution plan might be chosen by the query optimizer. Since it does not take advantage of a partition-wise join, several set of parallel slaves are used. The first one (Q1,00) scans the MASTER table, the second one (Q1,01) scans the DETAIL table, and both of them send the data to the third one (Q1,02) that performs the join of the two tables and sends the data to the query coordinator. Since all data (about 15GB; yes, the estimations are good) is sent through the PX channels, the cost should not be zero. However, as you can see, the cost is exactly the same as the one of the previous execution plan.</p>
<pre>----------------------------------------------------------------------------------------------
| Id  | Operation               | Name     | Bytes | Cost (%CPU)|    TQ  |IN-OUT| PQ Distrib |
----------------------------------------------------------------------------------------------
|   0 | SELECT STATEMENT        |          |    16G| 162524  (1)|        |      |            |
|   1 |  PX COORDINATOR         |          |       |            |        |      |            |
|   2 |   PX SEND QC (RANDOM)   | :TQ10002 |    16G| 162524  (1)|  Q1,02 | P->S | QC (RAND)  |
|   3 |    HASH JOIN BUFFERED   |          |    16G| 162524  (1)|  Q1,02 | PCWP |            |
|   4 |     PX RECEIVE          |          |   125M|   1422  (1)|  Q1,02 | PCWP |            |
|   5 |      PX SEND HASH       | :TQ10000 |   125M|   1422  (1)|  Q1,00 | P->P | HASH       |
|   6 |       PX BLOCK ITERATOR |          |   125M|   1422  (1)|  Q1,00 | PCWC |            |
|   7 |        TABLE ACCESS FULL| MASTER   |   125M|   1422  (1)|  Q1,00 | PCWP |            |
|   8 |     PX RECEIVE          |          |    15G| 161052  (1)|  Q1,02 | PCWP |            |
|   9 |      PX SEND HASH       | :TQ10001 |    15G| 161052  (1)|  Q1,01 | P->P | HASH       |
|  10 |       PX BLOCK ITERATOR |          |    15G| 161052  (1)|  Q1,01 | PCWC |            |
|  11 |        TABLE ACCESS FULL| DETAIL   |    15G| 161052  (1)|  Q1,01 | PCWP |            |
----------------------------------------------------------------------------------------------</pre>
<p>For completeness, let&#8217;s compare the cost of several distribution methods (&#8220;none-none&#8221; is the one of the first execution plan above, &#8220;hash-hash&#8221; of the second one). As you can see the cost is always the same!</p>
<pre>SQL> EXPLAIN PLAN SET STATEMENT_ID 'none-none' FOR SELECT /*+ pq_distribute(d none none) */ * FROM master m JOIN detail d ON (m.id = d.id);
SQL> EXPLAIN PLAN SET STATEMENT_ID 'hash-hash' FOR SELECT /*+ pq_distribute(d hash hash) */ * FROM master m JOIN detail d ON (m.id = d.id);
SQL> EXPLAIN PLAN SET STATEMENT_ID 'broadcast-none' FOR SELECT /*+ pq_distribute(d broadcast none) */ * FROM master m JOIN detail d ON (m.id = d.id);
SQL> EXPLAIN PLAN SET STATEMENT_ID 'none-broadcast' FOR SELECT /*+ pq_distribute(d none broadcast) */ * FROM master m JOIN detail d ON (m.id = d.id);
SQL> EXPLAIN PLAN SET STATEMENT_ID 'partition-none' FOR SELECT /*+ pq_distribute(d partition none) */ * FROM master m JOIN detail d ON (m.id = d.id);
SQL> EXPLAIN PLAN SET STATEMENT_ID 'none-partition' FOR SELECT /*+ pq_distribute(d none partition) */ * FROM master m JOIN detail d ON (m.id = d.id);
SQL> SELECT statement_id, cost FROM plan_table WHERE id = 0;

STATEMENT_ID                         COST
------------------------------ ----------
none-none                          162524
hash-hash                          162524
broadcast-none                     162524
none-broadcast                     162524
partition-none                     162524
none-partition                     162524
</pre>
<p>As I wrote before, the problem is not that the costs are not computed. The problem is that they are not externalized. In fact, by giving a look to a trace file generated through the event 10053 the costs are available. Here’s the relevant part (the lines starting with “<code>---- cost</code>” contain the most important information). As you can see there are two costs associated with every distribution method: one with the distribution costs (w/ dist) and one without them (w/o dist).</p>
<pre>Enumerating distribution method for join between M[MASTER] and D[DETAIL]
-- Using join method #Hash Join:
---- cost NONE = 0.00
  Outer table:  MASTER  Alias: M
    resc: 5120.11  card 4118000.00  bytes: 32  deg: 4  resp: 1422.25
  Inner table:  DETAIL  Alias: D
    resc: 579787.63  card: 31954000.00  bytes: 526  deg: 4  resp: 161052.12
    using dmeth: 513  #groups: 1
    Cost per ptn: 49.68  #ptns: 4
    hash_area: 16384 (max=16384) buildfrag: 5530  probefrag: 524636  ppasses: 1
      buildfrag: 5530  probefrag: 524636  passes: 1
  Hash join: Resc: 585106.45  Resp: 162524.05  [multiMatchCost=0.00]
---- cost(Hash Join) = 162524.05 (w/o dist), 162524.05 (w/ dist)
---- cost VALUE = 278.52
---- cost with slave mapping  =   Outer table:  MASTER  Alias: M
    resc: 5120.11  card 4118000.00  bytes: 32  deg: 4  resp: 1422.25
  Inner table:  DETAIL  Alias: D
    resc: 579787.63  card: 31954000.00  bytes: 526  deg: 4  resp: 161052.12
    using dmeth: 2  #groups: 1
    Cost per ptn: 49.68  #ptns: 4
    hash_area: 16384 (max=16384) buildfrag: 5530  probefrag: 524636  ppasses: 1
      buildfrag: 5530  probefrag: 524636  passes: 1
  Hash join: Resc: 585106.45  Resp: 162524.05  [multiMatchCost=0.00]
---- cost(Hash Join) = 162524.05 (w/o dist), 162802.57 (w/ dist)
---- cost PARTITION-RIGHT = 271.40
  Outer table:  MASTER  Alias: M
    resc: 5120.11  card 4118000.00  bytes: 32  deg: 4  resp: 1422.25
  Inner table:  DETAIL  Alias: D
    resc: 579787.63  card: 31954000.00  bytes: 526  deg: 4  resp: 161052.12
    using dmeth: 576  #groups: 1
    Cost per ptn: 49.68  #ptns: 4
    hash_area: 16384 (max=16384) buildfrag: 5530  probefrag: 524636  ppasses: 1
      buildfrag: 5530  probefrag: 524636  passes: 1
  Hash join: Resc: 585106.45  Resp: 162524.05  [multiMatchCost=0.00]
---- cost(Hash Join) = 162524.05 (w/o dist), 162795.46 (w/ dist)
---- cost PARTITION-LEFT = 7.12
  Outer table:  MASTER  Alias: M
    resc: 5120.11  card 4118000.00  bytes: 32  deg: 4  resp: 1422.25
  Inner table:  DETAIL  Alias: D
    resc: 579787.63  card: 31954000.00  bytes: 526  deg: 4  resp: 161052.12
    using dmeth: 544  #groups: 1
    Cost per ptn: 49.68  #ptns: 4
    hash_area: 16384 (max=16384) buildfrag: 5530  probefrag: 524636  ppasses: 1
      buildfrag: 5530  probefrag: 524636  passes: 1
  Hash join: Resc: 585106.45  Resp: 162524.05  [multiMatchCost=0.00]
---- cost(Hash Join) = 162524.05 (w/o dist), 162531.17 (w/ dist)
---- cost BROADCAST-RIGHT = 920.78
---- cost with slave mapping  =   Outer table:  MASTER  Alias: M
    resc: 5120.11  card 4118000.00  bytes: 32  deg: 4  resp: 1422.25
  Inner table:  DETAIL  Alias: D
    resc: 579787.63  card: 31954000.00  bytes: 526  deg: 4  resp: 161052.12
    using dmeth: 8  #groups: 4
    Cost per ptn: 49.68  #ptns: 4
    hash_area: 16384 (max=16384) buildfrag: 5530  probefrag: 524636  ppasses: 1
      buildfrag: 5530  probefrag: 524636  passes: 1
  Hash join: Resc: 585106.45  Resp: 162524.05  [multiMatchCost=0.00]
---- cost(Hash Join) = 162524.05 (w/o dist), 162755.25 (w/ dist)
---- cost BROADCAST-LEFT = 7.22
---- cost with slave mapping  =   Outer table:  MASTER  Alias: M
    resc: 5120.11  card 4118000.00  bytes: 32  deg: 4  resp: 1422.25
  Inner table:  DETAIL  Alias: D
    resc: 579787.63  card: 31954000.00  bytes: 526  deg: 4  resp: 161052.12
    using dmeth: 16  #groups: 4
    Cost per ptn: 49.68  #ptns: 4
    hash_area: 16384 (max=16384) buildfrag: 5530  probefrag: 524636  ppasses: 1
      buildfrag: 5530  probefrag: 524636  passes: 1
  Hash join: Resc: 585106.45  Resp: 162524.05  [multiMatchCost=0.00]
---- cost(Hash Join) = 162524.05 (w/o dist), 162526.86 (w/ dist)</pre>
<p>Since the cost are (correctly) computed, the query optimizer is able to choose the optimal plan. However, it would be nice to have the actual costs in the execution plans.</p>
]]></content:encoded>
			<wfw:commentRss>http://www.antognini.ch/2010/01/does-the-query-optimizer-cost-px-distribution-methods/feed/</wfw:commentRss>
		<slash:comments>13</slash:comments>
		</item>
		<item>
		<title>System Managed Extent Size &#8211; 11g Improvements</title>
		<link>http://www.antognini.ch/2009/08/system-managed-extent-size-11g-improvements/</link>
		<comments>http://www.antognini.ch/2009/08/system-managed-extent-size-11g-improvements/#comments</comments>
		<pubDate>Wed, 12 Aug 2009 13:20:48 +0000</pubDate>
		<dc:creator>Christian Antognini</dc:creator>
				<category><![CDATA[11gR1]]></category>
		<category><![CDATA[Parallel Processing]]></category>

		<guid isPermaLink="false">http://antognini.ch/?p=537</guid>
		<description><![CDATA[Oracle Database provides two extent management options: locally managed and dictionary managed. Today, when creating a new database, most DBAs use the former for all tablespaces. This is a good thing. What is less obvious is the choice between uniform extent size and system managed extent size. Both of these options have pros and cons. [...]]]></description>
			<content:encoded><![CDATA[<p>Oracle Database provides two extent management options: l<em>ocally managed</em> and <em>dictionary managed</em>. Today, when creating a new database, most DBAs use the former for all tablespaces. This is a good thing. What is less obvious is the choice between <em>uniform extent size</em> and <em>system managed extent size</em>. Both of these options have pros and cons. The tendency is to use uniform extent size only when the size of the segments to be stored in a tablespace is (well) known. Otherwise, it is better to delegate the choice of an “optimal” extent size to the database engine. In fact, the aim of system managed extent size is to adjust the size of the extents to the size of the segments to which they are associated. In other words, a large segment should have large extents, a small segment should have small extents. In this way it should be possible to avoid both fragmentation and very high number of extents. To do so, the size of the extents is normalized to a limited number of sizes: 64KB, 1MB, 8MB or 64MB.</p>
<p>Now, let’s discuss two enhancements that have been (silently?) introduced in 11.1.0.7… Yes, in 11.1.0.6 they are not available!</p>
<p>But, before that, let’s create a tablespace and a view that will be used for the tests I&#8217;ll show you&#8230;</p>
<pre>SQL> CREATE TABLESPACE test
  2  DATAFILE '/u01/oradata/DBM11106/test01.dbf' SIZE 1G
  3  BLOCKSIZE 8K
  4  EXTENT MANAGEMENT LOCAL AUTOALLOCATE
  5  SEGMENT SPACE MANAGEMENT AUTO;

SQL> CREATE OR REPLACE VIEW v AS
  2  SELECT bytes, extent_id-lag(extent_id,1,-1) OVER (ORDER BY extent_id) AS extents
  3  FROM (
  4    SELECT bytes, extent_id, decode(bytes,lead(bytes) OVER (ORDER BY extent_id),0,1) AS last
  5    FROM user_extents
  6    WHERE segment_name = 'T'
  7  )
  8  WHERE last = 1
  9  ORDER BY extent_id;</pre>
<p><strong>Improvement #1</strong></p>
<p>Up to 11.1.0.6, INITIAL (the attribute that can be specified in the storage clause) is only partially considered when creating a new segment. In fact, as the following example shows, even if a segment of 1MB is created, sixteen extents of 64KB are associated to it.</p>
<pre>SQL> CREATE TABLE t (id NUMBER, pad VARCHAR2(1000))
  2  TABLESPACE test
  3  STORAGE (INITIAL 1M);

SQL> SELECT * FROM v;

     BYTES    EXTENTS
---------- ----------
     65536         16</pre>
<p>One might think that INITIAL is not considered at all. However, it is easy to show that it’s actually used. To do so it&#8217;s sufficient to create a segment of 2MB. As the following example shows, two extents of 1MB are associated to it. In other words, even if the value of INITIAL is considered, the size of the first extent is not directly derived from it.</p>
<pre>SQL> CREATE TABLE t (id NUMBER, pad VARCHAR2(1000))
  2  TABLESPACE test
  3  STORAGE (INITIAL 2M);

SQL> SELECT * FROM v;

     BYTES    EXTENTS
---------- ----------
   1048576          2</pre>
<p>As of 11.1.0.7, however, the size of the first extent is directly derived from INITIAL. The only and obvious limitation is that the extent size is normalized to either 64KB, 1MB, 8MB or 64MB. To do so the database engine uses the extent size which is equal or smaller to INITIAL. Therefore, when the same SQL statements as before are executed, for a segment of 1MB only one extent of 1MB is associated to it.</p>
<pre>SQL> CREATE TABLE t (id NUMBER, pad VARCHAR2(1000))
  2  TABLESPACE test
  3  STORAGE (INITIAL 1M);

SQL> SELECT * FROM v;

     BYTES    EXTENTS
---------- ----------
   1048576          1</pre>
<p>A curios side effect of this new behavior is that it’s much common to see segments for which the first extent is larger than the second one. Such situations are not new… but, they are much more common than before. The following example illustrates this. Notice that, for a 10MB segment, the first extent is 8MB and the subsequent two are only 1MB each.</p>
<pre>SQL> CREATE TABLE t (id NUMBER, pad VARCHAR2(1000))
  2  TABLESPACE test
  3  STORAGE (INITIAL 10M);

SQL> SELECT * FROM v;

     BYTES    EXTENTS
---------- ----------
   8388608          1
   1048576          2</pre>
<p>In summary, as of 11.1.0.7, if you set INITIAL to either 64KB, 1MB, 8MB or 64MB, the first extent will have the specified size. So, if you know what the size of an segment will be, it is not bad to provide a sensible value through INITIAL.</p>
<p><strong>Improvement #2</strong></p>
<p>As you might guess, the second improvement is related to another attribute that can be specified in the storage clause: NEXT. The improvement, however, is a not what you might expect in first place. In fact, NEXT is not simply used to specify the size of any extent allocated after the first one. According to my observations NEXT is only used when a parallel load is performed.</p>
<p>Up to 11.1.0.6, when a parallel load is performed, every parallel slave allocates extents from a temporary segment and loads the data into it. In other words, the parallel slaves do not modify the target table. Then, when the transaction is committed, the temporary segment is merged to the target table. The following example illustrates this behavior. Notice three things. First, NEXT is set to 1MB. Second, the insert loads about 10MB of data. Third, I forced the utilization of two parallel slaves. Since each parallel slave allocated its own extents, every parallel slave allocates sixteen 64KB extents, four 1MB extents and one 832KB extent (for a total of about 6MB per parallel slave). Simply put, the extent allocation for the temporary segment follows the rules we are used to.</p>
<pre>SQL> CREATE TABLE t (id NUMBER, pad VARCHAR2(1000))
  2  TABLESPACE test
  3  STORAGE (NEXT 1M);

SQL> SELECT * FROM v;

     BYTES    EXTENTS
---------- ----------
     65536          1

SQL> ALTER SESSION FORCE PARALLEL DML PARALLEL 2;

SQL> INSERT INTO t
  2  SELECT rownum, rpad('*',1000,'*')
  3  FROM dual
  4  CONNECT BY level <= 10000;

SQL> SELECT bytes, extent_id-lag(extent_id,1,-1) OVER (ORDER BY extent_id) AS extents
  2  FROM (
  3    SELECT bytes, extent_id, decode(bytes,lead(bytes) OVER (ORDER BY extent_id),0,1) AS last
  4    FROM user_extents
  5    WHERE segment_type = 'TEMPORARY'
  6  )
  7  WHERE last = 1
  8  ORDER BY extent_id;

     BYTES    EXTENTS
---------- ----------
     65536         16
   1048576          4
    851968          1
     65536         16
   1048576          4
    851968          1

SQL> COMMIT;

SQL> SELECT * FROM v;

     BYTES    EXTENTS
---------- ----------
     65536         17
   1048576          4
    851968          1
     65536         16
   1048576          4
    851968          1</pre>
<p>As of 11.1.0.7, NEXT is used for sizing the extents of the temporary segment. As a result, if you know that are loading a lot of data, you can avoid the small extents (that might lead to suboptimal performance when the data will be accessed…). Let’s re-execute the same SQL statement as before to see what the difference is. Notice that the only extent of 64KB is the original one that was initially associated to the table.</p>
<pre>SQL> CREATE TABLE t (id NUMBER, pad VARCHAR2(1000))
  2  TABLESPACE test
  3  STORAGE (NEXT 1M);

SQL> SELECT * FROM v;

     BYTES    EXTENTS
---------- ----------
     65536          1

SQL> ALTER SESSION FORCE PARALLEL DML PARALLEL 2;

SQL> INSERT INTO t
  2  SELECT rownum, rpad('*',1000,'*')
  3  FROM dual
  4  CONNECT BY level <= 10000;

SQL> SELECT bytes, extent_id-lag(extent_id,1,-1) OVER (ORDER BY extent_id) AS extents
  2  FROM (
  3    SELECT bytes, extent_id, decode(bytes,lead(bytes) OVER (ORDER BY extent_id),0,1) AS last
  4    FROM user_extents
  5    WHERE segment_type = 'TEMPORARY'
  6  )
  7  WHERE last = 1
  8  ORDER BY extent_id;

     BYTES    EXTENTS
---------- ----------
   1048576          5
    786432          1
   1048576          5
    786432          1

SQL> COMMIT;

SQL> SELECT * FROM v;

     BYTES    EXTENTS
---------- ----------
     65536          1
   1048576          5
    786432          1
   1048576          5
    786432          1</pre>
<p>In summary, as of 11.1.0.7, with NEXT you can control the sizing of the temporary extents used during parallel loads.</p>
]]></content:encoded>
			<wfw:commentRss>http://www.antognini.ch/2009/08/system-managed-extent-size-11g-improvements/feed/</wfw:commentRss>
		<slash:comments>17</slash:comments>
		</item>
		<item>
		<title>Bloom Filters</title>
		<link>http://www.antognini.ch/2008/09/bloom-filters/</link>
		<comments>http://www.antognini.ch/2008/09/bloom-filters/#comments</comments>
		<pubDate>Fri, 26 Sep 2008 16:24:13 +0000</pubDate>
		<dc:creator>Christian Antognini</dc:creator>
				<category><![CDATA[10gR2]]></category>
		<category><![CDATA[11gR1]]></category>
		<category><![CDATA[Parallel Processing]]></category>
		<category><![CDATA[Partitioning]]></category>
		<category><![CDATA[Query Optimizer]]></category>

		<guid isPermaLink="false">http://antognini.ch/?p=50</guid>
		<description><![CDATA[Oracle Database, as of 10g Release 2, uses bloom filters in various situations. Unfortunately, no information about their usage is available in Oracle documentation. For this reason, I decided to write a paper to explain not only what bloom filters are, but also, and foremost, to describe how Oracle Database makes use of them. Specifically, [...]]]></description>
			<content:encoded><![CDATA[<p>Oracle Database, as of 10g Release 2, uses bloom filters in various situations. Unfortunately, no information about their usage is available in Oracle documentation. For this reason, I decided to write a paper to explain not only what bloom filters are, but also, and foremost, to describe how Oracle Database makes use of them. Specifically, to explain how the database engine uses bloom filters to reduce data communication between slave processes in parallel joins and to implement join-filter pruning.</p>
<p>Originally, I wrote this paper for the <a href="http://www.ioug.org/">IOUG</a> Select Journal. Even if I wrote it last June, I wanted to receive the printed copies before putting it online. Today a packet with five copies of the Q4/2008 issue and a polo shirt arrived&#8230; Hence, it is now available online as well.</p>
<p>If you don&#8217;t have access to Select Journal, you can download it from <a href="/publications/">this page</a>.</p>
]]></content:encoded>
			<wfw:commentRss>http://www.antognini.ch/2008/09/bloom-filters/feed/</wfw:commentRss>
		<slash:comments>11</slash:comments>
		</item>
	</channel>
</rss>

