Sql Performance Tuning
In this white paper, we will review some of the more common SQL problems; however,
there are many more SQL performance tips beyond what’s described in this paper.
Also just like all guidelines, each of these have some notable
exceptions.
1.
Verify that the appropriate statistics are provided
The most important resource to the DB2 optimizer, other than the SELECT statement itself, is the statistics found within the DB2 catalog. The optimizer uses these statistics to base many of its decisions. The main reason the DB2 optimizer may choose a non-optimal access path for a query is due to either invalid or missing the statistics. The DB2 optimizer uses the following catalog statistics:
The most important resource to the DB2 optimizer, other than the SELECT statement itself, is the statistics found within the DB2 catalog. The optimizer uses these statistics to base many of its decisions. The main reason the DB2 optimizer may choose a non-optimal access path for a query is due to either invalid or missing the statistics. The DB2 optimizer uses the following catalog statistics:
DB2 Catalog Table
|
Columns considered by optimizer
|
SYSIBM.SYSCOLDIST
|
|
CARDF
|
|
COLGROUPCOLNO
|
|
COLVALUE
|
|
FREQUENCYF
|
|
NUMCOLUMNS
|
|
TYPE
|
|
SYSIBM.SYSCOLSTATS
|
|
COLCARD
|
|
HIGHKEY
|
|
HIGH2KEY
|
|
LOWKEY
|
|
LOW2KEY
|
|
SYSIBM.SYSCOLUMNS
|
|
COLCARDF
|
|
HIGH2KEY
|
|
LOW2KEY
|
|
SYSIBM.SYSINDEXES
|
|
CLUSTERING
|
|
FIRSTKEYCARDF
|
|
NLEAF
|
|
NLEVELS
|
|
CLUSTERATIO
|
|
CLUSTERRATIOF
|
|
SYSIBM.SYSINDEXPART
|
|
LIMITKEY
|
|
SYSTEM.SYSTABLES
|
|
CARDF
|
|
EDPROC
|
|
NPAGESF
|
|
PCTPAGES
|
|
PCTROWCOMP
|
|
SYSIBM.SYSTABLESPACE
|
|
NACTIVEF
|
|
SYSIBM.SYSTABSTATS
|
|
NPAGES
|
Figure 2: Columns
recognized by
the DB2 optimizer and used to determine
the access path
Often, executing
the RUNSTATS command (which
is used to update the DB2 catalog
statistics) gets overlooked, particularly in a busy production environment. To minimize the impact of executing the RUNSTATS
command, consider using the sampling technique. Sampling with even as little
as 10% is ample. In addition to the statistics updated
by the RUNSTATS command, DB2 gives you the ability to update an additional 1,000 entries
for non-uniform distribution statistics. Beware
that each entry added
increases BIND time for all queries
referencing that column.
How do you know if you are missing
the statistics? You can manually
execute queries against
the catalog or use tools that provide this functionality. Currently, the DB2 optimizer does not externalize warnings for missing
the statistics.
2. Promote Stage
2 & Stage 1
Predicates if Possible
Either the Stage 1 Data Manager or the Stage 2 Relational Data Server will process every query. There are tremendous performance benefits to be gained when your query can be processed as Stage 1 rather than Stage 2. The predicates used to qualify your query will determine whether your query can be processed in Stage 1. In addition, each predicate is
Either the Stage 1 Data Manager or the Stage 2 Relational Data Server will process every query. There are tremendous performance benefits to be gained when your query can be processed as Stage 1 rather than Stage 2. The predicates used to qualify your query will determine whether your query can be processed in Stage 1. In addition, each predicate is
evaluated to determine whether that predicate is eligible for index access. There are some predicates
that
can never be processed as Stage
1 or never eligible
for an index. It’s important
to understand if your query is indexable and can be processed as Stage 1. The following are the
documented Stage 1 or Sargable
predicates:
Figure 3: Table
used to determine predicate
eligibility
There are a few more predicates that are not documented as Stage 1, because
they are not always Stage 1.
Join table sequence
and query rewrite
can also affect which
stage a predicate can be filtered.
Let’s examine some example queries
to see the effect of rewriting your SQL.
Example 1: Value
BETWEEN COL1 AND COL1
Any predicate type that is not identified as Stage 1 is Stage 2. This predicate as written is a
Stage 2 predicate.
However, a rewrite can promote this query
to Indexable Stage 1.
Value >=
COL1 AND value <=
COL2
This means that the optimizer may choose to use the predicates in a matching index access against multiple indexes. Without
the rewrite,
the predicate remains
as Stage 2.
6
Example 2: COL3 NOT IN (K,S,T)
Non-indexable Stage 1 predicates should also be rewritten,
if possible. For example, the above condition
is Stage 1, but not indexable. The list of values in parentheses identifies what COL3 cannot be equal to.
To determine the feasibility of the rewrite, identify
the list of what COL3 can be equal to. The longer
and more volatile the list, the less feasible this is. If the opposite of
(K, S, T) is less than 200 fairly static values,
the rewrite is worth
the extra typing. This
promotes the Stage 1 condition to Indexable Stage
1, which provides the optimizer with another
matching index choice. Even if a supporting
index is not available at BIND time, the rewrite will ensure
the query will be eligible for index access,
should an index be created
in the future. Once an index is created that incorporates COL3, a rebind of the transaction may possibly gain matching index access, where the old predicate would have no impact on rebind.
3.
SELECT only the columns needed
Every column that is selected has to be individually handed back to the calling
program, unless there is a precise match to the entire DCLGEN definition. This may lean you towards requesting all columns,
however, the real harm occurs when a sort is required. Every SELECTed column, with the sorting
columns repeated, makes up the width
of the sort work file wider.
The wider and longer the file, the slower the sort is. For example, 100,000 four-byte rows can be sorted
in approximately one second. Only
10,000 fifty-byte rows can be sorted
in the same time. Actual times will vary depending on hardware.
The exception to the rule,
“Disallow SELECT *”, would
be when several processes require
different parts of a table’s row. By combining the transactions, the whole row is retrieved once, and then the parts are
uniquely processed.
4.
SELECT only the rows needed
The less rows retrieved, the faster the query will run.
Each qualifying row has to make it through the
long journey from storage, through the buffer pool, Stage 1, Stage 2, possible sort and translations, and then deliver the result set to the calling
program. The database manager
should do all data filtering; it is very wasteful
to retrieve a row, test that row in the program code and then filter out that row. Disallowing program filtering is a hard rule to enforce.
Developers can choose
to use program code to perform all or some data manipulation
or they
can choose SQL. Typically there is a mix. The tell tale sign that filtering
can be pushed into the
DB2 engine is a program
code resembling:
IF TABLE-COL4 > :VALUE
GET NEXT RESULT ROW
5. Use constants and literals
if the values will not change
in the next 3 years (for static queries)
The DB2 Optimizer has the full use of all the non-uniform distribution statistics, and the various
domain range values for any column statistics provided when no host variables are
detected in a predicate, (WHERE COL5 > ‘X’). The purpose
of a host variable is to make a transaction adaptable to a changing
variable; this is most often used when a user is required
to enter this value. A host
variable eliminates
the need to rebind a program each time this variable changes. This extensibility comes at a cost of the optimizer accuracy. As soon as host
variables are detected, (WHERE
COL5 > :hv5),
the optimizer uses the following chart to
estimate the filter factors,
instead of using the catalog
statistics:
COLCARDF
|
FACTOR FOR
<, <=,
>, >=
|
FACTOR FOR
LIKE AND BETWEEN
|
>=
100,000,000
|
1/10,000
|
3/100,000
|
>=
10,000,000
|
1/3,000
|
1/10,000
|
>= 1,000,000
|
1/1,000
|
3/10,000
|
>= 100,000
|
1/300
|
1/1,000
|
>= 10,000
|
1/100
|
3/1,000
|
>= 1,000
|
1/30
|
1/100
|
>= 100
|
1/10
|
3/100
|
>= 0
|
1/3
|
1/10
|
Figure 4: Filter
Factors
The higher
the cardinality of the column, the lower
the predicated filter
factor (fraction of rows predicted
to remain). Most of the time the estimate leans the optimizer towards an appropriate access path. Sometimes, however, the predicated filter factor
is far from reality. This is when access path tuning is usually necessary.
6.
Make numeric and date data types match
Stage 1 processing has been very strict in prior releases
about processing
predicate compares where
the datatype lengths
vary. Prior to DB2 v7, this mismatch
led to the predicate
being demoted to stage 2
processing. However, a new feature in DB2 v7 allows
numeric datatypes to be manually
cast to avoid this stage 2 demotion.
ON DECIMAL(A.INTCOL,
7, 0) = B.DECICOL ON A.INTCOL = INTEGER(B.DECICOL)
If both columns are indexed, cast the column belonging to the larger
result set. If only one column is indexed, cast the partner.
A rebind is necessary to receive the promotion to Stage 1.
Tip #7: Sequence filtering from most restrictive to least restrictive by table, by predicate
type
When writing a SQL statement
with multiple predicates, determine the predicate that will filter
out the most data from the result
set and place that predicate at the start of the list. By
sequencing your predicates in this manner, the subsequent predicates will have less data to
filter.
The DB2 optimizer by default will categorize your predicate and process that predicate in the condition order listed below. However,
if your query presents
multiple predicates that fall into the same category,
these predicates will be executed
in the order that they are written.
This is why it is important to
sequence your predicates, placing the predicate with the most filtering
at the top of the sequence.
Eventually query rewrite
will take care of this in future
releases, but today this is something
to be aware of when writing your queries.
Category
|
Condition
|
||
Stage 1 and Indexable
|
|||
=, IN
(Single Value)
|
|||
Range conditions
|
|||
LIKE
|
|||
Noncorrelated
subqueries
|
|||
Stage 1 and On Index (index
|
|||
screening)
|
|||
=, IN
(Single Value)
|
|||
Range conditions
|
|||
LIKE
|
|||
Stage 1 on
data page rows that are
|
|||
ineligible
for prior categories
|
|||
=, IN (Single Vlaue)
|
|||
Range conditions
|
|||
LIKE
|
|||
Stage 2 on
either index or data that
|
|||
are ineligible
for prior categories
|
|||
=, IN
(Single Value)
|
|||
Range conditions
|
|||
LIKE
|
|||
Noncorrelated
subqueries
|
|||
Correlated
subqueries
|
Figure 5: Predicate Filtering Sequence
The order
of predicate filtering is mainly dependent
on the join sequence,
join method, and index
selection. The order the predicates physically appear
in the statement only come into play when there is a tie with one of the above listed categories. For example, the following statement has a tie in
the category range conditions:
WHERE A.COL2
= ‘abracadabra’ AND A.COL4
> 999
AND A.COL3
> :hvcol3
AND A.COL5
LIKE ‘%SON’
The most restrictive
condition should be listed
first, so that extra processing of the second
condition can be eliminated.
8. Prune SELECT lists
Every column
that is SELECTed
consumes resources for processing. There
are several areas that can
be examined to determine if column selection is really necessary.
Example 1:
WHERE (COL8 = ‘X’)
If a SELECT contains
a predicate where a column is equal to one value,
that column should not have to be retrieved for each row, the value will always be ‘X’.
Example 2: SELECT COLA,COLB
,COLC
ORDERY
BY COLC
DB2 no longer requires
selection of a column simply to do a sort. Therefore
in this example, COLC
does not require selection if the end user does not need that value.
Remove items from the SELECT list to
prevent unnecessary processing. It is no longer required
to SELECT columns used in the ORDER BY
or GROUP BY clauses.
9. Limit
Result Sets with Known Ends
The FETCH FIRST
n ROWS ONLY clause
should be used if there are a known,
maximum number of
rows that will be FETCHed
from a result set. This clause limits the number of rows returned
from a result set by invoking
a fast implicit close. Pages are quickly released
in the buffer pool when the nth result row has been processed. The OPTIMIZE
FOR n ROWS clause does not invoke a fast implicit
close and will keep locking
and fetching until the cursor
is implicitly or explicitly closed.
In contrast,
FETCH FIRST n ROWS ONLY will not allow
the n+1 row to be FETCHed and results in an SQLCODE = 100. Both clauses optimize the same
if n
is the same.
Existence checking should be handled
using:
SELECT 1
INTO :hv1
FROM TABLEX
WHERE ….. existence
check …. FETCH FIRST 1 ROW ONLY
10. Analyze and Tune Access
Paths
Use EXPLAIN or tools that interpret EXPLAIN output, to verify that the access path is appropriate for the required processing. Check the access path of the each query by
binding against production statistics in a production–like subsystem. Bufferpool, RID pool, sort pool, and LOCKMAX thresholds should also resemble the production environment. Oversized
RID pools in the test environment will mask RID pool
failures in production.
RID pool failures can occur during
List Prefetch, Multiple
Index Access, and Hybrid Join Type N access paths. RID pool failures result
in a full table scan.
Tune queries using a technique that will withstand
future smarter optimization and query rewrite. Typical query tuning may include using one or more of the following techniques:
–OPTIMIZE FOR n ROWS
–FETCH
FIRST n ROWS ONLY
–No Operation (+0,
-0, /1, *1, CONCAT ‘ ‘)
–ON 1=1
–Bogus Predicates
–Table expressions with DISTINCT
–REOPT(VARS)
–Index Optimization
All these techniques impact access path selection. Compare estimated costs of multiple scenarios to verify the success
of the tuning effort.
The goal of a tuning effort should
be refined access
paths and optimized index design. This is an
ongoing task that should be proactively initiated
when any of the following
occur:
• Increases in the number of DB2 objects
• Fluctuations in the size of DB2 objects
• Increases in the use of dynamic SQL
• Fluctuations of transaction rates
• Migrations
The Solution
Quest
Central for DB2 is an integrated console providing core functionality a DBA needs to perform their daily tasks of Database Administration, Space Management, SQL Tuning and Analysis, and Performance Diagnostic Monitoring. Quest Central
for DB2 was written by DB2 software
experts and provides rich functionality utilizing a graphical user interface. The product supports
DB2 databases running on the mainframe, Unix, Linux, and Windows. No
longer
are DB2 customers
required to maintain
and utilize separate
tools for their mainframe
and distributed DB2 systems.
The SQL Tuning component of Quest Central
provides the most complete SQL tuning environment for DB2 on the market. This environment consists of:
1. Tuning Lab – a facility where a single
SQL statement can be modified multiple
times, through use of scenarios. These scenarios can then be compared to immediately determine
which
SQL statement provided the most efficient
access path.
2. Compare – immediately see the effect
your modifications have on the performance of your SQL. By comparing multiple scenarios, you can see the effect
on the CPU, elapsed time, I/O
and many more statistics. Additionally, a compare of the data will ensure your SQL statement is returning the same subset of data.
3.
Advice
– the advice provided by the
SQL tuning component will detect all of the conditions specified in this white paper and more. In addition,
the SQL Tuning component
will even rewrite the SQL if applicable into a new scenario, incorporating the advice chosen.
4.
Access Path and Associated Statistics – All statistics applicable to the DB2 access path are displayed, in context
to the SQL. This takes the guesswork out of trying
to
understand why a particular access plan was chosen.
Quest Central for DB2’s robust
functionality can detect the above SQL tuning tips and many more. The remainder of this white
paper will demonstrate the strength and in-depth knowledge built right into Quest Central to enhance not only your SQL, but assist
with overall database
performance. Each tuning tip described above is contained right within Quest Central.
0 comments:
Post a Comment