Index scan with proper values & TS scan with host variab

kanha · New User Joined: 05 Nov 2005 Posts: 28

All,

The below query behaves weird. when i explain the query giving a proper value for T1.UPD_DT as below, index scan happens for all the table involved. But when I explain using just :H in stead, this does TS scan for 2 tables.

RUNSTATS has been done but no better result.

Please advise whether REOPT(ALWAYS) bind option would replace the actual value in the host variable so that index scan happens.

The tables are large & one table4 is approx. 40 million rows.

SELECT T2.CSN_ID AS CSN_ID,
COALESCE(T4.VNDR_STYLE,'NO STYLE') AS VNDR_STYLE,
SUM(T2.UPC_SHIP_QTY) AS ASN_UNITS
FROM TABLE1 T1
INNER JOIN
TABLE2 T2
ON T2.ASN_NBR = T1.ASN_NBR AND
T2.PO_NBR = T1.PO_NBR AND
T2.PO_STR_NBR = T1.PO_STR_NBR
INNER JOIN
TABLE T3
ON T2.CSN_ID = T3.CSN_ID
LEFT OUTER JOIN
TABLE4 T4
ON T2.COMP_UPC = T4.UPC
WHERE
T1.UPD_DT <= CURRENT_TIMESTAMP AND
T1.UPD_DT > '2011-05-24-13.00.18.118159'
GROUP BY T2.CSN_ID,
T4.VNDR_STYLE
WITH UR

dbzTHEdinosauer · Posted: Sat Sep 17, 2011 3:29 pm

well,
since you can not be bothered to read the manuals
or do a google search on something like:

effect on db2 performance when host variables vs literals

i will quote the manual for you: (link here)

kanha · New User Joined: 05 Nov 2005 Posts: 28

Dick,

Thanks for your response.

This static query as part of a cursor is in an application program rnning in production.

As the manual says, with values in the host variable, we might see a diff. access path. In this case, with values access path is better. So, can we force the optimizer to choose access path using values with REOPT(ALWAYS) option.

Or is any better way is der to force optimizer in this case???

TS scan is killing a lot of CPU.

dbzTHEdinosauer · Posted: Sun Sep 18, 2011 3:29 pm

REOPT(ALWAYS) adds extra processing time. Manual states the host values are substituted.

OPTHINT is another way, but I would not suggest it.
(you will have to do your own research to see how to implement OPTHINT)

you could use OPTIMIZE FOR N ROWS,
which would speed up your return time.

you could use ROWSET retrieval to decrease the number of db2 CALLs for rows.

this part of the predicate:
T1.UPD_DT > '2011-05-24-13.00.18.118159'

instead of a host variable,
is there not a basic range that you could supply that would always be valid?
e.g. CURRENT_TIMESTAMP - 6 MONTHS
what does this date represent?
are your totals based on accounting time periods,
it appears you want totals shipped within a time period.
granted, you would be forced to add UPD_DT to your SELECT and GROUP BY
and then filter out the results in your code

Why do you need:
T1.UPD_DT <= CURRENT_TIMESTAMP
do you have UPD_DT values that are in the future?

you have a GROUP BY because you want DB2 to SUM the T2.UPC_SHIP_QTY.
in fact, this particular SQL is designed so that you can avoid accumulating the T2.UPC_SHIP_QTY
in your code.

the GROUP BY causes a SORT.
Why not get rid of the SUM,
add UPD_DT to the SELECT
use UPD_DT > CURRENT_TIMESTAMP - 6 MONTHS (or whatever number of months, days,minutes,hours,seconds) to filter out the obviously too old
use a simple ORDER BY
use ROWSET retrieval
discard rows retrieved in your module based on UPD_DT appropriate starting date
accumulate in you code?

kanha · New User Joined: 05 Nov 2005 Posts: 28

Thanks Dick for your detailed response.

Will look for ways to tune the query further but as of now, I think REOPT(ALWAYS) can readily be implemented because it will save significant CPU by eliminating TS scan on 40 million row table.

Although it will add a little extra processing time, it will still perform much better.

Let me know your thoughts.

dick scherrer · Posted: Mon Sep 19, 2011 8:04 pm

Hello,

Suggest you try it and post what you observe. . .