Hi,
I am working on performance tuning of a DB2 query which is consuming a lot of CPU resources. The query is selecting a huge number of records from multiple tables using JOIN and UNION. The query is basically an unload which is a 5 table join with a correlated sub-query existence check, which is then UNIONed to the same 5 table join with a similar non-existence check. Here is how the query looks like:
Code:
SELECT A BUNCH OF COLUMNS,
CHAR('N')
FROM
TABLE1 A,
TABLE2 B,
TABLE3 C,
TABLE4 D,
TABLE5 E
WHERE A.COL1 = B.COL1
AND C.COL2 = D.COL2
AND A.COL3 = C.COL3
AND A.COL3 = E.COL3
AND E.COL4 = '0200'
AND D.COL5 = 02
AND A.COL6 IN (4,5)
AND A.COL7 = 'R'
AND E.COL8 IN('01','02')
AND EXISTS (SELECT '1'
FROM TABLE6 F
WHERE F.COL1 = A.COL1
AND F.COL3 = 0
AND F.COL9 = 0
AND F.COL10 = '074'
AND F.COL11 = 02)
UNION
SELECT SAME BUNCH OF COLUMNS,
CHAR('Y')
FROM
TABLE1 A,
TABLE2 B,
TABLE3 C,
TABLE4 D,
TABLE5 E
WHERE A.COL1 = B.COL1
AND C.COL2 = D.COL2
AND A.COL3 = C.COL3
AND A.COL3 = E.COL3
AND E.COL4 = '0200'
AND D.COL5 = 02
AND A.COL6 IN (4,5)
AND A.COL7 = 'R'
AND E.COL8 IN('01','02')
AND NOT EXISTS (SELECT '1'
FROM TABLE6 F
WHERE F.COL1 = A.COL1
AND F.COL3 = 0
AND F.COL9 = 0
AND F.COL10 = '074'
AND F.COL11 = 02)
Currently, this query is basically resolving the 5 table join twice. So to reduce the time taken by the query, can we replace the UNION somehow and still populate all the rows with N at the end for rows which satisfy the existence check and Y for the rows which do not.
I tried using CASE instead of UNION but surprisingly it took almost same CPU time as the above query. Here is how I coded the above query using CASE:
Code:
SELECT A BUNCH OF COLUMNS
CASE WHEN EXISTS
(SELECT '1'
FROM TABLE6 F
WHERE F.COL1 = A.COL1
AND F.COL3 = 0
AND F.COL9 = 0
AND F.COL10 = '074'
AND F.COL11 = 02) THEN CHAR ('N')
ELSE CHAR ('Y')
END
FROM
TABLE1 A,
TABLE2 B,
TABLE3 C,
TABLE4 D,
TABLE5 E
WHERE A.COL1 = B.COL1
AND C.COL2 = D.COL2
AND A.COL3 = C.COL3
AND A.COL3 = E.COL3
AND E.COL4 = '0200'
AND D.COL5 = 02
AND A.COL6 IN (4,5)
AND A.COL7 = 'R'
AND E.COL8 IN('01','02')
The result set was same with both the queries (approx. 3.2 million), but it still did not solve my problem with the CPU consumption. So, can anyone help me figure out a better way to achieve this, if at all that is possible.
SELECT SAME BUNCH OF COLUMNS,
case when lookup.col1 is null then 'N' else 'Y' end
FROM
TABLE1 A,
TABLE2 B,
TABLE3 C,
TABLE4 D,
TABLE5 E
left join (SELECT F.col1
FROM TABLE6 F
WHERE F.COL1 = A.COL1
AND F.COL3 = 0
AND F.COL9 = 0
AND F.COL10 = '074'
AND F.COL11 = 02) lookup
on lookup.COL1 = A.COL1
AND lookup.COL3 = 0
AND lookup.COL9 = 0
AND lookup.COL10 = '074'
AND lookup.COL11 = 02
WHERE A.COL1 = B.COL1
AND C.COL2 = D.COL2
AND A.COL3 = C.COL3
AND A.COL3 = E.COL3
AND E.COL4 = '0200'
AND D.COL5 = 02
AND A.COL6 IN (4,5)
AND A.COL7 = 'R'
AND E.COL8 IN('01','02')
However, reduction of CPU is not only depending on the efficient query but majorly depends on the Indexes, runstats, reorg, BF size, bind options and many more to list down. Being said that did you run explaain and what did you find ?
Also, if you have a load files of these tables everyday then you can achieve the same via SORT instead of unload.
Hi Rohit,
Thanks for the query, I tried running this but got a couple of issues, first it gave me this error:
Code:
DSNT408I SQLCODE = -206, ERROR: A.COL1 IS NOT VALID IN THE CONTEXT WHERE IT IS USED
DSNT418I SQLSTATE = 42703 SQLSTATE RETURN CODE
DSNT415I SQLERRP = DSNXORSO SQL PROCEDURE DETECTING ERROR
DSNT416I SQLERRD = -225 0 0 -1 4039 802 SQL DIAGNOSTIC INFORMATION
I thought it was because of the fact that the query after LEFT JOIN was referring to A.COL1 but A was declared outside, so it gave the error. I declared the table inside the lookup query
Code:
left join (SELECT F.col1
FROM TABLE6 F,
TABLE1 G
WHERE F.COL1 = G.COL1
AND F.COL3 = 0
AND F.COL9 = 0
AND F.COL10 = '074'
AND F.COL11 = 02) lookup
on lookup.COL1 = A.COL1
AND lookup.COL3 = 0
AND lookup.COL9 = 0
AND lookup.COL10 = '074'
AND lookup.COL11 = 02
I changed the query as above but this time got the same error for LOOKUP this time::
Code:
DSNT408I SQLCODE = -206, ERROR: LOOKUP.COL3 IS NOT VALID IN THE CONTEXT WHERE IT IS USED
DSNT418I SQLSTATE = 42703 SQLSTATE RETURN CODE
DSNT415I SQLERRP = DSNXORSO SQL PROCEDURE DETECTING ERROR
DSNT416I SQLERRD = -600 0 0 -1 4111 802 SQL DIAGNOSTIC INFORMATION
DSNT416I SQLERRD = X'FFFFFDA8' X'00000000' X'00000000' X'FFFFFFFF' X'0000
INFORMATION
Joined: 20 Oct 2006 Posts: 6966 Location: porcelain throne
adilkhan4all,
have you thought of simply unloading all five tables,
and using your sort product to do the joins and unions?
you will use less computer resources than when you attempt a
5 (large) table join.
i have found that offloading a lot of this esoteric nonsense to pure batch
(sort/cobol programs) is a lot faster than sophisticted sql involving many rows.
Sorry , I could not notice what I suggested but try sometging below. However, Dick and I suggested another better option to look for.
Code:
SELECT SAME BUNCH OF COLUMNS,
case when lookup.col1 is null then 'N' else 'Y' end
FROM
TABLE1 A,
TABLE2 B,
TABLE3 C,
TABLE4 D,
TABLE5 E
left join (SELECT F.col1
FROM TABLE6 F
WHERE
F.COL3 = 0
AND F.COL9 = 0
AND F.COL10 = '074'
AND F.COL11 = 02) lookup
on lookup.COL1 = A.COL1
WHERE A.COL1 = B.COL1
AND C.COL2 = D.COL2
AND A.COL3 = C.COL3
AND A.COL3 = E.COL3
AND E.COL4 = '0200'
AND D.COL5 = 02
AND A.COL6 IN (4,5)
AND A.COL7 = 'R'
AND E.COL8 IN('01','02')