I have explained the following query-(column names changed)
Code:
SELECT A.COLA, A.COLB, A.COLC,
A.COLD, A.COLE, A.COLF,
A.COLG, A.COLH, A.COLI,
A.COLJ, A.COLK, A.COLL,
A.COLM, A.COLN, A.COLO,
A.COLP, A.COLQ,A.COLR,
A.COLS, A.COLT,
A.COLU, A.COLC, A.COLV,
A.COLW, A.COLX, A.COLY
FROM TABLE1 A,
TABLE2 C
WHERE A.COLA = C.COLA
AND A.COLW = 'VAR'
AND A.COLX = (SELECT MAX(B.COLX)
FROM TABLE1 B
WHERE B.COLA = A.COLA
AND B.COLW = A.COLW
AND B.COLX <= CURRENT DATE)
AND C.COLX = (SELECT MAX(C1.COLX)
FROM TABLE3 C1
WHERE C1.COLA = C.COLA
AND C1.COL1 = C.COL1
AND C1.COLX <= CURRENT DATE)
AND C.COLZ = (SELECT MAX(C2.COLZ)
FROM TABLE3 C2
WHERE C2.COLA = C.COLA
AND C2.COL1 = C.COL1
AND C2.COLX = C.COLX)
AND C.COL2 IN('4', '5', '7', '8')
AND C.COL3 = 'P' ;
This is ddl from a view which is trying to access data from two tables table1 and table2.The tablespace scan is occuring for table2.
Can somebody please suggest how can it be converted to Indexed scan.I have already checked that there are proper indexes on the in where clause predicates and stats are updated.
Joined: 23 Nov 2006 Posts: 19244 Location: Inside the Matrix
Hello,
Quote:
I have already checked that there are proper indexes on the in where clause predicates and stats are updated.
This does NOT guarantee an index scan. . .
You may need to re-think the approach so that the process is driven by one table and the other table only referenced directly.
If you had used "real looking" column names instead of the alphabet soup (cola, colb, etc), more people could relate to what you want to do. Most likely several have done this or something very similar on their system.
Showing some sample data for the tables might help someone help you as well. . .
HI, Am including the "real looking" column names and query is rewritten as folows-
Code:
SELECT A.EMPID,
A.COUNTY_FM,
A.NAM_DIS,
A.INITIALS,
A.NPRE, A.NSUFF,
A.ROYPRE, A.ROYSUF, A.TITLE,
A.LAST_NAME_SRCH, A.FIRST_NAME_SRCH,
A.LAST_NAME,
A.FIRST_NAME,
A.MIDDLE_NAME,
A.SECOND_LAST_NAME,
A.SECOND_LAST_SRCH,
A.NAME_AC,
A.PREF_FIRST_NAME,
A.PARTNER_LAST_NAME,
A.PARTNER_ROY_PREFIX,
A.LAST_NAME_PREF_NLD,
A.NAM_DIS,
A.NAME_FORMAL,
A.NAM_TYP,
A.EFDT,
A.NAME
FROM PNAMES A,
PJOB C
WHERE A.EMPID = C.EMPID
AND A.NAM_TYP = 'XYZ'
AND A.EFDT = (SELECT MAX(B.EFDT)
FROM PNAMES B
WHERE B.EMPID = A.EMPID
AND B.NAM_TYP = A.NAM_TYP
AND B.EFDT <= CURRENT DATE)
AND C.EFDT = (SELECT MAX(C1.EFDT)
FROM PJOB_EPM C1
WHERE C1.EMPID = C.EMPID
AND C1.EMRCD = C.EMRCD
AND C1.EFDT <= CURRENT DATE)
AND C.EFFSEQ = (SELECT MAX(C2.EFFSEQ)
FROM PJOB_EPM C2
WHERE C2.EMPID = C.EMPID
AND C2.EMRCD = C.EMRCD
AND C2.EFDT = C.EFDT)
AND C.ESTATUS IN('A', 'L', 'P', 'S')
AND C.JOBIND = 'P' ;
edited to add the friendly code tags
Yeah, no column is getting selected for table2 that is PJOB above but certainly as they are referred in where clause predicates for references to table2.
As PJOB is only referenced then how is it possible to reference this table to covert the access to index than table , or if exists can be used then in which way?
on table PJOB C the only filtering criteria are ESTATUS and JOBIND
and maybe C.EMPID
Are you sure you have indexes on those ?
The other 2 (EFDT,EFFSEQ) are used in Correlated subquery criteria
Neither did you say how the join A-C is performed:
- Index scan A , nested loop , table scan C ?
- table scan C , nested loop ,Index scan A ?
- MergeScan :table scan C / Index scan A ?
- ...
Which indexes are available ? You claim "Proper indexes", but I rather see for myself if they are "proper".
If you have good data and stats collected for PJOB and an index on
Code:
EMPID
EMRCD
EFDT
EFFSEQ
optimizer should definitely choose this index with altleast 3 matching columns in the last 2 subqueries (C1 and C2) and one matching column on Correlation name (C). My guess is C1 and C2 would just require Index only access if you have index as specified above
If still a tablespace scan , Post the indexes which exists for your table.
@Guy C - There are total 17 indexes defined on the PJob table of which 16 are non unique and atleast 10 have ESTATUS as one column while no index on JOBIND.
table scan C , nested loop ,Index scan A is geting performed.
@Prashant- The stats are current and the indexes exst on the columns that you have mentioned.Also, the one fetch index scat happening for the last two subqueries.
The tablespace scan is only happening to PJOB table.
it seems like a fair chosen accesspath.
DB2 has to start somewhere and it has two choices :
choice 1 : get all PJOB with Estatus in (a,l,p,s) and jobind = 'P'
choice 2 : get all PNAMES with nametyp = 'XYZ'
probably PNAMES is a big table without index on nametyp
PJOB is probably smaller.
DB2 likes to start with the smaller (after initial filtering) table.
there is no index on jobind.
I don't know if there is an index starting with Estatus, but even then : at least 4 values of Estatus qualify so maybe Estatus is not so "filtering".
I don't know how many distinct values estatus has (db2 does).
If you can't limit the scope using an index, a Tablescan is the best accesspath available
If you do want DB2 to choose Choice 2 you might try to handicap the criteria on PJOB by adding "OR 0=1)" to them
Which table did the optmizer choose as the first table. Is it PJOB ?
If its PJOB there won't be any mathcols since the EMPID is unknown at the start and you are not specifying EMPID in where criteria. But still the accesstype would be I with Matchcols=0
If PJOB is the second table accessed it should match EMPID with Matchcols=1
Code:
PJob table of which 16 are non unique and atleast 10 have ESTATUS as one column while no index on JOBIND.
No point in having ESTATUS as a column in atleast 10 of the indexes you have. Thing to note is whether you are supplying the leading columns above ESTATUS .
Six indexes on Nametype that is on PNAMES.
Two Index on EStatus that is on PJob and 11 Col Card for Estatus.
No index on Jobbind.
I tried choice 2 as suggested by you as below(criteria on PJob)-
AND C.EFDT = (SELECT MAX(C1.EFDT)
FROM PJOB_EPM C1
WHERE C1.EMPID = C.EMPID
AND C1.EMRCD = C.EMRCD
AND C1.EFDT <= CURRENT DATE OR 0=1)
AND C.EFFSEQ = (SELECT MAX(C2.EFFSEQ)
FROM PJOB_EPM C2
WHERE C2.EMPID = C.EMPID
AND C2.EMRCD = C.EMRCD
AND C2.EFDT = C.EFDT OR 0=1)
With this the accesstype gets changed to I with TNAME as PNAMES now in place of PJOB and R previously-
But the query is still taking time,am skeptical if it spossible to tune the query any further.
@Prashanth- There are 1 Million and 8 Million rows in PNAMES and PJOB respectively.There are only 3 indexes having estatus(sorry about it)
and PJOB should be the second table accessed as its matching EMPID with Matchcols=1.
that was not what I suggested, you have added or 0=1 on the subselects which were actually OK.
what I wanted you to try was :
Code:
...
FROM PNAMES A,
PJOB C
WHERE A.EMPID = C.EMPID
AND A.NAM_TYP = 'XYZ'
AND A.EFDT = (SELECT MAX(B.EFDT) FROM PNAMES B
WHERE B.EMPID = A.EMPID AND B.NAM_TYP = A.NAM_TYP AND B.EFDT <= CURRENT DATE)
AND C.EFDT = (SELECT MAX(C1.EFDT) FROM PJOB_EPM C1
WHERE C1.EMPID = C.EMPID AND C1.EMRCD = C.EMRCD AND C1.EFDT <= CURRENT DATE)
AND C.EFFSEQ = (SELECT MAX(C2.EFFSEQ) FROM PJOB_EPM C2
WHERE C2.EMPID = C.EMPID AND C2.EMRCD = C.EMRCD AND C2.EFDT = C.EFDT)
AND (C.ESTATUS IN('A', 'L', 'P', 'S') OR 0=1)
AND (C.JOBIND = 'P' or 0=1);