Data from table.

krunalbafna · Active User Joined: 18 Jan 2010 Posts: 143 Location: Pune

I have tow tables:
Tabel A: having employee id in decimal format
Tabel B: having employee id in Char format (PIC X(12))

I want to extract those records from table A which are not present in table B.

I used following query:
SELECT EMPID,EMPCDE FROM TABLEA
WHERE EMPID NOT IN (
SELECT
CAST (EMPID AS DECIMAL(12)) FROM TABLEB))

But above query is taking longer time to execute.
Any efficient way available to solve the above issue and imporve the performance.

Kindly provide your feedback.

Regards,
Krunal

Gousiya Mulla · New User Joined: 02 Jun 2008 Posts: 87 Location: Bangalore

I think we can make use of joins ... !

Phrzby Phil · Posted: Wed Aug 11, 2010 5:34 pm

How's about a better title, ok?

GuyC · Posted: Thu Aug 12, 2010 12:18 pm

not joins, but subselect "where not exists" usually performs better

People should try NOT to write "IN ()"-subselects
Only when your IN-list has a very limited number of elements it will perform good

Do you need columns from TabA which aren't in TabB?
is EMPID unique in TabA? TabB?
are there indexes on TabA, TabB on or beginning with Empid ?

You can also have a look at "EXCEPT", a new kind of union in DB2 9. maybe that is useful

krunalbafna · Active User Joined: 18 Jan 2010 Posts: 143 Location: Pune

Hi...
My tables has huge amount of data around 75lacs.

I want records only from TABLEA

krunalbafna · Active User Joined: 18 Jan 2010 Posts: 143 Location: Pune

The indexes being with EMPID in both tables.

As i said i want only records from TABLEA which are not present in TABLEB for combination of EMPID,EMPCDE and EMPNO.

GuyC · Posted: Thu Aug 12, 2010 3:45 pm

like i said

krunalbafna · Active User Joined: 18 Jan 2010 Posts: 143 Location: Pune

Hi GUYC,
TABLEA has empid in decimal (COBOL Comp-3) format
TABLEB has empid in char (COBOL X) format

Your query:
select ... from tab1
where not exists (select 1 from tab2 where tab1.col1 = tab2.col1 and tab1.col2 = tab2.col2)

What does Select 1 means:

may you please explain.

Thanks

GuyC · Posted: Thu Aug 12, 2010 4:07 pm

select 1 : since "not exists()" only test if a row is there , there is no need to select columns in that subselect. the shortest column is simply literal 1.

char() is the function to change decimal to char but maybe that is not the exact correct format, so you might have to do substr()

Experiment with "select char(EmpId) from tabA " to see what you get
and see if it resembles tabB.Empid

if tab1.EmpID is DEC(12,0) it would be something like :
where tabB.EMPID =substr(char(tabA.EmpID),2,12)

krunalbafna · Active User Joined: 18 Jan 2010 Posts: 143 Location: Pune

GuyC,
I have tried using above query but the process is even slower. It takes more than 45mins to execute.

Any possible method to increase the efficiency to get the results very fast.

Regards and Thanks

GuyC · Posted: Thu Aug 12, 2010 4:47 pm

How can you reply "more than 45mins" 26mins after my post ?
besides : "elapsed time"? are you sure the workload on your mainframe was the same as previous test ? that the same rows were in the bufferpools ?
How about CPUsec ? GETPAGES ?

Do both tables have 7.500.000 rows ?
Are the indexes only on EmpID ? are the indexes clustered ? reorged ? unique ?

you did try where tabB.EMPID =substr(char(tabA.EmpID),2,12)
and not where cast(tabB.EMPID as dec(12)) =tabA.EmpID ?

This is a big difference. one is stage1 indexable the other one isn't

krunalbafna · Active User Joined: 18 Jan 2010 Posts: 143 Location: Pune

hey i know post is 26mins old....
But i have to stop the query in between as it is not showing the output. I calculated the time and it may take 45mins.

Substr query is giving me sqlcost:753.938.432

select TAB1.EMPID,TAB1.EMPCD from tab1
where not exists (select 1 from tab2 where tab2.empid = substr(char(tab1.EmpID),2,12) and tab1.empcd = tab2.empcd)

Using cast query sqlcost: 29.051

select TAB1.EMPID,TAB1.EMPCD from tab1
where not exists (select 1 from tab2 where tab1.empid = cast(tab2.empid as decimal(12)) and tab1.empcd = tab2.empcd)

Kindly check and provide your commetns.

krunalbafna · Active User Joined: 18 Jan 2010 Posts: 143 Location: Pune

GuyC,
Both tables have different record counts.

GuyC · Posted: Thu Aug 12, 2010 5:40 pm

Is there an index on tab2 on or beginning with (Empid,empcd) or (Empcd,Empid)? you really need one to get this performant.
if there is one, it really doesn't make sense and you'll need to provide explain info.

another possibility is (if you have DB2 V9 )

Craq Giegerich · Posted: Thu Aug 12, 2010 5:48 pm

Some consistency in defining the same data item in different tables would help.

dick scherrer · Posted: Thu Aug 12, 2010 9:48 pm

Hello,

krunalbafna · Active User Joined: 18 Jan 2010 Posts: 143 Location: Pune

Our Db2 version is 8. I am getting following error when i execute:

-084, ERROR: UNACCEPTABLE SQL STATEMENT
= 42612 SQLSTATE RETURN CODE

Thanks..
Any modification will make us imporve performance?

dbzTHEdinosauer · Posted: Fri Aug 13, 2010 3:05 pm

krunalbafna,

you are wasting a post to tell us that db2 vsn 9 syntax does not work on db2 Vsn 8.

at this point you need to run an explain on both queries

substr(char)
cast as decimal

and based on the explain output

add/modify indexes
insure that all predicates are indexable