Advice on an optimal unload solution

hailashwin · New User Joined: 16 Oct 2008 Posts: 74 Location: Boston

Hi,
There is an existing job that reads a file to get the account number and an indicator and goes to the table to retrieve all the rows on the table for the condition. There could be more than one row returned and so we use a cursor. The table is quite huge and each time the file is read, the cursor is closed and opened. Since the account file is also quite huge, the performance of the job is a cause of concern.

The below are the alternate ways I know could produce the same output but not sure which would help me optimize on the performance.

1. DSNTIAUL - I would unload the whole table and work out a SORT-JOIN with the account file and produce the output. But when I ran this step..it seemed to run forever.

2. Multi-row retrieval - Here I would retrieve the rowset and process it and so there would be only one fetch for every read on the file.

3. Unload Utility - Unload the entire table and use a SORT-JOIN with the account file and produce the output. I would least prefer this method as i would have to get the DBA do the work for me to create utility jobs.

With my few years of experience, I have seen jobs that have performed well on the Testplex going haywire once in the prodplex. So thought I will get some expert opinion before working on this task.

Any suggestions/Comments are welcome.

Thanks,
Ashwin.

prino · Posted: Tue May 03, 2011 12:56 pm

...quite huge..., ...it seemed to run forever...

Maybe you can be a bit more specific?

GuyC · Posted: Tue May 03, 2011 1:28 pm

at least some numbers:
input file : n1 records
table : n2 rows
output : n3 records
Some info like clustering, indexes, partitioning of the table / order of the input file are also relevant

PeterHolland · Posted: Tue May 03, 2011 1:44 pm

Maybe you can load the account file into a DB2 table and perform a JOIN
on the 2 tables.

hailashwin · New User Joined: 16 Oct 2008 Posts: 74 Location: Boston

Robert/GuyC,
The elapsed time for the DSNTIAUL step was 16 minutes and the CPU time was 4 minutes(these values are not bound to be consistent as it was run in the development environment) for one hundred thousand accounts on the file and the table had 7 million records. The select statement I used for this step had not predicates in the where clause as I had to match a indicator to one of the table columns after reading the flat file.

The table is not a partitioned table and the input is sorted on Account number.

Peter,
Creating a seperate table is out of my control here..

Thanks,
Ashwin.

GuyC · Posted: Tue May 03, 2011 4:50 pm

100.000 / 7.000.000 = 1.4%
you didn't mention how many rows qualify so I'll assume 3 rows for each input record.
So that's 5% of the table that you actually need.
If you have a decent access path, selecting 5% of the table is probably faster than unloading/sorting.

The following , I didn't understand:

don.leahy · Posted: Tue May 03, 2011 7:23 pm

You should also consider opening the cursor at the start of the program, and then use traditional match/merge logic between it and the account file. (Of course it goes without saying that the ORDER BY in the cursor must match the sort sequence of the account file).

Depending on how many records there are in the account file, this may be much more efficient than opening and closing the cursor thousands of time. The reason for this is that FETCHes are relatively cheap compared to OPENs.

GuyC · Posted: Tue May 03, 2011 9:07 pm

1) I wouldn't fetch through the whole table, when only 5% of the rows are needed.

2) The statement :"FETCHes are relatively cheap compared to OPENs" is one I wouldn't repeat, without looking at the accesspath.

don.leahy · Posted: Tue May 03, 2011 11:24 pm