"Subtracting" a dataset with duplicates on the key

Claes Norreen · Posted: Wed Aug 25, 2010 12:02 pm

Hi,

I have a dataset (no 1) with customers (key: customerno) and their phone number(s), with one record per customer per phone number. Naturally, a customer can have 0-many phone numbers.

I have another dataset (no 2) with unique customerno's. If a match on customerno is found with dataset1, all records with this number must be removed from dataset1. How can I accomblish this? I tried several types of SPLICE, but the duplicates on customerno in dataset1 are teasing me...

Assume that input datasets are not sorted in the key order.

Sample input (dataset1):

Claes Norreen · Posted: Wed Aug 25, 2010 12:46 pm

Hmm, seems I didn't try this SPLICE:

sqlcode1 · Active Member Joined: 08 Apr 2010 Posts: 577 Location: USA

Claes Norreen,
Please check if below mentioned single pass solution works for you...I am assuming your both input files are 80 bytes FB.

Frank/Kolusu,
Since OP's 2nd file is unique for customerno, could below method be used as a single pass solution?

Skolusu · Posted: Wed Aug 25, 2010 11:03 pm

Claes Norreen,

Ideally I would have used JOINKEYS for such requirement.

sqlcode1 · Active Member Joined: 08 Apr 2010 Posts: 577 Location: USA

Kolusu,
I tried using unique file as first one but I was trying to prevent an extra IFTHEN condition in the OUTREC.

Only, now I have understood that actually that was indeed more efficient way.

Thanks again for the feedback.

Thanks,

Claes Norreen · Posted: Thu Aug 26, 2010 12:25 am

Thanks for the feedback. In my initial post, I should have noted, that the input files do not have the same LRECL in "the real world". Actually, as this is part of a PROC, dataset1's LRECL will vary - so I doubt I can use a single pass? You may use LRECL=13 for dataset1 and LRECL=6 for dataset2, if you want to have a go at it.

Unfortunately we do not have the PTF for JOINKEY at my site yet.

Skolusu · Posted: Thu Aug 26, 2010 1:49 am

Claes Norreen,

Use the following DFSORT JCL which will give you the desired results.

Claes Norreen · Posted: Thu Aug 26, 2010 1:20 pm

Great stuff! Thank you.

Claes Norreen · Posted: Thu Aug 26, 2010 7:12 pm

Ran a performance test on yours vs. mine, and it turns out that mine are 50% faster - despite the extra pass of dataset1..!? How come, I wonder?

Dataset1: approx. 6,6 mill. records (LRECL=78)
Dataset2: approx. 16,000 records (LRECL=40)

Total CPU used:
Mine: 10,08 secs
Yours: 15,45 secs

Skolusu · Posted: Thu Aug 26, 2010 9:02 pm

Claes Norreen · Posted: Thu Aug 26, 2010 9:45 pm

Will do tomorrow - work is off for today. ;-)