Compare datasets - can this be further improved?

Claes Norreen · Posted: Thu May 10, 2007 5:54 pm

Once upon a time, Frank helped me solving the following task:

Compare two datasets with a common key, and extract those being new (in dataset2, but not dataset1), the deleted ones (in dataset2, but not dataset1) and the updated ones (where the data was changed from dataset1 to dataset2). So if we had these data:

Dataset1
KEY1DATA1
KEY2DATA2
KEY4DATA4

Dataset2
KEY1DATA1
KEY2DATA5
KEY3DATA3

the output would be:
KEY2DATA2UOLD
KEY2DATA5UNEW
KEY3DATA3DOLD
KEY4DATA4INEW

Where U=Update, I=Insert, D=Delete. Note, that KEY1 doesn't change from dataset1 to dataset2, so it's not represented in the output.

Frank came up with this piece of code:

Alain Benveniste · New User Joined: 14 Feb 2005 Posts: 88

Claes,

Do you have duplicates in dataset 1 &/or 2 ?

Alain

Claes Norreen · Posted: Fri May 11, 2007 3:38 am

Hi Alain,

Thanks, I forgot to mention that duplicates are NOT allowed. All keys must be unique, and also the two datasets must be equally long.

Alain Benveniste · New User Joined: 14 Feb 2005 Posts: 88

Claes,

One more question :

the field to compare is really 5 in length ?

Alain

Claes Norreen · Posted: Fri May 11, 2007 2:22 pm

I'm not sure what you mean..?

Alain Benveniste · New User Joined: 14 Feb 2005 Posts: 88

Oh I see, I mean DATA1, DATA2 ...

Claes Norreen · Posted: Fri May 11, 2007 2:56 pm

Ah ok, well this is just an example. In the example I give, the keylgh is 4 and the reclgh is 9, but - of course - the job should be able to handle any keylgh and reclgh.

Alain Benveniste · New User Joined: 14 Feb 2005 Posts: 88

Claes,

You can test this one

Claes Norreen · Posted: Mon May 14, 2007 11:40 am

Hi Alain,

Thank you very much for your reply and you efforts! Your code works fine on the testdata, but the dummy "CLAES NORREEN" record makes it hard to use in real life, as I will have to make this record equally long with the data records (which selsom are shorter than 80 bytes). Can it be done without the dummy record?

Claes Norreen · Posted: Mon May 14, 2007 11:56 am

By the way.... If we can avoid the dummy record, the code looks very promising! Should be much faster than my code, as you save two COPY operations and a SORT operation!

Claes Norreen · Posted: Mon May 14, 2007 12:12 pm

Now I see the problem..: By inputting both datasets as one, you can't know which is "old" and which is "new" without the DUMMY. You need some kind of seperator between "old" and "new".

I'll do a performance test now... ;-)

Claes Norreen · Posted: Mon May 14, 2007 12:59 pm

Performance test with 500,000 records with LRECL=1159

My code:

Alain Benveniste · New User Joined: 14 Feb 2005 Posts: 88

Claes,

The 2 records must be present. This method is explained here

www-304.ibm.com/jct01004c/systems/support/storage/software/sort/mvs/tricks/index.html
and look 'Include or omit groups of records'
You just need to create one record of your choice in a file and concatenate it as shown in the JCL. This record must be UNIQUE vs the file you want to treat.

Alain

Claes Norreen · Posted: Mon May 14, 2007 1:05 pm

Franks original code:

Claes Norreen · Posted: Mon May 14, 2007 1:21 pm

The major concern at my company is CPU usage, so I need to go with my own code (for now).

Thanks again Alain! :-)

ParagChouguley · Posted: Mon Jan 21, 2008 7:13 pm

Hi Frank and others,
I too have a requirement which is an extension to Claes's requirement.
Reffering to output of Frank's original job's output given by Claes: