Once upon a time, Frank helped me solving the following task:
Compare two datasets with a common key, and extract those being new (in dataset2, but not dataset1), the deleted ones (in dataset2, but not dataset1) and the updated ones (where the data was changed from dataset1 to dataset2). So if we had these data:
Dataset1
KEY1DATA1
KEY2DATA2
KEY4DATA4
Dataset2
KEY1DATA1
KEY2DATA5
KEY3DATA3
the output would be:
KEY2DATA2UOLD
KEY2DATA5UNEW
KEY3DATA3DOLD
KEY4DATA4INEW
Where U=Update, I=Insert, D=Delete. Note, that KEY1 doesn't change from dataset1 to dataset2, so it's not represented in the output.
Frank came up with this piece of code:
Code:
//ICETOOL EXEC PGM=ICETOOL
//OLD DD dataset1
//NEW DD dataset2
//T1 DD DSN=&&T1,SPACE=(CYL,(50,10)),DISP=(NEW,PASS)
//T2 DD DSN=&&T2,SPACE=(CYL,(50,10)),DISP=(NEW,PASS)
//T3 DD DSN=&&T3,SPACE=(CYL,(50,10)),DISP=(NEW,PASS)
//T4 DD DSN=&&T4,SPACE=(CYL,(50,10)),DISP=(NEW,PASS)
//CON1 DD DSN=*.T1,VOL=REF=*.T1,DISP=(OLD,DELETE)
// DD DSN=*.T2,VOL=REF=*.T2,DISP=(OLD,DELETE)
//INS DD DSN=&&INS,SPACE=(CYL,(50,10)),DISP=(NEW,PASS)
//DEL DD DSN=&&DEL,SPACE=(CYL,(50,10)),DISP=(NEW,PASS)
//UPD DD DSN=&&UPD,SPACE=(CYL,(50,10)),DISP=(NEW,PASS)
//CON2 DD DSN=*.DEL,VOL=REF=*.DEL,DISP=(OLD,DELETE)
// DD DSN=*.INS,VOL=REF=*.INS,DISP=(OLD,DELETE)
// DD DSN=*.UPD,VOL=REF=*.UPD,DISP=(OLD,DELETE)
//RESULT DD outputdsn
//TOOLMSG DD SYSOUT=*
//DFSMSG DD SYSOUT=*
//TOOLIN DD *
* Add 'OLD' to the OLD records
COPY FROM(OLD) TO(T1) USING(CTL1)
* Add 'NEW' to the NEW records
COPY FROM(NEW) TO(T2) USING(CTL2)
* Remove the unchanged records, keeping only the
* deleted, inserted and updated records in T3.
SELECT FROM(CON1) TO(T3) ON(1,reclgh,CH) NODUPS
* Write the deleted records ('OLD') to DEL and add 'D'.
* Write the inserted records ('NEW') to INS and add 'I'.
SELECT FROM(T3) TO(DEL) ON(1,keylgh,CH) NODUPS USING(CTL3)
* Write the updated records to UPD and add 'U'.
SELECT FROM(T3) TO(UPD) ON(1,keylgh,CH) ALLDUPS USING(CTL4)
* Sort the records together on the key.
SORT FROM(CON2) TO(RESULT) USING(CTL5)
//CTL1CNTL DD *
OUTFIL FNAMES=T1,OUTREC=(1,reclgh,reclgh+2:C'OLD')
//CTL2CNTL DD *
OUTFIL FNAMES=T2,OUTREC=(1,reclgh,reclgh+2:C'NEW')
//CTL3CNTL DD *
OUTFIL FNAMES=DEL,
INCLUDE=(reclgh+2,3,CH,EQ,C'OLD'),
OUTREC=(1,reclgh,C'D',reclgh+2,3)
OUTFIL FNAMES=INS,
INCLUDE=(reclgh+2,3,CH,EQ,C'NEW'),
OUTREC=(1,reclgh,C'D',reclgh+2,3)
//CTL4CNTL DD *
OUTFIL FNAMES=UPD,
OUTREC=(1,reclgh,C'U',reclgh+2,3)
//CTL5CNTL DD *
SORT FIELDS=(1,keylgh,CH,A,reclgh+1,4,CH,A)
Using the newest stuff in ICETOOL, I improved it, here's the code:
Code:
//ICETOOL EXEC PGM=ICETOOL
//OLD DD dataset1
//NEW DD dataset2
//T1 DD DSN=&&T1,SPACE=(CYL,(200,100),RLSE),DISP=(MOD,PASS)
//T2 DD DSN=&&T2,SPACE=(CYL,(200,100),RLSE),DISP=(NEW,DELETE)
//IAD DD DSN=&&IAD,SPACE=(CYL,(200,100),RLSE),DISP=(NEW,PASS)
//UPD DD DSN=&&UPD,SPACE=(CYL,(200,100),RLSE),DISP=(NEW,PASS)
//CON DD DSN=*.IAD,VOL=REF=*.IAD,DISP=(OLD,DELETE)
// DD DSN=*.UPD,VOL=REF=*.UPD,DISP=(OLD,DELETE)
//RESULT DD outputdsn
//TOOLMSG DD SYSOUT=*
//DFSMSG DD SYSOUT=*
//TOOLIN DD *
* Add 'DOLD' to the OLD records
COPY FROM(IN1) TO(T1) USING(CTL1)
* Add 'INEW' to the NEW records
COPY FROM(IN2) TO(T1) USING(CTL2)
* Remove the unchanged records, keeping only the
* deleted, inserted and updated records in T2.
SELECT FROM(T1) TO(T2) ON(1,reclgh,CH) NODUPS
* Write the deleted records ('OLD') to DEL. No need to add 'D'.
* Write the inserted records ('NEW') to INS. No need to add 'I'.
* The discarded (duplicate) records have an 'U' overlayed.
SELECT FROM(T2) TO(IAD) ON(1,keylgh,CH) NODUPS DISCARD(UPD) USING(CTL3)
* Sort the records together on the key.
SORT FROM(CON) TO(OUT) USING(CTL4)
//CTL1CNTL DD *
INREC BUILD=(1,reclgh,C'DOLD')
//CTL2CNTL DD *
INREC BUILD=(1,reclgh,C'INEW')
//CTL3CNTL DD *
OUTFIL FNAMES=IAD
OUTFIL FNAMES=UPD,OVERLAY=(reclgh+1:C'U')
//CTL4CNTL DD *
SORT FIELDS=(1,keylgh,CH,A,reclgh+1,4,CH,A)
This code is a lot shorter, and it also reduces CPU usage by approx. 3%.
My question is: Can it be improved even further? I experimented a bit using SPLICE, but couldn't get it right. I thought, maybe some of you experts have a great idea?
Ah ok, well this is just an example. In the example I give, the keylgh is 4 and the reclgh is 9, but - of course - the job should be able to handle any keylgh and reclgh.
Thank you very much for your reply and you efforts! Your code works fine on the testdata, but the dummy "CLAES NORREEN" record makes it hard to use in real life, as I will have to make this record equally long with the data records (which selsom are shorter than 80 bytes). Can it be done without the dummy record?
By the way.... If we can avoid the dummy record, the code looks very promising! Should be much faster than my code, as you save two COPY operations and a SORT operation!
Now I see the problem..: By inputting both datasets as one, you can't know which is "old" and which is "new" without the DUMMY. You need some kind of seperator between "old" and "new".
Joined: 03 Feb 2007 Posts: 175 Location: PUNE(INDIA)
Hi Frank and others,
I too have a requirement which is an extension to Claes's requirement.
Reffering to output of Frank's original job's output given by Claes:
Quote:
Dataset1
KEY1DATA1
KEY2DATA2
KEY4DATA4
Dataset2
KEY1DATA1
KEY2DATA5
KEY3DATA3
the output would be:
KEY2DATA2UOLD
KEY2DATA5UNEW
KEY3DATA3DOLD
KEY4DATA4INEW
Now, its clear that the job gives records inserted, deleted and changed with previous and chnaged data or whatever. But for the records changed, I want which part of data exactly in changed
Now here is my original requirement.
I want to comapre 2 same DB2 tables from different regions. I will provide 3 files as input
First File Contains:
Code:
Name of Field Key Starts Length
Field1 P 1 10
Field2 Y 11 10
Field3 Y 21 10
Field4 N 31 10
Field5 P 41 10
Now here are the meanigs of Key values.
P = Its one of the column of combination of Primary Key of DB2 table.
Y = I want to compare this field for differences.
N = Don't take this field for comparison.
Second and third file contains unload of 2 tables from diff regions. As they are unload of same table, they have same layout (structure). The starting byte and length of fields are mentioned in file1. Do you really need the datatype of these fields ???
Now, I need which rows are inserted, deleted and changed.
I'm fine with the way inserted and deleted rows are getting from Frank's job. But for changed rows I want something different.
So here is sample output
Code:
Primary Key Comb Field1 Field2 Field3 Field4 Field5 --> Header
Primary Key value xxxx yyyy zzzzz aaaa ffffffff INSERT --from new file
Primary Key value xxxx yyyy zzzzz aaaa ffffffff INSERT --from new file
Primary Key value xxxx yyyy zzzzz aaaa ffffffff DELETE --from old file
Primary Key value ---- yyyy zzzzz ---- ffffffff CHANGE --from old file
----------------- ---- SSSS ASDAAA ---- -------CHANGE --frm new file
So a changed record will have 2 rows with dashes indicating no change in that field and values will indicate they have changes.