IBM Mainframe Forum Index
 
Log In
 
IBM Mainframe Forum Index Mainframe: Search IBM Mainframe Forum: FAQ Register
 

Compare datasets - can this be further improved?


IBM Mainframe Forums -> DFSORT/ICETOOL
Post new topic   Reply to topic
View previous topic :: View next topic  
Author Message
Claes Norreen

Active User


Joined: 20 Dec 2005
Posts: 137
Location: Denmark

PostPosted: Thu May 10, 2007 5:54 pm
Reply with quote

Once upon a time, Frank helped me solving the following task:

Compare two datasets with a common key, and extract those being new (in dataset2, but not dataset1), the deleted ones (in dataset2, but not dataset1) and the updated ones (where the data was changed from dataset1 to dataset2). So if we had these data:

Dataset1
KEY1DATA1
KEY2DATA2
KEY4DATA4

Dataset2
KEY1DATA1
KEY2DATA5
KEY3DATA3

the output would be:
KEY2DATA2UOLD
KEY2DATA5UNEW
KEY3DATA3DOLD
KEY4DATA4INEW

Where U=Update, I=Insert, D=Delete. Note, that KEY1 doesn't change from dataset1 to dataset2, so it's not represented in the output.

Frank came up with this piece of code:
Code:

//ICETOOL EXEC PGM=ICETOOL       
//OLD          DD dataset1
//NEW          DD dataset2
//T1           DD DSN=&&T1,SPACE=(CYL,(50,10)),DISP=(NEW,PASS)
//T2           DD DSN=&&T2,SPACE=(CYL,(50,10)),DISP=(NEW,PASS)
//T3           DD DSN=&&T3,SPACE=(CYL,(50,10)),DISP=(NEW,PASS)
//T4           DD DSN=&&T4,SPACE=(CYL,(50,10)),DISP=(NEW,PASS)
//CON1         DD DSN=*.T1,VOL=REF=*.T1,DISP=(OLD,DELETE)
//             DD DSN=*.T2,VOL=REF=*.T2,DISP=(OLD,DELETE)
//INS          DD DSN=&&INS,SPACE=(CYL,(50,10)),DISP=(NEW,PASS)
//DEL          DD DSN=&&DEL,SPACE=(CYL,(50,10)),DISP=(NEW,PASS)
//UPD          DD DSN=&&UPD,SPACE=(CYL,(50,10)),DISP=(NEW,PASS)
//CON2         DD DSN=*.DEL,VOL=REF=*.DEL,DISP=(OLD,DELETE)
//             DD DSN=*.INS,VOL=REF=*.INS,DISP=(OLD,DELETE)
//             DD DSN=*.UPD,VOL=REF=*.UPD,DISP=(OLD,DELETE)
//RESULT       DD outputdsn
//TOOLMSG      DD SYSOUT=*
//DFSMSG       DD SYSOUT=*
//TOOLIN DD *
* Add 'OLD' to the OLD records                         
  COPY FROM(OLD) TO(T1) USING(CTL1)                   
* Add 'NEW' to the NEW records                         
  COPY FROM(NEW) TO(T2) USING(CTL2)                   
* Remove the unchanged records, keeping only the       
* deleted, inserted and updated records in T3.         
  SELECT FROM(CON1) TO(T3) ON(1,reclgh,CH) NODUPS
* Write the deleted records ('OLD') to DEL and add 'D'.
* Write the inserted records ('NEW') to INS and add 'I'.
  SELECT FROM(T3) TO(DEL) ON(1,keylgh,CH) NODUPS USING(CTL3)                                 
* Write the updated records to UPD and add 'U'.       
  SELECT FROM(T3) TO(UPD) ON(1,keylgh,CH) ALLDUPS USING(CTL4)
* Sort the records together on the key.
  SORT FROM(CON2) TO(RESULT) USING(CTL5)
//CTL1CNTL DD *                           
  OUTFIL FNAMES=T1,OUTREC=(1,reclgh,reclgh+2:C'OLD')
//CTL2CNTL DD *                           
  OUTFIL FNAMES=T2,OUTREC=(1,reclgh,reclgh+2:C'NEW')
//CTL3CNTL DD *               
  OUTFIL FNAMES=DEL,             
    INCLUDE=(reclgh+2,3,CH,EQ,C'OLD'),
    OUTREC=(1,reclgh,C'D',reclgh+2,3) 
  OUTFIL FNAMES=INS,             
    INCLUDE=(reclgh+2,3,CH,EQ,C'NEW'),
    OUTREC=(1,reclgh,C'D',reclgh+2,3) 
//CTL4CNTL DD *               
  OUTFIL FNAMES=UPD,             
    OUTREC=(1,reclgh,C'U',reclgh+2,3) 
//CTL5CNTL DD *                   
  SORT FIELDS=(1,keylgh,CH,A,reclgh+1,4,CH,A)

Using the newest stuff in ICETOOL, I improved it, here's the code:
Code:

//ICETOOL EXEC PGM=ICETOOL   
//OLD          DD dataset1
//NEW          DD dataset2
//T1           DD DSN=&&T1,SPACE=(CYL,(200,100),RLSE),DISP=(MOD,PASS)
//T2           DD DSN=&&T2,SPACE=(CYL,(200,100),RLSE),DISP=(NEW,DELETE)
//IAD          DD DSN=&&IAD,SPACE=(CYL,(200,100),RLSE),DISP=(NEW,PASS)
//UPD          DD DSN=&&UPD,SPACE=(CYL,(200,100),RLSE),DISP=(NEW,PASS)
//CON          DD DSN=*.IAD,VOL=REF=*.IAD,DISP=(OLD,DELETE)           
//             DD DSN=*.UPD,VOL=REF=*.UPD,DISP=(OLD,DELETE)       
//RESULT       DD outputdsn
//TOOLMSG      DD SYSOUT=*
//DFSMSG       DD SYSOUT=*
//TOOLIN DD *
* Add 'DOLD' to the OLD records                         
  COPY FROM(IN1) TO(T1) USING(CTL1)             
* Add 'INEW' to the NEW records                                               
  COPY FROM(IN2) TO(T1) USING(CTL2)                                 
* Remove the unchanged records, keeping only the       
* deleted, inserted and updated records in T2.         
  SELECT FROM(T1) TO(T2) ON(1,reclgh,CH) NODUPS                       
* Write the deleted records ('OLD') to DEL. No need to add 'D'.
* Write the inserted records ('NEW') to INS. No need to add 'I'.
* The discarded (duplicate) records have an 'U' overlayed.
  SELECT FROM(T2) TO(IAD) ON(1,keylgh,CH) NODUPS DISCARD(UPD) USING(CTL3)
* Sort the records together on the key.
  SORT FROM(CON) TO(OUT) USING(CTL4)                                 
//CTL1CNTL DD *                                                     
  INREC BUILD=(1,reclgh,C'DOLD')                                       
//CTL2CNTL DD *                                                     
  INREC BUILD=(1,reclgh,C'INEW')                                       
//CTL3CNTL DD *                                                     
  OUTFIL FNAMES=IAD                                                 
  OUTFIL FNAMES=UPD,OVERLAY=(reclgh+1:C'U')                             
//CTL4CNTL DD *                                                     
  SORT FIELDS=(1,keylgh,CH,A,reclgh+1,4,CH,A)

This code is a lot shorter, and it also reduces CPU usage by approx. 3%.

My question is: Can it be improved even further? I experimented a bit using SPLICE, but couldn't get it right. I thought, maybe some of you experts have a great idea? icon_biggrin.gif
Back to top
View user's profile Send private message
Alain Benveniste

New User


Joined: 14 Feb 2005
Posts: 88

PostPosted: Fri May 11, 2007 12:10 am
Reply with quote

Claes,

Do you have duplicates in dataset 1 &/or 2 ?

Alain
Back to top
View user's profile Send private message
Claes Norreen

Active User


Joined: 20 Dec 2005
Posts: 137
Location: Denmark

PostPosted: Fri May 11, 2007 3:38 am
Reply with quote

Hi Alain,

Thanks, I forgot to mention that duplicates are NOT allowed. All keys must be unique, and also the two datasets must be equally long.
Back to top
View user's profile Send private message
Alain Benveniste

New User


Joined: 14 Feb 2005
Posts: 88

PostPosted: Fri May 11, 2007 1:52 pm
Reply with quote

Claes,

One more question :

the field to compare is really 5 in length ?

Alain
Back to top
View user's profile Send private message
Claes Norreen

Active User


Joined: 20 Dec 2005
Posts: 137
Location: Denmark

PostPosted: Fri May 11, 2007 2:22 pm
Reply with quote

I'm not sure what you mean..?
Back to top
View user's profile Send private message
Alain Benveniste

New User


Joined: 14 Feb 2005
Posts: 88

PostPosted: Fri May 11, 2007 2:46 pm
Reply with quote

Oh I see, I mean DATA1, DATA2 ...
Back to top
View user's profile Send private message
Claes Norreen

Active User


Joined: 20 Dec 2005
Posts: 137
Location: Denmark

PostPosted: Fri May 11, 2007 2:56 pm
Reply with quote

Ah ok, well this is just an example. In the example I give, the keylgh is 4 and the reclgh is 9, but - of course - the job should be able to handle any keylgh and reclgh.
Back to top
View user's profile Send private message
Alain Benveniste

New User


Joined: 14 Feb 2005
Posts: 88

PostPosted: Sat May 12, 2007 1:28 am
Reply with quote

Claes,

You can test this one
Code:

//STEP0001 EXEC PGM=ICETOOL
//TOOLMSG  DD SYSOUT=*
//DFSMSG   DD SYSOUT=*
//IN       DD *
CLAES NORREEN
/*
//         DD *
KEY1DATA1
KEY2DATA2
KEY4DATA4
/*
//         DD *
CLAES NORREEN
/*
//         DD *
KEY1DATA1
KEY2DATA5
KEY3DATA3
/*
//TOOLIN   DD *
  SELECT FROM(IN) TO(OUT1) USNG(ICE0) ON(1,9,CH) NODUPS
  SPLICE FROM(OUT1) TO(OUTX) USING(ICE1) ON(1,4,CH) WITH(1,85) -
         KEEPBASE KEEPNODUPS
/*
//OUT1     DD DSN=&&OUT1,SPACE=(CYL,(1,1),DISP=(NEW,DELETE,DELETE)
//OUTX     DD SYSOUT=*
//ICE0CNTL DD *
  INREC IFTHEN=(WHEN=INIT,
                OVERLAY=(81:SEQNUM,5,ZD)),
        IFTHEN=(WHEN=(1,13,CH,NE,C'CLAES NORREEN'),
                OVERLAY=(86:SEQNUM,5,ZD,
                         81:81,5,ZD,SUB,86,5,ZD,M11,LENGTH=5))
  SORT FIELDS=(1,4,CH,A,5,5,CH,D)
  OUTFIL FNAMES=OUT1,
         OMIT=(1,13,CH,EQ,C'CLAES NORREEN'),
         OVERLAY=(86:SEQNUM,5,ZD,RESTART=(1,4))
/*
//ICE1CNTL DD *
  SORT FIELDS=(1,4,CH,A,86,5,ZD,D)
  OUTFIL FNAMES=OUTX,
         IFTHEN=(WHEN=(81,10,CH,EQ,C'0000100001'),
                 OVERLAY=(81:C'INEW')),
         IFTHEN=(WHEN=(81,10,CH,EQ,C'0000100002'),
                 OVERLAY=(81:C'UOLD')),
         IFTHEN=(WHEN=(81,10,CH,EQ,C'0000200001'),
                 OVERLAY=(81:C'DOLD')),
         IFTHEN=(WHEN=(81,10,CH,EQ,C'0000200002'),
                 OVERLAY=(81:C'UNEW')),
         IFOUTLEN=84
/*

Alain
Back to top
View user's profile Send private message
Claes Norreen

Active User


Joined: 20 Dec 2005
Posts: 137
Location: Denmark

PostPosted: Mon May 14, 2007 11:40 am
Reply with quote

Hi Alain,

Thank you very much for your reply and you efforts! Your code works fine on the testdata, but the dummy "CLAES NORREEN" record makes it hard to use in real life, as I will have to make this record equally long with the data records (which selsom are shorter than 80 bytes). Can it be done without the dummy record?
Back to top
View user's profile Send private message
Claes Norreen

Active User


Joined: 20 Dec 2005
Posts: 137
Location: Denmark

PostPosted: Mon May 14, 2007 11:56 am
Reply with quote

By the way.... If we can avoid the dummy record, the code looks very promising! Should be much faster than my code, as you save two COPY operations and a SORT operation! icon_biggrin.gif
Back to top
View user's profile Send private message
Claes Norreen

Active User


Joined: 20 Dec 2005
Posts: 137
Location: Denmark

PostPosted: Mon May 14, 2007 12:12 pm
Reply with quote

Now I see the problem..: By inputting both datasets as one, you can't know which is "old" and which is "new" without the DUMMY. You need some kind of seperator between "old" and "new".

I'll do a performance test now... ;-)
Back to top
View user's profile Send private message
Claes Norreen

Active User


Joined: 20 Dec 2005
Posts: 137
Location: Denmark

PostPosted: Mon May 14, 2007 12:59 pm
Reply with quote

Performance test with 500,000 records with LRECL=1159

My code:
Code:

  EXCP   CONN    TCB    SRB  CLOCK   SERV  PG  PAGE  SWAP   
53087  99245    .16    .01    2.3  3097K   0     0     0     0     0
         TOTAL TCB CPU TIME=   .16  TOTAL ELAPSED TIME=   2.3       

Your code:
Code:

  EXCP   CONN    TCB    SRB  CLOCK   SERV  PG  PAGE  SWAP   
 51962  43258    .25    .01    1.5  4441K   0     0     0     0
          TOTAL TCB CPU TIME=   .25  TOTAL ELAPSED TIME=   1.5

Not sure why, but your code seems to use more CPU (although the elaps time is less)... All that calculating must be the reason..?
Back to top
View user's profile Send private message
Alain Benveniste

New User


Joined: 14 Feb 2005
Posts: 88

PostPosted: Mon May 14, 2007 1:03 pm
Reply with quote

Claes,

The 2 records must be present. This method is explained here

www-304.ibm.com/jct01004c/systems/support/storage/software/sort/mvs/tricks/index.html
and look 'Include or omit groups of records'
You just need to create one record of your choice in a file and concatenate it as shown in the JCL. This record must be UNIQUE vs the file you want to treat.

Alain
Back to top
View user's profile Send private message
Claes Norreen

Active User


Joined: 20 Dec 2005
Posts: 137
Location: Denmark

PostPosted: Mon May 14, 2007 1:05 pm
Reply with quote

Franks original code:

Code:

  EXCP   CONN    TCB    SRB  CLOCK   SERV  PG  PAGE  SWAP  VIO
53649  61055    .17    .01    1.9  3217K   0     0     0     0     
         TOTAL TCB CPU TIME=   .17  TOTAL ELAPSED TIME=   2.0       
Back to top
View user's profile Send private message
Claes Norreen

Active User


Joined: 20 Dec 2005
Posts: 137
Location: Denmark

PostPosted: Mon May 14, 2007 1:21 pm
Reply with quote

The major concern at my company is CPU usage, so I need to go with my own code (for now).

Thanks again Alain! :-)
Back to top
View user's profile Send private message
ParagChouguley

Active User


Joined: 03 Feb 2007
Posts: 175
Location: PUNE(INDIA)

PostPosted: Mon Jan 21, 2008 7:13 pm
Reply with quote

Hi Frank and others,
I too have a requirement which is an extension to Claes's requirement.
Reffering to output of Frank's original job's output given by Claes:
Quote:

Dataset1
KEY1DATA1
KEY2DATA2
KEY4DATA4

Dataset2
KEY1DATA1
KEY2DATA5
KEY3DATA3

the output would be:
KEY2DATA2UOLD
KEY2DATA5UNEW
KEY3DATA3DOLD
KEY4DATA4INEW



Now, its clear that the job gives records inserted, deleted and changed with previous and chnaged data or whatever. But for the records changed, I want which part of data exactly in changed

Now here is my original requirement.
I want to comapre 2 same DB2 tables from different regions. I will provide 3 files as input

First File Contains:
Code:

Name of Field         Key   Starts   Length
Field1                P       1          10
Field2                Y       11        10
Field3                Y       21        10
Field4                N       31        10
Field5                P       41        10


Now here are the meanigs of Key values.
P = Its one of the column of combination of Primary Key of DB2 table.
Y = I want to compare this field for differences.
N = Don't take this field for comparison.

Second and third file contains unload of 2 tables from diff regions. As they are unload of same table, they have same layout (structure). The starting byte and length of fields are mentioned in file1. Do you really need the datatype of these fields ???

Now, I need which rows are inserted, deleted and changed.
I'm fine with the way inserted and deleted rows are getting from Frank's job. But for changed rows I want something different.
So here is sample output

Code:

Primary Key Comb Field1 Field2 Field3 Field4 Field5 --> Header
Primary Key value  xxxx  yyyy   zzzzz aaaa  ffffffff INSERT --from new file
Primary Key value  xxxx  yyyy   zzzzz aaaa  ffffffff INSERT --from new file
Primary Key value  xxxx  yyyy   zzzzz aaaa  ffffffff DELETE --from old file
Primary Key value  ----  yyyy   zzzzz ----  ffffffff CHANGE --from old file
-----------------  ----  SSSS   ASDAAA ---- -------CHANGE --frm new file


So a changed record will have 2 rows with dashes indicating no change in that field and values will indicate they have changes.

Also I want a header as specified.

--Parag
Back to top
View user's profile Send private message
View previous topic :: :: View next topic  
Post new topic   Reply to topic View Bookmarks
All times are GMT + 6 Hours
Forum Index -> DFSORT/ICETOOL

 


Similar Topics
Topic Forum Replies
No new posts Compare only first records of the fil... SYNCSORT 7
No new posts Compare two files with a key and writ... SYNCSORT 3
No new posts Compare latest 2 rows of a table usin... DB2 1
No new posts How to compare two rows of same table DB2 11
No new posts Merging 2 datasets into one DFSORT/ICETOOL 1
Search our Forums:

Back to Top