Portal | Manuals | References | Downloads | Info | Programs | JCLs | Master the Mainframes
IBM Mainframe Computers Forums Index
 
Register
 
IBM Mainframe Computers Forums Index Mainframe: Search IBM Mainframe Forum: FAQ Memberlist Usergroups Profile Log in to check your private messages Log in
 

 

Compare datasets - can this be further improved?

 
Post new topic   Reply to topic    IBMMAINFRAMES.com Support Forums -> DFSORT/ICETOOL
View previous topic :: :: View next topic  
Author Message
Claes Norreen

Active User


Joined: 20 Dec 2005
Posts: 137
Location: Denmark

PostPosted: Thu May 10, 2007 5:54 pm    Post subject: Compare datasets - can this be further improved?
Reply with quote

Once upon a time, Frank helped me solving the following task:

Compare two datasets with a common key, and extract those being new (in dataset2, but not dataset1), the deleted ones (in dataset2, but not dataset1) and the updated ones (where the data was changed from dataset1 to dataset2). So if we had these data:

Dataset1
KEY1DATA1
KEY2DATA2
KEY4DATA4

Dataset2
KEY1DATA1
KEY2DATA5
KEY3DATA3

the output would be:
KEY2DATA2UOLD
KEY2DATA5UNEW
KEY3DATA3DOLD
KEY4DATA4INEW

Where U=Update, I=Insert, D=Delete. Note, that KEY1 doesn't change from dataset1 to dataset2, so it's not represented in the output.

Frank came up with this piece of code:
Code:

//ICETOOL EXEC PGM=ICETOOL       
//OLD          DD dataset1
//NEW          DD dataset2
//T1           DD DSN=&&T1,SPACE=(CYL,(50,10)),DISP=(NEW,PASS)
//T2           DD DSN=&&T2,SPACE=(CYL,(50,10)),DISP=(NEW,PASS)
//T3           DD DSN=&&T3,SPACE=(CYL,(50,10)),DISP=(NEW,PASS)
//T4           DD DSN=&&T4,SPACE=(CYL,(50,10)),DISP=(NEW,PASS)
//CON1         DD DSN=*.T1,VOL=REF=*.T1,DISP=(OLD,DELETE)
//             DD DSN=*.T2,VOL=REF=*.T2,DISP=(OLD,DELETE)
//INS          DD DSN=&&INS,SPACE=(CYL,(50,10)),DISP=(NEW,PASS)
//DEL          DD DSN=&&DEL,SPACE=(CYL,(50,10)),DISP=(NEW,PASS)
//UPD          DD DSN=&&UPD,SPACE=(CYL,(50,10)),DISP=(NEW,PASS)
//CON2         DD DSN=*.DEL,VOL=REF=*.DEL,DISP=(OLD,DELETE)
//             DD DSN=*.INS,VOL=REF=*.INS,DISP=(OLD,DELETE)
//             DD DSN=*.UPD,VOL=REF=*.UPD,DISP=(OLD,DELETE)
//RESULT       DD outputdsn
//TOOLMSG      DD SYSOUT=*
//DFSMSG       DD SYSOUT=*
//TOOLIN DD *
* Add 'OLD' to the OLD records                         
  COPY FROM(OLD) TO(T1) USING(CTL1)                   
* Add 'NEW' to the NEW records                         
  COPY FROM(NEW) TO(T2) USING(CTL2)                   
* Remove the unchanged records, keeping only the       
* deleted, inserted and updated records in T3.         
  SELECT FROM(CON1) TO(T3) ON(1,reclgh,CH) NODUPS
* Write the deleted records ('OLD') to DEL and add 'D'.
* Write the inserted records ('NEW') to INS and add 'I'.
  SELECT FROM(T3) TO(DEL) ON(1,keylgh,CH) NODUPS USING(CTL3)                                 
* Write the updated records to UPD and add 'U'.       
  SELECT FROM(T3) TO(UPD) ON(1,keylgh,CH) ALLDUPS USING(CTL4)
* Sort the records together on the key.
  SORT FROM(CON2) TO(RESULT) USING(CTL5)
//CTL1CNTL DD *                           
  OUTFIL FNAMES=T1,OUTREC=(1,reclgh,reclgh+2:C'OLD')
//CTL2CNTL DD *                           
  OUTFIL FNAMES=T2,OUTREC=(1,reclgh,reclgh+2:C'NEW')
//CTL3CNTL DD *               
  OUTFIL FNAMES=DEL,             
    INCLUDE=(reclgh+2,3,CH,EQ,C'OLD'),
    OUTREC=(1,reclgh,C'D',reclgh+2,3) 
  OUTFIL FNAMES=INS,             
    INCLUDE=(reclgh+2,3,CH,EQ,C'NEW'),
    OUTREC=(1,reclgh,C'D',reclgh+2,3) 
//CTL4CNTL DD *               
  OUTFIL FNAMES=UPD,             
    OUTREC=(1,reclgh,C'U',reclgh+2,3) 
//CTL5CNTL DD *                   
  SORT FIELDS=(1,keylgh,CH,A,reclgh+1,4,CH,A)

Using the newest stuff in ICETOOL, I improved it, here's the code:
Code:

//ICETOOL EXEC PGM=ICETOOL   
//OLD          DD dataset1
//NEW          DD dataset2
//T1           DD DSN=&&T1,SPACE=(CYL,(200,100),RLSE),DISP=(MOD,PASS)
//T2           DD DSN=&&T2,SPACE=(CYL,(200,100),RLSE),DISP=(NEW,DELETE)
//IAD          DD DSN=&&IAD,SPACE=(CYL,(200,100),RLSE),DISP=(NEW,PASS)
//UPD          DD DSN=&&UPD,SPACE=(CYL,(200,100),RLSE),DISP=(NEW,PASS)
//CON          DD DSN=*.IAD,VOL=REF=*.IAD,DISP=(OLD,DELETE)           
//             DD DSN=*.UPD,VOL=REF=*.UPD,DISP=(OLD,DELETE)       
//RESULT       DD outputdsn
//TOOLMSG      DD SYSOUT=*
//DFSMSG       DD SYSOUT=*
//TOOLIN DD *
* Add 'DOLD' to the OLD records                         
  COPY FROM(IN1) TO(T1) USING(CTL1)             
* Add 'INEW' to the NEW records                                               
  COPY FROM(IN2) TO(T1) USING(CTL2)                                 
* Remove the unchanged records, keeping only the       
* deleted, inserted and updated records in T2.         
  SELECT FROM(T1) TO(T2) ON(1,reclgh,CH) NODUPS                       
* Write the deleted records ('OLD') to DEL. No need to add 'D'.
* Write the inserted records ('NEW') to INS. No need to add 'I'.
* The discarded (duplicate) records have an 'U' overlayed.
  SELECT FROM(T2) TO(IAD) ON(1,keylgh,CH) NODUPS DISCARD(UPD) USING(CTL3)
* Sort the records together on the key.
  SORT FROM(CON) TO(OUT) USING(CTL4)                                 
//CTL1CNTL DD *                                                     
  INREC BUILD=(1,reclgh,C'DOLD')                                       
//CTL2CNTL DD *                                                     
  INREC BUILD=(1,reclgh,C'INEW')                                       
//CTL3CNTL DD *                                                     
  OUTFIL FNAMES=IAD                                                 
  OUTFIL FNAMES=UPD,OVERLAY=(reclgh+1:C'U')                             
//CTL4CNTL DD *                                                     
  SORT FIELDS=(1,keylgh,CH,A,reclgh+1,4,CH,A)

This code is a lot shorter, and it also reduces CPU usage by approx. 3%.

My question is: Can it be improved even further? I experimented a bit using SPLICE, but couldn't get it right. I thought, maybe some of you experts have a great idea? icon_biggrin.gif
Back to top
View user's profile Send private message

Alain Benveniste

New User


Joined: 14 Feb 2005
Posts: 88

PostPosted: Fri May 11, 2007 12:10 am    Post subject:
Reply with quote

Claes,

Do you have duplicates in dataset 1 &/or 2 ?

Alain
Back to top
View user's profile Send private message
Claes Norreen

Active User


Joined: 20 Dec 2005
Posts: 137
Location: Denmark

PostPosted: Fri May 11, 2007 3:38 am    Post subject:
Reply with quote

Hi Alain,

Thanks, I forgot to mention that duplicates are NOT allowed. All keys must be unique, and also the two datasets must be equally long.
Back to top
View user's profile Send private message
Alain Benveniste

New User


Joined: 14 Feb 2005
Posts: 88

PostPosted: Fri May 11, 2007 1:52 pm    Post subject:
Reply with quote

Claes,

One more question :

the field to compare is really 5 in length ?

Alain
Back to top
View user's profile Send private message
Claes Norreen

Active User


Joined: 20 Dec 2005
Posts: 137
Location: Denmark

PostPosted: Fri May 11, 2007 2:22 pm    Post subject:
Reply with quote

I'm not sure what you mean..?
Back to top
View user's profile Send private message
Alain Benveniste

New User


Joined: 14 Feb 2005
Posts: 88

PostPosted: Fri May 11, 2007 2:46 pm    Post subject:
Reply with quote

Oh I see, I mean DATA1, DATA2 ...
Back to top
View user's profile Send private message
Claes Norreen

Active User


Joined: 20 Dec 2005
Posts: 137
Location: Denmark

PostPosted: Fri May 11, 2007 2:56 pm    Post subject:
Reply with quote

Ah ok, well this is just an example. In the example I give, the keylgh is 4 and the reclgh is 9, but - of course - the job should be able to handle any keylgh and reclgh.
Back to top
View user's profile Send private message
Alain Benveniste

New User


Joined: 14 Feb 2005
Posts: 88

PostPosted: Sat May 12, 2007 1:28 am    Post subject:
Reply with quote

Claes,

You can test this one
Code:

//STEP0001 EXEC PGM=ICETOOL
//TOOLMSG  DD SYSOUT=*
//DFSMSG   DD SYSOUT=*
//IN       DD *
CLAES NORREEN
/*
//         DD *
KEY1DATA1
KEY2DATA2
KEY4DATA4
/*
//         DD *
CLAES NORREEN
/*
//         DD *
KEY1DATA1
KEY2DATA5
KEY3DATA3
/*
//TOOLIN   DD *
  SELECT FROM(IN) TO(OUT1) USNG(ICE0) ON(1,9,CH) NODUPS
  SPLICE FROM(OUT1) TO(OUTX) USING(ICE1) ON(1,4,CH) WITH(1,85) -
         KEEPBASE KEEPNODUPS
/*
//OUT1     DD DSN=&&OUT1,SPACE=(CYL,(1,1),DISP=(NEW,DELETE,DELETE)
//OUTX     DD SYSOUT=*
//ICE0CNTL DD *
  INREC IFTHEN=(WHEN=INIT,
                OVERLAY=(81:SEQNUM,5,ZD)),
        IFTHEN=(WHEN=(1,13,CH,NE,C'CLAES NORREEN'),
                OVERLAY=(86:SEQNUM,5,ZD,
                         81:81,5,ZD,SUB,86,5,ZD,M11,LENGTH=5))
  SORT FIELDS=(1,4,CH,A,5,5,CH,D)
  OUTFIL FNAMES=OUT1,
         OMIT=(1,13,CH,EQ,C'CLAES NORREEN'),
         OVERLAY=(86:SEQNUM,5,ZD,RESTART=(1,4))
/*
//ICE1CNTL DD *
  SORT FIELDS=(1,4,CH,A,86,5,ZD,D)
  OUTFIL FNAMES=OUTX,
         IFTHEN=(WHEN=(81,10,CH,EQ,C'0000100001'),
                 OVERLAY=(81:C'INEW')),
         IFTHEN=(WHEN=(81,10,CH,EQ,C'0000100002'),
                 OVERLAY=(81:C'UOLD')),
         IFTHEN=(WHEN=(81,10,CH,EQ,C'0000200001'),
                 OVERLAY=(81:C'DOLD')),
         IFTHEN=(WHEN=(81,10,CH,EQ,C'0000200002'),
                 OVERLAY=(81:C'UNEW')),
         IFOUTLEN=84
/*

Alain
Back to top
View user's profile Send private message
Claes Norreen

Active User


Joined: 20 Dec 2005
Posts: 137
Location: Denmark

PostPosted: Mon May 14, 2007 11:40 am    Post subject:
Reply with quote

Hi Alain,

Thank you very much for your reply and you efforts! Your code works fine on the testdata, but the dummy "CLAES NORREEN" record makes it hard to use in real life, as I will have to make this record equally long with the data records (which selsom are shorter than 80 bytes). Can it be done without the dummy record?
Back to top
View user's profile Send private message
Claes Norreen

Active User


Joined: 20 Dec 2005
Posts: 137
Location: Denmark

PostPosted: Mon May 14, 2007 11:56 am    Post subject:
Reply with quote

By the way.... If we can avoid the dummy record, the code looks very promising! Should be much faster than my code, as you save two COPY operations and a SORT operation! icon_biggrin.gif
Back to top
View user's profile Send private message
Claes Norreen

Active User


Joined: 20 Dec 2005
Posts: 137
Location: Denmark

PostPosted: Mon May 14, 2007 12:12 pm    Post subject:
Reply with quote

Now I see the problem..: By inputting both datasets as one, you can't know which is "old" and which is "new" without the DUMMY. You need some kind of seperator between "old" and "new".

I'll do a performance test now... ;-)
Back to top
View user's profile Send private message
Claes Norreen

Active User


Joined: 20 Dec 2005
Posts: 137
Location: Denmark

PostPosted: Mon May 14, 2007 12:59 pm    Post subject:
Reply with quote

Performance test with 500,000 records with LRECL=1159

My code:
Code:

  EXCP   CONN    TCB    SRB  CLOCK   SERV  PG  PAGE  SWAP   
53087  99245    .16    .01    2.3  3097K   0     0     0     0     0
         TOTAL TCB CPU TIME=   .16  TOTAL ELAPSED TIME=   2.3       

Your code:
Code:

  EXCP   CONN    TCB    SRB  CLOCK   SERV  PG  PAGE  SWAP   
 51962  43258    .25    .01    1.5  4441K   0     0     0     0
          TOTAL TCB CPU TIME=   .25  TOTAL ELAPSED TIME=   1.5

Not sure why, but your code seems to use more CPU (although the elaps time is less)... All that calculating must be the reason..?
Back to top
View user's profile Send private message
Alain Benveniste

New User


Joined: 14 Feb 2005
Posts: 88

PostPosted: Mon May 14, 2007 1:03 pm    Post subject:
Reply with quote

Claes,

The 2 records must be present. This method is explained here

http://www-304.ibm.com/jct01004c/systems/support/storage/software/sort/mvs/tricks/index.html
and look 'Include or omit groups of records'
You just need to create one record of your choice in a file and concatenate it as shown in the JCL. This record must be UNIQUE vs the file you want to treat.

Alain
Back to top
View user's profile Send private message
Claes Norreen

Active User


Joined: 20 Dec 2005
Posts: 137
Location: Denmark

PostPosted: Mon May 14, 2007 1:05 pm    Post subject:
Reply with quote

Franks original code:

Code:

  EXCP   CONN    TCB    SRB  CLOCK   SERV  PG  PAGE  SWAP  VIO
53649  61055    .17    .01    1.9  3217K   0     0     0     0     
         TOTAL TCB CPU TIME=   .17  TOTAL ELAPSED TIME=   2.0       
Back to top
View user's profile Send private message
Claes Norreen

Active User


Joined: 20 Dec 2005
Posts: 137
Location: Denmark

PostPosted: Mon May 14, 2007 1:21 pm    Post subject:
Reply with quote

The major concern at my company is CPU usage, so I need to go with my own code (for now).

Thanks again Alain! :-)
Back to top
View user's profile Send private message
ParagChouguley

Active User


Joined: 03 Feb 2007
Posts: 175
Location: PUNE(INDIA)

PostPosted: Mon Jan 21, 2008 7:13 pm    Post subject:
Reply with quote

Hi Frank and others,
I too have a requirement which is an extension to Claes's requirement.
Reffering to output of Frank's original job's output given by Claes:
Quote:

Dataset1
KEY1DATA1
KEY2DATA2
KEY4DATA4

Dataset2
KEY1DATA1
KEY2DATA5
KEY3DATA3

the output would be:
KEY2DATA2UOLD
KEY2DATA5UNEW
KEY3DATA3DOLD
KEY4DATA4INEW



Now, its clear that the job gives records inserted, deleted and changed with previous and chnaged data or whatever. But for the records changed, I want which part of data exactly in changed

Now here is my original requirement.
I want to comapre 2 same DB2 tables from different regions. I will provide 3 files as input

First File Contains:
Code:

Name of Field         Key   Starts   Length
Field1                P       1          10
Field2                Y       11        10
Field3                Y       21        10
Field4                N       31        10
Field5                P       41        10


Now here are the meanigs of Key values.
P = Its one of the column of combination of Primary Key of DB2 table.
Y = I want to compare this field for differences.
N = Don't take this field for comparison.

Second and third file contains unload of 2 tables from diff regions. As they are unload of same table, they have same layout (structure). The starting byte and length of fields are mentioned in file1. Do you really need the datatype of these fields ???

Now, I need which rows are inserted, deleted and changed.
I'm fine with the way inserted and deleted rows are getting from Frank's job. But for changed rows I want something different.
So here is sample output

Code:

Primary Key Comb Field1 Field2 Field3 Field4 Field5 --> Header
Primary Key value  xxxx  yyyy   zzzzz aaaa  ffffffff INSERT --from new file
Primary Key value  xxxx  yyyy   zzzzz aaaa  ffffffff INSERT --from new file
Primary Key value  xxxx  yyyy   zzzzz aaaa  ffffffff DELETE --from old file
Primary Key value  ----  yyyy   zzzzz ----  ffffffff CHANGE --from old file
-----------------  ----  SSSS   ASDAAA ---- -------CHANGE --frm new file


So a changed record will have 2 rows with dashes indicating no change in that field and values will indicate they have changes.

Also I want a header as specified.

--Parag
Back to top
View user's profile Send private message
View previous topic :: :: View next topic  
Post new topic   Reply to topic    IBMMAINFRAMES.com Support Forums -> DFSORT/ICETOOL All times are GMT + 6 Hours
Page 1 of 1

 

Search our Forum:

Similar Topics
Topic Author Forum Replies Posted
No new posts REXX to get the all datasets name parasmalik20 CLIST & REXX 5 Tue Jul 18, 2017 9:10 pm
No new posts compare 2 rows within the same file ram_vizag SYNCSORT 7 Wed Jun 14, 2017 12:34 am
No new posts member list compare jzhardy TSO/ISPF 6 Fri May 12, 2017 3:18 pm
This topic is locked: you cannot edit posts or make replies. SDSF multiple spool datasets extracte... PJAlarcon CLIST & REXX 1 Fri Apr 21, 2017 10:50 pm
No new posts Edit large number of datasets (QSAM) zh_lad TSO/ISPF 3 Tue Apr 04, 2017 6:08 pm


Facebook
Back to Top
 
Mainframe Wiki | Forum Rules | Bookmarks | Subscriptions | FAQ | Tutorials | Contact Us