IBM Mainframe Forum Index
 
Log In
 
IBM Mainframe Forum Index Mainframe: Search IBM Mainframe Forum: FAQ Register
 

SPLICE/JOINKEYS question


IBM Mainframe Forums -> DFSORT/ICETOOL
Post new topic   Reply to topic
View previous topic :: View next topic  
Author Message
David Eisenberg

New User


Joined: 15 Nov 2007
Posts: 39
Location: New York

PostPosted: Wed Dec 16, 2009 12:44 am
Reply with quote

Frank,

Some time ago, you helped us to solve a problem using SPLICE. The requirements for this problem have changed somewhat, and I'm wondering if JOINKEYS can help us do this more efficiently (there are millions of records involved). I apologize for the lengthy exposition.

There are two input files, F1 and F2. RECFM=VB, LRECL=2000. Both files have the same structure: key begins in column 5 (immediately after the RDW), keylength=20. Each file may contain duplicate keys. The rules for the output file are as follows: for any given key, if the key is present only in F1 or only in F2, we want all records with that key kept in the output file. If the key is present in both files, we want all occurrences of the records from F1 kept for that key, and none from F2 kept. The resulting file must preserve order in the case of duplicate keys.

Here's the SPLICE solution you provided, which works perfectly:

//S1 EXEC PGM=ICETOOL
//TOOLMSG DD SYSOUT=*
//DFSMSG DD SYSOUT=*
//F1 DD DSN=... input file1 (VB/2000)
//F2 DD DSN=... input file2 (VB/2000)
//T1 DD DSN=&&T1,UNIT=SYSDA,SPACE=(CYL,(5,5)),DISP=(MOD,PASS)
//OUT DD DSN=... output file (VB/2000)
//TOOLIN DD *
COPY FROM(F1) TO(T1) USING(CTL1)
COPY FROM(F2) TO(T1) USING(CTL2)
SPLICE FROM(T1) TO(OUT) ON(7,20,CH) KEEPNODUPS KEEPBASE VLENOVLY-
WITHALL WITH(5,1) WITH(7,1996) USING(CTL3)
/*
//CTL1CNTL DD *
INREC BUILD=(1,4,5:C'BB',7:5)
/*
//CTL2CNTL DD *
INREC BUILD=(1,4,5:C'VV',7:5)
/*
//CTL3CNTL DD *
OUTFIL FNAMES=OUT,OMIT=(5,2,CH,EQ,C'VB'),
BUILD=(1,4,5:7)
/*

Now here's our question. F2 is much larger than F1, and F2 is guaranteed to be sorted in advance. Given that, we're wondering if we can improve performance by accomplishing this same task without doing a SPLICE (which always does a sort of the files concatenated together). I guess what I want is a full outer JOIN, but I couldn't figure out how to handle the duplicate key situation, nor could I see how to fulfill the requirement to keep only the F1 records if there were any matching F2 records.

And there's one more question: we have some cases where the requirements are exactly the same, but where we know that F1 will have no duplicates, nor will F2. In this case, what's the most efficient approach?

Thank you so much...

David
Back to top
View user's profile Send private message
Frank Yaeger

DFSORT Developer


Joined: 15 Feb 2005
Posts: 7129
Location: San Jose, CA

PostPosted: Wed Dec 16, 2009 6:47 am
Reply with quote

For the case where you can have duplicates, please show an example of the records in each input file and what you expect for output.

For the case where you cannot have duplicates, please show an example of the records in each input file and what you expect for output.

Do you have the Nov, 2009 DFSORT PTF installed that supports JOINKEYS?
Back to top
View user's profile Send private message
David Eisenberg

New User


Joined: 15 Nov 2007
Posts: 39
Location: New York

PostPosted: Wed Dec 16, 2009 9:29 am
Reply with quote

Frank,

Yes, we have the JOINKEYS PTF installed. (It is very cool; thank you!)

>For the case where you can have duplicates, please show an example of the records in each input file and what you expect for output.<

In all examples below, RECFM=VB, LRECL=2000, keylen=20.

FILE1:
D...................02FROM FILE 1 XXXXXXXXX
E...................03FROM FILE 1 XXXXX
A...................01FROM FILE 1
E...................04FROM FILE 1 XXXXXXXXXXXXXX
H...................06FROM FILE 1 XXXX
H...................05FROM FILE 1 XXXXXX XXXXXXXXXXX

FILE2 (will be sorted in advance):
B...................07FROM FILE 2
D...................08FROM FILE 2 X
D...................14FROM FILE 2 XXXXXXXXXXXXX
F...................09FROM FILE 2 XXXXXXXXXX
F...................10FROM FILE 2 XXX
H...................11FROM FILE 2 XXX
H...................12FROM FILE 2 XXXXXXXX
H...................13FROM FILE 2

SORTOUT:
A...................01FROM FILE 1
B...................07FROM FILE 2
D...................02FROM FILE 1 XXXXXXXXX
E...................03FROM FILE 1 XXXXX
E...................04FROM FILE 1 XXXXXXXXXXXXXX
F...................09FROM FILE 2 XXXXXXXXXX
F...................10FROM FILE 2 XXX
H...................06FROM FILE 1 XXXX
H...................05FROM FILE 1 XXXXXX XXXXXXXXXXX

>For the case where you cannot have duplicates, please show an example of the records in each input file and what you expect for output.<

FILE1:
D...................02FROM FILE 1 XXXXXXXXX
E...................03FROM FILE 1 XXXXX
A...................01FROM FILE 1
H...................06FROM FILE 1 XXXX

FILE2 (will be sorted in advance):
B...................07FROM FILE 2
D...................08FROM FILE 2 X
F...................09FROM FILE 2 XXXXXXXXXX
H...................11FROM FILE 2 XXX

SORTOUT:
A...................01FROM FILE 1
B...................07FROM FILE 2
D...................02FROM FILE 1 XXXXXXXXX
E...................03FROM FILE 1 XXXXX
F...................09FROM FILE 2 XXXXXXXXXX
H...................06FROM FILE 1 XXXX

Please let me know if I can provide any further information. Thank you!

David
Back to top
View user's profile Send private message
David Eisenberg

New User


Joined: 15 Nov 2007
Posts: 39
Location: New York

PostPosted: Wed Dec 16, 2009 11:47 pm
Reply with quote

Frank,

Just so there's no confusion... in my second example above, it's only coincidence that both FILE1 and FILE2 have the same number of records (4). In reality, regardless of which scenario we're processing (i.e., multiple keys or unique keys), there will be millions of records in both FILE1 and FILE2, FILE2 is guaranteed to be sorted ahead of time, and FILE2 will be much larger than FILE1. Hence our interest in finding something more efficient than the SPLICE solution at the beginning of this post, presumably via the new features available in the latest PTF.

David
Back to top
View user's profile Send private message
Frank Yaeger

DFSORT Developer


Joined: 15 Feb 2005
Posts: 7129
Location: San Jose, CA

PostPosted: Thu Dec 17, 2009 2:44 am
Reply with quote

Doing what you want with VB files is rather complicated because of the variable parts of each record. You can't really do it in one pass because of the problem of separating out the F1 tail from the F2 tail for paired records.

Here's what I came up with - I don't know if it will be better than the SPLICE solution or not. You could try them both to find out.

Code:

//S2    EXEC  PGM=ICETOOL
//TOOLMSG   DD  SYSOUT=*
//DFSMSG   DD  SYSOUT=*
//IN1 DD DSN=... input file1 (VB/2000)
//IN2 DD DSN=... input file2 (VB/2000)
//TEMP1 DD DSN=&&TEMP1,UNIT=SYSDA,SPACE=(CYL,(5,5)),DISP=(MOD,PASS)
//OUT DD DSN=...  output file (VB/2000)
//TOOLIN DD *
COPY JKFROM TO(TEMP1) USING(CTL1)
COPY JKFROM TO(TEMP1) USING(CTL2)
SORT FROM(TEMP1) TO(OUT) USING(CTL3)
//CTL1CNTL DD *
  JOINKEYS F1=IN1,FIELDS=(5,20,A),TASKID=I1
  JOINKEYS F2=IN2,FIELDS=(5,20,A),SORTED,TASKID=I1
  JOIN UNPAIRED,F1,F2,ONLY
/*
//CTL2CNTL DD *
  JOINKEYS F1=IN1,FIELDS=(5,20,A),TASKID=I2
  JOINKEYS F2=IN2,FIELDS=(5,20,A),TASKID=I2
  REFORMAT FIELDS=(F1:1,4,5)
//I2F2CNTL DD *
  SUM FIELDS=NONE
//CTL3CNTL DD *
  SORT FIELDS=(5,20,CH,A)
/*


In the case where you don't have duplicates in F2, you can use the following in CTL2CNTL:

Code:

  JOINKEYS F2=IN2,FIELDS=(5,20,A),TASKID=I2,SORTED


and you don't need:

Code:

//I2F2CNTL DD *
  SUM FIELDS=NONE


which could improve performance.
Back to top
View user's profile Send private message
David Eisenberg

New User


Joined: 15 Nov 2007
Posts: 39
Location: New York

PostPosted: Thu Dec 17, 2009 9:25 pm
Reply with quote

>You can't really do it in one pass because of the problem of separating out the F1 tail from the F2 tail for paired records.<

Frank,

Thank you very much for your solution. I absolutely see your point regarding the difficulty in doing this in a single pass.

I did try an experiment, however, to attempt a single-pass solution at least in the case where there are no duplicates in either F1 or F2. For demonstration (and sanity) purposes, I used LRECL=30 for both F1 and F2, and I allocated a VB SORTOUT dataset with an LRECL=65 (i.e., long enough to hold two VB records plus an indicator byte). I kept the keylength at 20.

Here's F1:
Code:
A000000000000000000011---
C000000000000000000012-
D000000000000000000013----
F000000000000000000014
J000000000000000000015----

Here's F2:
Code:
A000000000000000000021+++
B000000000000000000022+
D000000000000000000023++++
E000000000000000000024++++
F000000000000000000025
J000000000000000000026++

Then I did this:

Code:
JOINKEYS FILE=F1,FIELDS=(5,20,A)
JOINKEYS FILE=F2,FIELDS=(5,20,A),SORTED
JOIN UNPAIRED,F1,F2
REFORMAT FIELDS=(F1:1,4,?,F1:1,30,F2:1)

That gave me this reformatted file (I've shown the RDWs in hex):

Code:
B<x001D0000>A000000000000000000011--- <x001D0000>A000000000000000000021+++
2<x00000000>                          <x001B0000>B000000000000000000022+
1<x001B0000>C000000000000000000012-
B<x001E0000>D000000000000000000013----<x001E0000>D000000000000000000023++++
2<x00000000>                          <x001E0000>E000000000000000000024++++
B<x001A0000>F000000000000000000014    <x001A0000>F000000000000000000025
B<x001E0000>J000000000000000000015----<x001C0000>J000000000000000000026++ 

By forcing the F2 records to begin at a fixed position in the reformatted file, I had hoped that I could write an OUTREC statement to BUILD the final records I want, and force the correct RDW in each case. I tried this:

Code:
OUTREC IFTHEN=(WHEN=(5,1,CH,EQ,C'2'),BUILD=(1:36)),
       IFTHEN=(WHEN=(5,1,CH,NE,C'2'),BUILD=(1:6)) 

But that produces this error message:

Code:
ICE126A 2 INCONSISTENT *OUTREC  IFTHEN 2 REFORMATTING FIELD FOUND

presumably because I can't build a VB record starting in column 1. If there's a way to make the OUTREC work, that would be the answer (in case the case of no duplicates in F1 or F2).

Even if I can't get an OUTREC to work, however, then I can presumably write an E35 exit to do essentially do the same thing. Even with the overhead of the exit, it might still end up being more efficient than the original SPLICE solution.

David
Back to top
View user's profile Send private message
Frank Yaeger

DFSORT Developer


Joined: 15 Feb 2005
Posts: 7129
Location: San Jose, CA

PostPosted: Thu Dec 17, 2009 11:51 pm
Reply with quote

That OUTREC statement is not correct. You need to include the RDW, e.g.

Code:

   OUTREC IFTHEN=(WHEN=(5,1,CH,EQ,C'2'),BUILD=(1,4,36)),
       IFTHEN=(WHEN=(5,1,CH,NE,C'2'),BUILD=(1,4,6)) 


But I don't think that will give you what you want for the B records. You'll get the extra bytes from the F2 record for 'B' and trailing blanks that you didn't start with.

If you used BUILD=(1,36) for the NE,C'2' part, you'd get the fixed F1 records with trailing blanks that you didn't start with. Perhaps you could use an OUTFIL statement with VLTRIM=C' ' to remove the trailing blanks?

I'm on vacation, so I really don't have time to experiment with this. Good luck.
Back to top
View user's profile Send private message
David Eisenberg

New User


Joined: 15 Nov 2007
Posts: 39
Location: New York

PostPosted: Fri Dec 18, 2009 12:43 am
Reply with quote

>I'm on vacation, so I really don't have time to experiment with this.<

Oy... I'm sorry! Please go back to the beach (or slopes, or whatever).

You've convinced me that an E35 exit is the only way to go. That's guaranteed to work; hopefully the JOINKEYS plus the exit will be faster than the original SPLICE.

Thank you, and enjoy your vacation!

David
Back to top
View user's profile Send private message
Frank Yaeger

DFSORT Developer


Joined: 15 Feb 2005
Posts: 7129
Location: San Jose, CA

PostPosted: Fri Dec 18, 2009 5:53 am
Reply with quote

Nothing to be sorry about - it's a "staycation", but I am trying NOT to work so I can recharge my battery.
Back to top
View user's profile Send private message
View previous topic :: :: View next topic  
Post new topic   Reply to topic View Bookmarks
All times are GMT + 6 Hours
Forum Index -> DFSORT/ICETOOL

 


Similar Topics
Topic Forum Replies
No new posts Joinkeys - 5 output files DFSORT/ICETOOL 7
No new posts PD not working for unsigned packed JO... DFSORT/ICETOOL 5
No new posts Def PD not working for unsigned packe... JCL & VSAM 3
No new posts Join multiple records using splice DFSORT/ICETOOL 5
No new posts Question for file manager IBM Tools 7
Search our Forums:

Back to Top