SPLICE/JOINKEYS question

David Eisenberg · New User Joined: 15 Nov 2007 Posts: 39 Location: New York

Frank,

Some time ago, you helped us to solve a problem using SPLICE. The requirements for this problem have changed somewhat, and I'm wondering if JOINKEYS can help us do this more efficiently (there are millions of records involved). I apologize for the lengthy exposition.

There are two input files, F1 and F2. RECFM=VB, LRECL=2000. Both files have the same structure: key begins in column 5 (immediately after the RDW), keylength=20. Each file may contain duplicate keys. The rules for the output file are as follows: for any given key, if the key is present only in F1 or only in F2, we want all records with that key kept in the output file. If the key is present in both files, we want all occurrences of the records from F1 kept for that key, and none from F2 kept. The resulting file must preserve order in the case of duplicate keys.

Here's the SPLICE solution you provided, which works perfectly:

//S1 EXEC PGM=ICETOOL
//TOOLMSG DD SYSOUT=*
//DFSMSG DD SYSOUT=*
//F1 DD DSN=... input file1 (VB/2000)
//F2 DD DSN=... input file2 (VB/2000)
//T1 DD DSN=&&T1,UNIT=SYSDA,SPACE=(CYL,(5,5)),DISP=(MOD,PASS)
//OUT DD DSN=... output file (VB/2000)
//TOOLIN DD *
COPY FROM(F1) TO(T1) USING(CTL1)
COPY FROM(F2) TO(T1) USING(CTL2)
SPLICE FROM(T1) TO(OUT) ON(7,20,CH) KEEPNODUPS KEEPBASE VLENOVLY-
WITHALL WITH(5,1) WITH(7,1996) USING(CTL3)
/*
//CTL1CNTL DD *
INREC BUILD=(1,4,5:C'BB',7:5)
/*
//CTL2CNTL DD *
INREC BUILD=(1,4,5:C'VV',7:5)
/*
//CTL3CNTL DD *
OUTFIL FNAMES=OUT,OMIT=(5,2,CH,EQ,C'VB'),
BUILD=(1,4,5:7)
/*

Now here's our question. F2 is much larger than F1, and F2 is guaranteed to be sorted in advance. Given that, we're wondering if we can improve performance by accomplishing this same task without doing a SPLICE (which always does a sort of the files concatenated together). I guess what I want is a full outer JOIN, but I couldn't figure out how to handle the duplicate key situation, nor could I see how to fulfill the requirement to keep only the F1 records if there were any matching F2 records.

And there's one more question: we have some cases where the requirements are exactly the same, but where we know that F1 will have no duplicates, nor will F2. In this case, what's the most efficient approach?

Thank you so much...

David

Frank Yaeger · Posted: Wed Dec 16, 2009 6:47 am

For the case where you can have duplicates, please show an example of the records in each input file and what you expect for output.

For the case where you cannot have duplicates, please show an example of the records in each input file and what you expect for output.

Do you have the Nov, 2009 DFSORT PTF installed that supports JOINKEYS?

David Eisenberg · New User Joined: 15 Nov 2007 Posts: 39 Location: New York

Frank,

Yes, we have the JOINKEYS PTF installed. (It is very cool; thank you!)

>For the case where you can have duplicates, please show an example of the records in each input file and what you expect for output.<

In all examples below, RECFM=VB, LRECL=2000, keylen=20.

FILE1:
D...................02FROM FILE 1 XXXXXXXXX
E...................03FROM FILE 1 XXXXX
A...................01FROM FILE 1
E...................04FROM FILE 1 XXXXXXXXXXXXXX
H...................06FROM FILE 1 XXXX
H...................05FROM FILE 1 XXXXXX XXXXXXXXXXX

FILE2 (will be sorted in advance):
B...................07FROM FILE 2
D...................08FROM FILE 2 X
D...................14FROM FILE 2 XXXXXXXXXXXXX
F...................09FROM FILE 2 XXXXXXXXXX
F...................10FROM FILE 2 XXX
H...................11FROM FILE 2 XXX
H...................12FROM FILE 2 XXXXXXXX
H...................13FROM FILE 2

SORTOUT:
A...................01FROM FILE 1
B...................07FROM FILE 2
D...................02FROM FILE 1 XXXXXXXXX
E...................03FROM FILE 1 XXXXX
E...................04FROM FILE 1 XXXXXXXXXXXXXX
F...................09FROM FILE 2 XXXXXXXXXX
F...................10FROM FILE 2 XXX
H...................06FROM FILE 1 XXXX
H...................05FROM FILE 1 XXXXXX XXXXXXXXXXX

>For the case where you cannot have duplicates, please show an example of the records in each input file and what you expect for output.<

FILE1:
D...................02FROM FILE 1 XXXXXXXXX
E...................03FROM FILE 1 XXXXX
A...................01FROM FILE 1
H...................06FROM FILE 1 XXXX

FILE2 (will be sorted in advance):
B...................07FROM FILE 2
D...................08FROM FILE 2 X
F...................09FROM FILE 2 XXXXXXXXXX
H...................11FROM FILE 2 XXX

SORTOUT:
A...................01FROM FILE 1
B...................07FROM FILE 2
D...................02FROM FILE 1 XXXXXXXXX
E...................03FROM FILE 1 XXXXX
F...................09FROM FILE 2 XXXXXXXXXX
H...................06FROM FILE 1 XXXX

Please let me know if I can provide any further information. Thank you!

David

David Eisenberg · New User Joined: 15 Nov 2007 Posts: 39 Location: New York

Frank,

Just so there's no confusion... in my second example above, it's only coincidence that both FILE1 and FILE2 have the same number of records (4). In reality, regardless of which scenario we're processing (i.e., multiple keys or unique keys), there will be millions of records in both FILE1 and FILE2, FILE2 is guaranteed to be sorted ahead of time, and FILE2 will be much larger than FILE1. Hence our interest in finding something more efficient than the SPLICE solution at the beginning of this post, presumably via the new features available in the latest PTF.

David

Frank Yaeger · Posted: Thu Dec 17, 2009 2:44 am

Doing what you want with VB files is rather complicated because of the variable parts of each record. You can't really do it in one pass because of the problem of separating out the F1 tail from the F2 tail for paired records.

Here's what I came up with - I don't know if it will be better than the SPLICE solution or not. You could try them both to find out.

David Eisenberg · New User Joined: 15 Nov 2007 Posts: 39 Location: New York

>You can't really do it in one pass because of the problem of separating out the F1 tail from the F2 tail for paired records.<

Frank,

Thank you very much for your solution. I absolutely see your point regarding the difficulty in doing this in a single pass.

I did try an experiment, however, to attempt a single-pass solution at least in the case where there are no duplicates in either F1 or F2. For demonstration (and sanity) purposes, I used LRECL=30 for both F1 and F2, and I allocated a VB SORTOUT dataset with an LRECL=65 (i.e., long enough to hold two VB records plus an indicator byte). I kept the keylength at 20.

Here's F1:

Frank Yaeger · Posted: Thu Dec 17, 2009 11:51 pm

That OUTREC statement is not correct. You need to include the RDW, e.g.

David Eisenberg · New User Joined: 15 Nov 2007 Posts: 39 Location: New York

>I'm on vacation, so I really don't have time to experiment with this.<

Oy... I'm sorry! Please go back to the beach (or slopes, or whatever).

You've convinced me that an E35 exit is the only way to go. That's guaranteed to work; hopefully the JOINKEYS plus the exit will be faster than the original SPLICE.

Thank you, and enjoy your vacation!

David

Frank Yaeger · Posted: Fri Dec 18, 2009 5:53 am

Nothing to be sorry about - it's a "staycation", but I am trying NOT to work so I can recharge my battery.