Some time ago, you helped us to solve a problem using SPLICE. The requirements for this problem have changed somewhat, and I'm wondering if JOINKEYS can help us do this more efficiently (there are millions of records involved). I apologize for the lengthy exposition.
There are two input files, F1 and F2. RECFM=VB, LRECL=2000. Both files have the same structure: key begins in column 5 (immediately after the RDW), keylength=20. Each file may contain duplicate keys. The rules for the output file are as follows: for any given key, if the key is present only in F1 or only in F2, we want all records with that key kept in the output file. If the key is present in both files, we want all occurrences of the records from F1 kept for that key, and none from F2 kept. The resulting file must preserve order in the case of duplicate keys.
Here's the SPLICE solution you provided, which works perfectly:
Now here's our question. F2 is much larger than F1, and F2 is guaranteed to be sorted in advance. Given that, we're wondering if we can improve performance by accomplishing this same task without doing a SPLICE (which always does a sort of the files concatenated together). I guess what I want is a full outer JOIN, but I couldn't figure out how to handle the duplicate key situation, nor could I see how to fulfill the requirement to keep only the F1 records if there were any matching F2 records.
And there's one more question: we have some cases where the requirements are exactly the same, but where we know that F1 will have no duplicates, nor will F2. In this case, what's the most efficient approach?
Just so there's no confusion... in my second example above, it's only coincidence that both FILE1 and FILE2 have the same number of records (4). In reality, regardless of which scenario we're processing (i.e., multiple keys or unique keys), there will be millions of records in both FILE1 and FILE2, FILE2 is guaranteed to be sorted ahead of time, and FILE2 will be much larger than FILE1. Hence our interest in finding something more efficient than the SPLICE solution at the beginning of this post, presumably via the new features available in the latest PTF.
Joined: 15 Feb 2005 Posts: 7129 Location: San Jose, CA
Doing what you want with VB files is rather complicated because of the variable parts of each record. You can't really do it in one pass because of the problem of separating out the F1 tail from the F2 tail for paired records.
Here's what I came up with - I don't know if it will be better than the SPLICE solution or not. You could try them both to find out.
>You can't really do it in one pass because of the problem of separating out the F1 tail from the F2 tail for paired records.<
Frank,
Thank you very much for your solution. I absolutely see your point regarding the difficulty in doing this in a single pass.
I did try an experiment, however, to attempt a single-pass solution at least in the case where there are no duplicates in either F1 or F2. For demonstration (and sanity) purposes, I used LRECL=30 for both F1 and F2, and I allocated a VB SORTOUT dataset with an LRECL=65 (i.e., long enough to hold two VB records plus an indicator byte). I kept the keylength at 20.
By forcing the F2 records to begin at a fixed position in the reformatted file, I had hoped that I could write an OUTREC statement to BUILD the final records I want, and force the correct RDW in each case. I tried this:
ICE126A 2 INCONSISTENT *OUTREC IFTHEN 2 REFORMATTING FIELD FOUND
presumably because I can't build a VB record starting in column 1. If there's a way to make the OUTREC work, that would be the answer (in case the case of no duplicates in F1 or F2).
Even if I can't get an OUTREC to work, however, then I can presumably write an E35 exit to do essentially do the same thing. Even with the overhead of the exit, it might still end up being more efficient than the original SPLICE solution.
But I don't think that will give you what you want for the B records. You'll get the extra bytes from the F2 record for 'B' and trailing blanks that you didn't start with.
If you used BUILD=(1,36) for the NE,C'2' part, you'd get the fixed F1 records with trailing blanks that you didn't start with. Perhaps you could use an OUTFIL statement with VLTRIM=C' ' to remove the trailing blanks?
I'm on vacation, so I really don't have time to experiment with this. Good luck.
>I'm on vacation, so I really don't have time to experiment with this.<
Oy... I'm sorry! Please go back to the beach (or slopes, or whatever).
You've convinced me that an E35 exit is the only way to go. That's guaranteed to work; hopefully the JOINKEYS plus the exit will be faster than the original SPLICE.