Joined: 09 Mar 2011 Posts: 7309 Location: Inside the Matrix
No. That is what your existing code deals with. You need to show what you want to happen when there are duplicate key values withing either or both of your input files.
You also show your sample data in key order. Is that correct? If so, specify SORTED on the JOINKEYS statements and get rid of the SORTWKn files altogether.
Joined: 09 Mar 2011 Posts: 7309 Location: Inside the Matrix
You've avoided the question about whether your input is in sequence.
If you have to SORT your input data (it happens by default for each JOINKEYS) then what do you have against SUM FIELDS=NONE?
If you don't have to SORT (so you specify SORTED on the JOINKEYS) then you can use SEQNUM with RESTART= for the key, and have INCLUDE= on your first OUTFILs to just get the first record of each key.
Joined: 09 Mar 2011 Posts: 7309 Location: Inside the Matrix
So your sample data should represent that. Unsorted, both files, duplicates possible (both files?) and then the output you require.
If you need to SORT for JOINKEYS, easiest thing to do to get rid of duplicate keys is SUM FIELDS=NONE. That would be in a JNFnCNTL dataset.
Then the rest of the code does not need to change. Although you could make a change, to use OUTFIL SAVE on the second OUTFIL, so all data not on the first OUTFIL would appear on the second.
Joined: 09 Mar 2011 Posts: 7309 Location: Inside the Matrix
Code:
JOIN UNPAIRED,F1
That bit says "give me all the matched records, and all the unmatched records from F1 as well".
If you have no duplicate keys on F1 and no duplicate keys on F2, you will have, combined, on your two OUTFILs, exactly the same number of records as on F1 (they must be either matched, or unmatched).
With the figures of 700,000 + 25,000 you should get 700,000 records.
If you expect 725,000 records, then you need to explain why.
The only way you will get more than 700,000 records with what you have shown is if there are duplicate keys on F1 or F2 and a match between the files on some duplicate keys.
Code:
File 1:
111111
222222
333333
555555
444444
555555
111111
File 2:
333333
111111
444444
If you use that data you will get:
Code:
111111
111111
333333
444444
222222
555555
555555
Seven records, from your input of six, because the 111111 is duplicate and matched.
There is no need for an extra step, you just need a //JNFnCNTL dataset to include the SUM FIELDS=NONE for the data you have to SORT anyway (in the JOINKEYS).
Which SORT product do you use? And which version? It should say in the sysout from a SORT step.
Joined: 09 Mar 2011 Posts: 7309 Location: Inside the Matrix
Run your SORT with the sample input I've shown. How many records do you get?
Add this to your step:
Code:
//JNF1CNTL DD *
SUM FIELDS=NONE
Run it again. How many records?
If you use ICETOOL's SELECT operator on your files you can easily report on duplicate keys. If they are there and you think they shouldn't be, you need to find out why.