I have two files where I can have at most one occurance of a key on each of the files. I want to merge them together and only end up with records where the key in question only occurs once and only once with respect to both files(i.e. remove key duplicates). I did my sort and a lot of the duplicates were removed but I've ended up with cases where I know there was a duplicate key but only one of the records was dropped.
Here's a record in the first file:
Code:
----+----1----+----2----+----3----+----4----+----5----+----6----+
0005583460001455123 BUSINESS
and here's the second file:
Code:
Command ===>
----+----1----+----2----+----3----+----4----+----5----
4113008124495000005583460001455123 NCH BUSINESS
Now I was expecting the records shown to disappear since the keys match. But this is what I ended up with:
Code:
Command ===>
----+----1----+----2----+----3----+----4----+----5---
4113008124495000005583460001455123 NCH BUSINESS
Only one of the two records with the key dropped, instead of both. Below is what I submitted, any ideas on what seems to have gone wrong here? There are about 8,000,000 records per file, could this be related to not giving enough sort space?
Code:
********************************* TOP OF DATA **********************************
ICE143I 0 BLOCKSET SORT TECHNIQUE SELECTED
ICE250I 0 VISIT http://www.ibm.com/storage/dfsort FOR DFSORT PAPERS, EXAMPLES AN
ICE000I 1 - CONTROL STATEMENTS FOR 5694-A01, Z/OS DFSORT V1R5 - 15:39 ON TUE JUL
SORT FIELDS=(16,19,CH,A,44,10,CH,A)
SUM FIELDS=NONE
ICE201I E RECORD TYPE IS F - DATA STARTS IN POSITION 1
ICE751I 0 C5-K21008 C6-K90007 C7-K90000 C8-K90007 E9-K90007 C9-BASE E5-K21514
ICE193I 0 ICEAM1 ENVIRONMENT IN EFFECT - ICEAM1 INSTALLATION MODULE SELECTED
ICE088I 1 RJMSORTI.SORT01 . , INPUT LRECL = 63, BLKSIZE = 27972, TYPE =
ICE093I 0 MAIN STORAGE = (MAX,12420848,12420848)
ICE156I 0 MAIN STORAGE ABOVE 16MB = (12290872,12290872)
ICE127I 0 OPTIONS: OVFLO=RC0 ,PAD=RC0 ,TRUNC=RC0 ,SPANINC=RC16,VLSCMP=N,SZERO=Y,
ICE128I 0 OPTIONS: SIZE=12420848,MAXLIM=1048576,MINLIM=450560,EQUALS=Y,LIST=Y,ER
ICE129I 0 OPTIONS: VIO=Y,RESDNT=ALL ,SMF=FULL ,WRKSEC=Y,OUTSEC=Y,VERIFY=N,CHALT=
ICE130I 0 OPTIONS: RESALL=4096,RESINV=0,SVC=109 ,CHECK=Y,WRKREL=Y,OUTREL=Y,CKPT=
ICE131I 0 OPTIONS: TMAXLIM=6291456,ARESALL=0,ARESINV=0,OVERRGN=65536,CINV=Y,CFW=
ICE132I 0 OPTIONS: VLSHRT=N,ZDPRINT=Y,IEXIT=Y,TEXIT=N,LISTX=N,EFS=NONE ,EXITC
ICE133I 0 OPTIONS: HIPRMAX=3901 ,DSPSIZE=MAX ,ODMAXBF=0,SOLRF=N,VLLONG=N,VSAMI
ICE235I 0 OPTIONS: NULLOUT=RC0
ICE084I 0 EXCP ACCESS METHOD USED FOR SORTOUT
ICE084I 0 EXCP ACCESS METHOD USED FOR SORTIN
ICE750I 0 DC 985341672 TC 0 CS DSVUU KSZ 33 VSZ 33
ICE752I 0 FSZ=15640344 RC IGN=0 E AVG=68 0 WSP=1381360 C DYN=0 0
ICE751I 1 DE-K10929 D5-K05352 D3-K10929 D7-Q91626 E8-K21008
ICE090I 0 OUTPUT LRECL = 63, BLKSIZE = 27972, TYPE = FB
ICE055I 0 INSERT 0, DELETE 7796071
ICE054I 0 RECORDS - IN: 15639609, OUT: 7843538
ICE134I 0 NUMBER OF BYTES SORTED: 985295367
ICE165I 0 TOTAL WORK DATA SET TRACKS ALLOCATED: 9000 , TRACKS USED: 0
ICE199I 0 MEMORY OBJECT STORAGE USED = 0M BYTES
ICE180I 0 HIPERSPACE STORAGE USED = 1047572K BYTES
ICE188I 0 DATA SPACE STORAGE USED = 0K BYTES
ICE052I 0 END OF DFSORT
******************************** BOTTOM OF DATA ************************
If you're not familiar with DFSORT and DFSORT's ICETOOL, I'd suggest reading through "z/OS DFSORT: Getting Started". It's an excellent tutorial, with lots of examples, that will show you how to use DFSORT, DFSORT's ICETOOL and DFSORT Symbols. You can access it online, along with all of the other DFSORT books, from:
Frank - I win the dummy award, nine times out of ten I remember that sum fields = none means collapse down to one record per key. This time I got it into my head that it meant eliminate all records with a multiple occur of a key. Oops.
What I'd ideally like is:
- keep only records where a key occurs more than once (or a specified number of times)
- in a pinch I could also work with keep one and only one record for each key that occurs more than once (I already know how to do this another way).
To do the first could I mod your example to use 'DUPS' instead of 'NODUPS'? I'll start also looking at the manuals you suggest.