Subject :
SORT CARD to remove duplicates and add new column with record count
I have a input file with following format.
File format = VB
LRECL = 189
Out of 189 characters, I'm interested in positions 1-8, 16-8 and 27-7.
If any 2 records have same values at these positions, Output file should retain the first record and count the total number of records in a new column removing the duplicate records.
Joined: 09 Mar 2011 Posts: 7309 Location: Inside the Matrix
Quote:
If any 2 records have same values at these positions, Output file should retain the first record and count the total number of records in a new column removing the duplicate records.
The TRAILER3 will use the final OUTFIL record which is duplicate according to the SECTIONS description.
With NOEQUALS, this will sometimes be the first input record of that key, and sometimes not. With EQUALS, it will never be the first input record of that key.
With SUM (since the SORT is needed) and NOEQUALS it will sometimes be the first input record of that key, and sometimes not, that is the base. With SUM and EQUALS it will always be the first input record of that key that is the base.
I have come up with one more solution for it but this requires 2 sort cards, I have tried this for FB file but I am sure you can modify the starting positions and use the same for VB file also,
first sort card would be like
SORT FIELDS=COPY
OUTREC FIELDS=(1:1,186,187,3,C'001')
Use output file of first sort and use this sort card
SORT FIELDS=(1,8,CH,A,16,8,CH,A,27,03,CH,A)
SUM FIELDS=(190,3,ZD)
Joined: 09 Mar 2011 Posts: 7309 Location: Inside the Matrix
Sumit,
Please don't use FIELDS when using OUTREC (or INREC, or OUTFIL OUTREC), use BUILD, which reduces the possibility of FIELDS (or OUTFIL OUTREC) getting confused with the other uses of FIELDS (you have two more in your example). In this case, OVERLAY is clearer anyway.
If you use INREC to process before the SORT, you don't need the two steps/two passes of the file. You also need EQUALS to ensue the first record read is the first record retained by SUM when there are duplicate keys. Unless "leaving spaces" on something (like on the OVERLAY), there is no need to specify column numbers.
Code:
OPTION EQUALS
INREC OVERLAY=(190:C'001')
SORT FIELDS=(1,8,CH,A,16,8,CH,A,27,03,CH,A)
SUM FIELDS=(190,3,ZD)
It should be rare that you need to read an entire file twice with SORT, and such cases will be for very complicated processing. If you have something simple with multiple passes of the data, put more work into it, there will be a simple one-pass solution.