SORT to remove duplicates with Input having 2 record types

andrearak · New User Joined: 29 Apr 2009 Posts: 3 Location: US

How can I use sort on the following FB LRECL=500 file to remove duplicate records. There are 2 record types. In this file example the first four records are unique, the remaining 8 are duplicates of the first four. I need to maintain the order of the first four records and have the duplicates (last eight removed):

Currently using this, which is causing the correct recs to be written (no dups) just in ascending order, which is not desired:
SORT FIELDS=(1,500,A),FORMAT=CH
SUM FIELDS=NONE
END

Is there some command to skip every other record so that only the first four records are written in the existing order?

0000001405700100000000000TFS701F 0102010202
ADH_LOS-ANGELES_20090422_124744
0000001408900100000000000TFS701R 0102010281
ADH_LOS-ANGELES_20090427_140010
0000001405700100000000000TFS701F 0102010202
ADH_LOS-ANGELES_20090422_124744
0000001408900100000000000TFS701R 0102010281
ADH_LOS-ANGELES_20090427_140010
0000001405700100000000000TFS701F 0102010202
ADH_LOS-ANGELES_20090422_124744
0000001408900100000000000TFS701R 0102010281
ADH_LOS-ANGELES_20090427_140010

Frank Yaeger · Posted: Wed Apr 29, 2009 9:05 pm

Here's a DFSORT/ICETOOL job that will do what you asked for:

andrearak · New User Joined: 29 Apr 2009 Posts: 3 Location: US

Thank you. I understand what you've recommended here and it worked perfectly.

One question: Why use ZD?

Frank Yaeger · Posted: Wed Apr 29, 2009 9:57 pm

You could use any supported numeric format for the sequence numbers - ZD, PD, BI or FS. I like to use ZD because it's readable. But any of the others would work as well. For example, instead of SEQNUM,8,ZD, you could use SEQNUM,5,PD and save some bytes (but sacrifice readability when you're debugging).

Skolusu · Posted: Wed Apr 29, 2009 10:06 pm

andrearak,

Here is an alternate way doing it using the new WHEN=GROUP function in one pass. You want to consider every 2 records as a single record and remove the duplicates. Using when=Group we push the first record on to the second record and sort on the full 1000 bytes and remove the duplicates.

Using OUTFIL we write out the original order once again

andrearak · New User Joined: 29 Apr 2009 Posts: 3 Location: US

Thanks Frank for the explanation. And thanks Kolusu for the alternative solution, that's clever.