Remove bunch of duplicates in a PS file

sreekantham · New User Joined: 15 Sep 2008 Posts: 5 Location: BANGALORE

Hi,

Currently i am having a problem to remove duplicates from a file.

Assume the file is in the below format

315412345678901201001345A2H
315412345678901202CRT AKP 43567001 AB CLIENT
315412345678901205999999 888888888666666
3154123456789012076K2 H45
3154123456789012087KAP
315412345678901209 21324567
315412345678901210A425 000000J01
315412345678901213K 62 141810
315412345678901214Z 6
315412345678901215K
315412345678901218ABC.SAMPLEIN

315412345678901301001345A2H
315412345678901302CRT AKP 43567001 AB CLIENT
315412345678901305999999 888888888666666
3154123456789013076K2 H45
3154123456789013087KAP
315412345678901309 21324567
315412345678901310A425 000000J01
315412345678901313K 62 141810
315412345678901314Z 6
315412345678901315K
315412345678901318ABC.SAMPLEIN

315412345678901201001345A2H
315412345678901202CRT AKP 43567001 AB CLIENT
315412345678901205999999 888888888666666
3154123456789012076K2 H45
3154123456789012087KAP
315412345678901209 21324567
315412345678901210A425 000000J01
315412345678901213K 62 141810
315412345678901214Z 6
315412345678901215K
315412345678901218ABC.SAMPLEIN

In this i need to remove duplicates, the key to be used here is 123456789012 whic is coming from 5th column onwards in the input file.

Just to explain furthur - 5cloumn onwards in record 1 and record 3 are same hence in the output file i need to write only the first occurance of the record that is entry 1 and duplicate shoould not be present in the output file.

Please let me know how to achieve this...

dick scherrer · Posted: Fri Jan 07, 2011 10:49 am

Hello and welcome to the forum,

Suggest you sort the file by "the key" and then read the file discarding duplicates as they are encountered.

If the only requirement is the removal of the duplicates, you might consider using your sort product.

sreekantham · New User Joined: 15 Sep 2008 Posts: 5 Location: BANGALORE

Thanks for the earliest reply.

This file is having so many other records with different formats as well and once i sort we may loose the record order that is i need to remove only second ... occurance and not the first one. Please let me know if there is any other option available to remove the group of duplicates.

Secondly this file is having some 1000 records, the key will be different in each records and the order of the records should not change except the duplicates.

Please help...

dick scherrer · Posted: Fri Jan 07, 2011 11:37 am

Hello,

Until you much more clearly explain all of the rules and possible data formats, no one will be able to help very much.

You need to make sure yoiu post all of the information and not have the requirement "grow" as replies are posted.

To preserve the original sequence of the retained records, you could use an "original record number" in the intermediate file and re-sort by the original record number for the final output.