Remove duplicate records with variable structur and format

oerdgie · New User Joined: 22 Nov 2007 Posts: 64 Location: Germany

Hi,

I like to remove all duplicate records from a file.
Unfortunately the file has different record structurs and within different field formats.

Example for a file :

c = char, b = binary, p = packed decimal field

bbbbcccc
cccbbpppccccc
ppccccccccb
ppbccbbccc
ppccccccccb this is duplicate
bbbbcccc this is duplicate
bbbbcc
bbbbcccc this is duplicate

Is it possible to solve the problem with DFSORT or ICETOOL ?

Many thanks in advance for help

enrico-sorichetti · Posted: Thu Mar 31, 2011 9:01 pm

the forum is rich of examples on how to build intermediate records with added fields to be used for sort processing that can be dropped when writing the output files

according to the input record type in the hope one exists
build an auxiliary field of the proper length and content on which run the duplicate check

Frank Yaeger · Posted: Thu Mar 31, 2011 11:28 pm

oerdgie,

If your PD fields are normalized - that is, they all have the same sign (e.g. C) for positive values, you can just use BI for the comparison field. BI will just compare the bits and as long as the PD fields are normalized, a bit comparison will work fine to find duplicates.

You can use SELECT with ON(p,m,BI) and FIRST, or SORT FIELDS=(p,m,BI,A) and SUM FIELDS=NONE, to remove the duplicates.

oerdgie · New User Joined: 22 Nov 2007 Posts: 64 Location: Germany

Hi Frank,

it works, thank's a lot for help !

Regards
oerdgie