View previous topic :: View next topic
|
Author |
Message |
oerdgie
New User
Joined: 22 Nov 2007 Posts: 64 Location: Germany
|
|
|
|
Hi,
I like to remove all duplicate records from a file.
Unfortunately the file has different record structurs and within different field formats.
Example for a file :
c = char, b = binary, p = packed decimal field
bbbbcccc
cccbbpppccccc
ppccccccccb
ppbccbbccc
ppccccccccb this is duplicate
bbbbcccc this is duplicate
bbbbcc
bbbbcccc this is duplicate
Is it possible to solve the problem with DFSORT or ICETOOL ?
Many thanks in advance for help |
|
Back to top |
|
|
enrico-sorichetti
Superior Member
Joined: 14 Mar 2007 Posts: 10873 Location: italy
|
|
|
|
the forum is rich of examples on how to build intermediate records with added fields to be used for sort processing that can be dropped when writing the output files
according to the input record type in the hope one exists
build an auxiliary field of the proper length and content on which run the duplicate check |
|
Back to top |
|
|
Frank Yaeger
DFSORT Developer
Joined: 15 Feb 2005 Posts: 7129 Location: San Jose, CA
|
|
|
|
oerdgie,
If your PD fields are normalized - that is, they all have the same sign (e.g. C) for positive values, you can just use BI for the comparison field. BI will just compare the bits and as long as the PD fields are normalized, a bit comparison will work fine to find duplicates.
You can use SELECT with ON(p,m,BI) and FIRST, or SORT FIELDS=(p,m,BI,A) and SUM FIELDS=NONE, to remove the duplicates. |
|
Back to top |
|
|
oerdgie
New User
Joined: 22 Nov 2007 Posts: 64 Location: Germany
|
|
|
|
Hi Frank,
it works, thank's a lot for help !
Regards
oerdgie |
|
Back to top |
|
|
|