How to remove duplicates from file with keeping a count

Nitin Bhargava · New User Joined: 22 May 2012 Posts: 32 Location: india

Hi All,

How to remove the duplicates from a file based on some condition and keeping the count of duplicate records eliminated using ICETOOL or DFSORT?

I have a file with following records:

01NITINSURRENDER10OWNER BHARGAVA

01NITINSURRENDER00INSURED BHARGAVA

01NITINSUREENDER30CO-OWNER BHAR

01ROHITACTIVE 20CO-INSUREDSINGHAI

01ROHITACTIVE 00INSURED SINGHAI

01ROHITACTIVE 10OWNER SINGH

01NIKHITERMINATE00INSURED JAIN

Here first 7 bytes are the key, next 10 bytes is for status, next 2 bytes is for TypeCode, next 10 bytes for TypeCode description and last 10 bytes for surname.

I have to first sort it in ascending order based on key and typecode. I did it using below sort.

SORT FIELDS=(1,7,CH,A,17,2,CH,A)

After this the file looks like

01NITINSURRENDER00INSURED BHARGAVA

01NITINSURRENDER10OWNER BHARGAVA

01NITINSUREENDER30CO-OWNER BHAR

01ROHITACTIVE 00INSURED SINGHAI

01ROHITACTIVE 10OWNER SINGH

01ROHITACTIVE 20CO-INSUREDSINGHAI

01NIKHITERMINATE00INSURED JAIN

Now I want to only consider TypeCode 00 and 10 (byte 17-18) and then have to remove the duplictae. For removing duplicate, condition is I have to consider full record except byte 17 to 26.

In output, even I don’t have to consider records with status as “SURRENDER” or “TERMINATE” and have to keep the count for records eliminated.

Output will look like :

01ROHITACTIVE 00INSURED SINGHAI

Whether it is possible to do all this in one step using ICETOOL or DFSORT?

Gnanas N · Posted: Mon Jun 25, 2012 1:24 pm

Bill Woodger · Posted: Mon Jun 25, 2012 1:24 pm

You only want type 00 and 10. You don't want those if they are "SURRENDER" or "TERMINATE". You then want to drop duplicates based on 1-16.

Somehow a count comes into it which you didn't explain.

Is that it?

Nitin Bhargava · New User Joined: 22 May 2012 Posts: 32 Location: india

one correction in output:

The output will look like
01ROHITACTIVE 00INSURED SINGHAI
01ROHITACTIVE 10OWNER SINGH

For total count: I just have to populate it in a PS of 80 bytes. In this case the total record skipped are 5 so output will be:

Count: 5

Bill Woodger · Posted: Mon Jun 25, 2012 2:38 pm

I think I'd look in the Smart DFSORT Tricks book for the example of XSUM (and much more). If at the end you really find that a blind count of duplicates dropped is useful, you can just COUNT on the file created.

Nitin Bhargava · New User Joined: 22 May 2012 Posts: 32 Location: india

could we do it in 2 steps

Bill Woodger · Posted: Mon Jun 25, 2012 3:47 pm

Err... Yes. On the confirmation you have given I can say almost definitely yes.

Skolusu · Posted: Mon Jun 25, 2012 9:26 pm

Nitin Bhargava,

Nitin Bhargava · New User Joined: 22 May 2012 Posts: 32 Location: india

i am able to do it in 2 steps by myself....thanks to all

dick scherrer · Posted: Tue Jun 26, 2012 11:15 pm

Hello,

As your request was not clear and your details incorrect, it is probably best that you did get it running by yourself. . .

d