Updating the counters after eliminating duplicates

PANDU1 · New User Joined: 16 Nov 2016 Posts: 7 Location: india

Hi Everyone,

Can any one suggest me in solving this query,

If my input file is

Arun Raj · Posted: Mon Nov 21, 2016 11:15 am

PANDU1,

Welcome to the forums! You should probably be able to achieve this with whichever sort product is available at your site (DFSORT, Syncsort,..).

Search the DFSORT forum here for examples on how to "remove duplicate"s or creating "trailer count"s.
Try something and get back if you face any issues. Someone would be here to help. Good luck!

Also make sure you include all the relevant information, RECFM, LRECL, field positions of your actual data, sort order of the input data (if any), sort order of the output (if it matters). Post code/sample data using "Code" tags.

enrico-sorichetti · Posted: Mon Nov 21, 2016 12:17 pm

tell us what sort product You will be using.

look at the messages

WER... ==> SYNCSORT
ICE... ==> IBM DFSORT

and the topic will be moved to the proper section

PANDU1 · New User Joined: 16 Nov 2016 Posts: 7 Location: india

Thanks Arun Raj and enrico-sorichetti

Field positions are as follows

01-2 - Sequence number
04-10 - Employee name
12-5 - Employee number(sort field)

Let the RECFM and LRECL valued be FB & 30 respectively.

Rohit Umarjikar · Posted: Mon Nov 21, 2016 9:23 pm

Bill Woodger · Posted: Mon Nov 21, 2016 10:54 pm

No, Rohit.

Even accepting the 4.4 comparison for C'COUNT' as a typo, you don't want to use HEADER and COUNT, for your tests, as there is absolutely nothing to stop these giving you a "false hit" if they happen to occur in at that position in a name.

There are clearly indicators for the header and trailer which give no possibility of a false hit. There is also, clearly, a "reference number" which should be used for the sort/deduplication, even if that is not sufficient (no indication) there is no reason to includ the "05" or the blank following it or only a selected part of the name field.

WHEN=GROUP with BEGIN for "01", END for "99". Sort field after the ID to contain X'00' for header, X'FF' for trailer.

To suggest using COUNT, you can't chop off the extension in OUTREC.

A sequence, with RESTART for the ID extension and an IFTHEN=(WHEN=(logcicalespression) on OUTREC would allow a JFY with SHIFT=LEFT for the format of count shown.

Arun Raj · Posted: Tue Nov 22, 2016 11:09 am

I think we are good without an END for the group.
This is UNTESTED, this has the essence of Bill's suggestion above.

Bill Woodger · Posted: Tue Nov 22, 2016 2:43 pm

Yes, Arun, that looks good. I got confused with the END because the trailer was being treated as a separate group by Rohit :-)

I'd rearrange the order of the IFTHENs, so that the data-one is treated first, it'll avoid a lot of tests for when the header/trailer when most of the records will be data. Multiple IFTHEN=(WHEN=(logicalexpression) are like an EVALUATE in COBOL, so cease the processing of that construct as soon as there is a "hit". That behaviour can be modified, when needed (for two entirely independent operations on the same record) by using HIT=NEXT.

PANDU1 · New User Joined: 16 Nov 2016 Posts: 7 Location: india

Thank you Arun,

I could able to solve the duplicates but trying to update the counters.

Bill Woodger · Posted: Tue Nov 22, 2016 3:41 pm

PANDU1,

Arun's code drops duplicates and produces correct counts. Even if you've already dropped the duplicates without telling us (wasting time) the code will still work even if there are no duplicates to drop. Note: I've not tested it either, but it looks good.

PANDU1 · New User Joined: 16 Nov 2016 Posts: 7 Location: india

Hi Bill,

I have no idea in solving this, so posted in the group. After trying with Arun's code I got the relevant output.

Thanks.

Bill Woodger · Posted: Tue Nov 22, 2016 4:49 pm

Great, thanks for letting us know. Make sure you understand it.

Arun Raj · Posted: Tue Nov 22, 2016 8:15 pm