Removing duplicates without Sorting in JCL

vishalbshah · New User Joined: 01 Dec 2006 Posts: 61 Location: Pune

Hi,

I Have a file with 2 Header record staring with Characters, details record starting with Numebrs and Trailer record again Staring with Characters

I want to remove duplicates in the details record but don't want to SORT the file as the Headers, Details and Trailer don't have any record type in such a way that if I sort the headers and trailer would remain first and last records.

I tried below in SORT

Bill Woodger · Posted: Fri Apr 13, 2012 1:27 pm

What you want is DATASORT, but I don't think you have it, as you've posted in the JCL forum so we assume you have Syncsort...

If you could strip off the trailer, how would you identify it to do so?

enrico-sorichetti · Posted: Fri Apr 13, 2012 1:34 pm

from Your profile

vishalbshah · New User Joined: 01 Dec 2006 Posts: 61 Location: Pune

Hi,

I have corrected my profile .

Could someone please, suggest the way to remove duplicates without Sorting the file ?

I am using Syncsort , I was thinking of preparing 1st Sort step to separate one file for headers one (STOPAFT=2) one file for trailer (as it starts characters) and one file for Details where i could remove duplicates and then in the next step merge them in the order as

Header file
Details file
Trailer file

this is my last option , but this means I have to create 2 steps , i was looking for some option to have it in 1 step.

Bill Woodger · Posted: Fri Apr 13, 2012 2:06 pm

Have a look at this recent one.

EDIT: Just looking back, do you want to sort at all? Or just remove duplicates from the file asis? In which case, unless you have duplicate headers and trailers, they don't come into it anyway, do they?

Bill Woodger · Posted: Fri Apr 13, 2012 2:44 pm

Why go off and post to the other topic? Messes that one up and looses continuity here.

Do you need to sort the file to get your duplicates (ie, are they already contiguous, or do they need to be shuffled about to make them contiguous)?

What is the RECFM/LRECL of your file?

vishalbshah · New User Joined: 01 Dec 2006 Posts: 61 Location: Pune

Hi

The file is of 783 characters and record format is Fixed :

vishalbshah · New User Joined: 01 Dec 2006 Posts: 61 Location: Pune

Hi Bill,

I don't want to sort the file

just want to remove duplicates.

Bill Woodger · Posted: Fri Apr 13, 2012 3:05 pm

OK. One thing is the length of your record. I don't have Syncsort docs, so you'll have to check on what the limit is for the field length for a comparison (like IFTHEN=(WHEN=(start,length,type... what is the maximum for "length"?).

If it can't handle your entire record, search the DFSORT forum for a nice solution from SQLCODE1 which you should be able to apply to yours. Search for Sammmy.

vishalbshah · New User Joined: 01 Dec 2006 Posts: 61 Location: Pune

Hi Bill,

The solution looks good but would it slove the issue of leaving the header and trailers and the details records as it is in their position ?

Bill Woodger · Posted: Fri Apr 13, 2012 3:37 pm

The solution does not depend on a SORT occurring, it was just necessary in that case to get the results for that requirement. You can use FIELDS=COPY.

vishalbshah · New User Joined: 01 Dec 2006 Posts: 61 Location: Pune

But without sorting the duplicates would not be removed, isn't it ?

If I use just COPY this would copy all the records as it is in the output.

Apologies if i am missing anything here but I am unable to relates the solution.

My requirement is to just remove consecutive duplicate record without any sorting (i.e. all the records headers, details, trailer ) should retain their own position.

Would appreciate if you could provide a code for file of 783 characters and Fixed record format.

gcicchet · Posted: Sat Apr 14, 2012 3:35 am

H,

maybe this will help

Bill Woodger · Posted: Sat Apr 14, 2012 3:39 am

I think I gave you a "bad steer" with my second suggestion. Sorry about that.

Is your data in sequence? Or is it just that you need to retain the existing order, and get rid of the duplicates?

If the former, are you allowed to use your Synctool? Can you do a MERGE with a single file, do the SUM FIELDS=NONE that way.

If the latter, can you identify the trailer by default (as not something else) or by a particular value that won't exist elsewhere? If so, you can modify my first suggestion by adding a sequence number, sorting on the data (whole record) SUM FIELDS=NONE and then sorting on the added sequence number to get back to the original order.

How many records are you expecting when you're doing this for real?

Bill Woodger · Posted: Sat Apr 14, 2012 5:44 pm

If you are not allowed to "tool" it and your data is not in sequence, you could try this type of thing out, tested with DFSORT, so not directly applicable to you, maybe:

Bill Woodger · Posted: Sat Apr 14, 2012 7:53 pm

Here's an alternative. You only need to consider these last two if you can't 'tool it or MERGE with one file and SUM FIELDS=NONE. They will be more resource-hungry than those.

Here your test for equality in the OUTFIL OMIT will get tricky, as you'll have to split it into four pieces 1,256 ,257,256, 513,256 769,14. Obviously start,length of a lot of things need changing.

If you need to use this one, it could probably be souped-up a little.

vishalbshah · New User Joined: 01 Dec 2006 Posts: 61 Location: Pune

Hi Bill,

My data (Details Records) is already in a sorted order which I want.

Only thing is I need is to remove any duplicate detail records without any jumbling of the details or headers or trailer record.

I am trying your last suggestion using JOINKEYS.

vishalbshah · New User Joined: 01 Dec 2006 Posts: 61 Location: Pune

Hi Bill,

I can identify header and trailer with the specific text.

consider the same input which i mentioned before and I worte below two steps to get the required output.

gcicchet · Posted: Mon Apr 16, 2012 10:50 am

Hi,

did you try my suggestion ?

Gerry

vishalbshah · New User Joined: 01 Dec 2006 Posts: 61 Location: Pune

Hi Gerry,

I was looking for SORT utility , as we are not having ICETOOL, have not used it before, i'll check with the configuration team here whether we can use TOOL in our setup.

gcicchet · Posted: Mon Apr 16, 2012 10:59 am

Hi,

SYNCTOOL (alias ICETOOL) is part of SYNCSORT.

Gerry

vishalbshah · New User Joined: 01 Dec 2006 Posts: 61 Location: Pune

Hi Gerry,

I tried your suggetion but the same is not resulting in the expected outout.

vishalbshah · New User Joined: 01 Dec 2006 Posts: 61 Location: Pune

Hi Bill,

I have tried the alterantive which you suggested :

gcicchet · Posted: Mon Apr 16, 2012 11:39 am

Hi,

my mistake, my cut and paste was incorrect, it should be

vishalbshah · New User Joined: 01 Dec 2006 Posts: 61 Location: Pune

Yes it works!

Thanks Gerry

Appreciate your help.

I have got the desired output.

so this card says move records from input to output as it is and when ever the records (1,783) are same in consecutive records just copy such first record.

Please, correct my understanding.