I Have a file with 2 Header record staring with Characters, details record starting with Numebrs and Trailer record again Staring with Characters
I want to remove duplicates in the details record but don't want to SORT the file as the Headers, Details and Trailer don't have any record type in such a way that if I sort the headers and trailer would remain first and last records.
I tried below in SORT
Code:
SORT FIELDS = COPY
SUM FIELDS = NONE
But the same is not working as expected
Could you please suggest,
Other way round is i would have to take out headers, details and trailer in separate files and then merge them after removing duplicates from details file.
this kind of humor is frowned upon on professional forums
usually people who reply look at the TS profile to see what tone and terminology use when answering
seeing a stupid skill description will lower the benevolence level
and often for that reason You are going to miss quite a few good answers
but if You imply skills in the MUSIC/SP operating system....
well that' pretty useless ...
it was dismissed and unsupported from McGill university for some years.
www.canpub.com/teammpg/
Could someone please, suggest the way to remove duplicates without Sorting the file ?
I am using Syncsort , I was thinking of preparing 1st Sort step to separate one file for headers one (STOPAFT=2) one file for trailer (as it starts characters) and one file for Details where i could remove duplicates and then in the next step merge them in the order as
Header file
Details file
Trailer file
this is my last option , but this means I have to create 2 steps , i was looking for some option to have it in 1 step.
EDIT: Just looking back, do you want to sort at all? Or just remove duplicates from the file asis? In which case, unless you have duplicate headers and trailers, they don't come into it anyway, do they?
The file is of 783 characters and record format is Fixed :
Code:
CDU MATCHING ENGINE REPORT FOR PROJECT CPP |DATE:2012-04-12|FULL / PAR
ROWID|CPP PRODUCT ID|CANDIDATE CIN|CANDIDATE SIGN|CANDIDATE SCORE|FIRST
00001|CPP000562712 |0800000000 |+ |0100 |HAYLEY
00001|CPP000562712 |0800000020 |+ |0041 |ADRIAN
00001|CPP000562712 |0800000018 |+ |0041 |HILLAR
00001|CPP000562712 |0800000017 |+ |0041 |MATHEW
00001|CPP000562712 |0800000019 |+ |0041 |HALE
00004|CPP000562752 |0800000000 |+ |0055 |HAYLEY
00004|CPP000562752 |0800000000 |+ |0055 |HAYLEY
00004|CPP000562752 |0800000020 |+ |0041 |ADRIAN
00004|CPP000562752 |0800000018 |+ |0041 |HILLAR
00004|CPP000562752 |0800000017 |+ |0041 |MATHEW
00004|CPP000562752 |0800000019 |+ |0041 |HALE
00004|CPP000562752 |0800000019 |+ |0041 |HALE
00005|CPP000562772 |0800000004 |+ |0055 |PAULIN
NUMBER OF MATCHES RETURNED FOR BASE DATA : |000011
I just want to renove duplicates from the details records and want to leave the header trailer as well as the details in the same order as they appear in the input file.
Joined: 09 Mar 2011 Posts: 7309 Location: Inside the Matrix
OK. One thing is the length of your record. I don't have Syncsort docs, so you'll have to check on what the limit is for the field length for a comparison (like IFTHEN=(WHEN=(start,length,type... what is the maximum for "length"?).
If it can't handle your entire record, search the DFSORT forum for a nice solution from SQLCODE1 which you should be able to apply to yours. Search for Sammmy.
But without sorting the duplicates would not be removed, isn't it ?
If I use just COPY this would copy all the records as it is in the output.
Apologies if i am missing anything here but I am unable to relates the solution.
My requirement is to just remove consecutive duplicate record without any sorting (i.e. all the records headers, details, trailer ) should retain their own position.
Would appreciate if you could provide a code for file of 783 characters and Fixed record format.
Joined: 09 Mar 2011 Posts: 7309 Location: Inside the Matrix
I think I gave you a "bad steer" with my second suggestion. Sorry about that.
Is your data in sequence? Or is it just that you need to retain the existing order, and get rid of the duplicates?
If the former, are you allowed to use your Synctool? Can you do a MERGE with a single file, do the SUM FIELDS=NONE that way.
If the latter, can you identify the trailer by default (as not something else) or by a particular value that won't exist elsewhere? If so, you can modify my first suggestion by adding a sequence number, sorting on the data (whole record) SUM FIELDS=NONE and then sorting on the added sequence number to get back to the original order.
How many records are you expecting when you're doing this for real?
Joined: 09 Mar 2011 Posts: 7309 Location: Inside the Matrix
If you are not allowed to "tool" it and your data is not in sequence, you could try this type of thing out, tested with DFSORT, so not directly applicable to you, maybe:
The idea is to put the data in sequence, with a sequence number. The same file is specificed for INA and INB, but the sequence numbers are generated "off by one" between the two versions of the file. Then use JOINKEYS to do the comparison (which can have key total length up to 4080 bytes).
The UNMATCHED from F1 represent those which are either unique, or the one (first) representing a coniguous set.
Tested with four-byte keys, up to you to do it with the 783.
You'd need to change the JNFnCNTLs for the OVERLAYs to start at column 784 and to ensure the sizes of the sequence numbers are sufficient for the maximum number of duplicates.
Then change the position or length of the key for both JOINKEYS (784,length-of-sequence-number,A,1,783,A).
As this will involve reading the data twice, it is only a good option if you can't use one of the others.
Now, you have Syncsort. Don't know if you can have the JNFnCNTLs. If not, you'd end up doing those in a seperate step with two OUTFILs, followed by the JOINKEYS.
Joined: 09 Mar 2011 Posts: 7309 Location: Inside the Matrix
Here's an alternative. You only need to consider these last two if you can't 'tool it or MERGE with one file and SUM FIELDS=NONE. They will be more resource-hungry than those.
Here your test for equality in the OUTFIL OMIT will get tricky, as you'll have to split it into four pieces 1,256 ,257,256, 513,256 769,14. Obviously start,length of a lot of things need changing.
If you need to use this one, it could probably be souped-up a little.
I was looking for SORT utility , as we are not having ICETOOL, have not used it before, i'll check with the configuration team here whether we can use TOOL in our setup.
but the result is not correct, seems to be only the matched detail record with headers and trailer appeared in the result.
Received Output:
Code:
CDU MATCHING ENGINE REPORT FOR PROJECT CPP |DATE:2012-04-12|FULL / PAR
ROWID|CPP PRODUCT ID|CANDIDATE CIN|CANDIDATE SIGN|CANDIDATE SCORE|FIRST
00001|CPP000562712 |0800000000 |+ |0100 |HAYLEY
NUMBER OF MATCHES RETURNED FOR BASE DATA : |000011
I am trying to change some conditions here ... please, let me knwo you can spot anything thing in this...
so this card says move records from input to output as it is and when ever the records (1,783) are same in consecutive records just copy such first record.