I Have a file with 2 Header record staring with Characters, details record starting with Numebrs and Trailer record again Staring with Characters
I want to remove duplicates in the details record but don't want to SORT the file as the Headers, Details and Trailer don't have any record type in such a way that if I sort the headers and trailer would remain first and last records.
I tried below in SORT
SORT FIELDS = COPY
SUM FIELDS = NONE
But the same is not working as expected
Could you please suggest,
Other way round is i would have to take out headers, details and trailer in separate files and then merge them after removing duplicates from details file.
Could someone please, suggest the way to remove duplicates without Sorting the file ?
I am using Syncsort , I was thinking of preparing 1st Sort step to separate one file for headers one (STOPAFT=2) one file for trailer (as it starts characters) and one file for Details where i could remove duplicates and then in the next step merge them in the order as
this is my last option , but this means I have to create 2 steps , i was looking for some option to have it in 1 step.
EDIT: Just looking back, do you want to sort at all? Or just remove duplicates from the file asis? In which case, unless you have duplicate headers and trailers, they don't come into it anyway, do they?
OK. One thing is the length of your record. I don't have Syncsort docs, so you'll have to check on what the limit is for the field length for a comparison (like IFTHEN=(WHEN=(start,length,type... what is the maximum for "length"?).
If it can't handle your entire record, search the DFSORT forum for a nice solution from SQLCODE1 which you should be able to apply to yours. Search for Sammmy.
I think I gave you a "bad steer" with my second suggestion. Sorry about that.
Is your data in sequence? Or is it just that you need to retain the existing order, and get rid of the duplicates?
If the former, are you allowed to use your Synctool? Can you do a MERGE with a single file, do the SUM FIELDS=NONE that way.
If the latter, can you identify the trailer by default (as not something else) or by a particular value that won't exist elsewhere? If so, you can modify my first suggestion by adding a sequence number, sorting on the data (whole record) SUM FIELDS=NONE and then sorting on the added sequence number to get back to the original order.
How many records are you expecting when you're doing this for real?
The idea is to put the data in sequence, with a sequence number. The same file is specificed for INA and INB, but the sequence numbers are generated "off by one" between the two versions of the file. Then use JOINKEYS to do the comparison (which can have key total length up to 4080 bytes).
The UNMATCHED from F1 represent those which are either unique, or the one (first) representing a coniguous set.
Tested with four-byte keys, up to you to do it with the 783.
You'd need to change the JNFnCNTLs for the OVERLAYs to start at column 784 and to ensure the sizes of the sequence numbers are sufficient for the maximum number of duplicates.
Then change the position or length of the key for both JOINKEYS (784,length-of-sequence-number,A,1,783,A).
As this will involve reading the data twice, it is only a good option if you can't use one of the others.
Now, you have Syncsort. Don't know if you can have the JNFnCNTLs. If not, you'd end up doing those in a seperate step with two OUTFILs, followed by the JOINKEYS.