|
View previous topic :: View next topic
|
| Author |
Message |
vishalbshah
New User
Joined: 01 Dec 2006 Posts: 61 Location: Pune
|
|
|
|
Hi,
I Have a file with 2 Header record staring with Characters, details record starting with Numebrs and Trailer record again Staring with Characters
I want to remove duplicates in the details record but don't want to SORT the file as the Headers, Details and Trailer don't have any record type in such a way that if I sort the headers and trailer would remain first and last records.
I tried below in SORT
| Code: |
SORT FIELDS = COPY
SUM FIELDS = NONE
|
But the same is not working as expected
Could you please suggest,
Other way round is i would have to take out headers, details and trailer in separate files and then merge them after removing duplicates from details file. |
|
| Back to top |
|
 |
Bill Woodger
Moderator Emeritus
Joined: 09 Mar 2011 Posts: 7309 Location: Inside the Matrix
|
|
|
|
What you want is DATASORT, but I don't think you have it, as you've posted in the JCL forum so we assume you have Syncsort...
If you could strip off the trailer, how would you identify it to do so? |
|
| Back to top |
|
 |
enrico-sorichetti
Superior Member

Joined: 14 Mar 2007 Posts: 10903 Location: italy
|
|
|
|
from Your profile
| Quote: |
| Mainframe Skills: music |
this kind of humor is frowned upon on professional forums
usually people who reply look at the TS profile to see what tone and terminology use when answering
seeing a stupid skill description will lower the benevolence level
and often for that reason You are going to miss quite a few good answers
but if You imply skills in the MUSIC/SP operating system....
well that' pretty useless ...
it was dismissed and unsupported from McGill university for some years.
www.canpub.com/teammpg/ |
|
| Back to top |
|
 |
vishalbshah
New User
Joined: 01 Dec 2006 Posts: 61 Location: Pune
|
|
|
|
Hi,
I have corrected my profile .
Could someone please, suggest the way to remove duplicates without Sorting the file ?
I am using Syncsort , I was thinking of preparing 1st Sort step to separate one file for headers one (STOPAFT=2) one file for trailer (as it starts characters) and one file for Details where i could remove duplicates and then in the next step merge them in the order as
Header file
Details file
Trailer file
this is my last option , but this means I have to create 2 steps , i was looking for some option to have it in 1 step. |
|
| Back to top |
|
 |
Bill Woodger
Moderator Emeritus
Joined: 09 Mar 2011 Posts: 7309 Location: Inside the Matrix
|
|
|
|
Have a look at this recent one.
EDIT: Just looking back, do you want to sort at all? Or just remove duplicates from the file asis? In which case, unless you have duplicate headers and trailers, they don't come into it anyway, do they? |
|
| Back to top |
|
 |
Bill Woodger
Moderator Emeritus
Joined: 09 Mar 2011 Posts: 7309 Location: Inside the Matrix
|
|
|
|
Why go off and post to the other topic? Messes that one up and looses continuity here.
Do you need to sort the file to get your duplicates (ie, are they already contiguous, or do they need to be shuffled about to make them contiguous)?
What is the RECFM/LRECL of your file? |
|
| Back to top |
|
 |
vishalbshah
New User
Joined: 01 Dec 2006 Posts: 61 Location: Pune
|
|
|
|
Hi
The file is of 783 characters and record format is Fixed :
| Code: |
CDU MATCHING ENGINE REPORT FOR PROJECT CPP |DATE:2012-04-12|FULL / PAR
ROWID|CPP PRODUCT ID|CANDIDATE CIN|CANDIDATE SIGN|CANDIDATE SCORE|FIRST
00001|CPP000562712 |0800000000 |+ |0100 |HAYLEY
00001|CPP000562712 |0800000020 |+ |0041 |ADRIAN
00001|CPP000562712 |0800000018 |+ |0041 |HILLAR
00001|CPP000562712 |0800000017 |+ |0041 |MATHEW
00001|CPP000562712 |0800000019 |+ |0041 |HALE
00004|CPP000562752 |0800000000 |+ |0055 |HAYLEY
00004|CPP000562752 |0800000000 |+ |0055 |HAYLEY
00004|CPP000562752 |0800000020 |+ |0041 |ADRIAN
00004|CPP000562752 |0800000018 |+ |0041 |HILLAR
00004|CPP000562752 |0800000017 |+ |0041 |MATHEW
00004|CPP000562752 |0800000019 |+ |0041 |HALE
00004|CPP000562752 |0800000019 |+ |0041 |HALE
00005|CPP000562772 |0800000004 |+ |0055 |PAULIN
NUMBER OF MATCHES RETURNED FOR BASE DATA : |000011
|
I just want to renove duplicates from the details records and want to leave the header trailer as well as the details in the same order as they appear in the input file. |
|
| Back to top |
|
 |
vishalbshah
New User
Joined: 01 Dec 2006 Posts: 61 Location: Pune
|
|
|
|
Hi Bill,
I don't want to sort the file
just want to remove duplicates. |
|
| Back to top |
|
 |
Bill Woodger
Moderator Emeritus
Joined: 09 Mar 2011 Posts: 7309 Location: Inside the Matrix
|
|
|
|
OK. One thing is the length of your record. I don't have Syncsort docs, so you'll have to check on what the limit is for the field length for a comparison (like IFTHEN=(WHEN=(start,length,type... what is the maximum for "length"?).
If it can't handle your entire record, search the DFSORT forum for a nice solution from SQLCODE1 which you should be able to apply to yours. Search for Sammmy. |
|
| Back to top |
|
 |
vishalbshah
New User
Joined: 01 Dec 2006 Posts: 61 Location: Pune
|
|
|
|
Hi Bill,
The solution looks good but would it slove the issue of leaving the header and trailers and the details records as it is in their position ? |
|
| Back to top |
|
 |
Bill Woodger
Moderator Emeritus
Joined: 09 Mar 2011 Posts: 7309 Location: Inside the Matrix
|
|
|
|
| The solution does not depend on a SORT occurring, it was just necessary in that case to get the results for that requirement. You can use FIELDS=COPY. |
|
| Back to top |
|
 |
vishalbshah
New User
Joined: 01 Dec 2006 Posts: 61 Location: Pune
|
|
|
|
But without sorting the duplicates would not be removed, isn't it ?
If I use just COPY this would copy all the records as it is in the output.
Apologies if i am missing anything here but I am unable to relates the solution.
My requirement is to just remove consecutive duplicate record without any sorting (i.e. all the records headers, details, trailer ) should retain their own position.
Would appreciate if you could provide a code for file of 783 characters and Fixed record format. |
|
| Back to top |
|
 |
gcicchet
Senior Member
Joined: 28 Jul 2006 Posts: 1702 Location: Australia
|
|
|
|
H,
maybe this will help
| Code: |
//S1 EXEC PGM=ICETOOL
//TOOLMSG DD SYSOUT=*
//DFSMSG DD SYSOUT=*
//IN DD DSN=input-file,DISP=SHR
//OUT DD SYSOUT=*
//TOOLIN DD *
SELECT FROM(IN) TO(OUT) ON(1,783,CH) FIRST
/*
//CTL1CNTL DD *
SORT FIELDS=COPY
/*
|
Gerry |
|
| Back to top |
|
 |
Bill Woodger
Moderator Emeritus
Joined: 09 Mar 2011 Posts: 7309 Location: Inside the Matrix
|
|
|
|
I think I gave you a "bad steer" with my second suggestion. Sorry about that.
Is your data in sequence? Or is it just that you need to retain the existing order, and get rid of the duplicates?
If the former, are you allowed to use your Synctool? Can you do a MERGE with a single file, do the SUM FIELDS=NONE that way.
If the latter, can you identify the trailer by default (as not something else) or by a particular value that won't exist elsewhere? If so, you can modify my first suggestion by adding a sequence number, sorting on the data (whole record) SUM FIELDS=NONE and then sorting on the added sequence number to get back to the original order.
How many records are you expecting when you're doing this for real? |
|
| Back to top |
|
 |
Bill Woodger
Moderator Emeritus
Joined: 09 Mar 2011 Posts: 7309 Location: Inside the Matrix
|
|
|
|
If you are not allowed to "tool" it and your data is not in sequence, you could try this type of thing out, tested with DFSORT, so not directly applicable to you, maybe:
| Code: |
//DEDUP EXEC PGM=SORT
//SYSOUT DD SYSOUT=*
//SORTOUT DD SYSOUT=*
//SYSIN DD *
JOINKEYS F1=INA,FIELDS=(5,3,A,1,4,A),SORTED,NOSEQCK
JOINKEYS F2=INB,FIELDS=(5,3,A,1,4,A),SORTED,NOSEQCK
REFORMAT FIELDS=(F1:1,4)
JOIN UNPAIRED,F1,ONLY
OPTION COPY
//*
//JNF1CNTL DD *
INREC OVERLAY=(5:SEQNUM,3,ZD,START=0)
//JNF2CNTL DD *
INREC OVERLAY=(5:SEQNUM,3,ZD,START=1)
//*
//INA DD *
HDR1
HDR2
1111
1111
1101
2222
2222
2222
2222
2122
3333
4444
4444
5555
6666
6667
TRL
//INB DD *
HDR1
HDR2
1111
1111
1101
2222
2222
2222
2222
2122
3333
4444
4444
5555
6666
6667
TRL
|
The idea is to put the data in sequence, with a sequence number. The same file is specificed for INA and INB, but the sequence numbers are generated "off by one" between the two versions of the file. Then use JOINKEYS to do the comparison (which can have key total length up to 4080 bytes).
The UNMATCHED from F1 represent those which are either unique, or the one (first) representing a coniguous set.
Tested with four-byte keys, up to you to do it with the 783.
You'd need to change the JNFnCNTLs for the OVERLAYs to start at column 784 and to ensure the sizes of the sequence numbers are sufficient for the maximum number of duplicates.
Then change the position or length of the key for both JOINKEYS (784,length-of-sequence-number,A,1,783,A).
As this will involve reading the data twice, it is only a good option if you can't use one of the others.
Now, you have Syncsort. Don't know if you can have the JNFnCNTLs. If not, you'd end up doing those in a seperate step with two OUTFILs, followed by the JOINKEYS. |
|
| Back to top |
|
 |
Bill Woodger
Moderator Emeritus
Joined: 09 Mar 2011 Posts: 7309 Location: Inside the Matrix
|
|
|
|
Here's an alternative. You only need to consider these last two if you can't 'tool it or MERGE with one file and SUM FIELDS=NONE. They will be more resource-hungry than those.
Here your test for equality in the OUTFIL OMIT will get tricky, as you'll have to split it into four pieces 1,256 ,257,256, 513,256 769,14. Obviously start,length of a lot of things need changing.
If you need to use this one, it could probably be souped-up a little.
| Code: |
//DEDUP EXEC PGM=SORT
//SYSOUT DD SYSOUT=*
//SORTOUT DD SYSOUT=*
//SYSIN DD *
OPTION COPY
INREC IFTHEN=(WHEN=INIT,
OVERLAY=(5:SEQNUM,3,ZD,5,3,ZD,MOD,+2,EDIT=(T)))
OUTREC IFTHEN=(WHEN=GROUP,BEGIN=(8,1,CH,EQ,C'1'),RECORDS=2,
PUSH=(9:1,4)),
IFTHEN=(WHEN=GROUP,BEGIN=(8,1,CH,EQ,C'0'),RECORDS=2,
PUSH=(13:1,4))
OUTFIL OMIT=(9,4,CH,EQ,13,4,CH),BUILD=(1,4)
//SORTIN DD *
HDR1
HDR2
1111
1111
1101
2222
2222
2222
2222
2221
2221
2221
2221
2121
3333
4444
4444
5555
6666
6667
TRL
|
|
|
| Back to top |
|
 |
vishalbshah
New User
Joined: 01 Dec 2006 Posts: 61 Location: Pune
|
|
|
|
Hi Bill,
My data (Details Records) is already in a sorted order which I want.
Only thing is I need is to remove any duplicate detail records without any jumbling of the details or headers or trailer record.
I am trying your last suggestion using JOINKEYS. |
|
| Back to top |
|
 |
vishalbshah
New User
Joined: 01 Dec 2006 Posts: 61 Location: Pune
|
|
|
|
Hi Bill,
I can identify header and trailer with the specific text.
consider the same input which i mentioned before and I worte below two steps to get the required output.
| Code: |
//TCSVBSTS JOB 0000,'SORT',CLASS=7,MSGCLASS=U,
// NOTIFY=&SYSUID,MSGLEVEL=(1,1)
//* $ACFJ219 ACF2 ACTIVE I003
//SORT10 EXEC PGM=SORT
//SORTIN DD DSN=TCS.TEST.SORT.VISHAL,DISP=SHR
//HDRS DD DSN=&&HDRS,
// DISP=(NEW,PASS,DELETE)
//TRL DD DSN=&&TRL,
// DISP=(NEW,PASS,DELETE)
//DETS DD DSN=&&DETS,
// DISP=(NEW,PASS,DELETE)
//SYSOUT DD SYSOUT=*
//SYSOUZ DD SYSOUT=*
//SORTWK01 DD SPACE=(TRK,(1,1))
//SYSIN DD *
SORT FIELDS=(1,783,CH,A),EQUALS
OUTFIL FNAMES=HDRS,INCLUDE=(1,3,CH,EQ,C'CDU',OR,1,3,CH,EQ,C'ROW')
OUTFIL FNAMES=TRL,INCLUDE=(1,3,CH,EQ,C'NUM')
OUTFIL FNAMES=DETS,INCLUDE=(1,5,CH,GT,C'00000')
SUM FIELDS=NONE
/*
//SORT20 EXEC PGM=SORT
//SORTIN DD DSN=&&HDRS,
// DISP=(OLD,PASS,DELETE)
// DD DSN=&&DETS,
// DISP=(OLD,PASS,DELETE)
// DD DSN=&&TRL,
// DISP=(OLD,PASS,DELETE)
//SORTOUT DD DSN=TCS.TEST.SORT.VISHAL.OUTR,
// DISP=SHR
//SYSOUT DD SYSOUT=*
//SYSOUZ DD SYSOUT=*
//SORTWK01 DD SPACE=(TRK,(1,1))
//SYSIN DD *
SORT FIELDS=COPY
/*
|
the only problem with this is this is done in 2 steps and I wanted to do it in 1 step.
The Output is :
| Code: |
CDU MATCHING ENGINE REPORT FOR PROJECT CPP |DATE:2012-04-12|FULL / PAR
ROWID|CPP PRODUCT ID|CANDIDATE CIN|CANDIDATE SIGN|CANDIDATE SCORE|FIRST
00001|CPP000562712 |0800000000 |+ |0100 |HAYLEY
00001|CPP000562712 |0800000017 |+ |0041 |MATHEW
00001|CPP000562712 |0800000018 |+ |0041 |HILLAR
00001|CPP000562712 |0800000019 |+ |0041 |HALE
00001|CPP000562712 |0800000020 |+ |0041 |ADRIAN
00004|CPP000562752 |0800000000 |+ |0055 |HAYLEY
00004|CPP000562752 |0800000017 |+ |0041 |MATHEW
00004|CPP000562752 |0800000018 |+ |0041 |HILLAR
00004|CPP000562752 |0800000019 |+ |0041 |HALE
00004|CPP000562752 |0800000020 |+ |0041 |ADRIAN
00005|CPP000562772 |0800000004 |+ |0055 |PAULIN
NUMBER OF MATCHES RETURNED FOR BASE DATA : |000011
|
|
|
| Back to top |
|
 |
gcicchet
Senior Member
Joined: 28 Jul 2006 Posts: 1702 Location: Australia
|
|
|
|
Hi,
did you try my suggestion ?
Gerry |
|
| Back to top |
|
 |
vishalbshah
New User
Joined: 01 Dec 2006 Posts: 61 Location: Pune
|
|
|
|
Hi Gerry,
I was looking for SORT utility , as we are not having ICETOOL, have not used it before, i'll check with the configuration team here whether we can use TOOL in our setup. |
|
| Back to top |
|
 |
gcicchet
Senior Member
Joined: 28 Jul 2006 Posts: 1702 Location: Australia
|
|
|
|
Hi,
SYNCTOOL (alias ICETOOL) is part of SYNCSORT.
Gerry |
|
| Back to top |
|
 |
vishalbshah
New User
Joined: 01 Dec 2006 Posts: 61 Location: Pune
|
|
|
|
Hi Gerry,
I tried your suggetion but the same is not resulting in the expected outout.
| Code: |
//TCSVBSTS JOB 0000,'ICETOOL',CLASS=7,MSGCLASS=U,
// NOTIFY=&SYSUID,MSGLEVEL=(1,1)
//* $ACFJ219 ACF2 ACTIVE I003
//SORT10 EXEC PGM=ICETOOL
//IN DD DSN=TCS.TEST.SORT.VISHAL,DISP=SHR
//OUT DD DSN=TCS.TEST.SORT.VISHAL1,DISP=SHR
//TOOLMSG DD SYSOUT=*
//DFSMSG DD SYSOUT=*
//SORTWK01 DD SPACE=(TRK,(1,1))
//TOOLIN DD *
SELECT FROM(IN) TO(OUT) ON(1,783,CH) FIRST
//CTL1CNTL DD *
SORT FIELDS=COPY
/*
|
Output I am getting is
| Code: |
CDU MATCHING ENGINE REPORT FOR PROJECT CPP |DATE:2012-04-12|FULL / PAR
NUMBER OF MATCHES RETURNED FOR BASE DATA : |000011
ROWID|CPP PRODUCT ID|CANDIDATE CIN|CANDIDATE SIGN|CANDIDATE SCORE|FIRST
00001|CPP000562712 |0800000000 |+ |0100 |HAYLEY
00001|CPP000562712 |0800000017 |+ |0041 |MATHEW
00001|CPP000562712 |0800000018 |+ |0041 |HILLAR
00001|CPP000562712 |0800000019 |+ |0041 |HALE
00001|CPP000562712 |0800000020 |+ |0041 |ADRIAN
00004|CPP000562752 |0800000000 |+ |0055 |HAYLEY
00004|CPP000562752 |0800000017 |+ |0041 |MATHEW
00004|CPP000562752 |0800000018 |+ |0041 |HILLAR
00004|CPP000562752 |0800000019 |+ |0041 |HALE
00004|CPP000562752 |0800000020 |+ |0041 |ADRIAN
00005|CPP000562772 |0800000004 |+ |0055 |PAULIN
|
As you can see the trailer is now moved to header
it has removed the duplicates but the requirement is not to change the headers and trailer ordering. |
|
| Back to top |
|
 |
vishalbshah
New User
Joined: 01 Dec 2006 Posts: 61 Location: Pune
|
|
|
|
Hi Bill,
I have tried the alterantive which you suggested :
| Code: |
//TCSVBSTS JOB 0000,'SORT1',CLASS=7,MSGCLASS=U,
// NOTIFY=&SYSUID,MSGLEVEL=(1,1)
//* $ACFJ219 ACF2 ACTIVE I003
//SORT10 EXEC PGM=SORT
//SORTIN DD DSN=TCS.TEST.SORT.VISHAL,DISP=SHR
//SORTOUT DD DSN=TCS.TEST.SORT.VISHAL1,DISP=SHR
//SYSOUT DD SYSOUT=*
//SYSOUZ DD SYSOUT=*
//SORTWK01 DD SPACE=(TRK,(1,1))
//SYSIN DD *
OPTION COPY
INREC IFTHEN=(WHEN=INIT,
OVERLAY=(784:SEQNUM,3,ZD,784,3,ZD,MOD,+2,EDIT=(T)))
OUTREC IFTHEN=(WHEN=GROUP,BEGIN=(787,1,CH,EQ,C'1'),RECORDS=2,
PUSH=(788:1,4)),
IFTHEN=(WHEN=GROUP,BEGIN=(787,1,CH,EQ,C'0'),RECORDS=2,
PUSH=(792:1,4))
OUTFIL OMIT=(788,4,CH,EQ,792,4,CH),BUILD=(1,783)
/*
|
but the result is not correct, seems to be only the matched detail record with headers and trailer appeared in the result.
Received Output:
| Code: |
CDU MATCHING ENGINE REPORT FOR PROJECT CPP |DATE:2012-04-12|FULL / PAR
ROWID|CPP PRODUCT ID|CANDIDATE CIN|CANDIDATE SIGN|CANDIDATE SCORE|FIRST
00001|CPP000562712 |0800000000 |+ |0100 |HAYLEY
NUMBER OF MATCHES RETURNED FOR BASE DATA : |000011
|
I am trying to change some conditions here ... please, let me knwo you can spot anything thing in this... |
|
| Back to top |
|
 |
gcicchet
Senior Member
Joined: 28 Jul 2006 Posts: 1702 Location: Australia
|
|
|
|
Hi,
my mistake, my cut and paste was incorrect, it should be
| Code: |
SELECT FROM(IN) TO(OUT) ON(1,783,CH) FIRST USING(CTL1)
|
Gerry |
|
| Back to top |
|
 |
vishalbshah
New User
Joined: 01 Dec 2006 Posts: 61 Location: Pune
|
|
|
|
Yes it works!
Thanks Gerry
Appreciate your help.
I have got the desired output.
so this card says move records from input to output as it is and when ever the records (1,783) are same in consecutive records just copy such first record.
Please, correct my understanding. |
|
| Back to top |
|
 |
|
|
 |
All times are GMT + 6 Hours |
|