IBM Mainframe Forum Index
 
Log In
 
IBM Mainframe Forum Index Mainframe: Search IBM Mainframe Forum: FAQ Register
 

Removing duplicates without Sorting in JCL


IBM Mainframe Forums -> JCL & VSAM
Post new topic   This topic is locked: you cannot edit posts or make replies.
View previous topic :: View next topic  
Author Message
vishalbshah

New User


Joined: 01 Dec 2006
Posts: 61
Location: Pune

PostPosted: Fri Apr 13, 2012 12:49 pm
Reply with quote

Hi,

I Have a file with 2 Header record staring with Characters, details record starting with Numebrs and Trailer record again Staring with Characters

I want to remove duplicates in the details record but don't want to SORT the file as the Headers, Details and Trailer don't have any record type in such a way that if I sort the headers and trailer would remain first and last records.

I tried below in SORT

Code:

SORT FIELDS = COPY
SUM FIELDS = NONE



But the same is not working as expected

Could you please suggest,

Other way round is i would have to take out headers, details and trailer in separate files and then merge them after removing duplicates from details file.
Back to top
View user's profile Send private message
Bill Woodger

Moderator Emeritus


Joined: 09 Mar 2011
Posts: 7309
Location: Inside the Matrix

PostPosted: Fri Apr 13, 2012 1:27 pm
Reply with quote

What you want is DATASORT, but I don't think you have it, as you've posted in the JCL forum so we assume you have Syncsort...

If you could strip off the trailer, how would you identify it to do so?
Back to top
View user's profile Send private message
enrico-sorichetti

Superior Member


Joined: 14 Mar 2007
Posts: 10873
Location: italy

PostPosted: Fri Apr 13, 2012 1:34 pm
Reply with quote

from Your profile
Quote:
Mainframe Skills: music


this kind of humor is frowned upon on professional forums
usually people who reply look at the TS profile to see what tone and terminology use when answering

seeing a stupid skill description will lower the benevolence level
and often for that reason You are going to miss quite a few good answers

but if You imply skills in the MUSIC/SP operating system....
well that' pretty useless ...
it was dismissed and unsupported from McGill university for some years.
www.canpub.com/teammpg/
Back to top
View user's profile Send private message
vishalbshah

New User


Joined: 01 Dec 2006
Posts: 61
Location: Pune

PostPosted: Fri Apr 13, 2012 2:00 pm
Reply with quote

Hi,

I have corrected my profile .

Could someone please, suggest the way to remove duplicates without Sorting the file ?

I am using Syncsort , I was thinking of preparing 1st Sort step to separate one file for headers one (STOPAFT=2) one file for trailer (as it starts characters) and one file for Details where i could remove duplicates and then in the next step merge them in the order as

Header file
Details file
Trailer file

this is my last option , but this means I have to create 2 steps , i was looking for some option to have it in 1 step.
Back to top
View user's profile Send private message
Bill Woodger

Moderator Emeritus


Joined: 09 Mar 2011
Posts: 7309
Location: Inside the Matrix

PostPosted: Fri Apr 13, 2012 2:06 pm
Reply with quote

Have a look at this recent one.

EDIT: Just looking back, do you want to sort at all? Or just remove duplicates from the file asis? In which case, unless you have duplicate headers and trailers, they don't come into it anyway, do they?
Back to top
View user's profile Send private message
Bill Woodger

Moderator Emeritus


Joined: 09 Mar 2011
Posts: 7309
Location: Inside the Matrix

PostPosted: Fri Apr 13, 2012 2:44 pm
Reply with quote

Why go off and post to the other topic? Messes that one up and looses continuity here.

Do you need to sort the file to get your duplicates (ie, are they already contiguous, or do they need to be shuffled about to make them contiguous)?

What is the RECFM/LRECL of your file?
Back to top
View user's profile Send private message
vishalbshah

New User


Joined: 01 Dec 2006
Posts: 61
Location: Pune

PostPosted: Fri Apr 13, 2012 2:49 pm
Reply with quote

Hi

The file is of 783 characters and record format is Fixed :

Code:

CDU MATCHING ENGINE REPORT FOR PROJECT CPP   |DATE:2012-04-12|FULL / PAR
ROWID|CPP PRODUCT ID|CANDIDATE CIN|CANDIDATE SIGN|CANDIDATE SCORE|FIRST
00001|CPP000562712  |0800000000   |+             |0100           |HAYLEY
00001|CPP000562712  |0800000020   |+             |0041           |ADRIAN
00001|CPP000562712  |0800000018   |+             |0041           |HILLAR
00001|CPP000562712  |0800000017   |+             |0041           |MATHEW
00001|CPP000562712  |0800000019   |+             |0041           |HALE 
00004|CPP000562752  |0800000000   |+             |0055           |HAYLEY
00004|CPP000562752  |0800000000   |+             |0055           |HAYLEY
00004|CPP000562752  |0800000020   |+             |0041           |ADRIAN
00004|CPP000562752  |0800000018   |+             |0041           |HILLAR
00004|CPP000562752  |0800000017   |+             |0041           |MATHEW
00004|CPP000562752  |0800000019   |+             |0041           |HALE 
00004|CPP000562752  |0800000019   |+             |0041           |HALE 
00005|CPP000562772  |0800000004   |+             |0055           |PAULIN
NUMBER OF MATCHES RETURNED FOR BASE DATA :   |000011                   



I just want to renove duplicates from the details records and want to leave the header trailer as well as the details in the same order as they appear in the input file.
Back to top
View user's profile Send private message
vishalbshah

New User


Joined: 01 Dec 2006
Posts: 61
Location: Pune

PostPosted: Fri Apr 13, 2012 2:53 pm
Reply with quote

Hi Bill,

I don't want to sort the file

just want to remove duplicates.
Back to top
View user's profile Send private message
Bill Woodger

Moderator Emeritus


Joined: 09 Mar 2011
Posts: 7309
Location: Inside the Matrix

PostPosted: Fri Apr 13, 2012 3:05 pm
Reply with quote

OK. One thing is the length of your record. I don't have Syncsort docs, so you'll have to check on what the limit is for the field length for a comparison (like IFTHEN=(WHEN=(start,length,type... what is the maximum for "length"?).

If it can't handle your entire record, search the DFSORT forum for a nice solution from SQLCODE1 which you should be able to apply to yours. Search for Sammmy.
Back to top
View user's profile Send private message
vishalbshah

New User


Joined: 01 Dec 2006
Posts: 61
Location: Pune

PostPosted: Fri Apr 13, 2012 3:26 pm
Reply with quote

Hi Bill,

The solution looks good but would it slove the issue of leaving the header and trailers and the details records as it is in their position ?
Back to top
View user's profile Send private message
Bill Woodger

Moderator Emeritus


Joined: 09 Mar 2011
Posts: 7309
Location: Inside the Matrix

PostPosted: Fri Apr 13, 2012 3:37 pm
Reply with quote

The solution does not depend on a SORT occurring, it was just necessary in that case to get the results for that requirement. You can use FIELDS=COPY.
Back to top
View user's profile Send private message
vishalbshah

New User


Joined: 01 Dec 2006
Posts: 61
Location: Pune

PostPosted: Fri Apr 13, 2012 3:58 pm
Reply with quote

But without sorting the duplicates would not be removed, isn't it ?

If I use just COPY this would copy all the records as it is in the output.

Apologies if i am missing anything here but I am unable to relates the solution.

My requirement is to just remove consecutive duplicate record without any sorting (i.e. all the records headers, details, trailer ) should retain their own position.

Would appreciate if you could provide a code for file of 783 characters and Fixed record format.
Back to top
View user's profile Send private message
gcicchet

Senior Member


Joined: 28 Jul 2006
Posts: 1702
Location: Australia

PostPosted: Sat Apr 14, 2012 3:35 am
Reply with quote

H,

maybe this will help
Code:
//S1       EXEC PGM=ICETOOL                           
//TOOLMSG  DD SYSOUT=*                                 
//DFSMSG   DD SYSOUT=*                                 
//IN       DD DSN=input-file,DISP=SHR       
//OUT      DD SYSOUT=*                                 
//TOOLIN   DD *                                       
SELECT FROM(IN) TO(OUT) ON(1,783,CH) FIRST             
/*                                                     
//CTL1CNTL DD *                                       
  SORT FIELDS=COPY                                     
/*                                                     


Gerry
Back to top
View user's profile Send private message
Bill Woodger

Moderator Emeritus


Joined: 09 Mar 2011
Posts: 7309
Location: Inside the Matrix

PostPosted: Sat Apr 14, 2012 3:39 am
Reply with quote

I think I gave you a "bad steer" with my second suggestion. Sorry about that.

Is your data in sequence? Or is it just that you need to retain the existing order, and get rid of the duplicates?

If the former, are you allowed to use your Synctool? Can you do a MERGE with a single file, do the SUM FIELDS=NONE that way.

If the latter, can you identify the trailer by default (as not something else) or by a particular value that won't exist elsewhere? If so, you can modify my first suggestion by adding a sequence number, sorting on the data (whole record) SUM FIELDS=NONE and then sorting on the added sequence number to get back to the original order.

How many records are you expecting when you're doing this for real?
Back to top
View user's profile Send private message
Bill Woodger

Moderator Emeritus


Joined: 09 Mar 2011
Posts: 7309
Location: Inside the Matrix

PostPosted: Sat Apr 14, 2012 5:44 pm
Reply with quote

If you are not allowed to "tool" it and your data is not in sequence, you could try this type of thing out, tested with DFSORT, so not directly applicable to you, maybe:

Code:
//DEDUP    EXEC PGM=SORT
//SYSOUT   DD SYSOUT=*
//SORTOUT  DD SYSOUT=*
//SYSIN    DD *
  JOINKEYS F1=INA,FIELDS=(5,3,A,1,4,A),SORTED,NOSEQCK
  JOINKEYS F2=INB,FIELDS=(5,3,A,1,4,A),SORTED,NOSEQCK
  REFORMAT FIELDS=(F1:1,4)
  JOIN UNPAIRED,F1,ONLY
  OPTION COPY
//*
//JNF1CNTL DD *
  INREC OVERLAY=(5:SEQNUM,3,ZD,START=0)
//JNF2CNTL DD *
  INREC OVERLAY=(5:SEQNUM,3,ZD,START=1)
//*
//INA      DD *
HDR1
HDR2
1111
1111
1101
2222
2222
2222
2222
2122
3333
4444
4444
5555
6666
6667
TRL
//INB      DD *
HDR1
HDR2
1111
1111
1101
2222
2222
2222
2222
2122
3333
4444
4444
5555
6666
6667
TRL


The idea is to put the data in sequence, with a sequence number. The same file is specificed for INA and INB, but the sequence numbers are generated "off by one" between the two versions of the file. Then use JOINKEYS to do the comparison (which can have key total length up to 4080 bytes).

The UNMATCHED from F1 represent those which are either unique, or the one (first) representing a coniguous set.

Tested with four-byte keys, up to you to do it with the 783.

You'd need to change the JNFnCNTLs for the OVERLAYs to start at column 784 and to ensure the sizes of the sequence numbers are sufficient for the maximum number of duplicates.

Then change the position or length of the key for both JOINKEYS (784,length-of-sequence-number,A,1,783,A).

As this will involve reading the data twice, it is only a good option if you can't use one of the others.

Now, you have Syncsort. Don't know if you can have the JNFnCNTLs. If not, you'd end up doing those in a seperate step with two OUTFILs, followed by the JOINKEYS.
Back to top
View user's profile Send private message
Bill Woodger

Moderator Emeritus


Joined: 09 Mar 2011
Posts: 7309
Location: Inside the Matrix

PostPosted: Sat Apr 14, 2012 7:53 pm
Reply with quote

Here's an alternative. You only need to consider these last two if you can't 'tool it or MERGE with one file and SUM FIELDS=NONE. They will be more resource-hungry than those.

Here your test for equality in the OUTFIL OMIT will get tricky, as you'll have to split it into four pieces 1,256 ,257,256, 513,256 769,14. Obviously start,length of a lot of things need changing.

If you need to use this one, it could probably be souped-up a little.

Code:
//DEDUP    EXEC PGM=SORT
//SYSOUT   DD SYSOUT=*
//SORTOUT  DD SYSOUT=*
//SYSIN    DD *
  OPTION COPY
  INREC IFTHEN=(WHEN=INIT,
                OVERLAY=(5:SEQNUM,3,ZD,5,3,ZD,MOD,+2,EDIT=(T)))
                                                               
  OUTREC IFTHEN=(WHEN=GROUP,BEGIN=(8,1,CH,EQ,C'1'),RECORDS=2,
                 PUSH=(9:1,4)),
         IFTHEN=(WHEN=GROUP,BEGIN=(8,1,CH,EQ,C'0'),RECORDS=2,
                 PUSH=(13:1,4))
                                                               
  OUTFIL OMIT=(9,4,CH,EQ,13,4,CH),BUILD=(1,4)
//SORTIN   DD *
HDR1
HDR2
1111
1111
1101
2222
2222
2222
2222
2221
2221
2221
2221
2121
3333
4444
4444
5555
6666
6667
TRL
Back to top
View user's profile Send private message
vishalbshah

New User


Joined: 01 Dec 2006
Posts: 61
Location: Pune

PostPosted: Mon Apr 16, 2012 10:15 am
Reply with quote

Hi Bill,

My data (Details Records) is already in a sorted order which I want.

Only thing is I need is to remove any duplicate detail records without any jumbling of the details or headers or trailer record.

I am trying your last suggestion using JOINKEYS.
Back to top
View user's profile Send private message
vishalbshah

New User


Joined: 01 Dec 2006
Posts: 61
Location: Pune

PostPosted: Mon Apr 16, 2012 10:39 am
Reply with quote

Hi Bill,

I can identify header and trailer with the specific text.

consider the same input which i mentioned before and I worte below two steps to get the required output.

Code:

//TCSVBSTS JOB 0000,'SORT',CLASS=7,MSGCLASS=U,                         
//    NOTIFY=&SYSUID,MSGLEVEL=(1,1)                                     
//* $ACFJ219 ACF2 ACTIVE I003                                           
//SORT10   EXEC PGM=SORT                                               
//SORTIN   DD DSN=TCS.TEST.SORT.VISHAL,DISP=SHR                         
//HDRS     DD DSN=&&HDRS,                                               
//            DISP=(NEW,PASS,DELETE)                                   
//TRL      DD DSN=&&TRL,                                               
//            DISP=(NEW,PASS,DELETE)                                   
//DETS     DD DSN=&&DETS,                                               
//            DISP=(NEW,PASS,DELETE)                                   
//SYSOUT   DD   SYSOUT=*                                               
//SYSOUZ   DD   SYSOUT=*                                               
//SORTWK01 DD   SPACE=(TRK,(1,1))                                       
//SYSIN    DD *                                                         
  SORT FIELDS=(1,783,CH,A),EQUALS                                       
  OUTFIL FNAMES=HDRS,INCLUDE=(1,3,CH,EQ,C'CDU',OR,1,3,CH,EQ,C'ROW')     
  OUTFIL FNAMES=TRL,INCLUDE=(1,3,CH,EQ,C'NUM')                         
  OUTFIL FNAMES=DETS,INCLUDE=(1,5,CH,GT,C'00000')                       
  SUM FIELDS=NONE                                     
/*                                                   
//SORT20   EXEC PGM=SORT                             
//SORTIN   DD DSN=&&HDRS,                             
//            DISP=(OLD,PASS,DELETE)                 
//         DD DSN=&&DETS,                             
//            DISP=(OLD,PASS,DELETE)                 
//         DD DSN=&&TRL,                             
//            DISP=(OLD,PASS,DELETE)                 
//SORTOUT DD DSN=TCS.TEST.SORT.VISHAL.OUTR,           
//            DISP=SHR                               
//SYSOUT   DD   SYSOUT=*                             
//SYSOUZ   DD   SYSOUT=*                             
//SORTWK01 DD   SPACE=(TRK,(1,1))                     
//SYSIN    DD *                                       
  SORT FIELDS=COPY                                   
/*                                                   



the only problem with this is this is done in 2 steps and I wanted to do it in 1 step.

The Output is :

Code:


CDU MATCHING ENGINE REPORT FOR PROJECT CPP   |DATE:2012-04-12|FULL / PAR
ROWID|CPP PRODUCT ID|CANDIDATE CIN|CANDIDATE SIGN|CANDIDATE SCORE|FIRST
00001|CPP000562712  |0800000000   |+             |0100           |HAYLEY
00001|CPP000562712  |0800000017   |+             |0041           |MATHEW
00001|CPP000562712  |0800000018   |+             |0041           |HILLAR
00001|CPP000562712  |0800000019   |+             |0041           |HALE 
00001|CPP000562712  |0800000020   |+             |0041           |ADRIAN
00004|CPP000562752  |0800000000   |+             |0055           |HAYLEY
00004|CPP000562752  |0800000017   |+             |0041           |MATHEW
00004|CPP000562752  |0800000018   |+             |0041           |HILLAR
00004|CPP000562752  |0800000019   |+             |0041           |HALE 
00004|CPP000562752  |0800000020   |+             |0041           |ADRIAN
00005|CPP000562772  |0800000004   |+             |0055           |PAULIN
NUMBER OF MATCHES RETURNED FOR BASE DATA :   |000011                   

Back to top
View user's profile Send private message
gcicchet

Senior Member


Joined: 28 Jul 2006
Posts: 1702
Location: Australia

PostPosted: Mon Apr 16, 2012 10:50 am
Reply with quote

Hi,

did you try my suggestion ?


Gerry
Back to top
View user's profile Send private message
vishalbshah

New User


Joined: 01 Dec 2006
Posts: 61
Location: Pune

PostPosted: Mon Apr 16, 2012 10:54 am
Reply with quote

Hi Gerry,

I was looking for SORT utility , as we are not having ICETOOL, have not used it before, i'll check with the configuration team here whether we can use TOOL in our setup.
Back to top
View user's profile Send private message
gcicchet

Senior Member


Joined: 28 Jul 2006
Posts: 1702
Location: Australia

PostPosted: Mon Apr 16, 2012 10:59 am
Reply with quote

Hi,

SYNCTOOL (alias ICETOOL) is part of SYNCSORT.


Gerry
Back to top
View user's profile Send private message
vishalbshah

New User


Joined: 01 Dec 2006
Posts: 61
Location: Pune

PostPosted: Mon Apr 16, 2012 11:12 am
Reply with quote

Hi Gerry,

I tried your suggetion but the same is not resulting in the expected outout.

Code:


//TCSVBSTS JOB 0000,'ICETOOL',CLASS=7,MSGCLASS=U,   
//    NOTIFY=&SYSUID,MSGLEVEL=(1,1)                 
//* $ACFJ219 ACF2 ACTIVE I003                       
//SORT10   EXEC PGM=ICETOOL                         
//IN       DD DSN=TCS.TEST.SORT.VISHAL,DISP=SHR     
//OUT      DD DSN=TCS.TEST.SORT.VISHAL1,DISP=SHR   
//TOOLMSG  DD   SYSOUT=*                           
//DFSMSG   DD   SYSOUT=*                           
//SORTWK01 DD   SPACE=(TRK,(1,1))                   
//TOOLIN   DD *                                     
SELECT FROM(IN) TO(OUT) ON(1,783,CH) FIRST         
//CTL1CNTL DD *                                     
SORT FIELDS=COPY                                   
/*                                                 



Output I am getting is

Code:

CDU MATCHING ENGINE REPORT FOR PROJECT CPP   |DATE:2012-04-12|FULL / PAR
NUMBER OF MATCHES RETURNED FOR BASE DATA :   |000011                   
ROWID|CPP PRODUCT ID|CANDIDATE CIN|CANDIDATE SIGN|CANDIDATE SCORE|FIRST
00001|CPP000562712  |0800000000   |+             |0100           |HAYLEY
00001|CPP000562712  |0800000017   |+             |0041           |MATHEW
00001|CPP000562712  |0800000018   |+             |0041           |HILLAR
00001|CPP000562712  |0800000019   |+             |0041           |HALE 
00001|CPP000562712  |0800000020   |+             |0041           |ADRIAN
00004|CPP000562752  |0800000000   |+             |0055           |HAYLEY
00004|CPP000562752  |0800000017   |+             |0041           |MATHEW
00004|CPP000562752  |0800000018   |+             |0041           |HILLAR
00004|CPP000562752  |0800000019   |+             |0041           |HALE 
00004|CPP000562752  |0800000020   |+             |0041           |ADRIAN
00005|CPP000562772  |0800000004   |+             |0055           |PAULIN


As you can see the trailer is now moved to header
it has removed the duplicates but the requirement is not to change the headers and trailer ordering.
Back to top
View user's profile Send private message
vishalbshah

New User


Joined: 01 Dec 2006
Posts: 61
Location: Pune

PostPosted: Mon Apr 16, 2012 11:16 am
Reply with quote

Hi Bill,

I have tried the alterantive which you suggested :

Code:


//TCSVBSTS JOB 0000,'SORT1',CLASS=7,MSGCLASS=U,                     
//    NOTIFY=&SYSUID,MSGLEVEL=(1,1)                                 
//* $ACFJ219 ACF2 ACTIVE I003                                       
//SORT10   EXEC PGM=SORT                                           
//SORTIN   DD DSN=TCS.TEST.SORT.VISHAL,DISP=SHR                     
//SORTOUT  DD DSN=TCS.TEST.SORT.VISHAL1,DISP=SHR                   
//SYSOUT   DD   SYSOUT=*                                           
//SYSOUZ   DD   SYSOUT=*                                           
//SORTWK01 DD   SPACE=(TRK,(1,1))                                   
//SYSIN    DD *                                                     
  OPTION COPY                                                       
  INREC IFTHEN=(WHEN=INIT,                                         
                OVERLAY=(784:SEQNUM,3,ZD,784,3,ZD,MOD,+2,EDIT=(T)))
  OUTREC IFTHEN=(WHEN=GROUP,BEGIN=(787,1,CH,EQ,C'1'),RECORDS=2,     
                PUSH=(788:1,4)),                                   
         IFTHEN=(WHEN=GROUP,BEGIN=(787,1,CH,EQ,C'0'),RECORDS=2,     
                PUSH=(792:1,4))                                     
  OUTFIL OMIT=(788,4,CH,EQ,792,4,CH),BUILD=(1,783)                 
/*                                                                 


but the result is not correct, seems to be only the matched detail record with headers and trailer appeared in the result.

Received Output:
Code:

CDU MATCHING ENGINE REPORT FOR PROJECT CPP   |DATE:2012-04-12|FULL / PAR
ROWID|CPP PRODUCT ID|CANDIDATE CIN|CANDIDATE SIGN|CANDIDATE SCORE|FIRST
00001|CPP000562712  |0800000000   |+             |0100           |HAYLEY
NUMBER OF MATCHES RETURNED FOR BASE DATA :   |000011                   


I am trying to change some conditions here ... please, let me knwo you can spot anything thing in this...
Back to top
View user's profile Send private message
gcicchet

Senior Member


Joined: 28 Jul 2006
Posts: 1702
Location: Australia

PostPosted: Mon Apr 16, 2012 11:39 am
Reply with quote

Hi,

my mistake, my cut and paste was incorrect, it should be
Code:
SELECT FROM(IN) TO(OUT) ON(1,783,CH) FIRST USING(CTL1) 




Gerry
Back to top
View user's profile Send private message
vishalbshah

New User


Joined: 01 Dec 2006
Posts: 61
Location: Pune

PostPosted: Mon Apr 16, 2012 11:53 am
Reply with quote

Yes it works!

Thanks Gerry

Appreciate your help.

I have got the desired output.

so this card says move records from input to output as it is and when ever the records (1,783) are same in consecutive records just copy such first record.

Please, correct my understanding.
Back to top
View user's profile Send private message
View previous topic :: :: View next topic  
Post new topic   This topic is locked: you cannot edit posts or make replies. View Bookmarks
All times are GMT + 6 Hours
Forum Index -> JCL & VSAM Goto page 1, 2  Next

 


Similar Topics
Topic Forum Replies
This topic is locked: you cannot edit posts or make replies. Automation need help in sorting the data DFSORT/ICETOOL 38
No new posts How to remove block of duplicates DFSORT/ICETOOL 8
This topic is locked: you cannot edit posts or make replies. Compare files with duplicates in one ... DFSORT/ICETOOL 11
No new posts Sorting a record spanned over multipl... DFSORT/ICETOOL 13
No new posts Removing date values lines/records fr... SYNCSORT 2
Search our Forums:

Back to Top