Portal | Manuals | References | Downloads | Info | Programs | JCLs | Master the Mainframes
IBM Mainframe Computers Forums Index
 
Register
 
IBM Mainframe Computers Forums Index Mainframe: Search IBM Mainframe Forum: FAQ Memberlist Usergroups Profile Log in to check your private messages Log in
 

 

Remove duplicates without sorting the sequence

 
Post new topic   Reply to topic    IBMMAINFRAMES.com Support Forums -> DFSORT/ICETOOL
View previous topic :: :: View next topic  
Author Message
saithvis2

New User


Joined: 23 Dec 2005
Posts: 61
Location: Providence , US

PostPosted: Tue Aug 22, 2006 5:20 pm    Post subject: Remove duplicates without sorting the sequence
Reply with quote

Hi all,

I have a file which is 80 byte in length(of which only the first 10 bytes are of my use ), i have to remove duplicates from the file which are in file. I dont want to change the sequence of the same , i just want to remove the duplicates.

eg:
file a
---------------

abc
xyz
xyz
abd
abd
mno


o/p file
------------------

abc
xyz
abd
mno


Thus , how to go about it ? i have written a job, but it is not accepting
sum fields = none with sort fields =copy. Now, since i want to removing only duplicates and do not sort the sequence , what modification shall i add in my jcl.

my current Jcl:
---------------------
Code:

//STEP002   EXEC PGM=SORT                               
//SYSPRINT  DD  SYSOUT=*                               
//SYSOUT    DD  SYSOUT=*                               
//SYSUDUMP  DD  SYSOUT=*                               
//SORTWK01 DD UNIT=SCRDSK,SPACE=(CYL,(250,50),RLSE)     
//SORTWK02 DD UNIT=SCRDSK,SPACE=(CYL,(250,50),RLSE)     
//SORTWK03 DD UNIT=SCRDSK,SPACE=(CYL,(250,50),RLSE)     
//SORTIN   DD DSN=ABC.XYZ(ABC#S),DISP=SHR   
//SORTOUT DD DSN=&&TEMP,DISP=(,PASS),                   
//    SPACE=(CYL,(5,5),RLSE),DCB=(LRECL=80,BLKSIZE=9040,
//    RECFM=FB,DSORG=PS)                               
//SYSIN     DD  *                                       
    SORT FIELDS=COPY                                   
    INREC FIELDS=(1,10)                                 
/*                                                     
//STEP003   EXEC PGM=SORT                               
//SYSPRINT  DD  SYSOUT=*                               
//SYSOUT    DD  SYSOUT=*                               
//SYSUDUMP  DD  SYSOUT=*                               
//SORTWK01 DD UNIT=SCRDSK,SPACE=(CYL,(250,50),RLSE)     
//SORTWK02 DD UNIT=SCRDSK,SPACE=(CYL,(250,50),RLSE)     
//SORTIN   DD DSN=&&TEMP,DISP=(OLD,KEEP,KEEP)           
//SORTOUT DD DSN=ABC.OUTPUT,DISP=(,CATLG,KEEP),
//    SPACE=(CYL,(5,5),RLSE),DCB=(LRECL=80,BLKSIZE=9040,
//    RECFM=FB,DSORG=PS),MGMTCLAS=SNAP60                 
//SYSIN     DD  *                                       
    SORT FIELDS=COPY                                     
    SUM FIELDS=NONE                                     
/*                                                       
Back to top
View user's profile Send private message

dinguduse

New User


Joined: 24 Jun 2005
Posts: 8

PostPosted: Tue Aug 22, 2006 6:36 pm    Post subject: Re: Remove duplicates without sorting the sequence
Reply with quote

Hi,

1) add sequence number using sort
2) remove duplicates
3) resequence based on the sequence number you added.

Rgds,
Aravind
Back to top
View user's profile Send private message
saithvis2

New User


Joined: 23 Dec 2005
Posts: 61
Location: Providence , US

PostPosted: Tue Aug 22, 2006 7:22 pm    Post subject: Re: Remove duplicates without sorting the sequence
Reply with quote

Hi all,

I have done the same using rexx , but is there a way i can do the same using only sort.

job used:

Code:

//STEP002   EXEC PGM=SORT                                 
//SYSPRINT  DD  SYSOUT=*                                 
//SYSOUT    DD  SYSOUT=*                                 
//SYSUDUMP  DD  SYSOUT=*                                 
//SORTWK01 DD UNIT=SCRDSK,SPACE=(CYL,(250,50),RLSE)       
//SORTWK02 DD UNIT=SCRDSK,SPACE=(CYL,(250,50),RLSE)       
//SORTWK03 DD UNIT=SCRDSK,SPACE=(CYL,(250,50),RLSE)       
//SORTIN   DD DSN=ABC.PDS(abc),DISP=SHR     
//SORTOUT DD DSN=ABC.TOT2,DISP=(NEW,CATLG,DELETE),   
//    SPACE=(CYL,(5,5),RLSE),DCB=(LRECL=80,BLKSIZE=9040, 
//    RECFM=FB,DSORG=PS)                                 
//SYSIN     DD  *                                         
    SORT FIELDS=COPY                                     
    INREC FIELDS=(1,10)                                   
/*
//STEP003  EXEC PGM=IEBGENER                                   
//SYSPRINT DD  SYSOUT=*                                       
//SYSUT1   DD DSN=ABC.REXX.CNTL(REXXTRY),DISP=SHR         
//SYSUT2   DD DSN=&&TEMP(GO),DISP=(,PASS),SPACE=(TRK,(1,1,1)),
//  DCB=(RECFM=FB,LRECL=80)                                   
//SYSIN    DD  DUMMY                                           
//*                                                           
//*------------------------------------------------------------
//STEP004 EXEC  PGM=IKJEFT01,PARM='%GO'                       
//SYSPROC   DD DISP=SHR,DSN=&&TEMP                             
//SYSTSPRT  DD SYSOUT=*                                       
//SYSPRINT  DD SYSOUT=*                                       
//SYSOUT    DD SYSOUT=*                                       
//PRINTER   DD SYSOUT=*                                       
//INDEX     DD SYSOUT=*                                       
//INFILE    DD DISP=(OLD,KEEP,KEEP),DSN=ABC.TOT2           
//SYSTSIN   DD DUMMY                                           
//NEW     DD DSN=ABC.OUTPUT,DISP=(,CATLG,KEEP),     
//    SPACE=(CYL,(5,5),RLSE),DCB=(LRECL=80,BLKSIZE=9040,       
//    RECFM=FB,DSORG=PS),MGMTCLAS=SNAP60                       
                                             


Rexx code ABC.REXX.CNTL(REXXTRY):

Code:

/* REXX */                                                             
say "I am starting now!"                                               
eof1 = "NO"                             /* we haven't got eof on input*/
i = 0                                   /* record count/index         */
j = 1                                                                   
                                                                       
do forever                                                             
  call read_infile                                                     
  if eof1 = "YES" then do                                               
    "EXECIO * DISKW NEW  (stem out_rec. FINIS"                         
    say "rec_cnt " i                                                   
    leave                                                               
  end                                                                   
end                                                                     
say "I am finished"                                                     
exit                                                                   
/*--------------------------------------------------------------------*/
/*  SUBROUTINE     */                                                   
read_infile:                     
"EXECIO 1 DISKR INFILE"           
if RC > 0 then do                 
  eof1   = "YES"                 
  return                         
end                               
                                 
parse pull inrec                 
                                 
i = i + 1                         
if i = 1 then do                 
   inrecprev = inrec             
   out_rec.j = inrec             
   j = j + 1                     
end                               
if inrecprev \= inrec then do     
   inrecprev = inrec             
   out_rec.j = inrec             
    j = j + 1                                                           
 end                                                                     
 return                                                                 
 /*--------------------------------------------------------------------*/



Regards
Vishal
Back to top
View user's profile Send private message
Frank Yaeger

DFSORT Moderator


Joined: 15 Feb 2005
Posts: 7130
Location: San Jose, CA

PostPosted: Tue Aug 22, 2006 7:59 pm    Post subject:
Reply with quote

Here's a DFSORT/ICETOOL job that will do what you asked for:

Code:

//S1 EXEC PGM=ICETOOL
//TOOLMSG DD SYSOUT=*
//DFSMSG DD SYSOUT=*
//IN DD DSN=...  input file
//T1 DD DSN=&&T1,UNIT=SYSDA,SPACE=(CYL,(5,5)),DISP=(,PASS)
//OUT DD DSN=...  output file
//TOOLIN DD *
SORT FROM(IN) TO(T1) USING(CTL1)
SORT FROM(T1) TO(OUT) USING(CTL2)
/*
//CTL1CNTL DD *
  INREC FIELDS=(1,10,11:SEQNUM,8,ZD)
  SORT FIELDS=(1,10,CH,A)
  SUM FIELDS=NONE
/*
//CTL2CNTL DD *
  SORT FIELDS=(11,8,ZD,A)
  OUTREC FIELDS=(1,10)
/*
Back to top
View user's profile Send private message
saithvis2

New User


Joined: 23 Dec 2005
Posts: 61
Location: Providence , US

PostPosted: Tue Aug 22, 2006 8:42 pm    Post subject: Re: Remove duplicates without sorting the sequence
Reply with quote

Hi Frank,

Thanks alot , ICETOOL is really a powerful tool. I will go through other DFSORT/ICETOOL tricks for which you have given link in other posts of this community.

Have a great day ! icon_smile.gif


Regards
Vishal
Back to top
View user's profile Send private message
vjai6977

New User


Joined: 08 Aug 2008
Posts: 19
Location: Chennai

PostPosted: Tue Oct 14, 2008 12:52 pm    Post subject:
Reply with quote

Hi, Referring to the above job and requirement, I have the below requirement and the sort card modified for my requirement, but I don't get my required result.

Job I used:
-------------

//S1 EXEC PGM=ICETOOL
//TOOLMSG DD SYSOUT=*
//DFSMSG DD SYSOUT=*
//IN DD DSN=USR7.TST.INPUT,DISP=SHR
//T1 DD DSN=&&T1,UNIT=SYSDA,SPACE=(CYL,(5,5)),DISP=(,PASS)
//OUT DD DSN=USR7.TST.OUTPUT,DISP=OLD
//TOOLIN DD *
SORT FROM(IN) TO(T1) USING(CTL1)
SORT FROM(T1) TO(OUT) USING(CTL2)
/*
//CTL1CNTL DD *
INREC FIELDS=(1,65,70:SEQNUM,8,ZD)
SORT FIELDS=(5,25,CH,A,51,3,CH,A,55,3,CH,A,59,3,CH,A,63,3,CH,A)
SUM FIELDS=NONE
/*
//CTL2CNTL DD *
SORT FIELDS=(70,8,ZD,A)
OUTREC FIELDS=(1,65)
/*





Input File:
------------

----+----1----+----2----+----3----+----4----+----5----+----6----+----
TEST.INPUT.FILE1(0) JN1 PN1 PS1 DD1
TEST.INPUT.FILE1(-1) JN1 PN1 PS1 DD1
TEST.INPUT.INPUT1(0) JN1 PN1 PS1 DD2
TEST.INPUT.INPUT2 JN1 PN1 PS1 DD3
TEST.INPUT.INPUT1 JN2 PN2 PS1 DD1
TEST.INPUT.INPUT2 JN2 PN2 PS1 DD2
TEST.INPUT.INPUT2 JN2 PN2 PS1 DD2
TEST.INPUT.INPUT1 JN3 PN3 PS1 DD1
TEST.INPUT.INPUT2 JN3 PN3 PS1 DD1
TEST.INPUT.INPUT3 JN3 PN3 PS1 DD1
TEST.INPUT.INPUT4 JN3 PN3 PS1 DD1
TEST.INPUT.INPUT5 JN3 PN3 PS1 DD1
TEST.INPUT.INPUT6 JN3 PN3 PS1 DD1
TEST.INPUT.INPUT1 JN4 PN4 PS1 DD1


Output File:
--------------

----+----1----+----2----+----3----+----4----+----5----+----6----+----
TEST.INPUT.FILE1(0) JN1 PN1 PS1 DD1
TEST.INPUT.FILE1(-1) JN1 PN1 PS1 DD1
TEST.INPUT.INPUT1(0) JN1 PN1 PS1 DD2
TEST.INPUT.INPUT2 JN1 PN1 PS1 DD3
TEST.INPUT.INPUT1 JN2 PN2 PS1 DD1
TEST.INPUT.INPUT2 JN2 PN2 PS1 DD2
TEST.INPUT.INPUT1 JN3 PN3 PS1 DD1
TEST.INPUT.INPUT2 JN3 PN3 PS1 DD1
TEST.INPUT.INPUT3 JN3 PN3 PS1 DD1
TEST.INPUT.INPUT4 JN3 PN3 PS1 DD1
TEST.INPUT.INPUT5 JN3 PN3 PS1 DD1
TEST.INPUT.INPUT6 JN3 PN3 PS1 DD1
TEST.INPUT.INPUT1 JN4 PN4 PS1 DD1


REQUIRED OUTPUT::
------------------------

----+----1----+----2----+----3----+----4----+----5----+----6----+----
TEST.INPUT.FILE1(0) JN1 PN1 PS1 DD1
TEST.INPUT.FILE1(-1) JN1 PN1 PS1 DD1
TEST.INPUT.INPUT1(0) JN1 PN1 PS1 DD2
TEST.INPUT.INPUT2 JN1 PN1 PS1 DD3
TEST.INPUT.INPUT1 JN2 PN2 PS1 DD1
TEST.INPUT.INPUT1 JN3 PN3 PS1 DD1
TEST.INPUT.INPUT2 JN3 PN3 PS1 DD1
TEST.INPUT.INPUT3 JN3 PN3 PS1 DD1
TEST.INPUT.INPUT4 JN3 PN3 PS1 DD1
TEST.INPUT.INPUT5 JN3 PN3 PS1 DD1
TEST.INPUT.INPUT6 JN3 PN3 PS1 DD1

Note:

This input data is a Job Vs Dataset cross reference containing
Dataset Name - Columns 5-25
Jobname - Columns 51-53
Proc name - Columns 55-57
proc Step name - Columns 59-61
proc DD name - Columns 63-65

The Input Rows 8 thru 13 are are datasets belonging to concatenated dataset of a DD DD1 .

My Requirement to the get the output as in Required Output column:
-------------------------------------------------------------------------------
1. Sorting should be done removing duplicates and not by doing a sorting and changing the order of the dataset.

2. In case of concatenated dataset if any duplicates those should be removed and the order of dataset input file should be maintained.

3. For direct datasets the duplicates should be removed

I welcome a solution on this.

Jai
Back to top
View user's profile Send private message
Frank Yaeger

DFSORT Moderator


Joined: 15 Feb 2005
Posts: 7130
Location: San Jose, CA

PostPosted: Tue Oct 14, 2008 8:48 pm    Post subject:
Reply with quote

Your "rules" are not clear.

Which field or fields define a duplicate?

Which fields/values tell us we have a concatenated dataset?

Which fields/values tell us we have a "direct dataset"?

Please do a better job of explaining what you want to do based on the fields in the input records.
Back to top
View user's profile Send private message
vjai6977

New User


Joined: 08 Aug 2008
Posts: 19
Location: Chennai

PostPosted: Wed Oct 15, 2008 4:37 pm    Post subject:
Reply with quote

Hi Frank, I had re-structured the input dataset as below.

0______________25___51__ 55___59___63
TEST.INPUT.FILE1(0)***JN1**PN1**PS1**DD1
TEST.INPUT.FILE1(-1)**JN1**PN1**PS1**DD1
TEST.INPUT.INPUT1(0)*JN1**PN1** PS1**DD2
TEST.INPUT.INPUT2****JN1**PN1**PS1**DD3

TEST.INPUT.INPUT1****JN2**PN2**PS1**DD1
TEST.INPUT.INPUT2****JN2**PN2**PS1**DD2
TEST.INPUT.INPUT2****JN2**PN2**PS1**DD2
TEST.INPUT.INPUT1****JN3**PN3**PS1**DD1
TEST.INPUT.INPUT2****JN3**PN3**PS1**DD1
TEST.INPUT.INPUT3****JN3**PN3**PS1**DD1
TEST.INPUT.INPUT4****JN3**PN3**PS1**DD1
TEST.INPUT.INPUT5****JN3**PN3**PS1**DD1
TEST.INPUT.INPUT6****JN3**PN3**PS1**DD1

TEST.INPUT.INPUT1****JN4**PN4**PS1**DD1

Here '*' is a blank space.

1. The fields on position 0 thru 25 should be considered for eliminating duplicates.

2. If the values on fields 51-53 / 55-57 / 59-61 / 63-65 repeats in more then one row those are concatenated dataset entries. Here duplicates if any should be eliminated and the dataset order should be maintained.

3. For the column positions stated in point 2, if the values in one row is different from the next row then those are direct datasets. For all these duplicates should be removed on position 0-25

I need the duplicates datasets removed from entries 0-25 for all rows, but the dataset order against concatenated entries (means values in columns 51 - 65 repeated more then one row) should not get jumbled.

I thank you for you time to look into my query.


Jai
Back to top
View user's profile Send private message
Frank Yaeger

DFSORT Moderator


Joined: 15 Feb 2005
Posts: 7130
Location: San Jose, CA

PostPosted: Wed Oct 15, 2008 10:26 pm    Post subject:
Reply with quote

I've tried, but I just can't figure out what you're trying to do. I don't know if you're matching up the dsnames and the other four fields or just the other four fields. I don't know if by "duplicates if any should be eliminated" you mean to eliminate all records with a match or just keep the first record with a match.

It would really help if you would show example of the input records for each case and the expected output record (or none) for that case. Particularly, the case where you want to remove one or more records.
For example:

Input
TEST.INPUT.FILE1(0)***JN1**PN1**PS1**DD1
TEST.INPUT.FILE1(-1)**JN1**PN1**PS1**DD1

Output
?

The dsnames don't match, but the four fields match so do you want the first record, both records or neither record?

Input
TEST.INPUT.INPUT2****JN2**PN2**PS1**DD2
TEST.INPUT.INPUT2****JN2**PN2**PS1**DD2

Output
?

The dsname and four fields match so do you want the first record, both records or neither record?

Input
TEST.INPUT.INPUT1****JN3**PN3**PS1**DD1
TEST.INPUT.INPUT2****JN3**PN3**PS1**DD1
TEST.INPUT.INPUT3****JN3**PN3**PS1**DD1
TEST.INPUT.INPUT4****JN3**PN3**PS1**DD1
TEST.INPUT.INPUT5****JN3**PN3**PS1**DD1
TEST.INPUT.INPUT6****JN3**PN3**PS1**DD1

Output
?

The dsnames don't match, but the four fields match so do you want the first record, all records or no records?
Back to top
View user's profile Send private message
vjai6977

New User


Joined: 08 Aug 2008
Posts: 19
Location: Chennai

PostPosted: Fri Oct 17, 2008 12:12 pm    Post subject:
Reply with quote

Hi Frank,

My requirement is

1. I should get the duplicate datasets eliminated on column positions 1-25

2. At the same time, the dataset order of concatenated datasets should
not get changed after sort.

This I should have told you earlier icon_smile.gif

I worked out the below JCL and was able to get the required output.

Code:

//STEP01   EXEC PGM=ICETOOL
//TOOLMSG  DD SYSOUT=*
//DFSMSG   DD SYSOUT=*
//IN       DD DSN=UDID.TST.INPT1,DISP=SHR
//*1       DD DSN=&&T1,UNIT=SYSDA,SPACE=(CYL,(5,5)),DISP=(,PASS)
//T1       DD DSN=UDID.TST.STEP01.T1,
//            UNIT=DISK,DISP=(,CATLG,DELETE),
//            DCB=(RECFM=FB,LRECL=70,BLKSIZE=700),
//            SPACE=(CYL,(1,5),RLSE)
//OUT      DD DSN=UDID.TST.STEP01.SORTOUT,
//            UNIT=DISK,DISP=(,CATLG,DELETE),
//            DCB=(RECFM=FB,LRECL=70,BLKSIZE=700),
//            SPACE=(CYL,(1,5),RLSE)
//TOOLIN   DD *
SORT FROM(IN) TO(T1) USING(CTL1)
SORT FROM(T1) TO(OUT) USING(CTL2)
/*
//CTL1CNTL DD *
  INREC FIELDS=(1,65,66:SEQNUM,5,ZD)
  SORT FIELDS=(1,25,CH,A)
  SUM FIELDS=NONE
/*
//CTL2CNTL DD *
  SORT FIELDS=(66,5,ZD,A)
  OUTREC FIELDS=(1,65)
/*
//*


The '*' represents blank spaces.

Supplied Input:
------------------
TEST.INPUT.FILE1(0) **** JN1 PN1 PS1 DD1
TEST.INPUT.FILE1(-1) **** JN1 PN1 PS1 DD1
TEST.INPUT.INPUT1(0) **** JN1 PN1 PS1 DD2
TEST.INPUT.INPUT2 **** JN1 PN1 PS1 DD3
TEST.INPUT.INPUT1 **** JN2 PN2 PS1 DD1
TEST.INPUT.INPUT2 **** JN2 PN2 PS1 DD2
TEST.INPUT.INPUT2 **** JN2 PN2 PS1 DD2
TEST.INPUT.INPUT1 **** JN3 PN3 PS1 DD1
TEST.INPUT.INPUT1 **** JN3 PN3 PS1 DD1
TEST.INPUT.INPUT2 **** JN3 PN3 PS1 DD1
TEST.INPUT.INPUT2 **** JN3 PN3 PS1 DD1
TEST.INPUT.INPUT3 **** JN3 PN3 PS1 DD1
TEST.INPUT.INPUT3 **** JN3 PN3 PS1 DD1
TEST.INPUT.INPUT4 **** JN3 PN3 PS1 DD1
TEST.INPUT.INPUT4 **** JN3 PN3 PS1 DD1
TEST.INPUT.INPUT5 **** JN3 PN3 PS1 DD1
TEST.INPUT.INPUT5 **** JN3 PN3 PS1 DD1
TEST.INPUT.INPUT6 **** JN3 PN3 PS1 DD1
TEST.INPUT.INPUT6 **** JN3 PN3 PS1 DD1
TEST.INPUT.INPUT1 **** JN4 PN4 PS1 DD1


Achieved output:
--------------------
TEST.INPUT.FILE1(0) *** JN1 PN1 PS1 DD1.....
TEST.INPUT.FILE1(-1) *** JN1 PN1 PS1 DD1.....
TEST.INPUT.INPUT1(0) *** JN1 PN1 PS1 DD2.....
TEST.INPUT.INPUT2 *** JN1 PN1 PS1 DD3.....
TEST.INPUT.INPUT1 *** JN3 PN3 PS1 DD1.....
TEST.INPUT.INPUT3 *** JN3 PN3 PS1 DD1.....
TEST.INPUT.INPUT4 *** JN3 PN3 PS1 DD1.....
TEST.INPUT.INPUT5 *** JN3 PN3 PS1 DD1.....
TEST.INPUT.INPUT6 *** JN3 PN3 PS1 DD1.....

The above ICETOOL step i had learned from you on an other query which i had posted earlier. I thank you for your time on this query.

Jai
Back to top
View user's profile Send private message
Frank Yaeger

DFSORT Moderator


Joined: 15 Feb 2005
Posts: 7130
Location: San Jose, CA

PostPosted: Sat Oct 18, 2008 1:41 am    Post subject:
Reply with quote

Quote:
I worked out the below JCL and was able to get the required output.


Good for you. icon_cool.gif
Back to top
View user's profile Send private message
View previous topic :: :: View next topic  
Post new topic   Reply to topic    IBMMAINFRAMES.com Support Forums -> DFSORT/ICETOOL All times are GMT + 6 Hours
Page 1 of 1

 

Search our Forum:

Similar Topics
Topic Author Forum Replies Posted
No new posts Format record to remove Leading zeroes Learncoholic DFSORT/ICETOOL 14 Wed Apr 05, 2017 2:43 pm
No new posts Remove junk values in a file and rite... KP1125 DFSORT/ICETOOL 2 Wed Jan 25, 2017 9:58 pm
No new posts Sort Card to Remove Duplicate records... raj4neo SYNCSORT 2 Wed Jan 25, 2017 4:44 am
No new posts Sorting group data rajella DFSORT/ICETOOL 4 Sun Jan 22, 2017 11:32 pm
No new posts Removing Duplicates based on certain ... chandracdac DFSORT/ICETOOL 8 Fri Dec 09, 2016 4:40 am


Facebook
Back to Top
 
Mainframe Wiki | Forum Rules | Bookmarks | Subscriptions | FAQ | Tutorials | Contact Us