Portal | Manuals | References | Downloads | Info | Programs | JCLs | Master the Mainframes
IBM Mainframe Computers Forums Index
 
Register
 
IBM Mainframe Computers Forums Index Mainframe: Search IBM Mainframe Forum: FAQ Memberlist Usergroups Profile Log in to check your private messages Log in
 

 

Remove the duplicates without sorting the records

 
Post new topic   Reply to topic    IBMMAINFRAMES.com Support Forums -> JCL & VSAM
View previous topic :: :: View next topic  
Author Message
maki_psg

New User


Joined: 28 Jan 2010
Posts: 47
Location: India

PostPosted: Tue May 13, 2014 10:32 pm    Post subject: Remove the duplicates without sorting the records
Reply with quote

Hi,

I would like to remove the duplicates without sorting the records as per one of the requirements.

Could you please help me how we can achieve using Sync sort?

Input

Code:

ROBERT   101
ANDERSON 102
ROBERT   101


Output should be

Code:

ROBERT   101
ANDERSON 102
Back to top
View user's profile Send private message

Akatsukami

Global Moderator


Joined: 03 Oct 2009
Posts: 1738
Location: Bloomington, IL

PostPosted: Tue May 13, 2014 11:34 pm    Post subject:
Reply with quote

To pick only the first question that comes to mind, define "duplicate".
Back to top
View user's profile Send private message
Craq Giegerich

Senior Member


Joined: 19 May 2007
Posts: 1512
Location: Virginia, USA

PostPosted: Tue May 13, 2014 11:45 pm    Post subject: Re: Remove the duplicates without sorting the records
Reply with quote

maki_psg wrote:
Hi,

I would like to remove the duplicates without sorting the records as per one of the requirements.

Could you please help me how we can achieve using Sync sort?

Input


How many records in the input file?
Back to top
View user's profile Send private message
Bill Woodger

DFSORT Moderator


Joined: 09 Mar 2011
Posts: 7241

PostPosted: Tue May 13, 2014 11:52 pm    Post subject: Reply to: Remove the duplicates without sorting the records
Reply with quote

By "without sorting" you mean by having your final output in the same order as the input (less duplicates, however that is decided)?

Otherwise, you're just going to have to write a program, unless the duplicates can only ever be really close.

RECFM, LRECL, better sample data which is representative of conditions, expected output for that.

Also, what have you tried yourself, or are you just waiting around for a solution?
Back to top
View user's profile Send private message
maki_psg

New User


Joined: 28 Jan 2010
Posts: 47
Location: India

PostPosted: Wed May 14, 2014 12:15 am    Post subject: Re: Remove the duplicates without sorting the records
Reply with quote

Craq Giegerich wrote:
maki_psg wrote:
Hi,

I would like to remove the duplicates without sorting the records as per one of the requirements.

Could you please help me how we can achieve using Sync sort?

Input


How many records in the input file?


Hi Craq, we are no sure about the exact number of records. However, we are expecting to get an input file more than 1 million records in Prod.
Back to top
View user's profile Send private message
maki_psg

New User


Joined: 28 Jan 2010
Posts: 47
Location: India

PostPosted: Wed May 14, 2014 12:17 am    Post subject: Re: Reply to: Remove the duplicates without sorting the reco
Reply with quote

Bill Woodger wrote:
By "without sorting" you mean by having your final output in the same order as the input (less duplicates, however that is decided)?

Otherwise, you're just going to have to write a program, unless the duplicates can only ever be really close.

RECFM, LRECL, better sample data which is representative of conditions, expected output for that.

Also, what have you tried yourself, or are you just waiting around for a solution?


Please see below the sample code.

Code:

//**********************************************************************
//SRT01    EXEC PGM=SORT                                               
//**********************************************************************
//SYSOUT   DD SYSOUT=*                                                 
//SORTIN   DD *                                                         
ROBERT   101                                                           
ANDERSON 102                                                           
ROBERT   101                                                           
//SORTOUT  DD SYSOUT=*                                                 
//SYSIN    DD *                                                         
 SORT FIELDS=(1,08,CH,A)                                         
 SUM FIELDS=NONE                                                       
/*                                                                     


I got the below output
Code:

ANDERSON 102
ROBERT   101

However, I would like to get the output as
Code:

ROBERT   101
ANDERSON 102
Back to top
View user's profile Send private message
maki_psg

New User


Joined: 28 Jan 2010
Posts: 47
Location: India

PostPosted: Wed May 14, 2014 12:25 am    Post subject:
Reply with quote

Akatsukami wrote:
To pick only the first question that comes to mind, define "duplicate".


Hi Akatsukami, can you please share me some examples for this? Thanks.
Back to top
View user's profile Send private message
enrico-sorichetti

Global Moderator


Joined: 14 Mar 2007
Posts: 10202
Location: italy

PostPosted: Wed May 14, 2014 12:33 am    Post subject: Reply to: Remove the duplicates without sorting the records
Reply with quote

the question would have been asked in a better way ...

"remove the duplicates maintaining the original sequence"
to remobve duplicates the process MUST sort on the candidate key

the issue has been discussed quite a few times

- add a sequence number to preserve the original order
- eliminate the duplicates ( any process will do )
- sort on the sequence number to restore the original order

two passes over the data will be needed
Back to top
View user's profile Send private message
enrico-sorichetti

Global Moderator


Joined: 14 Mar 2007
Posts: 10202
Location: italy

PostPosted: Wed May 14, 2014 12:48 am    Post subject: Reply to: Remove the duplicates without sorting the records
Reply with quote

since I am in a very good mood
here is a way of doing it
( a sort expert might find a better way )

Code:
 ****** ***************************** Top of Data ******************************
 - - -  -  -  -  -  -  -  -  -  -  -  -  -  -  -  -  -  2 Line(s) not Displayed
 000003 //*
 000004 //ICE1    EXEC PGM=SORT
 000005 //SYSPRINT  DD SYSOUT=*
 000006 //SYSOUT    DD SYSOUT=*
 000007 //SORTIN    DD *
 000008 XXXXXXXX 0001
 000009 XXXXXXXX 0002
 000010 BBBBBBBB 0002
 000011 XXXXXXXX 0003
 000012 AAAAAAAA 0002
 000013 BBBBBBBB 0001
 000014 AAAAAAAA 0001
 000015 //SORTOUT   DD DISP=(,PASS),DSN=&&TMP,
 000016 //             UNIT=VIO,SPACE=(CYL,(1,1))
 000017 //SYSIN     DD *
 000018   INREC OVERLAY=(21:SEQNUM,4,ZD)
 000019   SORT FIELDS=(1,08,CH,A)
 000020   SUM FIELDS=NONE
 000021 //ICE2    EXEC PGM=SORT
 000022 //SYSPRINT  DD SYSOUT=*
 000023 //SYSOUT    DD SYSOUT=*
 000024 //SORTIN    DD DISP=(OLD,PASS),DSN=&&TMP
 000025 //SORTOUT   DD SYSOUT=*
 000026 //SYSIN     DD *
 000027   SORT FIELDS=(21,4,CH,A)
 ****** **************************** Bottom of Data ****************************


Code:
********************************* TOP OF DATA **********************************
XXXXXXXX 0001       0001
BBBBBBBB 0002       0003
AAAAAAAA 0002       0005
******************************** BOTTOM OF DATA ********************************


the two sort steps could be consolidated in a single ICETOOL step with two stages
Back to top
View user's profile Send private message
enrico-sorichetti

Global Moderator


Joined: 14 Mar 2007
Posts: 10202
Location: italy

PostPosted: Wed May 14, 2014 1:09 am    Post subject: Reply to: Remove the duplicates without sorting the records
Reply with quote

really terrific good mood

here is the ICETOOL version

Code:

 ****** ***************************** Top of Data ******************************
 - - -  -  -  -  -  -  -  -  -  -  -  -  -  -  -  -  -  3 Line(s) not Displayed
 000004 //ICE     EXEC PGM=ICETOOL
 000005 //TOOLMSG   DD SYSOUT=*
 000006 //DFSMSG    DD SYSOUT=*
 000007 //WK        DD DISP=(MOD,PASS),
 000008 //             DSN=&&WK,
 000009 //             UNIT=SYSDA,SPACE=(CYL,(1,1))
 000010 //IN        DD *
 000011 XXXXXXXX 0001
 000012 XXXXXXXX 0002
 000013 BBBBBBBB 0002
 000014 XXXXXXXX 0003
 000015 AAAAAAAA 0002
 000016 BBBBBBBB 0001
 000017 AAAAAAAA 0001
 000018 //OU        DD SYSOUT=*
 000019 //TOOLIN    DD *
 000020   SORT   FROM(IN) TO(WK) USING(CTL1)
 000021   SORT   FROM(WK) TO(OU) USING(CTL2)
 000022 //CTL1CNTL DD *
 000023   INREC OVERLAY=(21:SEQNUM,4,ZD)
 000024   SORT FIELDS=(1,08,CH,A)
 000025   SUM FIELDS=NONE
 000026 //CTL2CNTL DD *
 000027   SORT FIELDS=(21,4,CH,A)
 000028 //*
 ****** **************************** Bottom of Data ****************************


the output is the same

up to You to fix the OVERLAY location offsets
and add a BUILD statement to retain only the original record
( gazillion of examples around )
Back to top
View user's profile Send private message
maki_psg

New User


Joined: 28 Jan 2010
Posts: 47
Location: India

PostPosted: Mon May 19, 2014 2:46 pm    Post subject: Re: Reply to: Remove the duplicates without sorting the reco
Reply with quote

enrico-sorichetti wrote:
since I am in a very good mood
here is a way of doing it
( a sort expert might find a better way )

Code:
 ****** ***************************** Top of Data ******************************
 - - -  -  -  -  -  -  -  -  -  -  -  -  -  -  -  -  -  2 Line(s) not Displayed
 000003 //*
 000004 //ICE1    EXEC PGM=SORT
 000005 //SYSPRINT  DD SYSOUT=*
 000006 //SYSOUT    DD SYSOUT=*
 000007 //SORTIN    DD *
 000008 XXXXXXXX 0001
 000009 XXXXXXXX 0002
 000010 BBBBBBBB 0002
 000011 XXXXXXXX 0003
 000012 AAAAAAAA 0002
 000013 BBBBBBBB 0001
 000014 AAAAAAAA 0001
 000015 //SORTOUT   DD DISP=(,PASS),DSN=&&TMP,
 000016 //             UNIT=VIO,SPACE=(CYL,(1,1))
 000017 //SYSIN     DD *
 000018   INREC OVERLAY=(21:SEQNUM,4,ZD)
 000019   SORT FIELDS=(1,08,CH,A)
 000020   SUM FIELDS=NONE
 000021 //ICE2    EXEC PGM=SORT
 000022 //SYSPRINT  DD SYSOUT=*
 000023 //SYSOUT    DD SYSOUT=*
 000024 //SORTIN    DD DISP=(OLD,PASS),DSN=&&TMP
 000025 //SORTOUT   DD SYSOUT=*
 000026 //SYSIN     DD *
 000027   SORT FIELDS=(21,4,CH,A)
 ****** **************************** Bottom of Data ****************************


Code:
********************************* TOP OF DATA **********************************
XXXXXXXX 0001       0001
BBBBBBBB 0002       0003
AAAAAAAA 0002       0005
******************************** BOTTOM OF DATA ********************************


the two sort steps could be consolidated in a single ICETOOL step with two stages


Hi Enrico, Thank you so much for your help. I have applied the logic mentioned above and got the expected result.
Back to top
View user's profile Send private message
ramsri

Active User


Joined: 18 Oct 2008
Posts: 380
Location: India

PostPosted: Mon May 19, 2014 5:24 pm    Post subject: Reply to: Remove the duplicates without sorting the records
Reply with quote

Enrico, but why I get different results with your code instead of what you shown?

Code:

//ICE3STEP EXEC PGM=ICETOOL                 
//TOOLMSG  DD SYSOUT=*                       
//SSMSG    DD SYSOUT=*                       
//WK       DD DISP=(MOD,PASS),               
//            DSN=&&WK,                     
//            UNIT=SYSDA,SPACE=(CYL,(1,1))   
//IN       DD *                             
XXXXXXXX 0001                               
XXXXXXXX 0002                               
BBBBBBBB 0002                               
XXXXXXXX 0003                               
AAAAAAAA 0002                               
BBBBBBBB 0001                               
AAAAAAAA 0001                               
//OU       DD SYSOUT=*                       
//TOOLIN   DD *                             
  SORT   FROM(IN) TO(WK) USING(CTL1)         
  SORT   FROM(WK) TO(OU) USING(CTL2)         
//CTL1CNTL DD *                             
  INREC OVERLAY=(21:SEQNUM,4,ZD)   
  SORT FIELDS=(1,08,CH,A)         
  SUM FIELDS=NONE                 
//CTL2CNTL DD *                   
  SORT FIELDS=(21,4,CH,A)         


Output:
Code:

XXXXXXXX 0003       0004
BBBBBBBB 0001       0006
AAAAAAAA 0001       0007


I just copy pasted your code and ran it as it is !

Thanks.
Back to top
View user's profile Send private message
enrico-sorichetti

Global Moderator


Joined: 14 Mar 2007
Posts: 10202
Location: italy

PostPosted: Mon May 19, 2014 5:36 pm    Post subject: Reply to: Remove the duplicates without sorting the records
Reply with quote

try adding
Code:
  OPTION EQUALS


to the first ICETOOL stage (CTL1)
Back to top
View user's profile Send private message
ramsri

Active User


Joined: 18 Oct 2008
Posts: 380
Location: India

PostPosted: Mon May 19, 2014 6:55 pm    Post subject: Reply to: Remove the duplicates without sorting the records
Reply with quote

ok......added it and got results.....thanks.
Back to top
View user's profile Send private message
View previous topic :: :: View next topic  
Post new topic   Reply to topic    IBMMAINFRAMES.com Support Forums -> JCL & VSAM All times are GMT + 6 Hours
Page 1 of 1

 

Search our Forum:

Similar Topics
Topic Author Forum Replies Posted
No new posts Removing Duplicates based on certain ... chandracdac DFSORT/ICETOOL 8 Fri Dec 09, 2016 4:40 am
No new posts Limit duplicate records in the SORT pshongal SYNCSORT 6 Mon Nov 21, 2016 12:54 pm
No new posts Updating the counters after eliminati... PANDU1 DFSORT/ICETOOL 12 Mon Nov 21, 2016 9:47 am
No new posts How to split the records using the am... vnktrrd DFSORT/ICETOOL 24 Fri Oct 28, 2016 7:33 pm
No new posts Efficient sorting chandracdac DFSORT/ICETOOL 5 Sat Oct 22, 2016 3:23 am


Facebook
Back to Top
 
Mainframe Wiki | Forum Rules | Bookmarks | Subscriptions | FAQ | Tutorials | Contact Us