IBM Mainframe Forum Index
 
Log In
 
IBM Mainframe Forum Index Mainframe: Search IBM Mainframe Forum: FAQ Register
 

Remove the duplicates without sorting the records


IBM Mainframe Forums -> JCL & VSAM
Post new topic   Reply to topic
View previous topic :: View next topic  
Author Message
maki_psg

New User


Joined: 28 Jan 2010
Posts: 47
Location: India

PostPosted: Tue May 13, 2014 10:32 pm
Reply with quote

Hi,

I would like to remove the duplicates without sorting the records as per one of the requirements.

Could you please help me how we can achieve using Sync sort?

Input

Code:

ROBERT   101
ANDERSON 102
ROBERT   101


Output should be

Code:

ROBERT   101
ANDERSON 102
Back to top
View user's profile Send private message
Akatsukami

Global Moderator


Joined: 03 Oct 2009
Posts: 1788
Location: Bloomington, IL

PostPosted: Tue May 13, 2014 11:34 pm
Reply with quote

To pick only the first question that comes to mind, define "duplicate".
Back to top
View user's profile Send private message
Craq Giegerich

Senior Member


Joined: 19 May 2007
Posts: 1512
Location: Virginia, USA

PostPosted: Tue May 13, 2014 11:45 pm
Reply with quote

maki_psg wrote:
Hi,

I would like to remove the duplicates without sorting the records as per one of the requirements.

Could you please help me how we can achieve using Sync sort?

Input


How many records in the input file?
Back to top
View user's profile Send private message
Bill Woodger

Moderator Emeritus


Joined: 09 Mar 2011
Posts: 7309
Location: Inside the Matrix

PostPosted: Tue May 13, 2014 11:52 pm
Reply with quote

By "without sorting" you mean by having your final output in the same order as the input (less duplicates, however that is decided)?

Otherwise, you're just going to have to write a program, unless the duplicates can only ever be really close.

RECFM, LRECL, better sample data which is representative of conditions, expected output for that.

Also, what have you tried yourself, or are you just waiting around for a solution?
Back to top
View user's profile Send private message
maki_psg

New User


Joined: 28 Jan 2010
Posts: 47
Location: India

PostPosted: Wed May 14, 2014 12:15 am
Reply with quote

Craq Giegerich wrote:
maki_psg wrote:
Hi,

I would like to remove the duplicates without sorting the records as per one of the requirements.

Could you please help me how we can achieve using Sync sort?

Input


How many records in the input file?


Hi Craq, we are no sure about the exact number of records. However, we are expecting to get an input file more than 1 million records in Prod.
Back to top
View user's profile Send private message
maki_psg

New User


Joined: 28 Jan 2010
Posts: 47
Location: India

PostPosted: Wed May 14, 2014 12:17 am
Reply with quote

Bill Woodger wrote:
By "without sorting" you mean by having your final output in the same order as the input (less duplicates, however that is decided)?

Otherwise, you're just going to have to write a program, unless the duplicates can only ever be really close.

RECFM, LRECL, better sample data which is representative of conditions, expected output for that.

Also, what have you tried yourself, or are you just waiting around for a solution?


Please see below the sample code.

Code:

//**********************************************************************
//SRT01    EXEC PGM=SORT                                               
//**********************************************************************
//SYSOUT   DD SYSOUT=*                                                 
//SORTIN   DD *                                                         
ROBERT   101                                                           
ANDERSON 102                                                           
ROBERT   101                                                           
//SORTOUT  DD SYSOUT=*                                                 
//SYSIN    DD *                                                         
 SORT FIELDS=(1,08,CH,A)                                         
 SUM FIELDS=NONE                                                       
/*                                                                     


I got the below output
Code:

ANDERSON 102
ROBERT   101

However, I would like to get the output as
Code:

ROBERT   101
ANDERSON 102
Back to top
View user's profile Send private message
maki_psg

New User


Joined: 28 Jan 2010
Posts: 47
Location: India

PostPosted: Wed May 14, 2014 12:25 am
Reply with quote

Akatsukami wrote:
To pick only the first question that comes to mind, define "duplicate".


Hi Akatsukami, can you please share me some examples for this? Thanks.
Back to top
View user's profile Send private message
enrico-sorichetti

Superior Member


Joined: 14 Mar 2007
Posts: 10872
Location: italy

PostPosted: Wed May 14, 2014 12:33 am
Reply with quote

the question would have been asked in a better way ...

"remove the duplicates maintaining the original sequence"
to remobve duplicates the process MUST sort on the candidate key

the issue has been discussed quite a few times

- add a sequence number to preserve the original order
- eliminate the duplicates ( any process will do )
- sort on the sequence number to restore the original order

two passes over the data will be needed
Back to top
View user's profile Send private message
enrico-sorichetti

Superior Member


Joined: 14 Mar 2007
Posts: 10872
Location: italy

PostPosted: Wed May 14, 2014 12:48 am
Reply with quote

since I am in a very good mood
here is a way of doing it
( a sort expert might find a better way )

Code:
 ****** ***************************** Top of Data ******************************
 - - -  -  -  -  -  -  -  -  -  -  -  -  -  -  -  -  -  2 Line(s) not Displayed
 000003 //*
 000004 //ICE1    EXEC PGM=SORT
 000005 //SYSPRINT  DD SYSOUT=*
 000006 //SYSOUT    DD SYSOUT=*
 000007 //SORTIN    DD *
 000008 XXXXXXXX 0001
 000009 XXXXXXXX 0002
 000010 BBBBBBBB 0002
 000011 XXXXXXXX 0003
 000012 AAAAAAAA 0002
 000013 BBBBBBBB 0001
 000014 AAAAAAAA 0001
 000015 //SORTOUT   DD DISP=(,PASS),DSN=&&TMP,
 000016 //             UNIT=VIO,SPACE=(CYL,(1,1))
 000017 //SYSIN     DD *
 000018   INREC OVERLAY=(21:SEQNUM,4,ZD)
 000019   SORT FIELDS=(1,08,CH,A)
 000020   SUM FIELDS=NONE
 000021 //ICE2    EXEC PGM=SORT
 000022 //SYSPRINT  DD SYSOUT=*
 000023 //SYSOUT    DD SYSOUT=*
 000024 //SORTIN    DD DISP=(OLD,PASS),DSN=&&TMP
 000025 //SORTOUT   DD SYSOUT=*
 000026 //SYSIN     DD *
 000027   SORT FIELDS=(21,4,CH,A)
 ****** **************************** Bottom of Data ****************************


Code:
********************************* TOP OF DATA **********************************
XXXXXXXX 0001       0001
BBBBBBBB 0002       0003
AAAAAAAA 0002       0005
******************************** BOTTOM OF DATA ********************************


the two sort steps could be consolidated in a single ICETOOL step with two stages
Back to top
View user's profile Send private message
enrico-sorichetti

Superior Member


Joined: 14 Mar 2007
Posts: 10872
Location: italy

PostPosted: Wed May 14, 2014 1:09 am
Reply with quote

really terrific good mood

here is the ICETOOL version

Code:

 ****** ***************************** Top of Data ******************************
 - - -  -  -  -  -  -  -  -  -  -  -  -  -  -  -  -  -  3 Line(s) not Displayed
 000004 //ICE     EXEC PGM=ICETOOL
 000005 //TOOLMSG   DD SYSOUT=*
 000006 //DFSMSG    DD SYSOUT=*
 000007 //WK        DD DISP=(MOD,PASS),
 000008 //             DSN=&&WK,
 000009 //             UNIT=SYSDA,SPACE=(CYL,(1,1))
 000010 //IN        DD *
 000011 XXXXXXXX 0001
 000012 XXXXXXXX 0002
 000013 BBBBBBBB 0002
 000014 XXXXXXXX 0003
 000015 AAAAAAAA 0002
 000016 BBBBBBBB 0001
 000017 AAAAAAAA 0001
 000018 //OU        DD SYSOUT=*
 000019 //TOOLIN    DD *
 000020   SORT   FROM(IN) TO(WK) USING(CTL1)
 000021   SORT   FROM(WK) TO(OU) USING(CTL2)
 000022 //CTL1CNTL DD *
 000023   INREC OVERLAY=(21:SEQNUM,4,ZD)
 000024   SORT FIELDS=(1,08,CH,A)
 000025   SUM FIELDS=NONE
 000026 //CTL2CNTL DD *
 000027   SORT FIELDS=(21,4,CH,A)
 000028 //*
 ****** **************************** Bottom of Data ****************************


the output is the same

up to You to fix the OVERLAY location offsets
and add a BUILD statement to retain only the original record
( gazillion of examples around )
Back to top
View user's profile Send private message
maki_psg

New User


Joined: 28 Jan 2010
Posts: 47
Location: India

PostPosted: Mon May 19, 2014 2:46 pm
Reply with quote

enrico-sorichetti wrote:
since I am in a very good mood
here is a way of doing it
( a sort expert might find a better way )

Code:
 ****** ***************************** Top of Data ******************************
 - - -  -  -  -  -  -  -  -  -  -  -  -  -  -  -  -  -  2 Line(s) not Displayed
 000003 //*
 000004 //ICE1    EXEC PGM=SORT
 000005 //SYSPRINT  DD SYSOUT=*
 000006 //SYSOUT    DD SYSOUT=*
 000007 //SORTIN    DD *
 000008 XXXXXXXX 0001
 000009 XXXXXXXX 0002
 000010 BBBBBBBB 0002
 000011 XXXXXXXX 0003
 000012 AAAAAAAA 0002
 000013 BBBBBBBB 0001
 000014 AAAAAAAA 0001
 000015 //SORTOUT   DD DISP=(,PASS),DSN=&&TMP,
 000016 //             UNIT=VIO,SPACE=(CYL,(1,1))
 000017 //SYSIN     DD *
 000018   INREC OVERLAY=(21:SEQNUM,4,ZD)
 000019   SORT FIELDS=(1,08,CH,A)
 000020   SUM FIELDS=NONE
 000021 //ICE2    EXEC PGM=SORT
 000022 //SYSPRINT  DD SYSOUT=*
 000023 //SYSOUT    DD SYSOUT=*
 000024 //SORTIN    DD DISP=(OLD,PASS),DSN=&&TMP
 000025 //SORTOUT   DD SYSOUT=*
 000026 //SYSIN     DD *
 000027   SORT FIELDS=(21,4,CH,A)
 ****** **************************** Bottom of Data ****************************


Code:
********************************* TOP OF DATA **********************************
XXXXXXXX 0001       0001
BBBBBBBB 0002       0003
AAAAAAAA 0002       0005
******************************** BOTTOM OF DATA ********************************


the two sort steps could be consolidated in a single ICETOOL step with two stages


Hi Enrico, Thank you so much for your help. I have applied the logic mentioned above and got the expected result.
Back to top
View user's profile Send private message
ramsri

Active User


Joined: 18 Oct 2008
Posts: 380
Location: India

PostPosted: Mon May 19, 2014 5:24 pm
Reply with quote

Enrico, but why I get different results with your code instead of what you shown?

Code:

//ICE3STEP EXEC PGM=ICETOOL                 
//TOOLMSG  DD SYSOUT=*                       
//SSMSG    DD SYSOUT=*                       
//WK       DD DISP=(MOD,PASS),               
//            DSN=&&WK,                     
//            UNIT=SYSDA,SPACE=(CYL,(1,1))   
//IN       DD *                             
XXXXXXXX 0001                               
XXXXXXXX 0002                               
BBBBBBBB 0002                               
XXXXXXXX 0003                               
AAAAAAAA 0002                               
BBBBBBBB 0001                               
AAAAAAAA 0001                               
//OU       DD SYSOUT=*                       
//TOOLIN   DD *                             
  SORT   FROM(IN) TO(WK) USING(CTL1)         
  SORT   FROM(WK) TO(OU) USING(CTL2)         
//CTL1CNTL DD *                             
  INREC OVERLAY=(21:SEQNUM,4,ZD)   
  SORT FIELDS=(1,08,CH,A)         
  SUM FIELDS=NONE                 
//CTL2CNTL DD *                   
  SORT FIELDS=(21,4,CH,A)         


Output:
Code:

XXXXXXXX 0003       0004
BBBBBBBB 0001       0006
AAAAAAAA 0001       0007


I just copy pasted your code and ran it as it is !

Thanks.
Back to top
View user's profile Send private message
enrico-sorichetti

Superior Member


Joined: 14 Mar 2007
Posts: 10872
Location: italy

PostPosted: Mon May 19, 2014 5:36 pm
Reply with quote

try adding
Code:
  OPTION EQUALS


to the first ICETOOL stage (CTL1)
Back to top
View user's profile Send private message
ramsri

Active User


Joined: 18 Oct 2008
Posts: 380
Location: India

PostPosted: Mon May 19, 2014 6:55 pm
Reply with quote

ok......added it and got results.....thanks.
Back to top
View user's profile Send private message
View previous topic :: :: View next topic  
Post new topic   Reply to topic View Bookmarks
All times are GMT + 6 Hours
Forum Index -> JCL & VSAM

 


Similar Topics
Topic Forum Replies
No new posts Sortjoin and Search for a String and ... DFSORT/ICETOOL 1
No new posts Compare only first records of the fil... SYNCSORT 7
No new posts Pulling a fixed number of records fro... DB2 2
This topic is locked: you cannot edit posts or make replies. Automation need help in sorting the data DFSORT/ICETOOL 38
No new posts Remove leading zeroes SYNCSORT 4
Search our Forums:

Back to Top