View previous topic :: View next topic
Author
Message
maki_psg New User Joined: 28 Jan 2010Posts: 47 Location: India
Hi,
I would like to remove the duplicates without sorting the records as per one of the requirements.
Could you please help me how we can achieve using Sync sort?
Input
Code:
ROBERT 101
ANDERSON 102
ROBERT 101
Output should be
Code:
ROBERT 101
ANDERSON 102
Back to top
Akatsukami Global Moderator Joined: 03 Oct 2009Posts: 1788 Location: Bloomington, IL
To pick only the first question that comes to mind, define "duplicate".
Back to top
Craq Giegerich Senior Member Joined: 19 May 2007Posts: 1512 Location: Virginia, USA
maki_psg wrote:
Hi,
I would like to remove the duplicates without sorting the records as per one of the requirements.
Could you please help me how we can achieve using Sync sort?
Input
How many records in the input file?
Back to top
Bill Woodger Moderator Emeritus Joined: 09 Mar 2011Posts: 7309 Location: Inside the Matrix
By "without sorting" you mean by having your final output in the same order as the input (less duplicates, however that is decided)?
Otherwise, you're just going to have to write a program, unless the duplicates can only ever be really close.
RECFM, LRECL, better sample data which is representative of conditions, expected output for that.
Also, what have you tried yourself, or are you just waiting around for a solution?
Back to top
maki_psg New User Joined: 28 Jan 2010Posts: 47 Location: India
Craq Giegerich wrote:
maki_psg wrote:
Hi,
I would like to remove the duplicates without sorting the records as per one of the requirements.
Could you please help me how we can achieve using Sync sort?
Input
How many records in the input file?
Hi Craq, we are no sure about the exact number of records. However, we are expecting to get an input file more than 1 million records in Prod.
Back to top
maki_psg New User Joined: 28 Jan 2010Posts: 47 Location: India
Bill Woodger wrote:
By "without sorting" you mean by having your final output in the same order as the input (less duplicates, however that is decided)?
Otherwise, you're just going to have to write a program, unless the duplicates can only ever be really close.
RECFM, LRECL, better sample data which is representative of conditions, expected output for that.
Also, what have you tried yourself, or are you just waiting around for a solution?
Please see below the sample code.
Code:
//**********************************************************************
//SRT01 EXEC PGM=SORT
//**********************************************************************
//SYSOUT DD SYSOUT=*
//SORTIN DD *
ROBERT 101
ANDERSON 102
ROBERT 101
//SORTOUT DD SYSOUT=*
//SYSIN DD *
SORT FIELDS=(1,08,CH,A)
SUM FIELDS=NONE
/*
I got the below output
Code:
ANDERSON 102
ROBERT 101
However, I would like to get the output as
Code:
ROBERT 101
ANDERSON 102
Back to top
maki_psg New User Joined: 28 Jan 2010Posts: 47 Location: India
Akatsukami wrote:
To pick only the first question that comes to mind, define "duplicate".
Hi Akatsukami, can you please share me some examples for this? Thanks.
Back to top
enrico-sorichetti Superior Member Joined: 14 Mar 2007Posts: 10872 Location: italy
the question would have been asked in a better way ...
"remove the duplicates maintaining the original sequence"
to remobve duplicates the process MUST sort on the candidate key
the issue has been discussed quite a few times
- add a sequence number to preserve the original order
- eliminate the duplicates ( any process will do )
- sort on the sequence number to restore the original order
two passes over the data will be needed
Back to top
enrico-sorichetti Superior Member Joined: 14 Mar 2007Posts: 10872 Location: italy
since I am in a very good mood
here is a way of doing it
( a sort expert might find a better way )
Code:
****** ***************************** Top of Data ******************************
- - - - - - - - - - - - - - - - - - - 2 Line(s) not Displayed
000003 //*
000004 //ICE1 EXEC PGM=SORT
000005 //SYSPRINT DD SYSOUT=*
000006 //SYSOUT DD SYSOUT=*
000007 //SORTIN DD *
000008 XXXXXXXX 0001
000009 XXXXXXXX 0002
000010 BBBBBBBB 0002
000011 XXXXXXXX 0003
000012 AAAAAAAA 0002
000013 BBBBBBBB 0001
000014 AAAAAAAA 0001
000015 //SORTOUT DD DISP=(,PASS),DSN=&&TMP,
000016 // UNIT=VIO,SPACE=(CYL,(1,1))
000017 //SYSIN DD *
000018 INREC OVERLAY=(21:SEQNUM,4,ZD)
000019 SORT FIELDS=(1,08,CH,A)
000020 SUM FIELDS=NONE
000021 //ICE2 EXEC PGM=SORT
000022 //SYSPRINT DD SYSOUT=*
000023 //SYSOUT DD SYSOUT=*
000024 //SORTIN DD DISP=(OLD,PASS),DSN=&&TMP
000025 //SORTOUT DD SYSOUT=*
000026 //SYSIN DD *
000027 SORT FIELDS=(21,4,CH,A)
****** **************************** Bottom of Data ****************************
Code:
********************************* TOP OF DATA **********************************
XXXXXXXX 0001 0001
BBBBBBBB 0002 0003
AAAAAAAA 0002 0005
******************************** BOTTOM OF DATA ********************************
the two sort steps could be consolidated in a single ICETOOL step with two stages
Back to top
enrico-sorichetti Superior Member Joined: 14 Mar 2007Posts: 10872 Location: italy
really terrific good mood
here is the ICETOOL version
Code:
****** ***************************** Top of Data ******************************
- - - - - - - - - - - - - - - - - - - 3 Line(s) not Displayed
000004 //ICE EXEC PGM=ICETOOL
000005 //TOOLMSG DD SYSOUT=*
000006 //DFSMSG DD SYSOUT=*
000007 //WK DD DISP=(MOD,PASS),
000008 // DSN=&&WK,
000009 // UNIT=SYSDA,SPACE=(CYL,(1,1))
000010 //IN DD *
000011 XXXXXXXX 0001
000012 XXXXXXXX 0002
000013 BBBBBBBB 0002
000014 XXXXXXXX 0003
000015 AAAAAAAA 0002
000016 BBBBBBBB 0001
000017 AAAAAAAA 0001
000018 //OU DD SYSOUT=*
000019 //TOOLIN DD *
000020 SORT FROM(IN) TO(WK) USING(CTL1)
000021 SORT FROM(WK) TO(OU) USING(CTL2)
000022 //CTL1CNTL DD *
000023 INREC OVERLAY=(21:SEQNUM,4,ZD)
000024 SORT FIELDS=(1,08,CH,A)
000025 SUM FIELDS=NONE
000026 //CTL2CNTL DD *
000027 SORT FIELDS=(21,4,CH,A)
000028 //*
****** **************************** Bottom of Data ****************************
the output is the same
up to You to fix the OVERLAY location offsets
and add a BUILD statement to retain only the original record
( gazillion of examples around )
Back to top
maki_psg New User Joined: 28 Jan 2010Posts: 47 Location: India
enrico-sorichetti wrote:
since I am in a very good mood
here is a way of doing it
( a sort expert might find a better way )
Code:
****** ***************************** Top of Data ******************************
- - - - - - - - - - - - - - - - - - - 2 Line(s) not Displayed
000003 //*
000004 //ICE1 EXEC PGM=SORT
000005 //SYSPRINT DD SYSOUT=*
000006 //SYSOUT DD SYSOUT=*
000007 //SORTIN DD *
000008 XXXXXXXX 0001
000009 XXXXXXXX 0002
000010 BBBBBBBB 0002
000011 XXXXXXXX 0003
000012 AAAAAAAA 0002
000013 BBBBBBBB 0001
000014 AAAAAAAA 0001
000015 //SORTOUT DD DISP=(,PASS),DSN=&&TMP,
000016 // UNIT=VIO,SPACE=(CYL,(1,1))
000017 //SYSIN DD *
000018 INREC OVERLAY=(21:SEQNUM,4,ZD)
000019 SORT FIELDS=(1,08,CH,A)
000020 SUM FIELDS=NONE
000021 //ICE2 EXEC PGM=SORT
000022 //SYSPRINT DD SYSOUT=*
000023 //SYSOUT DD SYSOUT=*
000024 //SORTIN DD DISP=(OLD,PASS),DSN=&&TMP
000025 //SORTOUT DD SYSOUT=*
000026 //SYSIN DD *
000027 SORT FIELDS=(21,4,CH,A)
****** **************************** Bottom of Data ****************************
Code:
********************************* TOP OF DATA **********************************
XXXXXXXX 0001 0001
BBBBBBBB 0002 0003
AAAAAAAA 0002 0005
******************************** BOTTOM OF DATA ********************************
the two sort steps could be consolidated in a single ICETOOL step with two stages
Hi Enrico, Thank you so much for your help. I have applied the logic mentioned above and got the expected result.
Back to top
ramsri Active User Joined: 18 Oct 2008Posts: 380 Location: India
Enrico, but why I get different results with your code instead of what you shown?
Code:
//ICE3STEP EXEC PGM=ICETOOL
//TOOLMSG DD SYSOUT=*
//SSMSG DD SYSOUT=*
//WK DD DISP=(MOD,PASS),
// DSN=&&WK,
// UNIT=SYSDA,SPACE=(CYL,(1,1))
//IN DD *
XXXXXXXX 0001
XXXXXXXX 0002
BBBBBBBB 0002
XXXXXXXX 0003
AAAAAAAA 0002
BBBBBBBB 0001
AAAAAAAA 0001
//OU DD SYSOUT=*
//TOOLIN DD *
SORT FROM(IN) TO(WK) USING(CTL1)
SORT FROM(WK) TO(OU) USING(CTL2)
//CTL1CNTL DD *
INREC OVERLAY=(21:SEQNUM,4,ZD)
SORT FIELDS=(1,08,CH,A)
SUM FIELDS=NONE
//CTL2CNTL DD *
SORT FIELDS=(21,4,CH,A)
Output:
Code:
XXXXXXXX 0003 0004
BBBBBBBB 0001 0006
AAAAAAAA 0001 0007
I just copy pasted your code and ran it as it is !
Thanks.
Back to top
enrico-sorichetti Superior Member Joined: 14 Mar 2007Posts: 10872 Location: italy
try adding
to the first ICETOOL stage (CTL1)
Back to top
ramsri Active User Joined: 18 Oct 2008Posts: 380 Location: India
ok......added it and got results.....thanks.
Back to top
Please enable JavaScript!