View previous topic :: View next topic
Author
Message
mksranjani New User Joined: 03 Feb 2010Posts: 29 Location: Chennai
hi,
I have a comma separated file of lrecl 646. I need to remove the duplicates from the file. I tried with sort function 'SUM FIELDS=NONE' . But it did not remove the duplicates. can someone help?
Back to top
expat Global Moderator Joined: 14 Mar 2007Posts: 8797 Location: Welsh Wales
Show a sample of your input, and the wrong output.
The input output file DSORG, RECFM and LRECL
Back to top
mksranjani New User Joined: 03 Feb 2010Posts: 29 Location: Chennai
hi, please find below the sort i used
Code:
//STEP0001 EXEC PGM=SORT
//SYSOUT DD SYSOUT=*
//SYSPRINT DD SYSOUT=*
//SORTIN DD DSN=INPUT FILE...
//SORTOUT DD DSN=OUTPUT FILE....
// DISP=(NEW,CATLG,DELETE),
// UNIT=TESTDA,SPACE=(CYL,(10,50),RLSE),
// DCB=(LRECL=646,RECFM=FB,BLKSIZE=0)
//SYSIN DD *
SORT FIELDS=(1,646,CH,A)
SUM FIELDS=NONE
/*
Input file has the same dcb parameters as that of output.
In the output file, duplicates were not removed. it was the same as that of input.
My input data looks like below
Code:
1234567891,A,123456789,H64398376,,
5874639368,B,AAA123456,AAAAAAA,BBBBBB,
5874639368,B,AAA123456,AAAAAAA,BBBBBB,
Back to top
enrico-sorichetti Superior Member Joined: 14 Mar 2007Posts: 10872 Location: italy
just checked with
Code:
****** ***************************** Top of Data ******************************
000001 //ENRICO1 JOB NOTIFY=&SYSUID,
000002 // MSGLEVEL=(1,1),CLASS=A,MSGCLASS=H
000003 //*
000004 //S1 EXEC PGM=SORT
000005 //SYSPRINT DD SYSOUT=*
000006 //SYSOUT DD SYSOUT=*
000007 //SORTIN DD *
000008 1234567891,A,123456789,H64398376,,
000009 5874639368,B,AAA123456,AAAAAAA,BBBBBB,
000010 5874639368,B,AAA123456,AAAAAAA,BBBBBB,
000011 //SORTOUT DD SYSOUT=*,
000012 // DCB=(RECFM=FB,LRECL=80)
000013 //SYSIN DD *
000014 SORT FIELDS=(1,80,CH,A)
000015 SUM FIELDS=NONE
000016 //*
****** **************************** Bottom of Data ****************************
and it worked, so something is missing in Your description of the problem
Back to top
expat Global Moderator Joined: 14 Mar 2007Posts: 8797 Location: Welsh Wales
And the output looks exactly as I would expect.
I think that you have omitted some vital information here. I would hazard a guess that this input has come from another operating system and may have a unicode seperator embedded into the data, which means that you have multiple records on each line.
However, if this is not the case you need to describe your requirement in far more detail than you already have.
Back to top
expat Global Moderator Joined: 14 Mar 2007Posts: 8797 Location: Welsh Wales
Oh, I now see that the input data has changed
Back to top
enrico-sorichetti Superior Member Joined: 14 Mar 2007Posts: 10872 Location: italy
I just edited the post adding the code tags
Back to top
mksranjani New User Joined: 03 Feb 2010Posts: 29 Location: Chennai
Hi,
Sorry, please see if the below information helps.
Input file is a comma separated file. Each field has variable lengths.
Each line has only one record.
Total lrecl of the file is 646.
I want to remove duplicate records.
Back to top
enrico-sorichetti Superior Member Joined: 14 Mar 2007Posts: 10872 Location: italy
the additionl information posted is quite useless, it adds nothing to the topic
You want to check the overall byte by byte matching or the real value matching
Code:
xxxx,aaa,bbb, 0,......
xxxx,aaa,bbb,00,.....
would fail the byte by byte matching
but logically the two records are the same, a <numeric> 0 is the same as a numeric 00
and the extra blank might be considered as harmless withespace
what is that You did not understand in the snippet I posted after testing!
Back to top
mksranjani New User Joined: 03 Feb 2010Posts: 29 Location: Chennai
hi Enricho, it does work. but is it possible to remove the duplicates without sorting the records.
Back to top
mksranjani New User Joined: 03 Feb 2010Posts: 29 Location: Chennai
hi, i want to match it byte by byte.
Back to top
enrico-sorichetti Superior Member Joined: 14 Mar 2007Posts: 10872 Location: italy
nothing that You could not have found out Yourself a forum search with DUPLICATE PRESERVE
Code:
****** ***************************** Top of Data ******************************
000001 //ENRICO1 JOB NOTIFY=&SYSUID,
000002 // MSGLEVEL=(1,1),CLASS=A,MSGCLASS=H
000003 //*
000004 //S1 EXEC PGM=SORT
000005 //SYSPRINT DD SYSOUT=*
000006 //SYSOUT DD SYSOUT=*
000007 //SORTIN DD *
000008 5874639369,B,AAA123456,AAAAAAA,BBBBBB,
000009 5874639369,B,AAA123456,AAAAAAA,BBBBBB,
000010 1234567891,A,123456789,H64398376,,
000011 5874639368,B,AAA123456,AAAAAAA,BBBBBB,
000012 5874639368,B,AAA123456,AAAAAAA,BBBBBB,
000013 //SORTOUT DD SYSOUT=*,
000014 // DCB=(RECFM=FB,LRECL=80)
000015 //SYSIN DD *
000016 OPTION COPY
000017 INREC IFTHEN=(WHEN=INIT,OVERLAY=(81:SEQNUM,8,ZD,RESTART=(1,80)))
000018 OUTFIL INCLUDE=(81,8,ZD,EQ,1),BUILD=(1,80)
000019 //*
****** **************************** Bottom of Data ****************************
modify accordingly to Your record length,
the above snippet works if the duplicate record are contiguous,
if they are not You might need two/more passes
if You are using DFSORT better wait for Frank or Kolusu for a better suggestion
they work in a different time zone and right now they might be still in bed
Back to top
mksranjani New User Joined: 03 Feb 2010Posts: 29 Location: Chennai
thank you enricho.
I tried the above code you gave.
OPTION COPY
INREC IFTHEN=(WHEN=INIT,OVERLAY=(647:SEQNUM,8,ZD,RESTART=(1,646)))
OUTFIL INCLUDE=(647,8,ZD,EQ,1),BUILD=(1,646)
But got the below error
WER271A INREC STATEMENT : NUMERIC FIELD ERROR
Back to top
enrico-sorichetti Superior Member Joined: 14 Mar 2007Posts: 10872 Location: italy
Code:
WER271A INREC STATEMENT : NUMERIC FIELD ERROR
as said gazillions of times WER messages indicate that You are using SYNCSORT not DFSORT
topic moved where it belongs
Frank as Kolusu are DFSORT developers, You cannot expect Them to provide advice on a competitor' s product
and... by the way my name is Enrico, not Enricho
Back to top
mksranjani New User Joined: 03 Feb 2010Posts: 29 Location: Chennai
Thank you Enrico..
Back to top
enrico-sorichetti Superior Member Joined: 14 Mar 2007Posts: 10872 Location: italy
tested with DFSORT, searched the manuals and here is explanation of the
Quote:
ICE111A E REFORMATTING FIELD ERROR
Quote:
The length (m for p,m or FIXLEN=m for %nn) for a RESTART field was greater than 256 bytes.
there should be something similar in Your sysout and manuals
Back to top
sqlcode1 Active Member Joined: 08 Apr 2010Posts: 577 Location: USA
Untest on syncsort but should work...(Tested on DFSort...)
Code:
//SYSIN DD *
INREC IFTHEN=(WHEN=INIT,OVERLAY=(647:SEQNUM,8,ZD,RESTART=(001,256),
655:SEQNUM,8,ZD,RESTART=(257,256),
663:SEQNUM,8,ZD,RESTART=(513,134))),
IFTHEN=(WHEN=(647,8,ZD,EQ,1,OR,655,8,ZD,EQ,1,OR,663,8,ZD,EQ,1),
OVERLAY=(671:C'Y'))
SORT FIELDS=COPY
OUTFIL INCLUDE=(671,1,CH,EQ,C'Y'),BUILD=(1,646)
/*
Thanks,
Back to top
enrico-sorichetti Superior Member Joined: 14 Mar 2007Posts: 10872 Location: italy
still there is the question pending...
what if the duplicate records are not contiguous !
Back to top
sqlcode1 Active Member Joined: 08 Apr 2010Posts: 577 Location: USA
mksranjani wrote:
hi Enricho, it does work. but is it possible to remove the duplicates without sorting the records.
I thought OP didn't wanted to sort records.
Correct me if I am wrong but even if records are not contiguous, I think its possible using outrec. It would take 2 pass only if after removing duplicates OP wants to go back to original order.
Thanks,
Back to top
enrico-sorichetti Superior Member Joined: 14 Mar 2007Posts: 10872 Location: italy
the solutions we posted got rid of the duplicates without altering the original ordering,
but will work only for contiguous duplicates
if the duplicates are not contiguous as I said before I feel that a multi pass approach is needed!
too bad that the TS is using Syncsort,
I guess that we are bound to miss Frank and Kolusu smart solutions!
I searched the forums for non contiguous duplicates keeping the original order and the first occurrence, but I was not able to find anything
Back to top
Phrzby Phil Senior Member Joined: 31 Oct 2006Posts: 1042 Location: Richmond, Virginia
1. add a seq# column
2. sort by other fields
3. dedupe
4. re-order by seq#
Back to top
sqlcode1 Active Member Joined: 08 Apr 2010Posts: 577 Location: USA
Phrzby Phil,
Point #2 and 4 would require separate pass. We were thinking of single pass solution.
Thanks,
Back to top
enrico-sorichetti Superior Member Joined: 14 Mar 2007Posts: 10872 Location: italy
my bad for using the <wrong> terminology
anybody can do it with TWO sort steps
the point here are not the multiple passes of data ( they will be needed anyway )
the puzzle is how to do it with a single SORT/ICETOOL step
Back to top
superk Global Moderator Joined: 26 Apr 2004Posts: 4652 Location: Raleigh, NC, USA
This topic is for the Syncsort product. Could we all please refrain from commenting on it and let Alissa provide the appropriate solution.
Back to top
enrico-sorichetti Superior Member Joined: 14 Mar 2007Posts: 10872 Location: italy
Just exploring
a two step (two passes ) job for non contiguous duplicates ( using SORT)
Code:
****** ***************************** Top of Data ******************************
000001 //ENRICO1 JOB NOTIFY=&SYSUID,
000002 // MSGLEVEL=(1,1),CLASS=A,MSGCLASS=H
000003 //*
000004 //CLEANUP EXEC PGM=CLEANUP
000005 //STEPLIB DD DISP=SHR,DSN=SYS2.LINKLIB
000006 //*
000007 //S1 EXEC PGM=SORT
000008 //SYSPRINT DD SYSOUT=*
000009 //SYSOUT DD SYSOUT=*
000010 //SORTIN DD *
000011 5874639369,B,AAA123456,AAAAAAA,BBBBBB,
000012 5874639369,B,AAA123456,AAAAAAA,BBBBBB,
000013 1234567891,A,123456789,H64398376,,
000014 5874639368,B,AAA123456,AAAAAAA,BBBBBB,
000015 5874639368,B,AAA123456,AAAAAAA,BBBBBB,
000016 5874639369,B,AAA123456,AAAAAAA,BBBBBB,
000017 5874639369,B,AAA123456,AAAAAAA,BBBBBB,
000018 1234567891,A,123456789,H64398376,,
000019 5874639368,B,AAA123456,AAAAAAA,BBBBBB,
000020 5874639368,B,AAA123456,AAAAAAA,BBBBBB,
000021 5874639369,B,AAA123456,AAAAAAA,BBBBBB,
000022 5874639369,B,AAA123456,AAAAAAA,BBBBBB,
000023 1234567891,A,123456789,H64398376,,
000024 5874639368,B,AAA123456,AAAAAAA,BBBBBB,
000025 5874639368,B,AAA123456,AAAAAAA,BBBBBB,
000026 5874639369,B,AAA123456,AAAAAAA,BBBBBB,
000027 5874639369,B,AAA123456,AAAAAAA,BBBBBB,
000028 1234567891,A,123456789,H64398376,,
000029 5874639368,B,AAA123456,AAAAAAA,BBBBBB,
000030 5874639368,B,AAA123456,AAAAAAA,BBBBBB,
000031 5874639369,B,AAA123456,AAAAAAA,BBBBBB,
000032 5874639369,B,AAA123456,AAAAAAA,BBBBBB,
000033 1234567891,A,123456789,H64398376,,
000034 5874639368,B,AAA123456,AAAAAAA,BBBBBB,
000035 5874639368,B,AAA123456,AAAAAAA,BBBBBB,
000036 //SORTOUT DD DISP=(NEW,PASS),DSN=&&TEMP88,
000037 // UNIT=SYSDA,SPACE=(CYL,(1,1)),
000038 // DCB=(RECFM=FB,LRECL=88)
000039 //SYSIN DD *
000040 INREC OVERLAY=(81:SEQNUM,8,ZD)
000041 SORT FIELDS=(1,80,CH,A,81,8,ZD,A)
000042 OUTREC BUILD=(1,88)
000043 //*
000044 //S2 EXEC PGM=SORT
000045 //SYSPRINT DD SYSOUT=*
000046 //SYSOUT DD SYSOUT=*
000047 //SORTIN DD DISP=(OLD,PASS),DSN=&&TEMP88
000048 //SORTOUT DD SYSOUT=*,DCB=(LRECL=80)
000049 //SYSIN DD *
000050 SORT FIELDS=(81,8,ZD,A)
000051 INREC IFTHEN=(WHEN=INIT,OVERLAY=(89:SEQNUM,8,ZD,RESTART=(1,80)))
000052 OUTFIL INCLUDE=(89,8,ZD,EQ,1),BUILD=(1,80)
000053 //*
****** **************************** Bottom of Data ****************************
a one step (two passes anyway ) for the same ( using ICETOOL )
Code:
****** ***************************** Top of Data ******************************
000001 //ENRICO1 JOB NOTIFY=&SYSUID,
000002 // MSGLEVEL=(1,1),CLASS=A,MSGCLASS=H
000003 //*
000004 //CLEANUP EXEC PGM=CLEANUP
000005 //STEPLIB DD DISP=SHR,DSN=SYS2.LINKLIB
000006 //*
000007 //S1 EXEC PGM=ICETOOL
000008 //TOOLMSG DD SYSOUT=*
000009 //SYSPRINT DD SYSOUT=*
000010 //DFSMSG DD SYSOUT=*
000011 //SYSOUT DD SYSOUT=*
000012 //SORTIN DD *
000013 5874639369,B,AAA123456,AAAAAAA,BBBBBB,
000014 5874639369,B,AAA123456,AAAAAAA,BBBBBB,
000015 1234567891,A,123456789,H64398376,,
000016 5874639368,B,AAA123456,AAAAAAA,BBBBBB,
000017 5874639368,B,AAA123456,AAAAAAA,BBBBBB,
000018 5874639369,B,AAA123456,AAAAAAA,BBBBBB,
000019 5874639369,B,AAA123456,AAAAAAA,BBBBBB,
000020 1234567891,A,123456789,H64398376,,
000021 5874639368,B,AAA123456,AAAAAAA,BBBBBB,
000022 5874639368,B,AAA123456,AAAAAAA,BBBBBB,
000023 5874639369,B,AAA123456,AAAAAAA,BBBBBB,
000024 5874639369,B,AAA123456,AAAAAAA,BBBBBB,
000025 1234567891,A,123456789,H64398376,,
000026 5874639368,B,AAA123456,AAAAAAA,BBBBBB,
000027 5874639368,B,AAA123456,AAAAAAA,BBBBBB,
000028 5874639369,B,AAA123456,AAAAAAA,BBBBBB,
000029 5874639369,B,AAA123456,AAAAAAA,BBBBBB,
000030 1234567891,A,123456789,H64398376,,
000031 5874639368,B,AAA123456,AAAAAAA,BBBBBB,
000032 5874639368,B,AAA123456,AAAAAAA,BBBBBB,
000033 5874639369,B,AAA123456,AAAAAAA,BBBBBB,
000034 5874639369,B,AAA123456,AAAAAAA,BBBBBB,
000035 1234567891,A,123456789,H64398376,,
000036 5874639368,B,AAA123456,AAAAAAA,BBBBBB,
000037 5874639368,B,AAA123456,AAAAAAA,BBBBBB,
000038 //TEMP88 DD DISP=(MOD,PASS),DSN=&&TEMP88,
000039 // UNIT=SYSDA,SPACE=(CYL,(1,1)),
000040 // DCB=(RECFM=FB,LRECL=88)
000041 //SORTOUT DD SYSOUT=*,DCB=(LRECL=80)
000042 //TOOLIN DD *
000043 SORT FROM(SORTIN) TO(TEMP88) USING(CTL1)
000044 SORT FROM(TEMP88) TO(SORTOUT) USING(CTL2)
000045 //CTL1CNTL DD *
000046 INREC OVERLAY=(81:SEQNUM,8,ZD)
000047 SORT FIELDS=(1,80,CH,A,81,8,ZD,A)
000048 OUTREC BUILD=(1,88)
000049 //CTL2CNTL DD *
000050 SORT FIELDS=(81,8,ZD,A)
000051 INREC IFTHEN=(WHEN=INIT,OVERLAY=(89:SEQNUM,8,ZD,RESTART=(1,80)))
000052 OUTFIL INCLUDE=(89,8,ZD,EQ,1),BUILD=(1,80)
****** **************************** Bottom of Data ****************************
as usually a SORT expert might come up with a better solution
Back to top
Please enable JavaScript!