|
View previous topic :: View next topic
|
| Author |
Message |
mksranjani
New User
Joined: 03 Feb 2010 Posts: 29 Location: Chennai
|
|
|
|
hi,
I have a comma separated file of lrecl 646. I need to remove the duplicates from the file. I tried with sort function 'SUM FIELDS=NONE' . But it did not remove the duplicates. can someone help? |
|
| Back to top |
|
 |
expat
Global Moderator

Joined: 14 Mar 2007 Posts: 8797 Location: Welsh Wales
|
|
|
|
Show a sample of your input, and the wrong output.
The input output file DSORG, RECFM and LRECL |
|
| Back to top |
|
 |
mksranjani
New User
Joined: 03 Feb 2010 Posts: 29 Location: Chennai
|
|
|
|
hi, please find below the sort i used
| Code: |
//STEP0001 EXEC PGM=SORT
//SYSOUT DD SYSOUT=*
//SYSPRINT DD SYSOUT=*
//SORTIN DD DSN=INPUT FILE...
//SORTOUT DD DSN=OUTPUT FILE....
// DISP=(NEW,CATLG,DELETE),
// UNIT=TESTDA,SPACE=(CYL,(10,50),RLSE),
// DCB=(LRECL=646,RECFM=FB,BLKSIZE=0)
//SYSIN DD *
SORT FIELDS=(1,646,CH,A)
SUM FIELDS=NONE
/*
|
Input file has the same dcb parameters as that of output.
In the output file, duplicates were not removed. it was the same as that of input.
My input data looks like below
| Code: |
1234567891,A,123456789,H64398376,,
5874639368,B,AAA123456,AAAAAAA,BBBBBB,
5874639368,B,AAA123456,AAAAAAA,BBBBBB, |
|
|
| Back to top |
|
 |
enrico-sorichetti
Superior Member

Joined: 14 Mar 2007 Posts: 10900 Location: italy
|
|
|
|
just checked with
| Code: |
****** ***************************** Top of Data ******************************
000001 //ENRICO1 JOB NOTIFY=&SYSUID,
000002 // MSGLEVEL=(1,1),CLASS=A,MSGCLASS=H
000003 //*
000004 //S1 EXEC PGM=SORT
000005 //SYSPRINT DD SYSOUT=*
000006 //SYSOUT DD SYSOUT=*
000007 //SORTIN DD *
000008 1234567891,A,123456789,H64398376,,
000009 5874639368,B,AAA123456,AAAAAAA,BBBBBB,
000010 5874639368,B,AAA123456,AAAAAAA,BBBBBB,
000011 //SORTOUT DD SYSOUT=*,
000012 // DCB=(RECFM=FB,LRECL=80)
000013 //SYSIN DD *
000014 SORT FIELDS=(1,80,CH,A)
000015 SUM FIELDS=NONE
000016 //*
****** **************************** Bottom of Data ****************************
|
and it worked, so something is missing in Your description of the problem |
|
| Back to top |
|
 |
expat
Global Moderator

Joined: 14 Mar 2007 Posts: 8797 Location: Welsh Wales
|
|
|
|
And the output looks exactly as I would expect.
I think that you have omitted some vital information here. I would hazard a guess that this input has come from another operating system and may have a unicode seperator embedded into the data, which means that you have multiple records on each line.
However, if this is not the case you need to describe your requirement in far more detail than you already have. |
|
| Back to top |
|
 |
expat
Global Moderator

Joined: 14 Mar 2007 Posts: 8797 Location: Welsh Wales
|
|
|
|
Oh, I now see that the input data has changed  |
|
| Back to top |
|
 |
enrico-sorichetti
Superior Member

Joined: 14 Mar 2007 Posts: 10900 Location: italy
|
|
|
|
I just edited the post adding the code tags  |
|
| Back to top |
|
 |
mksranjani
New User
Joined: 03 Feb 2010 Posts: 29 Location: Chennai
|
|
|
|
Hi,
Sorry, please see if the below information helps.
Input file is a comma separated file. Each field has variable lengths.
Each line has only one record.
Total lrecl of the file is 646.
I want to remove duplicate records. |
|
| Back to top |
|
 |
enrico-sorichetti
Superior Member

Joined: 14 Mar 2007 Posts: 10900 Location: italy
|
|
|
|
the additionl information posted is quite useless, it adds nothing to the topic
You want to check the overall byte by byte matching or the real value matching
| Code: |
xxxx,aaa,bbb, 0,......
xxxx,aaa,bbb,00,.....
|
would fail the byte by byte matching
but logically the two records are the same, a <numeric> 0 is the same as a numeric 00
and the extra blank might be considered as harmless withespace
what is that You did not understand in the snippet I posted after testing! |
|
| Back to top |
|
 |
mksranjani
New User
Joined: 03 Feb 2010 Posts: 29 Location: Chennai
|
|
|
|
| hi Enricho, it does work. but is it possible to remove the duplicates without sorting the records. |
|
| Back to top |
|
 |
mksranjani
New User
Joined: 03 Feb 2010 Posts: 29 Location: Chennai
|
|
|
|
| hi, i want to match it byte by byte. |
|
| Back to top |
|
 |
enrico-sorichetti
Superior Member

Joined: 14 Mar 2007 Posts: 10900 Location: italy
|
|
|
|
nothing that You could not have found out Yourself a forum search with DUPLICATE PRESERVE
| Code: |
****** ***************************** Top of Data ******************************
000001 //ENRICO1 JOB NOTIFY=&SYSUID,
000002 // MSGLEVEL=(1,1),CLASS=A,MSGCLASS=H
000003 //*
000004 //S1 EXEC PGM=SORT
000005 //SYSPRINT DD SYSOUT=*
000006 //SYSOUT DD SYSOUT=*
000007 //SORTIN DD *
000008 5874639369,B,AAA123456,AAAAAAA,BBBBBB,
000009 5874639369,B,AAA123456,AAAAAAA,BBBBBB,
000010 1234567891,A,123456789,H64398376,,
000011 5874639368,B,AAA123456,AAAAAAA,BBBBBB,
000012 5874639368,B,AAA123456,AAAAAAA,BBBBBB,
000013 //SORTOUT DD SYSOUT=*,
000014 // DCB=(RECFM=FB,LRECL=80)
000015 //SYSIN DD *
000016 OPTION COPY
000017 INREC IFTHEN=(WHEN=INIT,OVERLAY=(81:SEQNUM,8,ZD,RESTART=(1,80)))
000018 OUTFIL INCLUDE=(81,8,ZD,EQ,1),BUILD=(1,80)
000019 //*
****** **************************** Bottom of Data ****************************
|
modify accordingly to Your record length,
the above snippet works if the duplicate record are contiguous,
if they are not You might need two/more passes
if You are using DFSORT better wait for Frank or Kolusu for a better suggestion
they work in a different time zone and right now they might be still in bed
 |
|
| Back to top |
|
 |
mksranjani
New User
Joined: 03 Feb 2010 Posts: 29 Location: Chennai
|
|
|
|
thank you enricho.
I tried the above code you gave.
OPTION COPY
INREC IFTHEN=(WHEN=INIT,OVERLAY=(647:SEQNUM,8,ZD,RESTART=(1,646)))
OUTFIL INCLUDE=(647,8,ZD,EQ,1),BUILD=(1,646)
But got the below error
WER271A INREC STATEMENT : NUMERIC FIELD ERROR |
|
| Back to top |
|
 |
enrico-sorichetti
Superior Member

Joined: 14 Mar 2007 Posts: 10900 Location: italy
|
|
|
|
| Code: |
| WER271A INREC STATEMENT : NUMERIC FIELD ERROR |
as said gazillions of times WER messages indicate that You are using SYNCSORT not DFSORT
topic moved where it belongs
Frank as Kolusu are DFSORT developers, You cannot expect Them to provide advice on a competitor' s product
and... by the way my name is Enrico, not Enricho  |
|
| Back to top |
|
 |
mksranjani
New User
Joined: 03 Feb 2010 Posts: 29 Location: Chennai
|
|
|
|
Thank you Enrico..  |
|
| Back to top |
|
 |
enrico-sorichetti
Superior Member

Joined: 14 Mar 2007 Posts: 10900 Location: italy
|
|
|
|
tested with DFSORT, searched the manuals and here is explanation of the
| Quote: |
| ICE111A E REFORMATTING FIELD ERROR |
| Quote: |
| The length (m for p,m or FIXLEN=m for %nn) for a RESTART field was greater than 256 bytes. |
there should be something similar in Your sysout and manuals |
|
| Back to top |
|
 |
sqlcode1
Active Member
Joined: 08 Apr 2010 Posts: 577 Location: USA
|
|
|
|
Untest on syncsort but should work...(Tested on DFSort...)
| Code: |
//SYSIN DD *
INREC IFTHEN=(WHEN=INIT,OVERLAY=(647:SEQNUM,8,ZD,RESTART=(001,256),
655:SEQNUM,8,ZD,RESTART=(257,256),
663:SEQNUM,8,ZD,RESTART=(513,134))),
IFTHEN=(WHEN=(647,8,ZD,EQ,1,OR,655,8,ZD,EQ,1,OR,663,8,ZD,EQ,1),
OVERLAY=(671:C'Y'))
SORT FIELDS=COPY
OUTFIL INCLUDE=(671,1,CH,EQ,C'Y'),BUILD=(1,646)
/* |
Thanks, |
|
| Back to top |
|
 |
enrico-sorichetti
Superior Member

Joined: 14 Mar 2007 Posts: 10900 Location: italy
|
|
|
|
still there is the question pending...
what if the duplicate records are not contiguous ! |
|
| Back to top |
|
 |
sqlcode1
Active Member
Joined: 08 Apr 2010 Posts: 577 Location: USA
|
|
|
|
| mksranjani wrote: |
| hi Enricho, it does work. but is it possible to remove the duplicates without sorting the records. |
I thought OP didn't wanted to sort records.
Correct me if I am wrong but even if records are not contiguous, I think its possible using outrec. It would take 2 pass only if after removing duplicates OP wants to go back to original order.
Thanks, |
|
| Back to top |
|
 |
enrico-sorichetti
Superior Member

Joined: 14 Mar 2007 Posts: 10900 Location: italy
|
|
|
|
the solutions we posted got rid of the duplicates without altering the original ordering,
but will work only for contiguous duplicates
if the duplicates are not contiguous as I said before I feel that a multi pass approach is needed!
too bad that the TS is using Syncsort,
I guess that we are bound to miss Frank and Kolusu smart solutions!
I searched the forums for non contiguous duplicates keeping the original order and the first occurrence, but I was not able to find anything |
|
| Back to top |
|
 |
Phrzby Phil
Senior Member
Joined: 31 Oct 2006 Posts: 1054 Location: Richmond, Virginia
|
|
|
|
1. add a seq# column
2. sort by other fields
3. dedupe
4. re-order by seq# |
|
| Back to top |
|
 |
sqlcode1
Active Member
Joined: 08 Apr 2010 Posts: 577 Location: USA
|
|
|
|
Phrzby Phil,
Point #2 and 4 would require separate pass. We were thinking of single pass solution.
Thanks, |
|
| Back to top |
|
 |
enrico-sorichetti
Superior Member

Joined: 14 Mar 2007 Posts: 10900 Location: italy
|
|
|
|
my bad for using the <wrong> terminology
anybody can do it with TWO sort steps
the point here are not the multiple passes of data ( they will be needed anyway )
the puzzle is how to do it with a single SORT/ICETOOL step |
|
| Back to top |
|
 |
superk
Global Moderator

Joined: 26 Apr 2004 Posts: 4652 Location: Raleigh, NC, USA
|
|
|
|
| This topic is for the Syncsort product. Could we all please refrain from commenting on it and let Alissa provide the appropriate solution. |
|
| Back to top |
|
 |
enrico-sorichetti
Superior Member

Joined: 14 Mar 2007 Posts: 10900 Location: italy
|
|
|
|
Just exploring
a two step (two passes ) job for non contiguous duplicates ( using SORT)
| Code: |
****** ***************************** Top of Data ******************************
000001 //ENRICO1 JOB NOTIFY=&SYSUID,
000002 // MSGLEVEL=(1,1),CLASS=A,MSGCLASS=H
000003 //*
000004 //CLEANUP EXEC PGM=CLEANUP
000005 //STEPLIB DD DISP=SHR,DSN=SYS2.LINKLIB
000006 //*
000007 //S1 EXEC PGM=SORT
000008 //SYSPRINT DD SYSOUT=*
000009 //SYSOUT DD SYSOUT=*
000010 //SORTIN DD *
000011 5874639369,B,AAA123456,AAAAAAA,BBBBBB,
000012 5874639369,B,AAA123456,AAAAAAA,BBBBBB,
000013 1234567891,A,123456789,H64398376,,
000014 5874639368,B,AAA123456,AAAAAAA,BBBBBB,
000015 5874639368,B,AAA123456,AAAAAAA,BBBBBB,
000016 5874639369,B,AAA123456,AAAAAAA,BBBBBB,
000017 5874639369,B,AAA123456,AAAAAAA,BBBBBB,
000018 1234567891,A,123456789,H64398376,,
000019 5874639368,B,AAA123456,AAAAAAA,BBBBBB,
000020 5874639368,B,AAA123456,AAAAAAA,BBBBBB,
000021 5874639369,B,AAA123456,AAAAAAA,BBBBBB,
000022 5874639369,B,AAA123456,AAAAAAA,BBBBBB,
000023 1234567891,A,123456789,H64398376,,
000024 5874639368,B,AAA123456,AAAAAAA,BBBBBB,
000025 5874639368,B,AAA123456,AAAAAAA,BBBBBB,
000026 5874639369,B,AAA123456,AAAAAAA,BBBBBB,
000027 5874639369,B,AAA123456,AAAAAAA,BBBBBB,
000028 1234567891,A,123456789,H64398376,,
000029 5874639368,B,AAA123456,AAAAAAA,BBBBBB,
000030 5874639368,B,AAA123456,AAAAAAA,BBBBBB,
000031 5874639369,B,AAA123456,AAAAAAA,BBBBBB,
000032 5874639369,B,AAA123456,AAAAAAA,BBBBBB,
000033 1234567891,A,123456789,H64398376,,
000034 5874639368,B,AAA123456,AAAAAAA,BBBBBB,
000035 5874639368,B,AAA123456,AAAAAAA,BBBBBB,
000036 //SORTOUT DD DISP=(NEW,PASS),DSN=&&TEMP88,
000037 // UNIT=SYSDA,SPACE=(CYL,(1,1)),
000038 // DCB=(RECFM=FB,LRECL=88)
000039 //SYSIN DD *
000040 INREC OVERLAY=(81:SEQNUM,8,ZD)
000041 SORT FIELDS=(1,80,CH,A,81,8,ZD,A)
000042 OUTREC BUILD=(1,88)
000043 //*
000044 //S2 EXEC PGM=SORT
000045 //SYSPRINT DD SYSOUT=*
000046 //SYSOUT DD SYSOUT=*
000047 //SORTIN DD DISP=(OLD,PASS),DSN=&&TEMP88
000048 //SORTOUT DD SYSOUT=*,DCB=(LRECL=80)
000049 //SYSIN DD *
000050 SORT FIELDS=(81,8,ZD,A)
000051 INREC IFTHEN=(WHEN=INIT,OVERLAY=(89:SEQNUM,8,ZD,RESTART=(1,80)))
000052 OUTFIL INCLUDE=(89,8,ZD,EQ,1),BUILD=(1,80)
000053 //*
****** **************************** Bottom of Data **************************** |
a one step (two passes anyway ) for the same ( using ICETOOL )
| Code: |
****** ***************************** Top of Data ******************************
000001 //ENRICO1 JOB NOTIFY=&SYSUID,
000002 // MSGLEVEL=(1,1),CLASS=A,MSGCLASS=H
000003 //*
000004 //CLEANUP EXEC PGM=CLEANUP
000005 //STEPLIB DD DISP=SHR,DSN=SYS2.LINKLIB
000006 //*
000007 //S1 EXEC PGM=ICETOOL
000008 //TOOLMSG DD SYSOUT=*
000009 //SYSPRINT DD SYSOUT=*
000010 //DFSMSG DD SYSOUT=*
000011 //SYSOUT DD SYSOUT=*
000012 //SORTIN DD *
000013 5874639369,B,AAA123456,AAAAAAA,BBBBBB,
000014 5874639369,B,AAA123456,AAAAAAA,BBBBBB,
000015 1234567891,A,123456789,H64398376,,
000016 5874639368,B,AAA123456,AAAAAAA,BBBBBB,
000017 5874639368,B,AAA123456,AAAAAAA,BBBBBB,
000018 5874639369,B,AAA123456,AAAAAAA,BBBBBB,
000019 5874639369,B,AAA123456,AAAAAAA,BBBBBB,
000020 1234567891,A,123456789,H64398376,,
000021 5874639368,B,AAA123456,AAAAAAA,BBBBBB,
000022 5874639368,B,AAA123456,AAAAAAA,BBBBBB,
000023 5874639369,B,AAA123456,AAAAAAA,BBBBBB,
000024 5874639369,B,AAA123456,AAAAAAA,BBBBBB,
000025 1234567891,A,123456789,H64398376,,
000026 5874639368,B,AAA123456,AAAAAAA,BBBBBB,
000027 5874639368,B,AAA123456,AAAAAAA,BBBBBB,
000028 5874639369,B,AAA123456,AAAAAAA,BBBBBB,
000029 5874639369,B,AAA123456,AAAAAAA,BBBBBB,
000030 1234567891,A,123456789,H64398376,,
000031 5874639368,B,AAA123456,AAAAAAA,BBBBBB,
000032 5874639368,B,AAA123456,AAAAAAA,BBBBBB,
000033 5874639369,B,AAA123456,AAAAAAA,BBBBBB,
000034 5874639369,B,AAA123456,AAAAAAA,BBBBBB,
000035 1234567891,A,123456789,H64398376,,
000036 5874639368,B,AAA123456,AAAAAAA,BBBBBB,
000037 5874639368,B,AAA123456,AAAAAAA,BBBBBB,
000038 //TEMP88 DD DISP=(MOD,PASS),DSN=&&TEMP88,
000039 // UNIT=SYSDA,SPACE=(CYL,(1,1)),
000040 // DCB=(RECFM=FB,LRECL=88)
000041 //SORTOUT DD SYSOUT=*,DCB=(LRECL=80)
000042 //TOOLIN DD *
000043 SORT FROM(SORTIN) TO(TEMP88) USING(CTL1)
000044 SORT FROM(TEMP88) TO(SORTOUT) USING(CTL2)
000045 //CTL1CNTL DD *
000046 INREC OVERLAY=(81:SEQNUM,8,ZD)
000047 SORT FIELDS=(1,80,CH,A,81,8,ZD,A)
000048 OUTREC BUILD=(1,88)
000049 //CTL2CNTL DD *
000050 SORT FIELDS=(81,8,ZD,A)
000051 INREC IFTHEN=(WHEN=INIT,OVERLAY=(89:SEQNUM,8,ZD,RESTART=(1,80)))
000052 OUTFIL INCLUDE=(89,8,ZD,EQ,1),BUILD=(1,80)
****** **************************** Bottom of Data **************************** |
as usually a SORT expert might come up with a better solution  |
|
| Back to top |
|
 |
|
|
 |
All times are GMT + 6 Hours |
|