Portal | Manuals | References | Downloads | Info | Programs | JCLs | Mainframe wiki | Quick Ref
IBM Mainframe Forum Index
 
Register
 
IBM Mainframe Forum Index Mainframe: Search IBM Mainframe Forum: FAQ Memberlist Profile Log in to check your private messages Log in
 
Syncsort - Remove duplicates in a csv file
Goto page Previous  1, 2
 
Post new topic   Reply to topic    IBMMAINFRAMES.com Support Forums -> JCL & VSAM
View previous topic :: :: View next topic  
Author Message
Phrzby Phil

Senior Member


Joined: 31 Oct 2006
Posts: 1004
Location: Richmond, Virginia

PostPosted: Fri Jun 10, 2011 11:08 pm    Post subject:
Reply with quote

1. add a seq# column
2. sort by other fields
3. dedupe
4. re-order by seq#
Back to top
View user's profile Send private message

sqlcode1

Active Member


Joined: 08 Apr 2010
Posts: 578
Location: USA

PostPosted: Fri Jun 10, 2011 11:15 pm    Post subject:
Reply with quote

Phrzby Phil,
Point #2 and 4 would require separate pass. We were thinking of single pass solution.

Thanks,
Back to top
View user's profile Send private message
enrico-sorichetti

Senior Member


Joined: 14 Mar 2007
Posts: 10690
Location: italy

PostPosted: Fri Jun 10, 2011 11:18 pm    Post subject: Reply to: Syncsort - Remove duplicates in a csv file
Reply with quote

my bad for using the <wrong> terminology

anybody can do it with TWO sort steps

the point here are not the multiple passes of data ( they will be needed anyway )

the puzzle is how to do it with a single SORT/ICETOOL step
Back to top
View user's profile Send private message
superk

Moderator Team Head


Joined: 26 Apr 2004
Posts: 4650
Location: Raleigh, NC, USA

PostPosted: Fri Jun 10, 2011 11:21 pm    Post subject: Reply to: Syncsort - Remove duplicates in a csv file
Reply with quote

This topic is for the Syncsort product. Could we all please refrain from commenting on it and let Alissa provide the appropriate solution.
Back to top
View user's profile Send private message
enrico-sorichetti

Senior Member


Joined: 14 Mar 2007
Posts: 10690
Location: italy

PostPosted: Sat Jun 11, 2011 12:39 pm    Post subject: Reply to: Syncsort - Remove duplicates in a csv file
Reply with quote

Just exploring
a two step (two passes ) job for non contiguous duplicates ( using SORT)
Code:
****** ***************************** Top of Data ******************************
000001 //ENRICO1  JOB NOTIFY=&SYSUID,
000002 //             MSGLEVEL=(1,1),CLASS=A,MSGCLASS=H
000003 //*
000004 //CLEANUP EXEC PGM=CLEANUP
000005 //STEPLIB   DD DISP=SHR,DSN=SYS2.LINKLIB
000006 //*
000007 //S1      EXEC PGM=SORT
000008 //SYSPRINT  DD SYSOUT=*
000009 //SYSOUT    DD SYSOUT=*
000010 //SORTIN    DD *
000011 5874639369,B,AAA123456,AAAAAAA,BBBBBB,
000012 5874639369,B,AAA123456,AAAAAAA,BBBBBB,
000013 1234567891,A,123456789,H64398376,,
000014 5874639368,B,AAA123456,AAAAAAA,BBBBBB,
000015 5874639368,B,AAA123456,AAAAAAA,BBBBBB,
000016 5874639369,B,AAA123456,AAAAAAA,BBBBBB,
000017 5874639369,B,AAA123456,AAAAAAA,BBBBBB,
000018 1234567891,A,123456789,H64398376,,
000019 5874639368,B,AAA123456,AAAAAAA,BBBBBB,
000020 5874639368,B,AAA123456,AAAAAAA,BBBBBB,
000021 5874639369,B,AAA123456,AAAAAAA,BBBBBB,
000022 5874639369,B,AAA123456,AAAAAAA,BBBBBB,
000023 1234567891,A,123456789,H64398376,,
000024 5874639368,B,AAA123456,AAAAAAA,BBBBBB,
000025 5874639368,B,AAA123456,AAAAAAA,BBBBBB,
000026 5874639369,B,AAA123456,AAAAAAA,BBBBBB,
000027 5874639369,B,AAA123456,AAAAAAA,BBBBBB,
000028 1234567891,A,123456789,H64398376,,
000029 5874639368,B,AAA123456,AAAAAAA,BBBBBB,
000030 5874639368,B,AAA123456,AAAAAAA,BBBBBB,
000031 5874639369,B,AAA123456,AAAAAAA,BBBBBB,
000032 5874639369,B,AAA123456,AAAAAAA,BBBBBB,
000033 1234567891,A,123456789,H64398376,,
000034 5874639368,B,AAA123456,AAAAAAA,BBBBBB,
000035 5874639368,B,AAA123456,AAAAAAA,BBBBBB,
000036 //SORTOUT   DD DISP=(NEW,PASS),DSN=&&TEMP88,
000037 //             UNIT=SYSDA,SPACE=(CYL,(1,1)),
000038 //             DCB=(RECFM=FB,LRECL=88)
000039 //SYSIN     DD *
000040   INREC  OVERLAY=(81:SEQNUM,8,ZD)
000041   SORT   FIELDS=(1,80,CH,A,81,8,ZD,A)
000042   OUTREC BUILD=(1,88)
000043 //*
000044 //S2      EXEC PGM=SORT
000045 //SYSPRINT  DD SYSOUT=*
000046 //SYSOUT    DD SYSOUT=*
000047 //SORTIN    DD DISP=(OLD,PASS),DSN=&&TEMP88
000048 //SORTOUT   DD SYSOUT=*,DCB=(LRECL=80)
000049 //SYSIN     DD *
000050   SORT   FIELDS=(81,8,ZD,A)
000051   INREC  IFTHEN=(WHEN=INIT,OVERLAY=(89:SEQNUM,8,ZD,RESTART=(1,80)))
000052   OUTFIL INCLUDE=(89,8,ZD,EQ,1),BUILD=(1,80)
000053 //*
****** **************************** Bottom of Data ****************************


a one step (two passes anyway ) for the same ( using ICETOOL )
Code:
****** ***************************** Top of Data ******************************
000001 //ENRICO1  JOB NOTIFY=&SYSUID,
000002 //             MSGLEVEL=(1,1),CLASS=A,MSGCLASS=H
000003 //*
000004 //CLEANUP EXEC PGM=CLEANUP
000005 //STEPLIB   DD DISP=SHR,DSN=SYS2.LINKLIB
000006 //*
000007 //S1      EXEC PGM=ICETOOL
000008 //TOOLMSG   DD SYSOUT=*
000009 //SYSPRINT  DD SYSOUT=*
000010 //DFSMSG    DD SYSOUT=*
000011 //SYSOUT    DD SYSOUT=*
000012 //SORTIN    DD *
000013 5874639369,B,AAA123456,AAAAAAA,BBBBBB,
000014 5874639369,B,AAA123456,AAAAAAA,BBBBBB,
000015 1234567891,A,123456789,H64398376,,
000016 5874639368,B,AAA123456,AAAAAAA,BBBBBB,
000017 5874639368,B,AAA123456,AAAAAAA,BBBBBB,
000018 5874639369,B,AAA123456,AAAAAAA,BBBBBB,
000019 5874639369,B,AAA123456,AAAAAAA,BBBBBB,
000020 1234567891,A,123456789,H64398376,,
000021 5874639368,B,AAA123456,AAAAAAA,BBBBBB,
000022 5874639368,B,AAA123456,AAAAAAA,BBBBBB,
000023 5874639369,B,AAA123456,AAAAAAA,BBBBBB,
000024 5874639369,B,AAA123456,AAAAAAA,BBBBBB,
000025 1234567891,A,123456789,H64398376,,
000026 5874639368,B,AAA123456,AAAAAAA,BBBBBB,
000027 5874639368,B,AAA123456,AAAAAAA,BBBBBB,
000028 5874639369,B,AAA123456,AAAAAAA,BBBBBB,
000029 5874639369,B,AAA123456,AAAAAAA,BBBBBB,
000030 1234567891,A,123456789,H64398376,,
000031 5874639368,B,AAA123456,AAAAAAA,BBBBBB,
000032 5874639368,B,AAA123456,AAAAAAA,BBBBBB,
000033 5874639369,B,AAA123456,AAAAAAA,BBBBBB,
000034 5874639369,B,AAA123456,AAAAAAA,BBBBBB,
000035 1234567891,A,123456789,H64398376,,
000036 5874639368,B,AAA123456,AAAAAAA,BBBBBB,
000037 5874639368,B,AAA123456,AAAAAAA,BBBBBB,
000038 //TEMP88    DD DISP=(MOD,PASS),DSN=&&TEMP88,
000039 //             UNIT=SYSDA,SPACE=(CYL,(1,1)),
000040 //             DCB=(RECFM=FB,LRECL=88)
000041 //SORTOUT   DD SYSOUT=*,DCB=(LRECL=80)
000042 //TOOLIN    DD *
000043   SORT FROM(SORTIN) TO(TEMP88)  USING(CTL1)
000044   SORT FROM(TEMP88) TO(SORTOUT) USING(CTL2)
000045 //CTL1CNTL  DD *
000046   INREC  OVERLAY=(81:SEQNUM,8,ZD)
000047   SORT   FIELDS=(1,80,CH,A,81,8,ZD,A)
000048   OUTREC BUILD=(1,88)
000049 //CTL2CNTL  DD *
000050   SORT   FIELDS=(81,8,ZD,A)
000051   INREC  IFTHEN=(WHEN=INIT,OVERLAY=(89:SEQNUM,8,ZD,RESTART=(1,80)))
000052   OUTFIL INCLUDE=(89,8,ZD,EQ,1),BUILD=(1,80)
****** **************************** Bottom of Data ****************************


as usually a SORT expert might come up with a better solution icon_biggrin.gif
Back to top
View user's profile Send private message
PeD

Active User


Joined: 26 Nov 2005
Posts: 458
Location: Belgium

PostPosted: Sun Jun 12, 2011 1:53 pm    Post subject:
Reply with quote

For two records for which you want to see one excluded, edit them in hexadecimal, then verify there is not somewhere a non-printable code byte ( less than '40'x ).
Back to top
View user's profile Send private message
mksranjani

New User


Joined: 03 Feb 2010
Posts: 29
Location: Chennai

PostPosted: Mon Jun 13, 2011 12:48 pm    Post subject: Reply to: Syncsort - Remove duplicates in a csv file
Reply with quote

Hi Enrico,

In the code you have given , could you please explain me the INREC and SORT FIELDS statements.
have given below the code for your reference..

000045 //CTL1CNTL DD *
000046 INREC OVERLAY=(81:SEQNUM,8,ZD)
000047 SORT FIELDS=(1,80,CH,A,81,8,ZD,A)
000048 OUTREC BUILD=(1,88)
000049 //CTL2CNTL DD *
000050 SORT FIELDS=(81,8,ZD,A)
000051 INREC IFTHEN=(WHEN=INIT,OVERLAY=(89:SEQNUM,8,ZD,RESTART=(1,80)))
000052 OUTFIL INCLUDE=(89,8,ZD,EQ,1),BUILD=(1,80)
Back to top
View user's profile Send private message
enrico-sorichetti

Senior Member


Joined: 14 Mar 2007
Posts: 10690
Location: italy

PostPosted: Mon Jun 13, 2011 1:08 pm    Post subject: Reply to: Syncsort - Remove duplicates in a csv file
Reply with quote

the inrec and the sort statements of the first pass
apply a sequence number to the original dataset and sort on the whole thing

bbbbbb...
bbbbbb...
aaaaaa...
aaaaaa...
bbbbbb...
bbbbbb...


bbbbbb...1
bbbbbb...2
aaaaaa...3
aaaaaa...4
bbbbbb...5
bbbbbb...6


aaaaaa...3
aaaaaa...4
bbbbbb...1
bbbbbb...2
bbbbbb...5
bbbbbb...6



the second inrec applies a record in the block sequence number


aaaaaa...3...1
aaaaaa...4...2
bbbbbb...1...1
bbbbbb...2...2
bbbbbb...5...3
bbbbbb...6...4

the sort sorts on the original sequence number


bbbbbb...1...1
bbbbbb...2...2
aaaaaa...3...1
aaaaaa...4...2
bbbbbb...5...3
bbbbbb...6...4

the outrec keeps only the records with a 1 in the record in the block
bbbbbb...1...1
aaaaaa...3...1
and drops all the auxiliary fields
bbbbbb...
aaaaaa...
Back to top
View user's profile Send private message
mksranjani

New User


Joined: 03 Feb 2010
Posts: 29
Location: Chennai

PostPosted: Mon Jun 13, 2011 3:23 pm    Post subject: Reply to: Syncsort - Remove duplicates in a csv file
Reply with quote

thank u enrico... it was helpful...
Back to top
View user's profile Send private message
View previous topic :: :: View next topic  
Post new topic   Reply to topic    IBMMAINFRAMES.com Support Forums -> JCL & VSAM All times are GMT + 6 Hours
Goto page Previous  1, 2
Page 2 of 2

 

Search our Forum:

Similar Topics
Topic Author Forum Replies Posted
No new posts Problem getting record length of a re... amitc23 COBOL Programming 1 Thu Jul 02, 2020 10:52 pm
No new posts Want to check if trailer count is 0 a... rohanthengal JCL & VSAM 10 Fri Jun 26, 2020 10:47 am
No new posts File transfer to remote system using ... kalyan.lc5 TSO/ISPF 2 Wed Jun 24, 2020 1:24 pm
No new posts How to remove remote file older than ... ssuthagar All Other Mainframe Topics 4 Thu Jun 04, 2020 6:26 am
No new posts How to fetch the latest record from a... Dinesh Mani DFSORT/ICETOOL 8 Tue Jun 02, 2020 5:34 pm

Facebook
Back to Top
 
Job Vacancies | Forum Rules | Bookmarks | Subscriptions | FAQ | Polls | Contact Us