IBM Mainframe Forum Index
 
Log In
 
IBM Mainframe Forum Index Mainframe: Search IBM Mainframe Forum: FAQ Register
 

Syncsort - Remove duplicates in a csv file


IBM Mainframe Forums -> JCL & VSAM
Post new topic   Reply to topic
View previous topic :: View next topic  
Author Message
Phrzby Phil

Senior Member


Joined: 31 Oct 2006
Posts: 1042
Location: Richmond, Virginia

PostPosted: Fri Jun 10, 2011 11:08 pm
Reply with quote

1. add a seq# column
2. sort by other fields
3. dedupe
4. re-order by seq#
Back to top
View user's profile Send private message
sqlcode1

Active Member


Joined: 08 Apr 2010
Posts: 577
Location: USA

PostPosted: Fri Jun 10, 2011 11:15 pm
Reply with quote

Phrzby Phil,
Point #2 and 4 would require separate pass. We were thinking of single pass solution.

Thanks,
Back to top
View user's profile Send private message
enrico-sorichetti

Superior Member


Joined: 14 Mar 2007
Posts: 10873
Location: italy

PostPosted: Fri Jun 10, 2011 11:18 pm
Reply with quote

my bad for using the <wrong> terminology

anybody can do it with TWO sort steps

the point here are not the multiple passes of data ( they will be needed anyway )

the puzzle is how to do it with a single SORT/ICETOOL step
Back to top
View user's profile Send private message
superk

Global Moderator


Joined: 26 Apr 2004
Posts: 4652
Location: Raleigh, NC, USA

PostPosted: Fri Jun 10, 2011 11:21 pm
Reply with quote

This topic is for the Syncsort product. Could we all please refrain from commenting on it and let Alissa provide the appropriate solution.
Back to top
View user's profile Send private message
enrico-sorichetti

Superior Member


Joined: 14 Mar 2007
Posts: 10873
Location: italy

PostPosted: Sat Jun 11, 2011 12:39 pm
Reply with quote

Just exploring
a two step (two passes ) job for non contiguous duplicates ( using SORT)
Code:
****** ***************************** Top of Data ******************************
000001 //ENRICO1  JOB NOTIFY=&SYSUID,
000002 //             MSGLEVEL=(1,1),CLASS=A,MSGCLASS=H
000003 //*
000004 //CLEANUP EXEC PGM=CLEANUP
000005 //STEPLIB   DD DISP=SHR,DSN=SYS2.LINKLIB
000006 //*
000007 //S1      EXEC PGM=SORT
000008 //SYSPRINT  DD SYSOUT=*
000009 //SYSOUT    DD SYSOUT=*
000010 //SORTIN    DD *
000011 5874639369,B,AAA123456,AAAAAAA,BBBBBB,
000012 5874639369,B,AAA123456,AAAAAAA,BBBBBB,
000013 1234567891,A,123456789,H64398376,,
000014 5874639368,B,AAA123456,AAAAAAA,BBBBBB,
000015 5874639368,B,AAA123456,AAAAAAA,BBBBBB,
000016 5874639369,B,AAA123456,AAAAAAA,BBBBBB,
000017 5874639369,B,AAA123456,AAAAAAA,BBBBBB,
000018 1234567891,A,123456789,H64398376,,
000019 5874639368,B,AAA123456,AAAAAAA,BBBBBB,
000020 5874639368,B,AAA123456,AAAAAAA,BBBBBB,
000021 5874639369,B,AAA123456,AAAAAAA,BBBBBB,
000022 5874639369,B,AAA123456,AAAAAAA,BBBBBB,
000023 1234567891,A,123456789,H64398376,,
000024 5874639368,B,AAA123456,AAAAAAA,BBBBBB,
000025 5874639368,B,AAA123456,AAAAAAA,BBBBBB,
000026 5874639369,B,AAA123456,AAAAAAA,BBBBBB,
000027 5874639369,B,AAA123456,AAAAAAA,BBBBBB,
000028 1234567891,A,123456789,H64398376,,
000029 5874639368,B,AAA123456,AAAAAAA,BBBBBB,
000030 5874639368,B,AAA123456,AAAAAAA,BBBBBB,
000031 5874639369,B,AAA123456,AAAAAAA,BBBBBB,
000032 5874639369,B,AAA123456,AAAAAAA,BBBBBB,
000033 1234567891,A,123456789,H64398376,,
000034 5874639368,B,AAA123456,AAAAAAA,BBBBBB,
000035 5874639368,B,AAA123456,AAAAAAA,BBBBBB,
000036 //SORTOUT   DD DISP=(NEW,PASS),DSN=&&TEMP88,
000037 //             UNIT=SYSDA,SPACE=(CYL,(1,1)),
000038 //             DCB=(RECFM=FB,LRECL=88)
000039 //SYSIN     DD *
000040   INREC  OVERLAY=(81:SEQNUM,8,ZD)
000041   SORT   FIELDS=(1,80,CH,A,81,8,ZD,A)
000042   OUTREC BUILD=(1,88)
000043 //*
000044 //S2      EXEC PGM=SORT
000045 //SYSPRINT  DD SYSOUT=*
000046 //SYSOUT    DD SYSOUT=*
000047 //SORTIN    DD DISP=(OLD,PASS),DSN=&&TEMP88
000048 //SORTOUT   DD SYSOUT=*,DCB=(LRECL=80)
000049 //SYSIN     DD *
000050   SORT   FIELDS=(81,8,ZD,A)
000051   INREC  IFTHEN=(WHEN=INIT,OVERLAY=(89:SEQNUM,8,ZD,RESTART=(1,80)))
000052   OUTFIL INCLUDE=(89,8,ZD,EQ,1),BUILD=(1,80)
000053 //*
****** **************************** Bottom of Data ****************************


a one step (two passes anyway ) for the same ( using ICETOOL )
Code:
****** ***************************** Top of Data ******************************
000001 //ENRICO1  JOB NOTIFY=&SYSUID,
000002 //             MSGLEVEL=(1,1),CLASS=A,MSGCLASS=H
000003 //*
000004 //CLEANUP EXEC PGM=CLEANUP
000005 //STEPLIB   DD DISP=SHR,DSN=SYS2.LINKLIB
000006 //*
000007 //S1      EXEC PGM=ICETOOL
000008 //TOOLMSG   DD SYSOUT=*
000009 //SYSPRINT  DD SYSOUT=*
000010 //DFSMSG    DD SYSOUT=*
000011 //SYSOUT    DD SYSOUT=*
000012 //SORTIN    DD *
000013 5874639369,B,AAA123456,AAAAAAA,BBBBBB,
000014 5874639369,B,AAA123456,AAAAAAA,BBBBBB,
000015 1234567891,A,123456789,H64398376,,
000016 5874639368,B,AAA123456,AAAAAAA,BBBBBB,
000017 5874639368,B,AAA123456,AAAAAAA,BBBBBB,
000018 5874639369,B,AAA123456,AAAAAAA,BBBBBB,
000019 5874639369,B,AAA123456,AAAAAAA,BBBBBB,
000020 1234567891,A,123456789,H64398376,,
000021 5874639368,B,AAA123456,AAAAAAA,BBBBBB,
000022 5874639368,B,AAA123456,AAAAAAA,BBBBBB,
000023 5874639369,B,AAA123456,AAAAAAA,BBBBBB,
000024 5874639369,B,AAA123456,AAAAAAA,BBBBBB,
000025 1234567891,A,123456789,H64398376,,
000026 5874639368,B,AAA123456,AAAAAAA,BBBBBB,
000027 5874639368,B,AAA123456,AAAAAAA,BBBBBB,
000028 5874639369,B,AAA123456,AAAAAAA,BBBBBB,
000029 5874639369,B,AAA123456,AAAAAAA,BBBBBB,
000030 1234567891,A,123456789,H64398376,,
000031 5874639368,B,AAA123456,AAAAAAA,BBBBBB,
000032 5874639368,B,AAA123456,AAAAAAA,BBBBBB,
000033 5874639369,B,AAA123456,AAAAAAA,BBBBBB,
000034 5874639369,B,AAA123456,AAAAAAA,BBBBBB,
000035 1234567891,A,123456789,H64398376,,
000036 5874639368,B,AAA123456,AAAAAAA,BBBBBB,
000037 5874639368,B,AAA123456,AAAAAAA,BBBBBB,
000038 //TEMP88    DD DISP=(MOD,PASS),DSN=&&TEMP88,
000039 //             UNIT=SYSDA,SPACE=(CYL,(1,1)),
000040 //             DCB=(RECFM=FB,LRECL=88)
000041 //SORTOUT   DD SYSOUT=*,DCB=(LRECL=80)
000042 //TOOLIN    DD *
000043   SORT FROM(SORTIN) TO(TEMP88)  USING(CTL1)
000044   SORT FROM(TEMP88) TO(SORTOUT) USING(CTL2)
000045 //CTL1CNTL  DD *
000046   INREC  OVERLAY=(81:SEQNUM,8,ZD)
000047   SORT   FIELDS=(1,80,CH,A,81,8,ZD,A)
000048   OUTREC BUILD=(1,88)
000049 //CTL2CNTL  DD *
000050   SORT   FIELDS=(81,8,ZD,A)
000051   INREC  IFTHEN=(WHEN=INIT,OVERLAY=(89:SEQNUM,8,ZD,RESTART=(1,80)))
000052   OUTFIL INCLUDE=(89,8,ZD,EQ,1),BUILD=(1,80)
****** **************************** Bottom of Data ****************************


as usually a SORT expert might come up with a better solution icon_biggrin.gif
Back to top
View user's profile Send private message
PeD

Active User


Joined: 26 Nov 2005
Posts: 459
Location: Belgium

PostPosted: Sun Jun 12, 2011 1:53 pm
Reply with quote

For two records for which you want to see one excluded, edit them in hexadecimal, then verify there is not somewhere a non-printable code byte ( less than '40'x ).
Back to top
View user's profile Send private message
mksranjani

New User


Joined: 03 Feb 2010
Posts: 29
Location: Chennai

PostPosted: Mon Jun 13, 2011 12:48 pm
Reply with quote

Hi Enrico,

In the code you have given , could you please explain me the INREC and SORT FIELDS statements.
have given below the code for your reference..

000045 //CTL1CNTL DD *
000046 INREC OVERLAY=(81:SEQNUM,8,ZD)
000047 SORT FIELDS=(1,80,CH,A,81,8,ZD,A)
000048 OUTREC BUILD=(1,88)
000049 //CTL2CNTL DD *
000050 SORT FIELDS=(81,8,ZD,A)
000051 INREC IFTHEN=(WHEN=INIT,OVERLAY=(89:SEQNUM,8,ZD,RESTART=(1,80)))
000052 OUTFIL INCLUDE=(89,8,ZD,EQ,1),BUILD=(1,80)
Back to top
View user's profile Send private message
enrico-sorichetti

Superior Member


Joined: 14 Mar 2007
Posts: 10873
Location: italy

PostPosted: Mon Jun 13, 2011 1:08 pm
Reply with quote

the inrec and the sort statements of the first pass
apply a sequence number to the original dataset and sort on the whole thing

bbbbbb...
bbbbbb...
aaaaaa...
aaaaaa...
bbbbbb...
bbbbbb...


bbbbbb...1
bbbbbb...2
aaaaaa...3
aaaaaa...4
bbbbbb...5
bbbbbb...6


aaaaaa...3
aaaaaa...4
bbbbbb...1
bbbbbb...2
bbbbbb...5
bbbbbb...6



the second inrec applies a record in the block sequence number


aaaaaa...3...1
aaaaaa...4...2
bbbbbb...1...1
bbbbbb...2...2
bbbbbb...5...3
bbbbbb...6...4

the sort sorts on the original sequence number


bbbbbb...1...1
bbbbbb...2...2
aaaaaa...3...1
aaaaaa...4...2
bbbbbb...5...3
bbbbbb...6...4

the outrec keeps only the records with a 1 in the record in the block
bbbbbb...1...1
aaaaaa...3...1
and drops all the auxiliary fields
bbbbbb...
aaaaaa...
Back to top
View user's profile Send private message
mksranjani

New User


Joined: 03 Feb 2010
Posts: 29
Location: Chennai

PostPosted: Mon Jun 13, 2011 3:23 pm
Reply with quote

thank u enrico... it was helpful...
Back to top
View user's profile Send private message
View previous topic :: :: View next topic  
Post new topic   Reply to topic View Bookmarks
All times are GMT + 6 Hours
Forum Index -> JCL & VSAM Goto page 1, 2  Next

 


Similar Topics
Topic Forum Replies
No new posts FTP VB File from Mainframe retaining ... JCL & VSAM 8
No new posts Extract the file name from another fi... DFSORT/ICETOOL 6
No new posts How to split large record length file... DFSORT/ICETOOL 10
No new posts Extracting Variable decimal numbers f... DFSORT/ICETOOL 17
No new posts SFTP Issue - destination file record ... All Other Mainframe Topics 2
Search our Forums:

Back to Top