Portal | Manuals | References | Downloads | Info | Programs | JCLs | Master the Mainframes
IBM Mainframe Computers Forums Index
 
Register
 
IBM Mainframe Computers Forums Index Mainframe: Search IBM Mainframe Forum: FAQ Memberlist Usergroups Profile Log in to check your private messages Log in
 

 

Syncsort - Remove duplicates in a csv file
Goto page 1, 2  Next
 
Post new topic   Reply to topic    IBMMAINFRAMES.com Support Forums -> JCL & VSAM
View previous topic :: :: View next topic  
Author Message
mksranjani

New User


Joined: 03 Feb 2010
Posts: 29
Location: Chennai

PostPosted: Fri Jun 10, 2011 2:20 pm    Post subject: Syncsort - Remove duplicates in a csv file
Reply with quote

hi,

I have a comma separated file of lrecl 646. I need to remove the duplicates from the file. I tried with sort function 'SUM FIELDS=NONE' . But it did not remove the duplicates. can someone help?
Back to top
View user's profile Send private message

expat

Global Moderator


Joined: 14 Mar 2007
Posts: 8593
Location: Back in jolly old England

PostPosted: Fri Jun 10, 2011 2:26 pm    Post subject:
Reply with quote

Show a sample of your input, and the wrong output.
The input output file DSORG, RECFM and LRECL
Back to top
View user's profile Send private message
mksranjani

New User


Joined: 03 Feb 2010
Posts: 29
Location: Chennai

PostPosted: Fri Jun 10, 2011 2:37 pm    Post subject: Reply to: Remove duplicates in a csv file
Reply with quote

hi, please find below the sort i used

Code:
//STEP0001 EXEC PGM=SORT                           
//SYSOUT    DD SYSOUT=*                             
//SYSPRINT DD SYSOUT=*                             
//SORTIN    DD DSN=INPUT FILE...                     
//SORTOUT  DD DSN=OUTPUT FILE....                   
//                     DISP=(NEW,CATLG,DELETE),             
//                     UNIT=TESTDA,SPACE=(CYL,(10,50),RLSE),
//                     DCB=(LRECL=646,RECFM=FB,BLKSIZE=0)   
//SYSIN       DD *                                     
 SORT FIELDS=(1,646,CH,A)                           
 SUM FIELDS=NONE         
/*

Input file has the same dcb parameters as that of output.
In the output file, duplicates were not removed. it was the same as that of input.

My input data looks like below

Code:
1234567891,A,123456789,H64398376,,
5874639368,B,AAA123456,AAAAAAA,BBBBBB,
5874639368,B,AAA123456,AAAAAAA,BBBBBB,
Back to top
View user's profile Send private message
enrico-sorichetti

Global Moderator


Joined: 14 Mar 2007
Posts: 10201
Location: italy

PostPosted: Fri Jun 10, 2011 2:43 pm    Post subject: Reply to: Remove duplicates in a csv file
Reply with quote

just checked with

Code:
****** ***************************** Top of Data ******************************
000001 //ENRICO1  JOB NOTIFY=&SYSUID,
000002 //             MSGLEVEL=(1,1),CLASS=A,MSGCLASS=H
000003 //*
000004 //S1      EXEC PGM=SORT
000005 //SYSPRINT  DD SYSOUT=*
000006 //SYSOUT    DD SYSOUT=*
000007 //SORTIN    DD *
000008 1234567891,A,123456789,H64398376,,
000009 5874639368,B,AAA123456,AAAAAAA,BBBBBB,
000010 5874639368,B,AAA123456,AAAAAAA,BBBBBB,
000011 //SORTOUT   DD SYSOUT=*,
000012 //             DCB=(RECFM=FB,LRECL=80)
000013 //SYSIN     DD *
000014   SORT   FIELDS=(1,80,CH,A)
000015   SUM    FIELDS=NONE
000016 //*
****** **************************** Bottom of Data ****************************


and it worked, so something is missing in Your description of the problem
Back to top
View user's profile Send private message
expat

Global Moderator


Joined: 14 Mar 2007
Posts: 8593
Location: Back in jolly old England

PostPosted: Fri Jun 10, 2011 2:44 pm    Post subject:
Reply with quote

And the output looks exactly as I would expect.

I think that you have omitted some vital information here. I would hazard a guess that this input has come from another operating system and may have a unicode seperator embedded into the data, which means that you have multiple records on each line.

However, if this is not the case you need to describe your requirement in far more detail than you already have.
Back to top
View user's profile Send private message
expat

Global Moderator


Joined: 14 Mar 2007
Posts: 8593
Location: Back in jolly old England

PostPosted: Fri Jun 10, 2011 2:45 pm    Post subject:
Reply with quote

Oh, I now see that the input data has changed icon_rolleyes.gif
Back to top
View user's profile Send private message
enrico-sorichetti

Global Moderator


Joined: 14 Mar 2007
Posts: 10201
Location: italy

PostPosted: Fri Jun 10, 2011 2:49 pm    Post subject: Reply to: Remove duplicates in a csv file
Reply with quote

I just edited the post adding the code tags icon_redface.gif
Back to top
View user's profile Send private message
mksranjani

New User


Joined: 03 Feb 2010
Posts: 29
Location: Chennai

PostPosted: Fri Jun 10, 2011 2:52 pm    Post subject: Reply to: Remove duplicates in a csv file
Reply with quote

Hi,
Sorry, please see if the below information helps.
Input file is a comma separated file. Each field has variable lengths.
Each line has only one record.
Total lrecl of the file is 646.
I want to remove duplicate records.
Back to top
View user's profile Send private message
enrico-sorichetti

Global Moderator


Joined: 14 Mar 2007
Posts: 10201
Location: italy

PostPosted: Fri Jun 10, 2011 2:58 pm    Post subject: Reply to: Remove duplicates in a csv file
Reply with quote

the additionl information posted is quite useless, it adds nothing to the topic
You want to check the overall byte by byte matching or the real value matching

Code:
xxxx,aaa,bbb, 0,......
xxxx,aaa,bbb,00,.....


would fail the byte by byte matching
but logically the two records are the same, a <numeric> 0 is the same as a numeric 00
and the extra blank might be considered as harmless withespace

what is that You did not understand in the snippet I posted after testing!
Back to top
View user's profile Send private message
mksranjani

New User


Joined: 03 Feb 2010
Posts: 29
Location: Chennai

PostPosted: Fri Jun 10, 2011 2:59 pm    Post subject: Reply to: Remove duplicates in a csv file
Reply with quote

hi Enricho, it does work. but is it possible to remove the duplicates without sorting the records.
Back to top
View user's profile Send private message
mksranjani

New User


Joined: 03 Feb 2010
Posts: 29
Location: Chennai

PostPosted: Fri Jun 10, 2011 3:01 pm    Post subject: Reply to: Remove duplicates in a csv file
Reply with quote

hi, i want to match it byte by byte.
Back to top
View user's profile Send private message
enrico-sorichetti

Global Moderator


Joined: 14 Mar 2007
Posts: 10201
Location: italy

PostPosted: Fri Jun 10, 2011 3:17 pm    Post subject: Reply to: Remove duplicates in a csv file
Reply with quote

nothing that You could not have found out Yourself a forum search with DUPLICATE PRESERVE

Code:
****** ***************************** Top of Data ******************************
000001 //ENRICO1  JOB NOTIFY=&SYSUID,
000002 //             MSGLEVEL=(1,1),CLASS=A,MSGCLASS=H
000003 //*
000004 //S1      EXEC PGM=SORT
000005 //SYSPRINT  DD SYSOUT=*
000006 //SYSOUT    DD SYSOUT=*
000007 //SORTIN    DD *
000008 5874639369,B,AAA123456,AAAAAAA,BBBBBB,
000009 5874639369,B,AAA123456,AAAAAAA,BBBBBB,
000010 1234567891,A,123456789,H64398376,,
000011 5874639368,B,AAA123456,AAAAAAA,BBBBBB,
000012 5874639368,B,AAA123456,AAAAAAA,BBBBBB,
000013 //SORTOUT   DD SYSOUT=*,
000014 //             DCB=(RECFM=FB,LRECL=80)
000015 //SYSIN     DD *
000016   OPTION COPY
000017   INREC  IFTHEN=(WHEN=INIT,OVERLAY=(81:SEQNUM,8,ZD,RESTART=(1,80)))
000018   OUTFIL INCLUDE=(81,8,ZD,EQ,1),BUILD=(1,80)
000019 //*
****** **************************** Bottom of Data ****************************


modify accordingly to Your record length,
the above snippet works if the duplicate record are contiguous,
if they are not You might need two/more passes

if You are using DFSORT better wait for Frank or Kolusu for a better suggestion

they work in a different time zone and right now they might be still in bed
icon_biggrin.gif
Back to top
View user's profile Send private message
mksranjani

New User


Joined: 03 Feb 2010
Posts: 29
Location: Chennai

PostPosted: Fri Jun 10, 2011 3:30 pm    Post subject: Reply to: Remove duplicates in a csv file
Reply with quote

thank you enricho.
I tried the above code you gave.

OPTION COPY
INREC IFTHEN=(WHEN=INIT,OVERLAY=(647:SEQNUM,8,ZD,RESTART=(1,646)))
OUTFIL INCLUDE=(647,8,ZD,EQ,1),BUILD=(1,646)

But got the below error
WER271A INREC STATEMENT : NUMERIC FIELD ERROR
Back to top
View user's profile Send private message
enrico-sorichetti

Global Moderator


Joined: 14 Mar 2007
Posts: 10201
Location: italy

PostPosted: Fri Jun 10, 2011 3:36 pm    Post subject: Reply to: Remove duplicates in a csv file
Reply with quote

Code:
WER271A INREC STATEMENT : NUMERIC FIELD ERROR


as said gazillions of times WER messages indicate that You are using SYNCSORT not DFSORT

topic moved where it belongs

Frank as Kolusu are DFSORT developers, You cannot expect Them to provide advice on a competitor' s product

and... by the way my name is Enrico, not Enricho icon_biggrin.gif
Back to top
View user's profile Send private message
mksranjani

New User


Joined: 03 Feb 2010
Posts: 29
Location: Chennai

PostPosted: Fri Jun 10, 2011 3:45 pm    Post subject: Reply to: Remove duplicates in a csv file
Reply with quote

Thank you Enrico.. icon_smile.gif
Back to top
View user's profile Send private message
enrico-sorichetti

Global Moderator


Joined: 14 Mar 2007
Posts: 10201
Location: italy

PostPosted: Fri Jun 10, 2011 6:11 pm    Post subject: Reply to: Remove duplicates in a csv file
Reply with quote

tested with DFSORT, searched the manuals and here is explanation of the

Quote:
ICE111A E REFORMATTING FIELD ERROR


Quote:
The length (m for p,m or FIXLEN=m for %nn) for a RESTART field was greater than 256 bytes.


there should be something similar in Your sysout and manuals
Back to top
View user's profile Send private message
sqlcode1

Active Member


Joined: 08 Apr 2010
Posts: 578
Location: USA

PostPosted: Fri Jun 10, 2011 9:43 pm    Post subject:
Reply with quote

Untest on syncsort but should work...(Tested on DFSort...)
Code:

//SYSIN DD *                                                           
 INREC  IFTHEN=(WHEN=INIT,OVERLAY=(647:SEQNUM,8,ZD,RESTART=(001,256),   
                                   655:SEQNUM,8,ZD,RESTART=(257,256),   
                                   663:SEQNUM,8,ZD,RESTART=(513,134))),
        IFTHEN=(WHEN=(647,8,ZD,EQ,1,OR,655,8,ZD,EQ,1,OR,663,8,ZD,EQ,1),
                          OVERLAY=(671:C'Y'))                           
 SORT FIELDS=COPY                                                       
 OUTFIL INCLUDE=(671,1,CH,EQ,C'Y'),BUILD=(1,646)                       
/*                                                                                                       


Thanks,
Back to top
View user's profile Send private message
enrico-sorichetti

Global Moderator


Joined: 14 Mar 2007
Posts: 10201
Location: italy

PostPosted: Fri Jun 10, 2011 9:54 pm    Post subject: Reply to: Syncsort - Remove duplicates in a csv file
Reply with quote

still there is the question pending...
what if the duplicate records are not contiguous !
Back to top
View user's profile Send private message
sqlcode1

Active Member


Joined: 08 Apr 2010
Posts: 578
Location: USA

PostPosted: Fri Jun 10, 2011 10:33 pm    Post subject:
Reply with quote

mksranjani wrote:
hi Enricho, it does work. but is it possible to remove the duplicates without sorting the records.

I thought OP didn't wanted to sort records.

Correct me if I am wrong but even if records are not contiguous, I think its possible using outrec. It would take 2 pass only if after removing duplicates OP wants to go back to original order.

Thanks,
Back to top
View user's profile Send private message
enrico-sorichetti

Global Moderator


Joined: 14 Mar 2007
Posts: 10201
Location: italy

PostPosted: Fri Jun 10, 2011 10:46 pm    Post subject: Reply to: Syncsort - Remove duplicates in a csv file
Reply with quote

the solutions we posted got rid of the duplicates without altering the original ordering,
but will work only for contiguous duplicates
if the duplicates are not contiguous as I said before I feel that a multi pass approach is needed!

too bad that the TS is using Syncsort,
I guess that we are bound to miss Frank and Kolusu smart solutions!

I searched the forums for non contiguous duplicates keeping the original order and the first occurrence, but I was not able to find anything
Back to top
View user's profile Send private message
View previous topic :: :: View next topic  
Post new topic   Reply to topic    IBMMAINFRAMES.com Support Forums -> JCL & VSAM All times are GMT + 6 Hours
Goto page 1, 2  Next
Page 1 of 2

 

Search our Forum:

Similar Topics
Topic Author Forum Replies Posted
No new posts syncsort: copy lines after the keyword shreya19 SYNCSORT 1 Fri Dec 02, 2016 9:47 am
No new posts Add PD field from 2nd file to PD in 1st Sushant Garje DFSORT/ICETOOL 6 Thu Dec 01, 2016 4:32 pm
No new posts File Aid to File Manager conversion murali3955 IBM Tools 4 Thu Nov 24, 2016 3:41 pm
No new posts Updating the counters after eliminati... PANDU1 DFSORT/ICETOOL 12 Mon Nov 21, 2016 9:47 am
No new posts How to convert the VBM file to VB or... Sulabh Agrawal JCL & VSAM 4 Fri Nov 18, 2016 1:04 pm


Facebook
Back to Top
 
Mainframe Wiki | Forum Rules | Bookmarks | Subscriptions | FAQ | Tutorials | Contact Us