Portal | Manuals | References | Downloads | Info | Programs | JCLs | Master the Mainframes
IBM Mainframe Computers Forums Index
 
Register
 
IBM Mainframe Computers Forums Index Mainframe: Search IBM Mainframe Forum: FAQ Memberlist Usergroups Profile Log in to check your private messages Log in
 

 

Search for duplicates in the file

 
Post new topic   Reply to topic    IBMMAINFRAMES.com Support Forums -> DFSORT/ICETOOL
View previous topic :: :: View next topic  
Author Message
Andrew Shinkarev

New User


Joined: 10 Jan 2008
Posts: 22
Location: Belarus

PostPosted: Thu Jan 10, 2008 7:36 pm    Post subject: Search for duplicates in the file
Reply with quote

Hi everyone!

There is a file (VB/80) containing the list of names for replacement. The name to be replaced is written in first 20 positions. After semicolon in the next 20 positions there is the "name-to-replace":

Name1;Name2; <could be some comments>
Name3;Name4;
......................
Name1;Name5;
......................
Name3;Name4;
......................

The requirement is to create 2 lists - 1 for full duplicates of both names and 1 for partial duplicates (just the first Name is duplicated). Every list should contain the line number of the duplicated string in the source file. Also any group of duplicates should be separated by the blank line.

I have written the next job:
Code:

//*
//* FULL DUPLICATE LIST                                               
//*
//STEP10  EXEC PGM=ICETOOL                                           
//TOOLMSG  DD SYSOUT=*                                               
//DFSMSG   DD SYSOUT=*                                               
//INP      DD DSN=<source file>,                       
//            DISP=SHR                                               
//TEMP1    DD DSN=&&TMPOUT1,                                         
//            DISP=(NEW,PASS),                                       
//            UNIT=SYSDA,SPACE=(TRK,(100,50),RLSE)                   
//TEMP2    DD DSN=&&TMPOUT2,                                         
//            DISP=(NEW,PASS),                                       
//            UNIT=SYSDA,SPACE=(TRK,(100,50),RLSE)                   
//OUT      DD DSN=<full duplicates list>,               
//            DISP=(NEW,CATLG,DELETE),                               
//            UNIT=SYSDA,SPACE=(TRK,(50,20),RLSE)                     
//TOOLIN   DD *                                                       
  SELECT FROM(INP) TO(TEMP1) ON(5,41,CH) LOWER(2) DISCARD(TEMP2) -   
  USING(CTL1)                                                         
  COPY FROM(TEMP2) USING(CTL2)                                       
/*                                                                   
//CTL1CNTL DD *                                                       
  OPTION VLSCMP,VLSHRT                                               
  INREC OVERLAY=(69:SEQNUM,8,ZD)                                     
/*                                                                   
//CTL2CNTL DD *                                                       
  OUTFIL FNAMES=OUT,                                                 
         SECTIONS=(5,41,SKIP=L)                                       
//*
//* PARTIAL DUPLICATE LIST                                           
//*
//STEP20  EXEC PGM=ICETOOL                                           
//TOOLMSG  DD SYSOUT=*                                               
//DFSMSG   DD SYSOUT=*                                               
//INP      DD DSN=&&TMPOUT1,                                         
//            DISP=(OLD,PASS)                                         
//TEMP     DD DSN=&&TMPOUT3,                                         
//            DISP=(NEW,PASS),                                       
//            UNIT=SYSDA,SPACE=(TRK,(100,50),RLSE)                   
//OUT      DD DSN=<partial duplicates list>,               
//            DISP=(NEW,CATLG,DELETE),                               
//            UNIT=SYSDA,SPACE=(TRK,(100,50),RLSE)                   
//TOOLIN   DD *                                                       
  SELECT FROM(INP) ON(5,20,CH) LOWER(2) DISCARD(TEMP) USING(CTL3)     
  COPY FROM(TEMP) USING(CTL4)                                         
/*                                                                   
//CTL3CNTL DD *                                                       
  OPTION VLSCMP,VLSHRT                                               
  SORT FIELDS=(5,20,CH,A,69,8,CH,A)                                   
/*                                                                   
//CTL4CNTL DD *                                                       
  OUTFIL FNAMES=OUT,                                                 
         SECTIONS=(5,20,SKIP=L)                                       
/*                                                                   
//                                                                   


But there are two questions:

1. Is it possible to combine SELECT and OUTFIL...SECTIONS in ICETOOL(without usage of COPY)?

2. Sometimes some blank lines (at least one) in <partial duplicates list> are omitted:

....................
Name1;Name2;
Name1;Name3; <blank line should be after this one>
Name4;Name5;
Name4;Name6;
......................

Is it caused by any mistake in the job?
Back to top
View user's profile Send private message

Frank Yaeger

DFSORT Moderator


Joined: 15 Feb 2005
Posts: 7130
Location: San Jose, CA

PostPosted: Thu Jan 10, 2008 10:05 pm    Post subject:
Reply with quote

1. Yes.

2. I'm getting a blank line for the data I tried, but I don't know what your data looks like.

I believe you can do what you want more easily with a DFSORT/ICETOOL job like this:

Code:

//STEP10  EXEC PGM=ICETOOL
//* FULL DUPLICATE LIST
//TOOLMSG  DD SYSOUT=*
//DFSMSG   DD SYSOUT=*
//INP      DD DSN=<source file>,
//            DISP=SHR
//TEMP1    DD DSN=&&TMPOUT1,DISP=(,PASS),
//            UNIT=SYSDA,SPACE=(TRK,(100,50),RLSE)
//FULL     DD DSN=<full duplicates list>,
//            DISP=(NEW,CATLG,DELETE),
//            UNIT=SYSDA,SPACE=(TRK,(50,20),RLSE)
//PARTIAL  DD DSN=<partial duplicates list>,
//            DISP=(NEW,CATLG,DELETE),
//            UNIT=SYSDA,SPACE=(TRK,(100,50),RLSE)
//TOOLIN   DD *
SELECT FROM(INP) TO(FULL) ON(5,41,CH) ALLDUPS DISCARD(TEMP1) -
  USING(CTL1)
SELECT FROM(TEMP1) TO(PARTIAL) ON(5,20,CH) ALLDUPS USING(CTL2)
/*
//CTL1CNTL DD *
  INREC OVERLAY=(69:SEQNUM,8,ZD)
  OUTFIL FNAMES=FULL,
         SECTIONS=(5,41,SKIP=L)
/*
//CTL2CNTL DD *
  OUTFIL FNAMES=PARTIAL,
         SECTIONS=(5,20,SKIP=L)
/*


I used the following example records for INP:

Code:

Name1111111111111111;Name2222222222222222;1<comments>
Name1111111111111111;Name3333333333333333;2
Name3333333333333333;Name4444444444444444;3
Name8888888888888888;Name2222222222222222;4<comments>
Name4444444444444444;Name5555555555555555;5
Name4444444444444444;Name6666666666666666;6
Name1111111111111111;Name5555555555555555;7
Name3333333333333333;Name4444444444444444;8
Name8888888888888888;Name2222222222222222;9


The job produced the following records for FULL:

Code:

Name3333333333333333;Name4444444444444444;3                     00000003
Name3333333333333333;Name4444444444444444;8                     00000008

Name8888888888888888;Name2222222222222222;4<comments>           00000004
Name8888888888888888;Name2222222222222222;9                     00000009


The job produced the following records for PARTIAL:

Code:

Name1111111111111111;Name2222222222222222;1<comments>           00000001
Name1111111111111111;Name3333333333333333;2                     00000002
Name1111111111111111;Name5555555555555555;7                     00000007

Name4444444444444444;Name5555555555555555;5                     00000005
Name4444444444444444;Name6666666666666666;6                     00000006


If that's not what you want, then please describe more clearly what you do want using input and output examples.
Back to top
View user's profile Send private message
Andrew Shinkarev

New User


Joined: 10 Jan 2008
Posts: 22
Location: Belarus

PostPosted: Fri Jan 11, 2008 5:28 pm    Post subject: Reply to: Search for duplicates in the file
Reply with quote

Thank you very much, Frank.

I just have added a sorting by line numbers for partial list:

Code:

//CTL2CNTL DD *                   
  SORT FIELDS=(5,20,CH,A,69,8,CH,A)
  OUTFIL FNAMES=PARTIAL,           


and the job works fine.
Back to top
View user's profile Send private message
View previous topic :: :: View next topic  
Post new topic   Reply to topic    IBMMAINFRAMES.com Support Forums -> DFSORT/ICETOOL All times are GMT + 6 Hours
Page 1 of 1

 

Search our Forum:

Similar Topics
Topic Author Forum Replies Posted
No new posts SORTJOIN - Copy Matched and Unmatched... Steve Ironmonger DFSORT/ICETOOL 5 Tue Jan 17, 2017 4:26 pm
No new posts Copy RECFM =VB TO FB file with RECL =... sahil41352 DFSORT/ICETOOL 3 Wed Dec 28, 2016 11:29 pm
No new posts Removing Duplicates based on certain ... chandracdac DFSORT/ICETOOL 8 Fri Dec 09, 2016 4:40 am
No new posts Add PD field from 2nd file to PD in 1st Sushant Garje DFSORT/ICETOOL 6 Thu Dec 01, 2016 4:32 pm
No new posts File Aid to File Manager conversion murali3955 IBM Tools 4 Thu Nov 24, 2016 3:41 pm


Facebook
Back to Top
 
Mainframe Wiki | Forum Rules | Bookmarks | Subscriptions | FAQ | Tutorials | Contact Us