IBM Mainframe Forum Index
 
Log In
 
IBM Mainframe Forum Index Mainframe: Search IBM Mainframe Forum: FAQ Register
 

How to remove duplicates using DFSORT or ICETOOL ?


IBM Mainframe Forums -> DFSORT/ICETOOL
Post new topic   Reply to topic
View previous topic :: View next topic  
Author Message
samanthjain

New User


Joined: 25 Jul 2008
Posts: 17
Location: Mumbai

PostPosted: Wed Mar 04, 2009 9:19 am
Reply with quote

Hi,

I have a file of LRECL=100. I want to seperate this file into two output files based on the duplicates in first 10 bytes.

My requirement is

First file should not have any duplicate records
Second file should have 1st record of the duplicate records

For example the input file has 10 records with 1st 10 bytes as AAA1234987.

Based on my condition, this should not be in 1st output file and in 2nd output file i want only 1st record of this duplicate 10 records.

I knew few things like NODUPS,ALLDUPS in ICETOOL and i tried it out but couldnt get the desired output.

Can you please provide me the JCL for this.

Thanks in advance.
Back to top
View user's profile Send private message
dick scherrer

Moderator Emeritus


Joined: 23 Nov 2006
Posts: 19244
Location: Inside the Matrix

PostPosted: Wed Mar 04, 2009 10:15 am
Reply with quote

Hello,

Suggest you post a more complete sample of the input data.

This sample should contain data that will be written in both output files and you should post the 2 sets of output that should be written from the new sample input you post.
Back to top
View user's profile Send private message
samanthjain

New User


Joined: 25 Jul 2008
Posts: 17
Location: Mumbai

PostPosted: Wed Mar 04, 2009 10:36 am
Reply with quote

Hi,


My input file looks like below :

Code:
AAA5060781310060237626T      ü~T    Øbæ2
AAA5061020310060237626T     ã}°T    a  2
AAA1234987310060237626T      ´ÚT    ØØ 2
AAA1234987310060237626T      ´ÚT    ØØ 2
AAA1234987310060237626T      ´ÚT    ØØ 2
AAA1234987310060237626T      ´ÚT    ØØ 2
AAA1234987310060237626T      ´ÚT    ØØ 2
AAA1234987310060237626T      ´ÚT    ØØ 2
AAA5053109310060237626T      q T    Øbæ2
AAA5056740310060237626T      ¹ T    Øa*2
AAA5059454310060237626T      \ÅT    a æ2
AAA1234987310060237626T      ´ÚT    ØØ 2
AAA5059672310060237626T     Óo T    a  2
AAA1234987310060237626T      ´ÚT    ØØ 2
AAA5061373310060320362T     ¶ü T    a <2
AAA5061575310060237626T     6HsT    Øk%2
AAA5062208310060237626T     ô ¨T    a @2

Here I want to move the records into seperate files considerig first 10 bytes.

In above file, the 1st 10 bytes AAA1234987 has repeated several times means its a duplicate record.
So I want in my second output file with only 1st record with key AAA1234987. And rest with same key should be discarded.
And rest of the records with other key in 1st putput file.

Thanks in advance.
Back to top
View user's profile Send private message
dick scherrer

Moderator Emeritus


Joined: 23 Nov 2006
Posts: 19244
Location: Inside the Matrix

PostPosted: Wed Mar 04, 2009 10:57 am
Reply with quote

Hello,

Please post the data for both output files when that posted input data is used. When posting the data (or code, jcl, etc) use the "Code" tag for readability and to preserve alignment (i've coded your data above).

Having the rules as you've posted is very important. It is also important for us to be able to see the actual desired output from a controlled set of input.
Back to top
View user's profile Send private message
samanthjain

New User


Joined: 25 Jul 2008
Posts: 17
Location: Mumbai

PostPosted: Wed Mar 04, 2009 11:09 am
Reply with quote

My input file looks like

Code:
AAA5060781310060237626T      ü~T    Øbæ2
AAA5061020310060237626T     ã}°T    a  2
AAA1234987310060237626T      ´ÚT    ØØ 2
AAA1234987310070237626T      ´ÚT    ØØ 2
AAA1234987310010237626T      ´ÚT    ØØ 2
AAA1234987310042337626T      ´ÚT    ØØ 2
AAA1234987310041237626T      ´ÚT    ØØ 2
AAA1234987310060237626T      ´ÚT    ØØ 2
AAA5053109310060237626T      q T    Øbæ2
AAA5056740310060237626T      ¹ T    Øa*2
AAA5059454310060237626T      \ÅT    a æ2
AAA1234987310011137626T      ´ÚT    ØØ 2
AAA5059672310060237626T     Óo T    a  2
AAA1234987310033327626T      ´ÚT    ØØ 2
AAA5061373310060320362T     ¶ü T    a <2
AAA5061575310060237626T     6HsT    Øk%2
AAA5062208310060237626T     ô ¨T    a @2



My 1st output should be like

Code:
AAA5060781310060237626T      ü~T    Øbæ2
AAA5061020310060237626T     ã}°T    a  2
AAA5053109310060237626T      q T    Øbæ2
AAA5056740310060237626T      ¹ T    Øa*2
AAA5059454310060237626T      \ÅT    a æ2
AAA5059672310060237626T     Óo T    a  2
AAA5061373310060320362T     ¶ü T    a <2
AAA5061575310060237626T     6HsT    Øk%2
AAA5062208310060237626T     ô ¨T    a @2


My secod output should be like


Code:
AAA1234987310060237626T      ´ÚT    ØØ 2


Can you please help me which utility should i use for this and it will be helpful if i also get the JCL ofr this.

Thanks in advance.
Back to top
View user's profile Send private message
Arun Raj

Moderator


Joined: 17 Oct 2006
Posts: 2481
Location: @my desk

PostPosted: Wed Mar 04, 2009 11:13 am
Reply with quote

samanthjain,

From whatever has been posted so far, I think you're trying to write "unique keyed records" into output-1 and the first duplicate of "non unique keyed records" into output-2.

It would be better if you post the expected output for the above input as asked by Dick.
Back to top
View user's profile Send private message
gcicchet

Senior Member


Joined: 28 Jul 2006
Posts: 1702
Location: Australia

PostPosted: Wed Mar 04, 2009 11:16 am
Reply with quote

Hi,

here is one way of doing it
Code:
//S1       EXEC PGM=ICETOOL                                             
//TOOLMSG  DD SYSOUT=*                                                 
//DFSMSG   DD SYSOUT=*                                                 
//IN       DD *                                                         
AAA5060781310060237626T      Ü~T    ØBæ2                               
AAA5061020310060237626T     ã}°T    A  2                               
AAA1234987310060237626T      ´ÚT    ØØ 2                               
AAA1234987310060237626T      ´ÚT    ØØ 2                               
AAA1234987310060237626T      ´ÚT    ØØ 2                               
AAA1234987310060237626T      ´ÚT    ØØ 2                               
AAA1234987310060237626T      ´ÚT    ØØ 2                               
AAA1234987310060237626T      ´ÚT    ØØ 2                               
AAA5053109310060237626T      Q T    ØBæ2                               
AAA5056740310060237626T      ¹ T    ØA*2                               
AAA5059454310060237626T      \ÅT    A æ2                               
AAA1234987310060237626T      ´ÚT    ØØ 2                               
AAA5059672310060237626T     ÓO T    A  2                               
AAA1234987310060237626T      ´ÚT    ØØ 2                               
AAA5061373310060320362T     ¶Ü T    A <2                               
AAA5061575310060237626T     6HST    ØK%2                               
AAA5062208310060237626T     ô ¨T    A @2     
/*                           
//OUT      DD DSN=&&DUPS,                                               
//            DISP=(,PASS),                                     
//            UNIT=SYSDA,                                       
//            SPACE=(TRK,(10,5),RLSE)                           
//NODUP    DD SYSOUT=*                                           
//TOOLIN   DD *                                                 
SELECT FROM(IN) TO(OUT) ON(1,10,CH) ALLDUPS DISCARD(NODUP)
/*     
//S2       EXEC PGM=ICETOOL                                     
//TOOLMSG  DD SYSOUT=*                                           
//DFSMSG   DD SYSOUT=*                                           
//IN       DD DSN=&&DUPS,DISP=SHR                               
//FIRST    DD SYSOUT=*                                           
//TOOLIN   DD *                                                 
SELECT FROM(IN) TO(FIRST) ON(1,10,CH) FIRSTDUP   
/*               




Gerry
Back to top
View user's profile Send private message
samanthjain

New User


Joined: 25 Jul 2008
Posts: 17
Location: Mumbai

PostPosted: Wed Mar 04, 2009 11:17 am
Reply with quote

I already posted my input and expected output. Please let me know if I need to give any more information on this.

Thanks in advance.
Back to top
View user's profile Send private message
samanthjain

New User


Joined: 25 Jul 2008
Posts: 17
Location: Mumbai

PostPosted: Wed Mar 04, 2009 11:20 am
Reply with quote

Hi Gerry,

Thanks for the solution.I tried it usinq two differentr ICETOOL steps in JCL. Is there something we can do in single ICETOOL step.

Thanks in advance.
Back to top
View user's profile Send private message
Arun Raj

Moderator


Joined: 17 Oct 2006
Posts: 2481
Location: @my desk

PostPosted: Wed Mar 04, 2009 11:29 am
Reply with quote

Gerry,

You need only one step to do this.
Code:
//STEP1    EXEC PGM=ICETOOL             
//TOOLMSG  DD SYSOUT=*                 
//DFSMSG   DD SYSOUT=*                 
//IN       DD DSN=Input file
//T1       DD DSN=&&T1,DISP=(,PASS)     
//NODUPS   DD SYSOUT=*                 
//DUPS     DD SYSOUT=*                 
//TOOLIN   DD *                                           
 SELECT FROM(IN) TO(NODUPS) ON(1,10,CH) NODUPS DISCARD(T1)
 SELECT FROM(T1) TO(DUPS)   ON(1,10,CH) FIRST       
Back to top
View user's profile Send private message
samanthjain

New User


Joined: 25 Jul 2008
Posts: 17
Location: Mumbai

PostPosted: Wed Mar 04, 2009 11:49 am
Reply with quote

Yeah its working... Thanks to everyone. . .icon_smile.gif
Back to top
View user's profile Send private message
gcicchet

Senior Member


Joined: 28 Jul 2006
Posts: 1702
Location: Australia

PostPosted: Wed Mar 04, 2009 2:44 pm
Reply with quote

Hi Arun,

too often we seem to be obsessed with a single step concept when in reality there is no time saved.

Actually I can see disadvantages having single steps especially when dealing with large files

If the job failed in the second select statement, restarting the job is messy.

Rerruning the entire job would be a waste, considering the first select statement has ended ok.

Gerry
Back to top
View user's profile Send private message
Arun Raj

Moderator


Joined: 17 Oct 2006
Posts: 2481
Location: @my desk

PostPosted: Wed Mar 04, 2009 4:53 pm
Reply with quote

Hi Gerry,

Might be a worst case situation. I have seen lot of solutions posted here by the DFSORT experts Frank and Kolusu which involves multiple statements in the TOOLIN card.

Here's a recent example where a single input file is processed by 5 COPY statements in the TOOLIN card. I doubt whether it's advisable to go for 5 steps instead of the single-step ICETOOL solution posted by Frank.

ibmmainframes.com/viewtopic.php?t=38468
Back to top
View user's profile Send private message
dick scherrer

Moderator Emeritus


Joined: 23 Nov 2006
Posts: 19244
Location: Inside the Matrix

PostPosted: Wed Mar 04, 2009 9:26 pm
Reply with quote

Hello,

Quote:
I doubt whether it's advisable to go for 5 steps instead of the single-step ICETOOL solution posted by Frank.
For my $.02, multiple steps versus one step might be determined by data volumes.

Lots of the things i do run multiple hundreds of millions of records. This occasionally leads to dasd space adventures. If i use multiple steps, restart is fairly trivial and i do not have to reprocess things that were already successfully done.

Also, i believe it is often easier to read when the soluton is in one set of control statements, but there would be no problem separating things into separate steps for production implementation.

FWIW - if writing a real program would eliminate the need for several passes of some huge amount of data, i would encourage writing the code.
Back to top
View user's profile Send private message
Skolusu

Senior Member


Joined: 07 Dec 2007
Posts: 2205
Location: San Jose

PostPosted: Wed Mar 04, 2009 10:12 pm
Reply with quote

Samanthjain,

You don't really need 2 passes of the data. The following DFSORT JCL will give you the desired results assuming that your input is 80 bytes lrecl and FB recfm

Code:

//STEP0100 EXEC PGM=SORT                   
//SYSOUT   DD SYSOUT=*                     
//SORTIN   DD *                           
AAA5060781310060237626T      Ü~T    ØBæ2   
AAA5061020310060237626T     ã}°T    A  2   
AAA1234987310060237626T      ´ÚT    ØØ 2   
AAA1234987310070237626T      ´ÚT    ØØ 2   
AAA1234987310010237626T      ´ÚT    ØØ 2   
AAA1234987310042337626T      ´ÚT    ØØ 2   
AAA1234987310041237626T      ´ÚT    ØØ 2   
AAA1234987310060237626T      ´ÚT    ØØ 2   
AAA5053109310060237626T      Q T    ØBæ2   
AAA5056740310060237626T      ¹ T    ØA*2   
AAA5059454310060237626T      \ÅT    A æ2   
AAA1234987310011137626T      ´ÚT    ØØ 2   
AAA5059672310060237626T     ÓO T    A  2   
AAA1234987310033327626T      ´ÚT    ØØ 2   
AAA5061373310060320362T     ¶Ü T    A <2   
AAA5061575310060237626T     6HST    ØK%2   
AAA5062208310060237626T     ô ¨T    A @2   
//UNQ      DD SYSOUT=*
//FDUP     DD SYSOUT=*
//SYSIN    DD *                                       
  OPTION EQUALS                                       
  INREC OVERLAY=(81:C'00000001')                       
  SORT FIELDS=(1,10,CH,A)                             
  SUM FIELDS=(81,8,ZD)                                 
  OUTFIL FNAMES=UNQ,INCLUDE=(81,8,ZD,EQ,1),BUILD=(1,80)
  OUTFIL FNAMES=FDUP,SAVE,BUILD=(1,80)                 
/*


If you want to use ICETOOL Select operator use the following JCL

Code:

//STEP0100 EXEC PGM=ICETOOL                                           
//TOOLMSG  DD SYSOUT=*                                               
//DFSMSG   DD SYSOUT=*                                               
//IN       DD *                                                       
AAA5060781310060237626T      Ü~T    ØBæ2                             
AAA5061020310060237626T     ã}°T    A  2                             
AAA1234987310060237626T      ´ÚT    ØØ 2                             
AAA1234987310070237626T      ´ÚT    ØØ 2                             
AAA1234987310010237626T      ´ÚT    ØØ 2                             
AAA1234987310042337626T      ´ÚT    ØØ 2                             
AAA1234987310041237626T      ´ÚT    ØØ 2                             
AAA1234987310060237626T      ´ÚT    ØØ 2                             
AAA5053109310060237626T      Q T    ØBæ2                             
AAA5056740310060237626T      ¹ T    ØA*2                             
AAA5059454310060237626T      \ÅT    A æ2                             
AAA1234987310011137626T      ´ÚT    ØØ 2                             
AAA5059672310060237626T     ÓO T    A  2                             
AAA1234987310033327626T      ´ÚT    ØØ 2                             
AAA5061373310060320362T     ¶Ü T    A <2                             
AAA5061575310060237626T     6HST    ØK%2                             
AAA5062208310060237626T     ô ¨T    A @2
//UNQ      DD SYSOUT=*
//FDUP     DD SYSOUT=*
//TOOLIN   DD *                                                       
  SELECT FROM(IN) TO(UNQ) ON(1,10,CH) NODUPS DISCARD(FDUP) USING(CTL1)
//CTL1CNTL DD *                                                       
  OUTFIL FNAMES=UNQ                                                   
  OUTFIL FNAMES=FDUP,NODETAIL,REMOVECC,                               
  SECTIONS=(1,10,HEADER3=(1,80))                                     
/*
Back to top
View user's profile Send private message
View previous topic :: :: View next topic  
Post new topic   Reply to topic View Bookmarks
All times are GMT + 6 Hours
Forum Index -> DFSORT/ICETOOL

 


Similar Topics
Topic Forum Replies
No new posts Modifying Date Format Using DFSORT DFSORT/ICETOOL 9
No new posts Sortjoin and Search for a String and ... DFSORT/ICETOOL 1
No new posts Replace Multiple Field values to Othe... DFSORT/ICETOOL 12
No new posts Calling DFSORT from Cobol, using OUTF... DFSORT/ICETOOL 5
No new posts DFsort help with SUM() DFSORT/ICETOOL 12
Search our Forums:

Back to Top