IBM Mainframe Forum Index
 
Log In
 
IBM Mainframe Forum Index Mainframe: Search IBM Mainframe Forum: FAQ Register
 

Removing duplicates and getting the counts


IBM Mainframe Forums -> DFSORT/ICETOOL
Post new topic   Reply to topic
View previous topic :: View next topic  
Author Message
Rohit Umarjikar

Global Moderator


Joined: 21 Sep 2010
Posts: 3053
Location: NYC,USA

PostPosted: Mon Aug 06, 2012 8:15 pm
Reply with quote

Hello Team,

Input file-FB-looks like below,


Code:
AA012341AA121ABCD1
AA012341AA211ABCD1
AA010121DS121AA191
BB001141DG431NN431
BB014561GD341HO451


Output file-FB-should look like

Code:
AA232
BB222


Here below considerations took place,
1)first 1-5 positions cosiderd and and unique count is added at 3rd place.
2)Combination of first 1-2 and 7-4 are taken and unique count is added at 4th place.
3)Combination of first 1-2 and 14-4 are taken and unique counts is added to 5th place.

I have tried this using below approach,
1) first I performed 3 steps which would perform to remove duplicated based on above 3 combinations.
2)Again written 3 more steps to get the counts at different postions as per above description.
3) And then written a program which would make my final output file as shown above.

I aggree that it is a very long procedure that I followed, but I would need some suggestion/SORT to optimise this requirement.
Back to top
View user's profile Send private message
Skolusu

Senior Member


Joined: 07 Dec 2007
Posts: 2205
Location: San Jose

PostPosted: Mon Aug 06, 2012 11:14 pm
Reply with quote

Rohit Umarjikar,

May be I am missing something here. How did the Combination of first 1-2 and 7-4 have a count of 3?

AA41AA - 2 record
AA21DS - 1 record

Now if you eliminated the dups aren't having only a count of 2? How did you get 3?
Back to top
View user's profile Send private message
Rohit Umarjikar

Global Moderator


Joined: 21 Sep 2010
Posts: 3053
Location: NYC,USA

PostPosted: Tue Aug 07, 2012 4:53 pm
Reply with quote

Skolusu,
Thanks for correcting me.
Here is the updated one,

Here below considerations took place,
1)first 1-7 positions cosiderd and and unique count is added at 3rd place.
2)Combination of first 1-2 and 9-4 are taken and unique count is added at 4th place.
3)Combination of first 1-2 and 14-4 are taken and unique counts is added to 5th place.
Back to top
View user's profile Send private message
knickraj
Warnings : 1

New User


Joined: 11 Jun 2007
Posts: 50
Location: Euro

PostPosted: Tue Aug 07, 2012 7:12 pm
Reply with quote

you may try the below not tested fully, but hoping it gives you some idea
on how to proceed , it can be further optimised also....and has to be corrected aswell..for duplicates icon_rolleyes.gif

Code:
//SS      EXEC PGM=ICETOOL                                   
 //TOOLMSG DD SYSOUT=*                                       
 //DFSMSG  DD SYSOUT=*                                       
 //IN       DD *                                             
 AA012441AA121ABCD1                                           
 AA012341AA211ABCD1                                           
 AA010121DS121AA191                                           
 BB001141DG431NN431                                           
 BB014561GD341HO451                                           
 BB012561GD341HO451                                           
 BB013561GD341HO451                                           
 //OUT     DD DSN=&&TEMP,DISP=(MOD,PASS,)                     
 //OUT1    DD SYSOUT=*                                       
 //TOOLIN  DD *                                               
 COPY FROM(IN)  TO(OUT) USING(CTL1)                           
 COPY FROM(IN)  TO(OUT) USING(CTL2)                           
 COPY FROM(IN)  TO(OUT) USING(CTL3)                           
 SPLICE FROM(OUT) TO(OUT1) ON(1,2,CH) WITHANY KEEPNODUPS -   
  WITH(7,2) WITH(9,2) WITH(11,2)                             
 //CTL1CNTL DD *                                             
   OPTION COPY                                               
   INREC BUILD=(1,5,80:X)                                     
   SORT FIELDS=(1,5,A)                                       
   SUM FIELDS=NONE                                           
  OUTFIL REMOVECC,NODETAIL,                                   
  SECTIONS=(1,2,TRAILER3=(1,2,X,5:COUNT=(M10,LENGTH=2)))     
 //CTL2CNTL DD *                                             
   OPTION COPY                                               
   INREC BUILD=(1,2,7,4,80:X)                                 
   SORT FIELDS=(1,6,A)                                       
   SUM FIELDS=NONE                                           
  OUTFIL REMOVECC,NODETAIL,                                   
  SECTIONS=(1,2,TRAILER3=(1,2,X,7:COUNT=(M10,LENGTH=2)))     
//CTL3CNTL DD *                                       
  OPTION COPY                                         
  INREC BUILD=(1,2,14,4,80:X)                         
  SORT FIELDS=(1,6,A)                                 
  SUM FIELDS=NONE                                     
 OUTFIL REMOVECC,NODETAIL,                             
 SECTIONS=(1,2,TRAILER3=(1,2,X,9:COUNT=(M10,LENGTH=2)))
Back to top
View user's profile Send private message
dick scherrer

Moderator Emeritus


Joined: 23 Nov 2006
Posts: 19244
Location: Inside the Matrix

PostPosted: Tue Aug 07, 2012 7:46 pm
Reply with quote

Hello,

Please do not post "solutions" that are known to be incomplete or do not work (i.e. have not been tested) . . .

If we had no solid support, this might be ok, but as we have Skolusu (DFSORT Developer) and others, there is no need to just "throw things" onto the topic. All this does is take time and provides little for the TS.
Back to top
View user's profile Send private message
Bill Woodger

Moderator Emeritus


Joined: 09 Mar 2011
Posts: 7309
Location: Inside the Matrix

PostPosted: Tue Aug 07, 2012 8:13 pm
Reply with quote

knickraj,

I know you are keen, but three passes of the file followed by a SPLICE is unlikely to be close to a good solution. Use of SPLICE is rare. Multiple passes only make sense when they can't be avoided.

Having got an idea, don't just post it if you think it needs more work: give it the work, or leave it for another time, please.
Back to top
View user's profile Send private message
Skolusu

Senior Member


Joined: 07 Dec 2007
Posts: 2205
Location: San Jose

PostPosted: Tue Aug 07, 2012 9:08 pm
Reply with quote

Rohit Umarjikar wrote:
Knickraj,

Thanks for your view

I am getting output like,

Code:
 A   3     
 B   4     
 A     3   
 B     4   
 A       3 
 B       4


Rohit Umarjikar,

1. You did not run the JCL as is given by Knickraj. If you had the JCL given by knickraj as is you wud have got

Code:

AA   3 3 3
BB   4 4 4


2. For the sample input you shown and conditions you specified , I have no idea as to how you arrived at the counts as

Code:
AA232
BB222


Can you explain how you got those counts?

3. You show the counts as single byte. What happens if the counts exceed that?


Knickraj,

As Bill mentioned you don't need that many passes to get the desired results. It can be done in a single pass. Any Sort solution should not exceed more than 3 passes (high limit). If you need more than 3 passes of data , you can sit down and write a program to get the results.

Never ever go by the description of how the problem is solved, look at it as "I have this input and I need this output with these conditions" and develop the logic to achieve it. Trust me you would find innovative ideas to solve the same problem when you look at the problem in that manner.
Back to top
View user's profile Send private message
Skolusu

Senior Member


Joined: 07 Dec 2007
Posts: 2205
Location: San Jose

PostPosted: Tue Aug 07, 2012 9:55 pm
Reply with quote

Rohit Umarjikar,

Use the following DFSORT JCL which will give you the desired results.

Code:

//STEP0100 EXEC PGM=ICETOOL                                 
//TOOLMSG  DD SYSOUT=*                                     
//DFSMSG   DD SYSOUT=*                                     
//IN       DD *                                             
AA012341AA121ABCD1                                         
AA012341AA211ABCD1                                         
AA010121DS121AA191                                         
BB001141DG431NN431                                         
BB014561GD341HO451                                         
----+----1----+----2----+----3----+----4----+----5----+----6
//T1       DD DSN=&&T1,DISP=(,PASS),SPACE=(CYL,(1,1),RLSE) 
//OUT      DD SYSOUT=*                                     
//TOOLIN   DD *                                             
  COPY FROM(IN) USING(CTL1)                                 
  SORT FROM(T1) USING(CTL2)                                 
//CTL1CNTL DD *                                             
  OUTFIL FNAMES=T1,                                         
  BUILD=(1,7,C'100',/,                                     
         1,2,9,4,X,C'010',/,                               
         1,2,14,4,X,C'001')                                 
//*                                                         
//CTL2CNTL DD *                                             
  SORT FIELDS=(1,7,CH,A)                                   
  SUM FIELDS=NONE                                           
  OUTFIL FNAMES=OUT,REMOVECC,NODETAIL,BUILD=(80X),         
  SECTIONS=(1,2,                                           
  TRAILER3=(1,2,'|',                                       
            TOT=(08,1,ZD,M11,LENGTH=5),'|',                 
            TOT=(09,1,ZD,M11,LENGTH=5),'|',                 
            TOT=(10,1,ZD,M11,LENGTH=5),'|'))               
//*


The output from this is
Code:

AA|00002|00003|00002|
BB|00002|00002|00002|
Back to top
View user's profile Send private message
Rohit Umarjikar

Global Moderator


Joined: 21 Sep 2010
Posts: 3053
Location: NYC,USA

PostPosted: Wed Aug 08, 2012 2:08 pm
Reply with quote

Skolusu,

Yes it worked.
Thanks a ton for all your effort!!
Back to top
View user's profile Send private message
View previous topic :: :: View next topic  
Post new topic   Reply to topic View Bookmarks
All times are GMT + 6 Hours
Forum Index -> DFSORT/ICETOOL

 


Similar Topics
Topic Forum Replies
No new posts Select two different counts from SQL... DB2 6
No new posts How to remove block of duplicates DFSORT/ICETOOL 8
This topic is locked: you cannot edit posts or make replies. Compare files with duplicates in one ... DFSORT/ICETOOL 11
No new posts Removing date values lines/records fr... SYNCSORT 2
No new posts Merging 2 files but ignore duplicate... DFSORT/ICETOOL 1
Search our Forums:

Back to Top