IBM Mainframe Forum Index
 
Log In
 
IBM Mainframe Forum Index Mainframe: Search IBM Mainframe Forum: FAQ Register
 

eliminate duplicates for certain condtions


IBM Mainframe Forums -> DFSORT/ICETOOL
Post new topic   Reply to topic
View previous topic :: View next topic  
Author Message
ayushT

New User


Joined: 01 Oct 2007
Posts: 23
Location: india

PostPosted: Sun Jul 25, 2010 12:51 pm
Reply with quote

Hi,

I have a file containing having two columns. Column 1 has duplicates. I need to filter out records if Column 2 is greater than 999999.

Input file

Column1Coulmn2
abcdefgh123456
abcdefgh123451
abcdefgh1234567
zaqwersd9876546
zaqwersd9876541
zaqwersd987654

Output file
abcdefgh123456
abcdefgh123451
zaqwersd987654
Back to top
View user's profile Send private message
gcicchet

Senior Member


Joined: 28 Jul 2006
Posts: 1702
Location: Australia

PostPosted: Sun Jul 25, 2010 1:34 pm
Reply with quote

Hi,

I'm not sure the significance of duplicate colums, but here is a way of omitting records

Code:
//STEP0100 EXEC PGM=SORT                     
//SYSOUT   DD SYSOUT=*                       
//SORTIN   DD *                             
ABCDEFGH123456                               
ABCDEFGH123451                               
ABCDEFGH1234567                             
ZAQWERSD9876546                             
ZAQWERSD9876541                             
ZAQWERSD987654                               
/*                                           
//SORTOUT  DD SYSOUT=*                       
//SYSIN    DD *                             
  SORT FIELDS=COPY                           
  OMIT COND=(9,7,UFF,GT,999999)             
/*                                           



Gerry
Back to top
View user's profile Send private message
ayushT

New User


Joined: 01 Oct 2007
Posts: 23
Location: india

PostPosted: Mon Jul 26, 2010 7:38 am
Reply with quote

Hi Gerry,

Thank you for your reply. This would work if I wanted to removed all records where column 2 > 999999, but in my case I want to eliminate only if there are records with duplicates on column1. I want to keep a record with column2 > 999999 if there is only one instance of column1.

I think the example I put before might have misled you. Here's the latest

Column1Coulmn2
abcdefgh123456
abcdefgh123451
abcdefgh1234567
zaqwersd9876546
zaqwersd9876541
zaqwersd987654
wqewqrw8765432

Output file
abcdefgh123451
abcdefgh123456
wqewqrw8765432
zaqwersd987654

Thanks,
Ayush
Back to top
View user's profile Send private message
dick scherrer

Moderator Emeritus


Joined: 23 Nov 2006
Posts: 19244
Location: Inside the Matrix

PostPosted: Mon Jul 26, 2010 7:51 am
Reply with quote

Hello,

Suggest you lose the "alphabet soup" and post something more usable/readable using some proper spacing and the "Code" tag. If your actual data is sensitive, make up some obvious values.

Like:
Code:
FRED     12345
FRED     123456
MARY     5656565
MARY     5656565
JIM      9876543
JIM      1234
JIM      98765


Make enough data to show different situations and how you want the output from the new input.
Back to top
View user's profile Send private message
sqlcode1

Active Member


Joined: 08 Apr 2010
Posts: 577
Location: USA

PostPosted: Mon Jul 26, 2010 8:04 am
Reply with quote

Ayush,
What is the LRECL of input file? FB or VB? What are the data types for column1 and column2?

Thanks,
Back to top
View user's profile Send private message
Gabriel Araujo Alves

New User


Joined: 20 Jul 2010
Posts: 38
Location: Brazil

PostPosted: Mon Jul 26, 2010 8:46 pm
Reply with quote

ayushT wrote:
Hi Gerry,

Thank you for your reply. This would work if I wanted to removed all records where column 2 > 999999, but in my case I want to eliminate only if there are records with duplicates on column1. I want to keep a record with column2 > 999999 if there is only one instance of column1.

I think the example I put before might have misled you. Here's the latest

Column1Coulmn2
abcdefgh123456
abcdefgh123451
abcdefgh1234567
zaqwersd9876546
zaqwersd9876541
zaqwersd987654
wqewqrw8765432

Output file
abcdefgh123451
abcdefgh123456
wqewqrw8765432
zaqwersd987654

Thanks,
Ayush



Hi,
first you need separe the duplicated records in conditions of first column, and before this in another step you remove all records(duplicated) with column 2 greater 999999, like gcicchet do. if you don't know how to do this, i can post it here...
Back to top
View user's profile Send private message
ayushT

New User


Joined: 01 Oct 2007
Posts: 23
Location: india

PostPosted: Mon Jul 26, 2010 9:04 pm
Reply with quote

Thanks Gabriel. I thought of the same, but it involves two steps. I wanted to do it in one step.

Posting below a sample input dataset (FB,LRECL=15 bytes) which is sorted by Column1

Column1 - 5 bytes
Filler - 3 bytes
Coulmn2- 7 bytes

INPUT FILE
Code:

----5---9---------
AYUSH   444444
AYUSH   5555555
BECKY   6666666
RICKY   123456
RICKY   1234567
RICKY   987654

OUTPUT FILE
----5---9---------
AYUSH   444444
BECKY   6666666
RICKY   123456
RICKY   987654


Thanks,
Ayush
Back to top
View user's profile Send private message
sqlcode1

Active Member


Joined: 08 Apr 2010
Posts: 577
Location: USA

PostPosted: Mon Jul 26, 2010 9:43 pm
Reply with quote

Ayush,
What if there are only 2 records for the keys and both has field2 values less than 999999. Do you need to keep both the records? If not which one do you need to keep 1st or 2nd?

What is an expected output from below input?

Code:

AYUSH   444444
AYUSH   333333
BECKY   6666666
RICKY   123456
RICKY   1234567
RICKY   987654


Thanks,
Back to top
View user's profile Send private message
Frank Yaeger

DFSORT Developer


Joined: 15 Feb 2005
Posts: 7129
Location: San Jose, CA

PostPosted: Mon Jul 26, 2010 9:45 pm
Reply with quote

Ayush,

Here's a DFSORT/ICETOOL job that will do what you asked for in one step (although NOT in one pass):

Code:

//S1    EXEC  PGM=ICETOOL                                           
//TOOLMSG DD SYSOUT=*                                               
//DFSMSG  DD SYSOUT=*                                               
//IN DD DSN=&&IN,DISP=(OLD,PASS)                                     
//T1 DD DSN=&&T1,UNIT=SYSDA,SPACE=(CYL,(5,5)),DISP=(,PASS)           
//T2 DD DSN=&&T2,UNIT=SYSDA,SPACE=(CYL,(5,5)),DISP=(,PASS)           
//CON DD DSN=*.T1,VOL=REF=*.T1,DISP=(OLD,PASS)                       
//    DD DSN=*.T2,VOL=REF=*.T2,DISP=(OLD,PASS)                       
//OUT DD SYSOUT=*                                                   
//TOOLIN DD *                                                       
SELECT FROM(IN) TO(T1) DISCARD(T2) ON(1,5,CH) ALLDUPS USING(CTL1)   
SORT FROM(CON) TO(OUT) USING(CTL2)                                   
/*
//CTL1CNTL DD *                                                     
  OUTFIL FNAMES=T1,OMIT=(9,7,UFF,GT,999999)                         
/*
//CTL2CNTL DD *                                                     
  OPTION EQUALS                                                     
  SORT FIELDS=(1,5,CH,A)                                             
/*
Back to top
View user's profile Send private message
sqlcode1

Active Member


Joined: 08 Apr 2010
Posts: 577
Location: USA

PostPosted: Mon Jul 26, 2010 10:00 pm
Reply with quote

Frank,
Would below work as a single pass solution?

Code:
//SORT01  EXEC PGM=SORT                                               
//SORTIN   DD *                                                       
AYUSH   444444                                                       
AYUSH   5555555                                                       
BECKY   6666666                                                       
RICKY   123456                                                       
RICKY   1234567                                                       
RICKY   987654                                                       
//SORTOUT  DD SYSOUT=*                                               
//*                                                                   
//SYSIN DD *                                                         
 INREC IFTHEN=(WHEN=INIT,OVERLAY=(21:SEQNUM,8,ZD,RESTART=(1,5),       
                                  30:C'Y')),                         
       IFTHEN=(WHEN=(21,8,ZD,GT,1,AND,9,7,UFF,GT,0999999),           
               OVERLAY=(30:C'N'))                                     
 SORT FIELDS=COPY                                                     
 OUTFIL INCLUDE=(30,1,CH,EQ,C'Y'),BUILD=(1,15)                       
/*                                                                   
//SYSOUT DD SYSOUT=*                                                 
//*



Thanks,
Back to top
View user's profile Send private message
Frank Yaeger

DFSORT Developer


Joined: 15 Feb 2005
Posts: 7129
Location: San Jose, CA

PostPosted: Mon Jul 26, 2010 10:19 pm
Reply with quote

It doesn't appear to work correctly for these input records:

Code:

BILL    6666666       
BILL    6666666       
RICKY   1234567       
RICKY   1234567       
RICKY   1234567       


My job correctly gives no output records for this case, whereas your job incorrectly gives these output records:

Code:

BILL    6666666
RICKY   1234567
Back to top
View user's profile Send private message
sqlcode1

Active Member


Joined: 08 Apr 2010
Posts: 577
Location: USA

PostPosted: Mon Jul 26, 2010 10:40 pm
Reply with quote

Frank,
Thanks for the feedback.

I assumed this based on OP's expected output where he had hinted that he would still want to keep atleast 1 record even if its above 999999.

Code:

----5---9---------
AYUSH   444444
AYUSH   5555555
BECKY   6666666
RICKY   123456
RICKY   1234567
RICKY   987654

OUTPUT FILE
----5---9---------
AYUSH   444444
BECKY   6666666 
RICKY   123456
RICKY   987654


BECKY 6666666 --> This part in expected output led me to believe that OP wanted to keep atleast single record if all the record has field2 above limit.Basically I assumed he wanted to keep atleast single record for each input key regardless of the field2 condition.

Well this was an assumption and I could be wrong.

Thanks,
Back to top
View user's profile Send private message
Frank Yaeger

DFSORT Developer


Joined: 15 Feb 2005
Posts: 7129
Location: San Jose, CA

PostPosted: Mon Jul 26, 2010 11:58 pm
Reply with quote

Quote:
I assumed this based on OP's expected output where he had hinted that he would still want to keep atleast 1 record even if its above 999999.


Quote:
BECKY 6666666 --> This part in expected output led me to believe that OP wanted to keep atleast single record if all the record has field2 above limit.Basically I assumed he wanted to keep atleast single record for each input key regardless of the field2 condition.


I took this to mean he wanted to keep any nonduplicate records regardless of whether they met the < 999999 criteria or not. That's diffferent then always keeping one record of duplicates, which I don't see stated or implied anywhere.

I guess the OP will have to let us know (if he's still interested).
Back to top
View user's profile Send private message
Gabriel Araujo Alves

New User


Joined: 20 Jul 2010
Posts: 38
Location: Brazil

PostPosted: Tue Jul 27, 2010 12:04 am
Reply with quote

WHERE I CAN SEE MORE ABOUT
(WHEN=INIT)/WHEN=GROUP
Back to top
View user's profile Send private message
Frank Yaeger

DFSORT Developer


Joined: 15 Feb 2005
Posts: 7129
Location: San Jose, CA

PostPosted: Tue Jul 27, 2010 12:09 am
Reply with quote

See the discussion of "IFTHEN" under "OUTFIL Control Statements" in z/OS DFSORT Application Programming Guide:

publibz.boulder.ibm.com/cgi-bin/bookmgr_OS390/BOOKS/ICE1CA40/3.13?DT=20090527161936
Back to top
View user's profile Send private message
Gabriel Araujo Alves

New User


Joined: 20 Jul 2010
Posts: 38
Location: Brazil

PostPosted: Tue Jul 27, 2010 12:11 am
Reply with quote

THANK'S FRANK.
Back to top
View user's profile Send private message
ayushT

New User


Joined: 01 Oct 2007
Posts: 23
Location: india

PostPosted: Tue Jul 27, 2010 1:21 am
Reply with quote

I think what Frank put works for me. Thanks Frank!!!
Back to top
View user's profile Send private message
sqlcode1

Active Member


Joined: 08 Apr 2010
Posts: 577
Location: USA

PostPosted: Tue Jul 27, 2010 1:43 am
Reply with quote

ayushT,
Thanks for the clarification on requirement.


Thanks,
Back to top
View user's profile Send private message
ayushT

New User


Joined: 01 Oct 2007
Posts: 23
Location: india

PostPosted: Tue Jul 27, 2010 2:03 am
Reply with quote

Thank you all for your help!!!!
Back to top
View user's profile Send private message
View previous topic :: :: View next topic  
Post new topic   Reply to topic View Bookmarks
All times are GMT + 6 Hours
Forum Index -> DFSORT/ICETOOL

 


Similar Topics
Topic Forum Replies
No new posts How to remove block of duplicates DFSORT/ICETOOL 8
This topic is locked: you cannot edit posts or make replies. Compare files with duplicates in one ... DFSORT/ICETOOL 11
No new posts Merging 2 files but ignore duplicate... DFSORT/ICETOOL 1
No new posts COUNT the number of duplicates DFSORT/ICETOOL 3
This topic is locked: you cannot edit posts or make replies. SUM FIELDS=NONE in reverse - Get dupl... DFSORT/ICETOOL 9
Search our Forums:

Back to Top