IBM Mainframe Forum Index
 
Log In
 
IBM Mainframe Forum Index Mainframe: Search IBM Mainframe Forum: FAQ Register
 

Flagging duplicates


IBM Mainframe Forums -> DFSORT/ICETOOL
Post new topic   Reply to topic
View previous topic :: View next topic  
Author Message
mfarien

New User


Joined: 02 Mar 2007
Posts: 17
Location: USA

PostPosted: Thu Dec 06, 2007 9:09 pm
Reply with quote

I want to sort a file on a key SORT FIELDS=(1,5,CH,A) . I want to sort, keep all the records, I just want to flag the duplicates. Say my input file is of length 5 . I want output file of length six and mark all the records those are duplicate with a flag Y.
Example

abcde
abcde
qqqqq
rrrrrr
qqqqq
ppppp


I want my o/p file as

abcdey
abcdey
ppppp
rrrrrr
qqqqqy
qqqqqy
Back to top
View user's profile Send private message
Frank Yaeger

DFSORT Developer


Joined: 15 Feb 2005
Posts: 7129
Location: San Jose, CA

PostPosted: Thu Dec 06, 2007 10:33 pm
Reply with quote

Here's a DFSORT/ICETOOL job that will do what you asked for:

Code:

//S1    EXEC  PGM=ICETOOL
//TOOLMSG DD SYSOUT=*
//DFSMSG  DD SYSOUT=*
//IN DD DSN=...  input file (FB/5)
//T1 DD DSN=&&T1,UNIT=SYSDA,SPACE=(CYL,(5,5)),DISP=(,PASS)
//OUT DD DSN=...  output file (FB/6)
//TOOLIN DD *
SORT FROM(IN) TO(T1) USING(CTL1)
SPLICE FROM(T1) TO(OUT) ON(1,5,CH) KEEPBASE KEEPNODUPS -
  WITHALL WITH(1,5) USING(CTL2)
/*
//CTL1CNTL DD *
  SORT FIELDS=(1,5,CH,A)
  OUTREC OVERLAY=(6:SEQNUM,8,ZD,RESTART=(1,5))
/*
//CTL2CNTL DD *
  SORT FIELDS=(1,5,CH,A,6,8,ZD,D)
  OUTFIL FNAMES=OUT,
    IFTHEN=(WHEN=(6,8,ZD,GT,+1),BUILD=(1,5,C'y')),
    IFTHEN=(WHEN=NONE,BUILD=(1,5,X))
/*
Back to top
View user's profile Send private message
mfarien

New User


Joined: 02 Mar 2007
Posts: 17
Location: USA

PostPosted: Fri Dec 07, 2007 12:11 am
Reply with quote

Thanks Frank,

Just to understand it more, for the same condition, where I want all the i/p records in o/p file, with duplicates flagged, how I am going to code , if my input file length is 100, output is 101,sort fileds are 1,15 and flag should be at position 101.

I am trying to understand the control cards -

OUTREC OVERLAY=(6:SEQNUM,8,ZD,RESTART=(1,5)) .

SORT FIELDS=(1,5,CH,A,6,8,ZD,D)
OUTFIL FNAMES=OUT,
IFTHEN=(WHEN=(6,8,ZD,GT,+1),BUILD=(1,5,C'y')),
IFTHEN=(WHEN=NONE,BUILD=(1,5,X))
Back to top
View user's profile Send private message
Frank Yaeger

DFSORT Developer


Joined: 15 Feb 2005
Posts: 7129
Location: San Jose, CA

PostPosted: Fri Dec 07, 2007 12:42 am
Reply with quote

I assumed that you only had the key in each record as shown in your original example, so you didn't care about the order of the records with the same key for output. If you have other fields in the record and do care about the order of the records with the same key for output, then we'd need to do it a different way.

Let's start over. Show me a better example of your input records and expected output records with the other fields in the record besides the key so I can see what you really want.
Back to top
View user's profile Send private message
mfarien

New User


Joined: 02 Mar 2007
Posts: 17
Location: USA

PostPosted: Fri Dec 07, 2007 12:57 am
Reply with quote

OK. Got it.

Let me restart it. I have one i/p file, LRECL=100. My sort key is 15 chars.
rest all won't matter to me. I have already sorted the file with the key. Now say, I have 100 records in my sorted i/p file, with 20 duplicates, means total 80 unique records and 20 with duplicates. I want the output file LRECL=100 with a flag in each duplicate record at position 101 so in my cobol program, I know that it's a duplicate and I can process it accordingly checking for the flag. So I will have 100 in my o/p file and have 20 with flags and 80 without any flags.

The '.............' in the example are fields with 9's , A's and X's I want those as it is and they have nothing to do with sort or duplicates.

Example.
105682004709136.......................... < 100>
105682004709136.......................... < 100>
105682025446815.......................... < 100>
105682093745261.......................... < 100>
105682093745261.......................... < 100>
105682095668485.......................... < 100>

I want my o/p file as
105682004709136.......................... < 100>Y
105682004709136.......................... < 100>Y
105682025446815.......................... < 100>
105682093745261.......................... < 100>Y
105682093745261.......................... < 100>Y
105682095668485.......................... < 100>
Back to top
View user's profile Send private message
Frank Yaeger

DFSORT Developer


Joined: 15 Feb 2005
Posts: 7129
Location: San Jose, CA

PostPosted: Fri Dec 07, 2007 2:04 am
Reply with quote

I'm not sure what the answer to my previous question is so I'll ask it more directly:

Let's say your input is:

Code:

105682004709136.R01...................... < 100>
105682004709136.R02...................... < 100>
105682004709136.R03...................... < 100>
105682025446815.R04...................... < 100>
105682093745261.R05...................... < 100>
105682093745261.R06...................... < 100>
105682095668485.R07...................... < 100>


Can the output have the records with the same keys in any order, e.g. (R03, R02, R01 for the first key):

Code:

105682004709136.R03...................... < 100>Y
105682004709136.R02...................... < 100>Y
105682004709136.R01...................... < 100>Y
...


Or must the output have the same keys in their original order, e.g. R01, R02, R03 for the first key):

Code:

105682004709136.R01...................... < 100>Y
105682004709136.R02...................... < 100>Y
105682004709136.R03...................... < 100>Y
...
Back to top
View user's profile Send private message
mfarien

New User


Joined: 02 Mar 2007
Posts: 17
Location: USA

PostPosted: Fri Dec 07, 2007 2:21 am
Reply with quote

Those could be in any order.
Back to top
View user's profile Send private message
dick scherrer

Moderator Emeritus


Joined: 23 Nov 2006
Posts: 19244
Location: Inside the Matrix

PostPosted: Fri Dec 07, 2007 2:36 am
Reply with quote

Hello,

Sorry to "charge in", but i have to ask. . .

Is there some reason processing the data 3 times is better than adding the bit of code needed to handle duplicates in the COBOL program (which requires only 1 pass of the data)?

Hopefully, there is something i am misunderstanding. . . icon_confused.gif
Back to top
View user's profile Send private message
mfarien

New User


Joined: 02 Mar 2007
Posts: 17
Location: USA

PostPosted: Fri Dec 07, 2007 2:44 am
Reply with quote

If I have duplicates, those need to be reported and I need to add up the sum of amounts. I can not ignore, it is not to delete or omit duplicates, but to flag the duplicates and then use the file with the duplicates for calculating some amounts and also to put those in reporting.
There may be same key, but the other fields could be different. Means a same key, under different department, and getting some benefits. So need to know what all benefits key has received under the different department and to update the departments of duplicate keys in the cobol reports !
So I will have a good file ready with the duplicates flagged. I did write a program to do this, but what can be done in JCL for 100,000's of records will take time in Cobol.
I hope it explains !
Back to top
View user's profile Send private message
dick scherrer

Moderator Emeritus


Joined: 23 Nov 2006
Posts: 19244
Location: Inside the Matrix

PostPosted: Fri Dec 07, 2007 3:04 am
Reply with quote

Hello,

I believe that i understand what you need to do.

I also believe that proper coding would allow you to process the 100,000s of data only one time rather than the 3 times this approach will require.

The data you show is already in sequence, so that is not an issue.
Back to top
View user's profile Send private message
Frank Yaeger

DFSORT Developer


Joined: 15 Feb 2005
Posts: 7129
Location: San Jose, CA

PostPosted: Fri Dec 07, 2007 3:28 am
Reply with quote

mfarien,

Here's an updated DFSORT/ICETOOL job for your "new" requirement.

Code:

//S1    EXEC  PGM=ICETOOL
//TOOLMSG DD SYSOUT=*
//DFSMSG  DD SYSOUT=*
//IN DD DSN=...  input file (FB/100)
//T1 DD DSN=&&T1,UNIT=SYSDA,SPACE=(CYL,(5,5)),DISP=(,PASS)
//OUT DD DSN=...  output file (FB/101)
//TOOLIN DD *
SORT FROM(IN) TO(T1) USING(CTL1)
SPLICE FROM(T1) TO(OUT) ON(1,15,CH) KEEPBASE KEEPNODUPS -
  WITHALL WITH(1,100) USING(CTL2)
/*
//CTL1CNTL DD *
  SORT FIELDS=(1,15,CH,A)
  OUTREC OVERLAY=(102:SEQNUM,8,ZD,RESTART=(1,15))
/*
//CTL2CNTL DD *
  SORT FIELDS=(1,15,CH,A,102,8,ZD,D)
  OUTFIL FNAMES=OUT,
    IFTHEN=(WHEN=(102,8,ZD,GT,+1),BUILD=(1,100,C'Y')),
    IFTHEN=(WHEN=NONE,BUILD=(1,100,X))
/*
Back to top
View user's profile Send private message
mfarien

New User


Joined: 02 Mar 2007
Posts: 17
Location: USA

PostPosted: Fri Dec 07, 2007 8:29 pm
Reply with quote

Thanks Frank it worked well and I have learned something new and good for my day to day use.
Back to top
View user's profile Send private message
dick scherrer

Moderator Emeritus


Joined: 23 Nov 2006
Posts: 19244
Location: Inside the Matrix

PostPosted: Fri Dec 07, 2007 9:40 pm
Reply with quote

Hello,

Well, you have learned something new. . .

For your requirement it is very likely not good and should surely not be used day to day.

Maybe someday you will also learn that it is nearly never a good decision to read all of the data, write all of the data, and read it all again when a single read would be sufficient.
Back to top
View user's profile Send private message
mfarien

New User


Joined: 02 Mar 2007
Posts: 17
Location: USA

PostPosted: Fri Dec 07, 2007 10:45 pm
Reply with quote

I am not processing it so many times as you have understood.
Here a raw file is sorted and flagged for duplicates in JCL and later on used for processing in a Cobol.
What would be best way to do it in a single read ?
( When I do need an to process duplicates in reports and transaction files differently in cobol processing)
Back to top
View user's profile Send private message
dick scherrer

Moderator Emeritus


Joined: 23 Nov 2006
Posts: 19244
Location: Inside the Matrix

PostPosted: Fri Dec 07, 2007 11:22 pm
Reply with quote

Hello,

Quote:
I am not processing it so many times as you have understood.
The data you posted as the "input" is already in sequence - which would lead to "extra" processing. If the "real" data will not be in sequence, sorting it will not be extra overhead - it would be needed icon_smile.gif

Quote:
(When I do need an to process duplicates in reports and transaction files differently in cobol processing)
Please clarify this - i do not understand. . . icon_confused.gif
Back to top
View user's profile Send private message
mfarien

New User


Joined: 02 Mar 2007
Posts: 17
Location: USA

PostPosted: Sat Dec 08, 2007 12:47 am
Reply with quote

Yes, that's what we were discussing. If I am using the above ICETOOL step I am not going to have a sort step above it. I will remove my sort and use this. So sort and flag addition in 1 step only. I already mentioned in my post in which I gave the data 'I have already sorted the file with the key'. ....
Hope now we are on same page .. icon_smile.gif
Back to top
View user's profile Send private message
dick scherrer

Moderator Emeritus


Joined: 23 Nov 2006
Posts: 19244
Location: Inside the Matrix

PostPosted: Sat Dec 08, 2007 1:14 am
Reply with quote

Yup, i believe we are.

Good luck icon_smile.gif

d
Back to top
View user's profile Send private message
View previous topic :: :: View next topic  
Post new topic   Reply to topic View Bookmarks
All times are GMT + 6 Hours
Forum Index -> DFSORT/ICETOOL

 


Similar Topics
Topic Forum Replies
No new posts How to remove block of duplicates DFSORT/ICETOOL 8
This topic is locked: you cannot edit posts or make replies. Compare files with duplicates in one ... DFSORT/ICETOOL 11
No new posts Merging 2 files but ignore duplicate... DFSORT/ICETOOL 1
No new posts COUNT the number of duplicates DFSORT/ICETOOL 3
This topic is locked: you cannot edit posts or make replies. SUM FIELDS=NONE in reverse - Get dupl... DFSORT/ICETOOL 9
Search our Forums:

Back to Top