Split to include the all the reocrds for last key

rgupta71 · Posted: Sat Jan 22, 2011 12:04 am

Hi All,

I have written a reporting program.It works fine when the number of records in the input are around 100,000.Our input file has more that 1,300,000 records.There are duplicates records corresponding to a key.Is there any way to split the file so that we can always include all the records for the last key and the next split should start with next key.

For example
Input is
A 1
A 2 ---------> 100,000 records
A 3
B 1
B 2
B 3

When we do a split by 100,000
Output of split should be is
Split 1
A 1
A 2
A 3

Split 2
B 1
B 2
B 3

There is no limit on the number of records that we can have for a particular key.

Can we do it using sort?

Thanks in Advance.

Frank Yaeger · Posted: Sat Jan 22, 2011 2:31 am

It's not at all clear to me what it is you want to do.

A 1
A 2 ---------> 100,000 records
A 3

There appears to be 100002 records for this key, so what does that have to do with splitting by 100000?

How many output files are you expecting? What is the RECFM and LRECL of the input file?

And most importantly, what are the "rules" for splitting the input records to the output files?

You need to do a better job of explaining what you want to do before I can help you.

rgupta71 · Posted: Sat Jan 22, 2011 3:13 am

Hi Frank,

The input file may have up-to 1,300,000 record.So, I need to split them in 13 parts.The problem is that there are duplicates records corresponding to a key.If I directly split them by 100,000 then I might miss some of the data for the last key and it will again included in next file.

RECFM=FB and LRECL = 213.

There is only one Rule :- Splitted file should have all the records for the last key that is on 100,000 .................1,300,000 record and it should not be included in next split.

Frank Yaeger · Posted: Sat Jan 22, 2011 4:25 am

See my last post at:

ibmmainframes.com//viewtopic.php?t=18842

which shows a method for doing what you want which you can adapt.