DFSORT - splitting a large file by groups

Bruce Malcolm · New User Joined: 13 Dec 2012 Posts: 3 Location: UK

Hi,

My requirement is to split some large files (50 million records, lrecl 250) into smaller files of approximately 5 million records. No records are to be discarded.
I've looked at using one of the SPLIT options in DFSORT, but these seem to work based on relative record.
My data consists of groups of records that are related (the grouping could consist of a few records or many hundreds) and the data is in group order. My smaller output files need to keep the groups intact - I can't split the data for a given group across two files.
Is there an DFSORT option that could be used for this?
(I know a simple COBOL program could be written for this purpose, but want to explore the DFSORT route first).

thanks, Bruce.

Bill Woodger · Posted: Wed Dec 19, 2012 5:02 pm

Yes, it can be done, with code. There is an example here, if I can find it.

Bill Woodger · Posted: Wed Dec 19, 2012 5:24 pm

Have a look at this one. It is a somewhat lengthy thread (which shows the benefits of good answers). If this is not a reasonable fit for your requirement, let us know.

Bruce Malcolm · New User Joined: 13 Dec 2012 Posts: 3 Location: UK

Thanks for that reply Bill.

I think this solution would require prior knowledge of the key values? - so they could be added to the DFSORT code?

I don't have that, I just want to split the file after I've reached 5,000,000 records, and I've just reached the end of all the data for one key (group). Which might be an additional 100 or so records but that isn't an issue - keeping the data together for one group is important.

thanks, Bruce.

Bill Woodger · Posted: Wed Dec 19, 2012 8:08 pm

The "key values" are whatever you are using to define the group.

If you don't have prior knowledge of what defines a group, then it is going to be tricky doing anything other than a simple split, so I'm confused.

Can you post some sample data demonstrating the "grouping" required?

Bruce Malcolm · New User Joined: 13 Dec 2012 Posts: 3 Location: UK

The key is the first 8 bytes of the record (the record is a fixed length 224 bytes).
They key is numeric, so could range from 00000001 to 99999999.
The file will already be sorted in an ascending key order - but they might not be straightforward increments of 1.
e.g. the first 112 records might have a key of 00000006, the next 44 records might have a key of 00000008, the next 97 records might have a key of 00000011 etc.

thanks, Bruce.

Bill Woodger · Posted: Wed Dec 19, 2012 9:40 pm

So, have a look at this.

Allocates a sequence number, incremented by two (to turn 5m into 10m).

Uses 1,8 to define the GROUP and "PUSHES" the first four of the sequence number of the group-definer to all records of the GROUP. The first four digits of the number represent how many 5-millions (10-millions) there are.

You'll need at least five OUTFILs, deciding whether the "overflow" is to be in the fifth file or you want a seperate, sixth one, for such circumstances.

"Tested" with very small numbers of records (shift the four digits being checked to the right for different volumes).

Skolusu · Posted: Wed Dec 19, 2012 11:20 pm