My requirement is to split some large files (50 million records, lrecl 250) into smaller files of approximately 5 million records. No records are to be discarded.
I've looked at using one of the SPLIT options in DFSORT, but these seem to work based on relative record.
My data consists of groups of records that are related (the grouping could consist of a few records or many hundreds) and the data is in group order. My smaller output files need to keep the groups intact - I can't split the data for a given group across two files.
Is there an DFSORT option that could be used for this?
(I know a simple COBOL program could be written for this purpose, but want to explore the DFSORT route first).
Joined: 09 Mar 2011 Posts: 7309 Location: Inside the Matrix
Have a look at this one. It is a somewhat lengthy thread (which shows the benefits of good answers). If this is not a reasonable fit for your requirement, let us know.
I think this solution would require prior knowledge of the key values? - so they could be added to the DFSORT code?
I don't have that, I just want to split the file after I've reached 5,000,000 records, and I've just reached the end of all the data for one key (group). Which might be an additional 100 or so records but that isn't an issue - keeping the data together for one group is important.
The key is the first 8 bytes of the record (the record is a fixed length 224 bytes).
They key is numeric, so could range from 00000001 to 99999999.
The file will already be sorted in an ascending key order - but they might not be straightforward increments of 1.
e.g. the first 112 records might have a key of 00000006, the next 44 records might have a key of 00000008, the next 97 records might have a key of 00000011 etc.
Joined: 09 Mar 2011 Posts: 7309 Location: Inside the Matrix
So, have a look at this.
Allocates a sequence number, incremented by two (to turn 5m into 10m).
Uses 1,8 to define the GROUP and "PUSHES" the first four of the sequence number of the group-definer to all records of the GROUP. The first four digits of the number represent how many 5-millions (10-millions) there are.
You'll need at least five OUTFILs, deciding whether the "overflow" is to be in the fifth file or you want a seperate, sixth one, for such circumstances.
"Tested" with very small numbers of records (shift the four digits being checked to the right for different volumes).
Joined: 07 Dec 2007 Posts: 2205 Location: San Jose
Bruce Malcolm wrote:
Hi,
My requirement is to split some large files (50 million records, lrecl 250) into smaller files of approximately 5 million records. No records are to be discarded.
I've looked at using one of the SPLIT options in DFSORT, but these seem to work based on relative record.
My data consists of groups of records that are related (the grouping could consist of a few records or many hundreds) and the data is in group order. My smaller output files need to keep the groups intact - I can't split the data for a given group across two files.
Is there an DFSORT option that could be used for this?
(I know a simple COBOL program could be written for this purpose, but want to explore the DFSORT route first).
thanks, Bruce.
Your input is 50 million and you want to have smaller files with 5 million in each file, so that would make 10 files each with 5 million records. If you have more records we account that into another file.
Use the following DFSORT JCL which will give you the desired results.