I don't want a particular key to go into a specific output dataset, I only want matching key, to be present with the same dataset ( not to span over multiple datasets).
As, this is a one time execution, and I want to avoid a COBOL program to do the same. The number of records in output dataset may differ, to a maximum of 5. Consider the following example:
You are NOT consistent with your rules. If your maximum records in the output file is 5, as per the sample key 1111 has 4 records and 2222 has 3 records, So if you consider 5 records then shouldn't key 2222 also be in the same file 1111?
What happens if you have 8 records for the key value 1111? 8 > 5 , so it should be split into a new file? This will contradict with your earlier rule that the key shouldn't be split across files. So make up your mind about how you want to split the file.
The maximum number of records that an output file can hold is 5. Now, I want to split this big file in such a way that all the matching key records should not span across multiple files. If I use a utility like ICEMAN, it will break the above mentioned file in following manner:
There is no strict rule that I have to send 5 records in a file, the record count each file can differ (but a file can hold maximum of 5 records). Please let me know, in case, any other information is required.
You show a single along with four of another key in an output file. Is that vital, or can they be in two files?
What if you have more keys than fit in 99 output files?
What if you have more than five of one key?
If you GROUP on the key, with an ID, you could then use 99 OUTFIL INCLUDE specifying the ID value serially.
If you want to be "minimal", you'll have to have a sequence number at the very least as part of the grouping, and since you could have up to five keys going to the same file, the code would extend in complexity.
I need to perform this kind of operation on a file holding about 400-900 million records. And I need to split it into 99 smaller files of holding around 10 million records. I need to make sure that the last record that I am writing in every smaller file should be the last record for that key (For example 1111/2222). Yes, the smaller file can contain different keys.
OK.... a surprising turn of events from the sample data...
So, you have an input file. You want to write the data to an output file with a maximum number of records in that which cannot exceed 10,000,000. You must not split keys across files. You do this as many times as necessary until your input is exhausted.
Have you given us the real key-length and LRECL?
How many records can you have for the same key?
It would be much easier if the output could be 10-million-and-a-bit, the "bit" being those remaining records of the same key as the 10 millionth.
What are you going to do with the output?
If,on one run, you have "only" 400,000,000, do you want them split evenly across 99 files, or do you want them still in lumps of 10-million-and-a-bit?