Key in the above file is first four byts of the record. If the above file is split up into 2 files (maximum record out in output file=4), the output should be:
Joined: 09 Mar 2011 Posts: 7309 Location: Inside the Matrix
I think you were asked a little more than that.
How do you determine, from your data, which of up to 99 output datasets gets a particular key written to it.
Your input sample shows unsorted, your output sorted, yes? Or is this the "magic" "dynamic" sort of thing?
Are all datasets to be written to every time the job is run?
Please explain your requirement as fully and clearly as you can. Read through it a few times before posting. Show samples for input and expected out. RECFM/LRECL for input and outputs.
I don't want a particular key to go into a specific output dataset, I only want matching key, to be present with the same dataset ( not to span over multiple datasets).
All the datasets will be written. The output should be sorted, that can be achieved by using sort on the key fields.
As, this is a one time execution, and I want to avoid a COBOL program to do the same. The number of records in output dataset may differ, to a maximum of 5. Consider the following example:
Since the number of records in the input file are 11, so 99 output files will be created with only 3 files holding data and rest will be empty files. The output should be as mentioned below:
Joined: 07 Dec 2007 Posts: 2205 Location: San Jose
VivekKhanna wrote:
I don't want a particular key to go into a specific output dataset, I only want matching key, to be present with the same dataset ( not to span over multiple datasets).
As, this is a one time execution, and I want to avoid a COBOL program to do the same. The number of records in output dataset may differ, to a maximum of 5. Consider the following example:
You are NOT consistent with your rules. If your maximum records in the output file is 5, as per the sample key 1111 has 4 records and 2222 has 3 records, So if you consider 5 records then shouldn't key 2222 also be in the same file 1111?
What happens if you have 8 records for the key value 1111? 8 > 5 , so it should be split into a new file? This will contradict with your earlier rule that the key shouldn't be split across files. So make up your mind about how you want to split the file.
Joined: 09 Mar 2011 Posts: 7309 Location: Inside the Matrix
If you are going to get anything out of this, you're going to have to be accurate and fully detailed in your description of the requirement.
Go through everything you have said.
Give that to a colleague and ask them to sketch out on paper what can be understood from it.
Go through everything you have been asked.
Provide answers for all those and run it by the colleague again.
If the colleague is still unclear, provide them with clarification.
Once complete, post all the answers here.
You're asking for an amount of work to be done, and no-one wants to do it three times because you are lacking in your ability to describe your requirement to others.
The maximum number of records that an output file can hold is 5. Now, I want to split this big file in such a way that all the matching key records should not span across multiple files. If I use a utility like ICEMAN, it will break the above mentioned file in following manner:
We can see that key '2222' has been spanned across two different files, File 1 and File 2. Similarly, key '3333' has been spanned across two different files File 2 and File 3.
The requirement says that the Key should not span across different files. Thus the output should be like:
There is no strict rule that I have to send 5 records in a file, the record count each file can differ (but a file can hold maximum of 5 records). Please let me know, in case, any other information is required.
Joined: 09 Mar 2011 Posts: 7309 Location: Inside the Matrix
So your file isn't really that big?
You show a single along with four of another key in an output file. Is that vital, or can they be in two files?
What if you have more keys than fit in 99 output files?
What if you have more than five of one key?
If you GROUP on the key, with an ID, you could then use 99 OUTFIL INCLUDE specifying the ID value serially.
If you want to be "minimal", you'll have to have a sequence number at the very least as part of the grouping, and since you could have up to five keys going to the same file, the code would extend in complexity.
I need to perform this kind of operation on a file holding about 400-900 million records. And I need to split it into 99 smaller files of holding around 10 million records. I need to make sure that the last record that I am writing in every smaller file should be the last record for that key (For example 1111/2222). Yes, the smaller file can contain different keys.
Joined: 09 Mar 2011 Posts: 7309 Location: Inside the Matrix
OK.... a surprising turn of events from the sample data...
So, you have an input file. You want to write the data to an output file with a maximum number of records in that which cannot exceed 10,000,000. You must not split keys across files. You do this as many times as necessary until your input is exhausted.
Have you given us the real key-length and LRECL?
How many records can you have for the same key?
It would be much easier if the output could be 10-million-and-a-bit, the "bit" being those remaining records of the same key as the 10 millionth.
What are you going to do with the output?
If,on one run, you have "only" 400,000,000, do you want them split evenly across 99 files, or do you want them still in lumps of 10-million-and-a-bit?
There can be a maximum of 50 records for same key.
10 million + some X records is acceptable. (X can vary between 01-50).
Output will be FTPed.
on one run, there can be more then 400,000,000. The target system cannot hold more then 10 million + X records in a single file. So splitting is mandatory.
Joined: 09 Mar 2011 Posts: 7309 Location: Inside the Matrix
An idea, demonstrated with 80-byte fixed-length records.
A sequence number (10 digits) for every record.
A GROUP on the key, which pushes the first four digits of the sequence number (I've fudged this in the example, so it works with groups of 10, not 10,000,000).
OUTFIL to distribute the data based on the first four of the record which was the start of the GROUP.
I put a "SAVE" for any overflow over 99 files (two files in my example).