Hi,
Is there a way I can split a file based on a field in the input file (like, write output by the field break)? SPLIT, SPLITR, SPLITBY, STARTREC ENDREC or INCLUDE/OMIT.. None of these help.
Here are my requirements:
1. Splitting to be done based on the field value in 1-3 in the input
2. No of output files --> 4
3. The outputs should be written in round robin fashion
4. Cannot predict the no of input records
5. Cannot split the records of the fields being considered, as in, 'BBB' record cannot come in any other output file.
6. The records with the value 'EEE' in 1-3 position should again be written to the output file 1
Code:
----+----1--
************
AAA VALUE 1
AAA VALUE 2
AAA VALUE 3
AAA VALUE 4
AAA VALUE 5
BBB VALUE 6
BBB VALUE 7
BBB VALUE 8
BBB VALUE 9
BBB VALUE 10
BBB VALUE 11
CCC VALUE 12
CCC VALUE 13
CCC VALUE 14
CCC VALUE 15
CCC VALUE 16
CCC VALUE 17
DDD VALUE 18
DDD VALUE 19
DDD VALUE 20
EEE VALUE 21
EEE VALUE 22
But I cant use include/omit as I would not know the value of the field. Just that I want the split of the records at its break.
In such a scenario, how are you deciding that:
Quote:
6. The records with the value 'EEE' in 1-3 position should again be written to the output file 1
Which would mean you are aware of the values, right? Though another way to achieve this can be by using SECTIONS, you can read about it here. This should help you do the trick.
Also, I need something else to this. Is it possible to set a threshold on each output file based on the total volume of the input and prevent writing to an output file during round robin if the threshold is crossed?
As in, say, if the total no records is 1600. And output files being 4, the threshold would be 1600/4 = 400.
The unique value in pos 1-3 is 5. And the count for each
1. 1000
2. 200
3. 100
4. 100
5. 200
So during round robin I would want the 5th value with count 200 should not be written to the output file 1. Output 1 (or any output file) could cross the threshold as I should not break the group of values in pos 1-3.
Hope I am clear.
Yes. I agree I do not need 2 steps. And, in my previous post I labelled the steps to apply sequence number and the ID as 1 and 2. And hence labelled these 3 and 4.
I want it to be written to any output file other than 1 (as it already has 1000 records in it). Here, I want it to be written to output file 2, as it has only 200 records and has not crossed the threshold yet.
I could club the steps rather than re-reading the input file. I am good till writing records in round-robin fashion.
And,
I would calculate the threshold (optimal number of records to each output file for equal distribution of volume) based on the number of input records and the number of output files written.
No of input recs = 1600; output files =4 ;
Threshold = 1600/4 = 400.
I can have a count file created prior to this step. Or I can get it from the sysout details of the step that creates this file.
My bad, I should not have said