I have an amount field in a file and I want to split the records into multiple records 250000 each into one file until the total number of records reaches 10. Even after 10 splits if the amount is greater than zero then write the record into a separate file with the remaining amount.
Joined: 10 May 2007 Posts: 2455 Location: Hampshire, UK
Is this a one-off for this particular record? If so you would be better off writing a simple program or even doing it manually. If not, will the value to be output on each split always be 250,000? If not, how is that calculated? Suppose you cannot get 10 records from splitting e.g. if your input record was:
you would only get 5 splits and nothing in your second dataset (not file).
Yes the split limit is always 250,000. number of splits is based on the total amount. We will have to write to second file only if the amount is greater than 0 after 10 splits. If the amount exhausts after 5 splits, no need to write to second file. If the remaining amount after 10 splits is greater than 25000, we will only write one record for the total amount in the second file. I mean no splits required in the second file.
EDIT : Apparently my tests were not exhaustive enough. Thanks Bill for notifying me, I have edited my cards above. This assumes original amount to be split >= the individual split amount (250,000 in this case)
For example I have give that record. My file contains many of such records with key and amount fields. And amount is not a fixed one. If the amount is greater than 250000 then we should go for split. Else we will write one record and proceed to next. Also, if the amount split goes beyond 10 records then we will write 10 split records in the first file and another record with rest of the amount is written to second file.
Each record is repeated 10 times. And an offset to be subtracted from the original amount is created in incremental amounts (column-4 above). I have used the same column to store the subtraction results. So before subtraction, column 4 will look like this
So in the example above, we need all records to output-1 as long as the subtraction result is non-negative. Plus if your subtraction result is less than the incremental amount (250000), we are writing the leftover as an additional record. Because it cannot be split further.
We need to write output-2 only when the last record computation result is non-negative. ie, when we have a valid left over amount even after the 10th iteration.
Joined: 09 Mar 2011 Posts: 7311 Location: Inside the Matrix
My main concern with this topic is the data. Asked for a representative sample, data was only shown which is exactly divisible by 250000 (despite the example in the first post).
If that second set of sample data is representative, then the task is easy, one step, no ICETOOL.
You note that all the values on the first output file are 250000, and on the first OUTFIL output 1-10 records (using the slash-operator) depending on the amount (if greater than 2.5 million, output 10, 2.25 million, output nine, etc).
On the second OUTFIL, INCLUDE= for GT 2.5 million, and subtract 2.5 million to give the residual value.
If, as I suspect, the numbers are not nice and "round", it doesn't take much to make the final one a calculation on the first OUTFIL (subtract the value "below" the one you are testing for).
With all the references to the same fields and constants, I'd definitely show the solution with symbols.
But then, do I want to prepare and test all that, only to have TS/OP explain further that the description of the data is not quite right?
Lots of IFTHENs, lots of code to create, only two calculations at maximum per record.
For a different requirement, I'd go for RESIZE over two passes of the data, but Arun is correct, for any given solution it is only known to perform better or worse than another with the actual data. Generally I'd expect the RESIZE to work better, but no guarantees, as it does depend on what goes along with it, and can, not so much in this example, depend on the data.
As to "over engineering" there's a certain amount of that in both proposed solutions :-)
All suggested solutions would suffer from an increase to 50 splits, but all the code for all the solutions could be "generated" should such a thing arise.
My specific advice is to wait for/obtain clarity from TS/OP before getting to code...
Here I am overlaying the input record starting from 105 position and my amount field in the input file resides in 28th position in PD format with a length of 7. And, I want the subtracted value in ZD format.
After execution I am not getting the subtracted value correctly. Please suggest if I am doing something wrong.