How to split the records using the amount field

vnktrrd · New User Joined: 12 Jan 2010 Posts: 34 Location: New York

Hi,

I have an amount field in a file and I want to split the records into multiple records 250000 each into one file until the total number of records reaches 10. Even after 10 splits if the amount is greater than zero then write the record into a separate file with the remaining amount.

For ex input file has below data.

Nic Clouston · Posted: Fri Oct 28, 2016 8:09 pm

Is this a one-off for this particular record? If so you would be better off writing a simple program or even doing it manually. If not, will the value to be output on each split always be 250,000? If not, how is that calculated? Suppose you cannot get 10 records from splitting e.g. if your input record was:

vnktrrd · New User Joined: 12 Jan 2010 Posts: 34 Location: New York

Hi Nic,

Yes the split limit is always 250,000. number of splits is based on the total amount. We will have to write to second file only if the amount is greater than 0 after 10 splits. If the amount exhausts after 5 splits, no need to write to second file. If the remaining amount after 10 splits is greater than 25000, we will only write one record for the total amount in the second file. I mean no splits required in the second file.

Thanks,
Ramana.

Arun Raj · Posted: Fri Oct 28, 2016 10:15 pm

Here is one way of achieving this.

Bill Woodger · Posted: Fri Oct 28, 2016 11:10 pm

If you don't have a one-record file, please post representative sample input and output.

vnktrrd · New User Joined: 12 Jan 2010 Posts: 34 Location: New York

Hi Bill,

For example I have give that record. My file contains many of such records with key and amount fields. And amount is not a fixed one. If the amount is greater than 250000 then we should go for split. Else we will write one record and proceed to next. Also, if the amount split goes beyond 10 records then we will write 10 split records in the first file and another record with rest of the amount is written to second file.

Input file

Arun Raj · Posted: Mon Oct 31, 2016 11:22 am

vnktrrd,

My previous post was assuming your input had only one-record. Now that you have multiple records like that the solution above does not hold good.

Arun Raj · Posted: Mon Oct 31, 2016 12:34 pm

This might be of some interest.

vnktrrd · New User Joined: 12 Jan 2010 Posts: 34 Location: New York

Hi Arun,

Looks like this is working. I am still testing it tweaking the inputs.

Can you please explain how its achieved ? That might help me map to my data exactly.

Thanks,
Ramana.

Bill Woodger · Posted: Mon Oct 31, 2016 3:09 pm

Run the first step to SYSOUT rather than a named dataset, then you can see what is happening and work out how it is useful.

vnktrrd · New User Joined: 12 Jan 2010 Posts: 34 Location: New York

Hi Arun,

I have run it and its working fine. Thanks a lot.

All I am confused is the below snippet. Can you please explain how this loop works ?

Arun Raj · Posted: Mon Oct 31, 2016 8:01 pm

vnktrrd,

As Bill has pointed out above, you can look at the SORTOUT in the SYSOUT. If you take a look at that, you will see something like below.

magesh23586 · Posted: Tue Nov 01, 2016 2:57 am

Arun,

We can do this in one step and save resources

Arun Raj · Posted: Tue Nov 01, 2016 6:50 am

magesh23586,

This is a good alternative. But again the number of steps may not always decide if there is any savings or not.

magesh23586 · Posted: Tue Nov 01, 2016 10:46 am

Arun Raj · Posted: Tue Nov 01, 2016 4:52 pm

Bill Woodger · Posted: Tue Nov 01, 2016 5:32 pm

My main concern with this topic is the data. Asked for a representative sample, data was only shown which is exactly divisible by 250000 (despite the example in the first post).

If that second set of sample data is representative, then the task is easy, one step, no ICETOOL.

You note that all the values on the first output file are 250000, and on the first OUTFIL output 1-10 records (using the slash-operator) depending on the amount (if greater than 2.5 million, output 10, 2.25 million, output nine, etc).

On the second OUTFIL, INCLUDE= for GT 2.5 million, and subtract 2.5 million to give the residual value.

If, as I suspect, the numbers are not nice and "round", it doesn't take much to make the final one a calculation on the first OUTFIL (subtract the value "below" the one you are testing for).

With all the references to the same fields and constants, I'd definitely show the solution with symbols.

But then, do I want to prepare and test all that, only to have TS/OP explain further that the description of the data is not quite right?

Lots of IFTHENs, lots of code to create, only two calculations at maximum per record.

For a different requirement, I'd go for RESIZE over two passes of the data, but Arun is correct, for any given solution it is only known to perform better or worse than another with the actual data. Generally I'd expect the RESIZE to work better, but no guarantees, as it does depend on what goes along with it, and can, not so much in this example, depend on the data.

As to "over engineering" there's a certain amount of that in both proposed solutions :-)

All suggested solutions would suffer from an increase to 50 splits, but all the code for all the solutions could be "generated" should such a thing arise.

My specific advice is to wait for/obtain clarity from TS/OP before getting to code...

vnktrrd · New User Joined: 12 Jan 2010 Posts: 34 Location: New York

Hi Arun,

A lot of input for this post and a lot of learning for me.

A small help further for your above code I am using for this purpose.

I have mapped the code to my requirement like below.

Bill Woodger · Posted: Wed Nov 02, 2016 11:55 am

When something goes wrong, it is really, really, helpful to show what goes wrong. "It doesn't work" is useless.

Anyway, you sequence number starts at 112, not 111.

vnktrrd · New User Joined: 12 Jan 2010 Posts: 34 Location: New York

Hi Bill,

The constant 249000 starts at 105 and ends at 110. So the seqnum starts from 111. Please correct if I am wrong.

Output is like this :

vnktrrd · New User Joined: 12 Jan 2010 Posts: 34 Location: New York

I have also tried putting the subtracted value in a separate subsequent position. Its still the same. Subtraction is not working fine here. Please advise what am I am doing wrong.

Bill Woodger · Posted: Wed Nov 02, 2016 2:00 pm

Yes, on a re-count you are correct. I have a presctiption for new glasses, but I threw it away thinking it was an old receipt from the supermarket...

Your "M" simply indicates you have calculated a negative number. Find the value of M in EBCDIC, the "D" there is the sign, and the number-looking thing is the number.

vnktrrd · New User Joined: 12 Jan 2010 Posts: 34 Location: New York

Hi,

I still don't why I am negative number. I am subtracting smaller number from greater number. But I am still getting negative values.

Could someone help me ?

Thanks,
Ramana.

Arun Raj · Posted: Wed Nov 02, 2016 7:16 pm

vnktrrd,

Apart from the negative sign, the number itself does not seem to be what you expected. Maybe the actual value presented to the computation was different from what you have shown.

enrico-sorichetti · Posted: Wed Nov 02, 2016 7:24 pm