Split file based on a particular field in the input file

Shrinika Rajendran · New User Joined: 06 Jun 2013 Posts: 25 Location: India

Hi,
Is there a way I can split a file based on a field in the input file (like, write output by the field break)? SPLIT, SPLITR, SPLITBY, STARTREC ENDREC or INCLUDE/OMIT.. None of these help.

Here are my requirements:
1. Splitting to be done based on the field value in 1-3 in the input
2. No of output files --> 4
3. The outputs should be written in round robin fashion
4. Cannot predict the no of input records
5. Cannot split the records of the fields being considered, as in, 'BBB' record cannot come in any other output file.
6. The records with the value 'EEE' in 1-3 position should again be written to the output file 1

Abid Hasan · New User Joined: 25 Mar 2013 Posts: 88 Location: India

Hello,

I am not sure what you meant by 'round robin fashion', though a simple OUTFIL INCLUDE can achieve what you are looking for; sample SYSIN:

Shrinika Rajendran · New User Joined: 06 Jun 2013 Posts: 25 Location: India

Hi,
Thanks for your response.

But I cant use include/omit as I would not know the value of the field. Just that I want the split of the records at its break.

Abid Hasan · New User Joined: 25 Mar 2013 Posts: 88 Location: India

Hello,

Shrinika Rajendran · New User Joined: 06 Jun 2013 Posts: 25 Location: India

Hi,
It is what I mentioned round-robin for.
After 4 breaks, the 5th type should be written to the output file 1 again.

Please let me know for more details.

Thanks

Bill Woodger · Posted: Tue Jun 24, 2014 5:17 pm

Presuming, since you've posted in the JCL forum, that you have SyncSort:

Extend your record temporarily to include a sequence number with RESTART=(1,3).

Use IFTHEN=(WHEN=GROUP,BEGIN= for your extended value being "one", and PUSH an ID.

ID is a "sequence number" for the group. Take each ID and "normalise" it with arithmetic functions (0-3 or 1-4).

INCLUDE=/OMIT= on the OUTFILs, BUILD to cut the records back down to original.

Shrinika Rajendran · New User Joined: 06 Jun 2013 Posts: 25 Location: India

Yes. I already did exactly until applying IDs for each group. But I am not able to proceed further.
1.

Bill Woodger · Posted: Tue Jun 24, 2014 6:41 pm

If you've already tried something, you need to say so up front, saves a to-and-fro.

You need to turn a sequence number into a "cycle" 0-1-2-3 or 1-2-3-4 (or any cycle you choose, in fact).

1-2-3-4-5-6-7-8...

1-2-3-4-1-2-3-4...

If you divide the sequence by four, and look at the remainder, you'll see the cycle. Look in your documentation for what MOD gives you.

Shrinika Rajendran · New User Joined: 06 Jun 2013 Posts: 25 Location: India

Hi,
I got exactly what I wanted. Thanks. Here is what I did.
3.

Bill Woodger · Posted: Thu Jun 26, 2014 12:21 pm

No need to do it in two steps (reading the input twice). Why are you labelling those two steps 3. and 4.?

In your next question, if you don't want the 5 to go to the first output file, where do you want it to go?

Shrinika Rajendran · New User Joined: 06 Jun 2013 Posts: 25 Location: India

Yes. I agree I do not need 2 steps. And, in my previous post I labelled the steps to apply sequence number and the ID as 1 and 2. And hence labelled these 3 and 4.

I want it to be written to any output file other than 1 (as it already has 1000 records in it). Here, I want it to be written to output file 2, as it has only 200 records and has not crossed the threshold yet.

Bill Woodger · Posted: Thu Jun 26, 2014 12:41 pm

So you'll do all four steps as one later?

Where are you going to get the information on the size of the threshold?

Shrinika Rajendran · New User Joined: 06 Jun 2013 Posts: 25 Location: India

I could club the steps rather than re-reading the input file. I am good till writing records in round-robin fashion.

And,
I would calculate the threshold (optimal number of records to each output file for equal distribution of volume) based on the number of input records and the number of output files written.
No of input recs = 1600; output files =4 ;
Threshold = 1600/4 = 400.

Bill Woodger · Posted: Thu Jun 26, 2014 1:52 pm

OK, but what I'm asking is where you are going to get the number of input records from?

Shrinika Rajendran · New User Joined: 06 Jun 2013 Posts: 25 Location: India

I can have a count file created prior to this step. Or I can get it from the sysout details of the step that creates this file.
My bad, I should not have said