View previous topic :: View next topic
|
Author |
Message |
Shrinika Rajendran
New User
Joined: 06 Jun 2013 Posts: 25 Location: India
|
|
|
|
Hi,
Is there a way I can split a file based on a field in the input file (like, write output by the field break)? SPLIT, SPLITR, SPLITBY, STARTREC ENDREC or INCLUDE/OMIT.. None of these help.
Here are my requirements:
1. Splitting to be done based on the field value in 1-3 in the input
2. No of output files --> 4
3. The outputs should be written in round robin fashion
4. Cannot predict the no of input records
5. Cannot split the records of the fields being considered, as in, 'BBB' record cannot come in any other output file.
6. The records with the value 'EEE' in 1-3 position should again be written to the output file 1
Code: |
----+----1--
************
AAA VALUE 1
AAA VALUE 2
AAA VALUE 3
AAA VALUE 4
AAA VALUE 5
BBB VALUE 6
BBB VALUE 7
BBB VALUE 8
BBB VALUE 9
BBB VALUE 10
BBB VALUE 11
CCC VALUE 12
CCC VALUE 13
CCC VALUE 14
CCC VALUE 15
CCC VALUE 16
CCC VALUE 17
DDD VALUE 18
DDD VALUE 19
DDD VALUE 20
EEE VALUE 21
EEE VALUE 22
|
Please let me know for any further details.
Thanks in advance. |
|
Back to top |
|
 |
Abid Hasan
New User
Joined: 25 Mar 2013 Posts: 88 Location: India
|
|
|
|
Hello,
I am not sure what you meant by 'round robin fashion', though a simple OUTFIL INCLUDE can achieve what you are looking for; sample SYSIN:
Code: |
SORT FIELDS=COPY
OUTFIL FNAMES=OUT1,REMOVECC,
INCLUDE=((1,3,CH,EQ,C'AAA'),OR,(1,3,CH,EQ,C'EEE'))
OUTFIL FNAMES=OUT2,REMOVECC,
INCLUDE=(1,3,CH,EQ,C'BBB')
OUTFIL FNAMES=OUT3,REMOVECC,
INCLUDE=(1,3,CH,EQ,C'CCC')
OUTFIL FNAMES=OUT4,REMOVECC,
INCLUDE=(1,3,CH,EQ,C'DDD') |
Gives:
Out1:
Code: |
AAA VALUE 1
AAA VALUE 2
AAA VALUE 3
AAA VALUE 4
AAA VALUE 5
EEE VALUE 21
EEE VALUE 22 |
Out2:
Code: |
BBB VALUE 6
BBB VALUE 7
BBB VALUE 8
BBB VALUE 9
BBB VALUE 10
BBB VALUE 11 |
Out3:
Code: |
CCC VALUE 12
CCC VALUE 13
CCC VALUE 14
CCC VALUE 15
CCC VALUE 16
CCC VALUE 17 |
Out4:
Code: |
DDD VALUE 18
DDD VALUE 19
DDD VALUE 20 |
Hth! |
|
Back to top |
|
 |
Shrinika Rajendran
New User
Joined: 06 Jun 2013 Posts: 25 Location: India
|
|
|
|
Hi,
Thanks for your response.
But I cant use include/omit as I would not know the value of the field. Just that I want the split of the records at its break. |
|
Back to top |
|
 |
Abid Hasan
New User
Joined: 25 Mar 2013 Posts: 88 Location: India
|
|
|
|
Hello,
Shrinika Rajendran wrote: |
But I cant use include/omit as I would not know the value of the field. Just that I want the split of the records at its break. |
In such a scenario, how are you deciding that:
Quote: |
6. The records with the value 'EEE' in 1-3 position should again be written to the output file 1 |
Which would mean you are aware of the values, right? Though another way to achieve this can be by using SECTIONS, you can read about it here. This should help you do the trick. |
|
Back to top |
|
 |
Shrinika Rajendran
New User
Joined: 06 Jun 2013 Posts: 25 Location: India
|
|
|
|
Hi,
It is what I mentioned round-robin for.
After 4 breaks, the 5th type should be written to the output file 1 again.
Please let me know for more details.
Thanks |
|
Back to top |
|
 |
Bill Woodger
Moderator Emeritus
Joined: 09 Mar 2011 Posts: 7309 Location: Inside the Matrix
|
|
|
|
Presuming, since you've posted in the JCL forum, that you have SyncSort:
Extend your record temporarily to include a sequence number with RESTART=(1,3).
Use IFTHEN=(WHEN=GROUP,BEGIN= for your extended value being "one", and PUSH an ID.
ID is a "sequence number" for the group. Take each ID and "normalise" it with arithmetic functions (0-3 or 1-4).
INCLUDE=/OMIT= on the OUTFILs, BUILD to cut the records back down to original. |
|
Back to top |
|
 |
Shrinika Rajendran
New User
Joined: 06 Jun 2013 Posts: 25 Location: India
|
|
|
|
Yes. I already did exactly until applying IDs for each group. But I am not able to proceed further.
1.
Code: |
SORT FIELDS=COPY
OUTREC OVERLAY=(81:SEQNUM,8,ZD,RESTART=(1,3))
|
2.
Code: |
INREC IFTHEN=(WHEN=GROUP,BEGIN=(81,8,CH,EQ,C'00000001'),
PUSH=(89:ID=8))
|
Can you please explain on
Quote: |
Take each ID and "normalise" it with arithmetic functions (0-3 or 1-4).
|
Thanks |
|
Back to top |
|
 |
Bill Woodger
Moderator Emeritus
Joined: 09 Mar 2011 Posts: 7309 Location: Inside the Matrix
|
|
|
|
If you've already tried something, you need to say so up front, saves a to-and-fro.
You need to turn a sequence number into a "cycle" 0-1-2-3 or 1-2-3-4 (or any cycle you choose, in fact).
1-2-3-4-5-6-7-8...
1-2-3-4-1-2-3-4...
If you divide the sequence by four, and look at the remainder, you'll see the cycle. Look in your documentation for what MOD gives you. |
|
Back to top |
|
 |
Shrinika Rajendran
New User
Joined: 06 Jun 2013 Posts: 25 Location: India
|
|
|
|
Hi,
I got exactly what I wanted. Thanks. Here is what I did.
3.
Code: |
SORT FIELDS=COPY
INREC IFTHEN=(WHEN=(89,8,CH,GT,C'00000004'),
OVERLAY=(89:89,8,ZD,MOD,+4,LENGTH=8)),
IFTHEN=(WHEN=NONE,
BUILD=(1,96))
|
4.
Code: |
SORT FIELDS=COPY
OUTFIL FILES=01,INCLUDE=(89,8,CH,SS,C'1'),BUILD=(1,80)
OUTFIL FILES=02,INCLUDE=(89,8,CH,SS,C'2'),BUILD=(1,80)
OUTFIL FILES=03,INCLUDE=(89,8,CH,SS,C'3'),BUILD=(1,80)
OUTFIL FILES=04,INCLUDE=(89,8,CH,SS,C'4'),BUILD=(1,80)
|
Also, I need something else to this. Is it possible to set a threshold on each output file based on the total volume of the input and prevent writing to an output file during round robin if the threshold is crossed?
As in, say, if the total no records is 1600. And output files being 4, the threshold would be 1600/4 = 400.
The unique value in pos 1-3 is 5. And the count for each
1. 1000
2. 200
3. 100
4. 100
5. 200
So during round robin I would want the 5th value with count 200 should not be written to the output file 1. Output 1 (or any output file) could cross the threshold as I should not break the group of values in pos 1-3.
Hope I am clear.
Thanks in advance.
Thanks. |
|
Back to top |
|
 |
Bill Woodger
Moderator Emeritus
Joined: 09 Mar 2011 Posts: 7309 Location: Inside the Matrix
|
|
|
|
No need to do it in two steps (reading the input twice). Why are you labelling those two steps 3. and 4.?
In your next question, if you don't want the 5 to go to the first output file, where do you want it to go? |
|
Back to top |
|
 |
Shrinika Rajendran
New User
Joined: 06 Jun 2013 Posts: 25 Location: India
|
|
|
|
Yes. I agree I do not need 2 steps. And, in my previous post I labelled the steps to apply sequence number and the ID as 1 and 2. And hence labelled these 3 and 4.
I want it to be written to any output file other than 1 (as it already has 1000 records in it). Here, I want it to be written to output file 2, as it has only 200 records and has not crossed the threshold yet. |
|
Back to top |
|
 |
Bill Woodger
Moderator Emeritus
Joined: 09 Mar 2011 Posts: 7309 Location: Inside the Matrix
|
|
|
|
So you'll do all four steps as one later?
Where are you going to get the information on the size of the threshold? |
|
Back to top |
|
 |
Shrinika Rajendran
New User
Joined: 06 Jun 2013 Posts: 25 Location: India
|
|
|
|
I could club the steps rather than re-reading the input file. I am good till writing records in round-robin fashion.
And,
I would calculate the threshold (optimal number of records to each output file for equal distribution of volume) based on the number of input records and the number of output files written.
No of input recs = 1600; output files =4 ;
Threshold = 1600/4 = 400. |
|
Back to top |
|
 |
Bill Woodger
Moderator Emeritus
Joined: 09 Mar 2011 Posts: 7309 Location: Inside the Matrix
|
|
|
|
OK, but what I'm asking is where you are going to get the number of input records from? |
|
Back to top |
|
 |
Shrinika Rajendran
New User
Joined: 06 Jun 2013 Posts: 25 Location: India
|
|
|
|
I can have a count file created prior to this step. Or I can get it from the sysout details of the step that creates this file.
My bad, I should not have said
Quote: |
4. Cannot predict the no of input records
|
|
|
Back to top |
|
 |
|
|