IBM Mainframe Forum Index
 
Log In
 
IBM Mainframe Forum Index Mainframe: Search IBM Mainframe Forum: FAQ Register
 

Split file based on a particular field in the input file


IBM Mainframe Forums -> JCL & VSAM
Post new topic   Reply to topic
View previous topic :: View next topic  
Author Message
Shrinika Rajendran

New User


Joined: 06 Jun 2013
Posts: 25
Location: India

PostPosted: Tue Jun 24, 2014 12:09 pm
Reply with quote

Hi,
Is there a way I can split a file based on a field in the input file (like, write output by the field break)? SPLIT, SPLITR, SPLITBY, STARTREC ENDREC or INCLUDE/OMIT.. None of these help.

Here are my requirements:
1. Splitting to be done based on the field value in 1-3 in the input
2. No of output files --> 4
3. The outputs should be written in round robin fashion
4. Cannot predict the no of input records
5. Cannot split the records of the fields being considered, as in, 'BBB' record cannot come in any other output file.
6. The records with the value 'EEE' in 1-3 position should again be written to the output file 1

Code:

----+----1--
************
AAA VALUE 1
AAA VALUE 2
AAA VALUE 3
AAA VALUE 4
AAA VALUE 5
BBB VALUE 6
BBB VALUE 7
BBB VALUE 8
BBB VALUE 9
BBB VALUE 10
BBB VALUE 11
CCC VALUE 12
CCC VALUE 13
CCC VALUE 14
CCC VALUE 15
CCC VALUE 16
CCC VALUE 17
DDD VALUE 18
DDD VALUE 19
DDD VALUE 20
EEE VALUE 21
EEE VALUE 22


Please let me know for any further details.

Thanks in advance.
Back to top
View user's profile Send private message
Abid Hasan

New User


Joined: 25 Mar 2013
Posts: 88
Location: India

PostPosted: Tue Jun 24, 2014 12:59 pm
Reply with quote

Hello,

I am not sure what you meant by 'round robin fashion', though a simple OUTFIL INCLUDE can achieve what you are looking for; sample SYSIN:

Code:
 SORT FIELDS=COPY                                 
 OUTFIL FNAMES=OUT1,REMOVECC,                     
 INCLUDE=((1,3,CH,EQ,C'AAA'),OR,(1,3,CH,EQ,C'EEE'))
 OUTFIL FNAMES=OUT2,REMOVECC,                     
 INCLUDE=(1,3,CH,EQ,C'BBB')                       
 OUTFIL FNAMES=OUT3,REMOVECC,                     
 INCLUDE=(1,3,CH,EQ,C'CCC')                       
 OUTFIL FNAMES=OUT4,REMOVECC,                     
 INCLUDE=(1,3,CH,EQ,C'DDD')                       


Gives:
Out1:
Code:
AAA VALUE 1
AAA VALUE 2
AAA VALUE 3
AAA VALUE 4
AAA VALUE 5
EEE VALUE 21
EEE VALUE 22


Out2:
Code:
BBB VALUE 6 
BBB VALUE 7 
BBB VALUE 8 
BBB VALUE 9 
BBB VALUE 10
BBB VALUE 11


Out3:
Code:
CCC VALUE 12
CCC VALUE 13
CCC VALUE 14
CCC VALUE 15
CCC VALUE 16
CCC VALUE 17


Out4:
Code:
DDD VALUE 18
DDD VALUE 19
DDD VALUE 20


Hth!
Back to top
View user's profile Send private message
Shrinika Rajendran

New User


Joined: 06 Jun 2013
Posts: 25
Location: India

PostPosted: Tue Jun 24, 2014 2:10 pm
Reply with quote

Hi,
Thanks for your response.

But I cant use include/omit as I would not know the value of the field. Just that I want the split of the records at its break.
Back to top
View user's profile Send private message
Abid Hasan

New User


Joined: 25 Mar 2013
Posts: 88
Location: India

PostPosted: Tue Jun 24, 2014 2:23 pm
Reply with quote

Hello,

Shrinika Rajendran wrote:

But I cant use include/omit as I would not know the value of the field. Just that I want the split of the records at its break.

In such a scenario, how are you deciding that:
Quote:
6. The records with the value 'EEE' in 1-3 position should again be written to the output file 1


Which would mean you are aware of the values, right? Though another way to achieve this can be by using SECTIONS, you can read about it here. This should help you do the trick.
Back to top
View user's profile Send private message
Shrinika Rajendran

New User


Joined: 06 Jun 2013
Posts: 25
Location: India

PostPosted: Tue Jun 24, 2014 4:46 pm
Reply with quote

Hi,
It is what I mentioned round-robin for.
After 4 breaks, the 5th type should be written to the output file 1 again.

Please let me know for more details.

Thanks
Back to top
View user's profile Send private message
Bill Woodger

Moderator Emeritus


Joined: 09 Mar 2011
Posts: 7309
Location: Inside the Matrix

PostPosted: Tue Jun 24, 2014 5:17 pm
Reply with quote

Presuming, since you've posted in the JCL forum, that you have SyncSort:

Extend your record temporarily to include a sequence number with RESTART=(1,3).

Use IFTHEN=(WHEN=GROUP,BEGIN= for your extended value being "one", and PUSH an ID.

ID is a "sequence number" for the group. Take each ID and "normalise" it with arithmetic functions (0-3 or 1-4).

INCLUDE=/OMIT= on the OUTFILs, BUILD to cut the records back down to original.
Back to top
View user's profile Send private message
Shrinika Rajendran

New User


Joined: 06 Jun 2013
Posts: 25
Location: India

PostPosted: Tue Jun 24, 2014 5:56 pm
Reply with quote

Yes. I already did exactly until applying IDs for each group. But I am not able to proceed further.
1.
Code:
SORT FIELDS=COPY                             
OUTREC OVERLAY=(81:SEQNUM,8,ZD,RESTART=(1,3))


2.
Code:
INREC IFTHEN=(WHEN=GROUP,BEGIN=(81,8,CH,EQ,C'00000001'), 
             PUSH=(89:ID=8))                             


Can you please explain on
Quote:

Take each ID and "normalise" it with arithmetic functions (0-3 or 1-4).


Thanks
Back to top
View user's profile Send private message
Bill Woodger

Moderator Emeritus


Joined: 09 Mar 2011
Posts: 7309
Location: Inside the Matrix

PostPosted: Tue Jun 24, 2014 6:41 pm
Reply with quote

If you've already tried something, you need to say so up front, saves a to-and-fro.

You need to turn a sequence number into a "cycle" 0-1-2-3 or 1-2-3-4 (or any cycle you choose, in fact).

1-2-3-4-5-6-7-8...

1-2-3-4-1-2-3-4...

If you divide the sequence by four, and look at the remainder, you'll see the cycle. Look in your documentation for what MOD gives you.
Back to top
View user's profile Send private message
Shrinika Rajendran

New User


Joined: 06 Jun 2013
Posts: 25
Location: India

PostPosted: Thu Jun 26, 2014 11:45 am
Reply with quote

Hi,
I got exactly what I wanted. Thanks. Here is what I did.
3.
Code:

SORT FIELDS=COPY                                   
INREC IFTHEN=(WHEN=(89,8,CH,GT,C'00000004'),       
             OVERLAY=(89:89,8,ZD,MOD,+4,LENGTH=8)),
      IFTHEN=(WHEN=NONE,                           
             BUILD=(1,96))                         


4.
Code:

 SORT FIELDS=COPY                                           
OUTFIL FILES=01,INCLUDE=(89,8,CH,SS,C'1'),BUILD=(1,80)
OUTFIL FILES=02,INCLUDE=(89,8,CH,SS,C'2'),BUILD=(1,80)
OUTFIL FILES=03,INCLUDE=(89,8,CH,SS,C'3'),BUILD=(1,80)
OUTFIL FILES=04,INCLUDE=(89,8,CH,SS,C'4'),BUILD=(1,80)


Also, I need something else to this. Is it possible to set a threshold on each output file based on the total volume of the input and prevent writing to an output file during round robin if the threshold is crossed?

As in, say, if the total no records is 1600. And output files being 4, the threshold would be 1600/4 = 400.
The unique value in pos 1-3 is 5. And the count for each
1. 1000
2. 200
3. 100
4. 100
5. 200
So during round robin I would want the 5th value with count 200 should not be written to the output file 1. Output 1 (or any output file) could cross the threshold as I should not break the group of values in pos 1-3.
Hope I am clear.

Thanks in advance.
Thanks.
Back to top
View user's profile Send private message
Bill Woodger

Moderator Emeritus


Joined: 09 Mar 2011
Posts: 7309
Location: Inside the Matrix

PostPosted: Thu Jun 26, 2014 12:21 pm
Reply with quote

No need to do it in two steps (reading the input twice). Why are you labelling those two steps 3. and 4.?

In your next question, if you don't want the 5 to go to the first output file, where do you want it to go?
Back to top
View user's profile Send private message
Shrinika Rajendran

New User


Joined: 06 Jun 2013
Posts: 25
Location: India

PostPosted: Thu Jun 26, 2014 12:30 pm
Reply with quote

Yes. I agree I do not need 2 steps. And, in my previous post I labelled the steps to apply sequence number and the ID as 1 and 2. And hence labelled these 3 and 4.

I want it to be written to any output file other than 1 (as it already has 1000 records in it). Here, I want it to be written to output file 2, as it has only 200 records and has not crossed the threshold yet.
Back to top
View user's profile Send private message
Bill Woodger

Moderator Emeritus


Joined: 09 Mar 2011
Posts: 7309
Location: Inside the Matrix

PostPosted: Thu Jun 26, 2014 12:41 pm
Reply with quote

So you'll do all four steps as one later?

Where are you going to get the information on the size of the threshold?
Back to top
View user's profile Send private message
Shrinika Rajendran

New User


Joined: 06 Jun 2013
Posts: 25
Location: India

PostPosted: Thu Jun 26, 2014 1:22 pm
Reply with quote

I could club the steps rather than re-reading the input file. I am good till writing records in round-robin fashion.

And,
I would calculate the threshold (optimal number of records to each output file for equal distribution of volume) based on the number of input records and the number of output files written.
No of input recs = 1600; output files =4 ;
Threshold = 1600/4 = 400.
Back to top
View user's profile Send private message
Bill Woodger

Moderator Emeritus


Joined: 09 Mar 2011
Posts: 7309
Location: Inside the Matrix

PostPosted: Thu Jun 26, 2014 1:52 pm
Reply with quote

OK, but what I'm asking is where you are going to get the number of input records from?
Back to top
View user's profile Send private message
Shrinika Rajendran

New User


Joined: 06 Jun 2013
Posts: 25
Location: India

PostPosted: Thu Jun 26, 2014 2:32 pm
Reply with quote

I can have a count file created prior to this step. Or I can get it from the sysout details of the step that creates this file.
My bad, I should not have said
Quote:

4. Cannot predict the no of input records
Back to top
View user's profile Send private message
View previous topic :: :: View next topic  
Post new topic   Reply to topic View Bookmarks
All times are GMT + 6 Hours
Forum Index -> JCL & VSAM

 


Similar Topics
Topic Forum Replies
No new posts Unable to interpret a hex value to De... COBOL Programming 4
No new posts COBOL sorting, with input GDG base COBOL Programming 7
No new posts Sorting Date Field DFSORT/ICETOOL 4
No new posts Sort based on the record type DFSORT/ICETOOL 1
No new posts Concatenate 2 input datasets and give... JCL & VSAM 2
Search our Forums:

Back to Top