IBM Mainframe Forum Index
 
Log In
 
IBM Mainframe Forum Index Mainframe: Search IBM Mainframe Forum: FAQ Register
 

Use SORT to Split file and omit some records


IBM Mainframe Forums -> DFSORT/ICETOOL
Post new topic   Reply to topic
View previous topic :: View next topic  
Author Message
abby.qiong.zhang

New User


Joined: 07 Jun 2012
Posts: 26
Location: China

PostPosted: Mon Apr 21, 2014 11:23 am
Reply with quote

I would like to split my input file into several output use the following rules:
1. Split every group of lines that start with "B" and follow several "C", "D", "E" lines into a new file
2. If col "23-25" is not equal to "203" in line C, then omit the whole group "B", "C", "D", "E", "C", "D", "E" lines.

My input like this
Code:

----+----1----+----2----+----3----+----4----+----5----+----6----+-----7--
B826000628392                       STYRON BELGIUM B.V.B.A.         
C000000000000110.40EUR20320140410                           102PQ 38
DBE50735134047318                   GEBOERS,JAAK C/O STYRON BELGIUM
E11               U349166 /38/01/28006219/CASH-EXP RPT TER001435613
C000000000000156.35EUR20320140410                           102PQ 38
DBE92800224016023                   VANDEVENNE, JOCHEN (J) C/O STYRO
E11               U574514 /38/01/28006220/CASH-EXP RPT TER001433953
B826000628392                       STYRON BELGIUM B.V.B.A.           
C000000000059186.37EUR22720140410                           102PQ 38DBV
DBE07968128760266                   ENI GAS   POWER NV/SA             
E11               01620121/38/01/28006260/1000224503
B178098010                          STYRON EUROPE GMBH                 E
C000000000001805.00EUR22720140410                           102PQ 38DBVP
DNL49RBOS0527292796                 HOYER GLOBAL TRANSPORT           
E11               00653095/38/07/28044621/56480283 56481769             
C000000000075739.55EUR22720140410                           102PQ 38DBVP
DIT59O0356601600000116309025        VERSALIS SPA                       
E11               00950822/38/07/28044624/2201402735                   
C000000000000160.66EUR22720140410                           102PQ 38DBVP
DDE39700700100222349300             WACKER CHEMIE AG                   
E11               00084490/38/07/28044618/143551451                   
B826000628392                       STYRON BELGIUM B.V.B.A.           
C000000000000340.00EUR20320140410                           102PQ 38DBV
DNL56RABO0178997900                 HOUWER KOERIERS BV                 
E11               00102860/38/01/28006224/140033                       
C000000000000043.10EUR20320140410                           102PQ 38DBV
DNL73KRED0633142980                 PRAXAIR NV                         
E11               00284039/38/01/28006228/26982                       
C000000000001289.86EUR20320140410                           102PQ 38DBV
DDE15630400530929410900             ZWICK GMBH   CO   
E11               00096314/38/01/28006221/1103348                       


I want the output like this
file 1
Code:

----+----1----+----2----+----3----+----4----+----5----+----6----+-----7--
B826000628392                       STYRON BELGIUM B.V.B.A.         
C000000000000110.40EUR20320140410                           102PQ 38
DBE50735134047318                   GEBOERS,JAAK C/O STYRON BELGIUM
E11               U349166 /38/01/28006219/CASH-EXP RPT TER001435613
C000000000000156.35EUR20320140410                           102PQ 38
DBE92800224016023                   VANDEVENNE, JOCHEN (J) C/O STYRO
E11               U574514 /38/01/28006220/CASH-EXP RPT TER001433953 

file 2
Code:

B826000628392                       STYRON BELGIUM B.V.B.A.           
C000000000000340.00EUR20320140410                           102PQ 38DBV
DNL56RABO0178997900                 HOUWER KOERIERS BV                 
E11               00102860/38/01/28006224/140033                       
C000000000000043.10EUR20320140410                           102PQ 38DBV
DNL73KRED0633142980                 PRAXAIR NV                         
E11               00284039/38/01/28006228/26982                       
C000000000001289.86EUR20320140410                           102PQ 38DBV
DDE15630400530929410900             ZWICK GMBH   CO   
E11               00096314/38/01/28006221/1103348                     
Back to top
View user's profile Send private message
Bill Woodger

Moderator Emeritus


Joined: 09 Mar 2011
Posts: 7309
Location: Inside the Matrix

PostPosted: Mon Apr 21, 2014 1:10 pm
Reply with quote

You are going to need to be clearer.

"B" defines the start of a group, no other types of group?

Why isn't your entire input included in output file 1?

Does a "C" record immediately follow a "B" record?

What do you want to do when there are two (or more) "C" records withing a "B" group?

Probably more, but difficult to tell when so little is know.
Back to top
View user's profile Send private message
abby.qiong.zhang

New User


Joined: 07 Jun 2012
Posts: 26
Location: China

PostPosted: Mon Apr 21, 2014 1:55 pm
Reply with quote

Bill Woodger wrote:
You are going to need to be clearer.

"B" defines the start of a group, no other types of group?

Why isn't your entire input included in output file 1?

Does a "C" record immediately follow a "B" record?

What do you want to do when there are two (or more) "C" records withing a "B" group?

Probably more, but difficult to tell when so little is know.


Here's my requirements, the file is FB, length 512, and records will start like "B", "C", "D", "E". It's ok to do not split the file but reformat it to copy the B line before every C line.

Here's my group definition, start with B line and end with E line, like BCDECDE...CDE or BCDE.

If within a group, col 23-25 in C line is not '203', then omit the whole BCDE or BCDECDE..CDE group.

If within a group, col 23-25 in C line is '203', then reformat it from BCDECDECDE to BCDEBCDEBCDE to copy B line before every CDE line.

For example, input
Code:

B1234567
Cxxxxxxxxxxxxxxxxxxxxx203
Dxxxxxxxxxxx
Exxxxxxxxxxx
Cyyyyyyyyyyyyyyyyyyyyy203
Dyyyyyyyyyyy
Eyyyyyyyyyyy
B2234567
Czzzzzzzzzzzzzzzzzzzzzzzz203
Dzzzzzzzzzzzz
Ezzzzzzzzzzzz
B3234567
Cxxxxxxxxxxxxxxxxxxxxx227
Dxxxxxxxxxxx
Exxxxxxxxxxx


The output need to be like this, B line need to be copy into the second CDE line, and the last group of "BCDE" with 227 in col 23-25 at C line need to be omitted.
Code:

B1234567
Cxxxxxxxxxxxxxxxxxxxxx203
Dxxxxxxxxxxx
Exxxxxxxxxxx
B1234567
Cyyyyyyyyyyyyyyyyyyyyy203
Dyyyyyyyyyyy
Eyyyyyyyyyyy
B2234567
Czzzzzzzzzzzzzzzzzzzzzzzz203
Dzzzzzzzzzzzz
Ezzzzzzzzzzzz
[/quote]
Back to top
View user's profile Send private message
Bill Woodger

Moderator Emeritus


Joined: 09 Mar 2011
Posts: 7309
Location: Inside the Matrix

PostPosted: Mon Apr 21, 2014 2:16 pm
Reply with quote

OK, that makes things clearer, and easier.

Use OPTION COPY, INREC IFTHEN=(WHEN=GROUP to identify the B-record and PUSH the entire record to a temporary extension of your data, PUSH=(513:1,512).

The B record is now tucked away safely for when you need it.

In OUTFIL, use OMIT= to get rid of all the original B records. Use IFTHEN=(WHEN=(logical expression to identity the C records, then use the / (Slash Operator) to on BUILD to create two records for output, the B record first, followed by the C. BUILD=(513,512,/,1,512). Use IFOUTLEN=512 to set the record-length (which will achieve the chopping off of the B records from position 513 on all the other records).

I would write something to check, 100%, the structure of your file, and consider how it remains valid, but you've probably already dealt with this...
Back to top
View user's profile Send private message
abby.qiong.zhang

New User


Joined: 07 Jun 2012
Posts: 26
Location: China

PostPosted: Mon Apr 21, 2014 2:51 pm
Reply with quote

Bill Woodger wrote:
OK, that makes things clearer, and easier.

Use OPTION COPY, INREC IFTHEN=(WHEN=GROUP to identify the B-record and PUSH the entire record to a temporary extension of your data, PUSH=(513:1,512).

The B record is now tucked away safely for when you need it.

In OUTFIL, use OMIT= to get rid of all the original B records. Use IFTHEN=(WHEN=(logical expression to identity the C records, then use the / (Slash Operator) to on BUILD to create two records for output, the B record first, followed by the C. BUILD=(513,512,/,1,512). Use IFOUTLEN=512 to set the record-length (which will achieve the chopping off of the B records from position 513 on all the other records).

I would write something to check, 100%, the structure of your file, and consider how it remains valid, but you've probably already dealt with this...


Hi Bill,

Thanks very much for the guide, i tried with the below code, but received an error message with "END OF SORTOUT FIELD BEYOND MAXIMUM RECORD LENGTH"
Code:

OPTION COPY                                             
INREC IFOUTLEN=512,                                     
   IFTHEN=(WHEN=GROUP,                                 
     BEGIN=(1,1,CH,EQ,C'B'),PUSH=(513:1,512))           
OUTFIL OMIT=(1,1,CH,EQ,C'B'),                           
   IFTHEN=(WHEN=(1,1,CH,EQ,C'C',AND,23,3,CH,EQ,C'203'),
     BUILD=(513,512,/,1,512))                           
Back to top
View user's profile Send private message
Bill Woodger

Moderator Emeritus


Joined: 09 Mar 2011
Posts: 7309
Location: Inside the Matrix

PostPosted: Mon Apr 21, 2014 3:13 pm
Reply with quote

It is in the OUTFIL that the records need cutting down (using IFOUTLEN in this case).

Code:
OPTION COPY                                             
INREC  IFTHEN=(WHEN=GROUP,                                 
     BEGIN=(1,1,CH,EQ,C'B'),PUSH=(513:1,512))           
OUTFIL OMIT=(1,1,CH,EQ,C'B'),
   IFOUTLEN=512,                           
   IFTHEN=(WHEN=(1,1,CH,EQ,C'C',AND,23,3,CH,EQ,C'203'),
     BUILD=(513,512,/,1,512))           



However, I forgot about your 203's in my haste.

Include a SEQ on the WHEN=GROUP, with enough digits to securely cover the number of sub-records that can occur within a group.

Remove the OMIT=

Amend your IFTHEN on OUTFIL to have "and sequence number within group not equal to two" (because the first C will already follow the existing B).
Back to top
View user's profile Send private message
abby.qiong.zhang

New User


Joined: 07 Jun 2012
Posts: 26
Location: China

PostPosted: Mon Apr 21, 2014 3:31 pm
Reply with quote

Bill Woodger wrote:
It is in the OUTFIL that the records need cutting down (using IFOUTLEN in this case).

Code:
OPTION COPY                                             
INREC  IFTHEN=(WHEN=GROUP,                                 
     BEGIN=(1,1,CH,EQ,C'B'),PUSH=(513:1,512))           
OUTFIL OMIT=(1,1,CH,EQ,C'B'),
   IFOUTLEN=512,                           
   IFTHEN=(WHEN=(1,1,CH,EQ,C'C',AND,23,3,CH,EQ,C'203'),
     BUILD=(513,512,/,1,512))           



However, I forgot about your 203's in my haste.

Include a SEQ on the WHEN=GROUP, with enough digits to securely cover the number of sub-records that can occur within a group.

Remove the OMIT=

Amend your IFTHEN on OUTFIL to have "and sequence number within group not equal to two" (because the first C will already follow the existing B).


I try with the above code, the group BCDE and BCDECDE...CDE can't be omitted if C line isn't '203' in column 23-25.

May I know how to adopt the SEQ function? I checked the details in the utility book, but still have not much idea.
Back to top
View user's profile Send private message
Bill Woodger

Moderator Emeritus


Joined: 09 Mar 2011
Posts: 7309
Location: Inside the Matrix

PostPosted: Mon Apr 21, 2014 4:18 pm
Reply with quote

The original solution was removing all the B records, as they were being inserted when a C record was encountered.

Input

Code:
B
C
D
E
C
D
E


After INREC

Code:
B+B
C+B
D+B
E+B
C+B
D+B
E+B

After OMIT= on OUTFIL
Code:
C+B
D+B
E+B
C+B
D+B
E+B

After IFTHEN with BUILD and slash operator
Code:
B
C
D+B
E+B
B
C
D+B
E+B

As the records are written from OUTFIL (the other extended Bs disappear due to the IFOUTLEN=512)
Code:
B
C
D
E
B
C
D
E


This would be for all B groups, which is not what you wanted. You only want extra B records inserted if a C is 203. For first sub-group within a B, the C which is a 203 already has a B, so you need to code for that.

So, the OMIT= must be removed, as the B records for other groups are not going to be inserted before each C. This is nothing to do with omitting groups, it just happens to be the same word.

Because the OMIT= goes, you have this for your IFTHEN in OUTFIL

Code:
B+B
C+B
D+B
E+B
C+B
D+B
E+B


Now we pretend that both of those Cs have 203 on, so we want to generate a B for them.

Except, for the first C, we don't. It already has the B.

So we need to differentiate between the first C in a group, and other Cs in a group.

We need to add something to the above to do that.

Code:
INREC  IFTHEN=(WHEN=GROUP,                                 
     BEGIN=(1,1,CH,EQ,C'B'),PUSH=(513:1,512))


This is what is adding the entire B record to each record. Now we need something else. If you look at PUSH, you'll see there is not much else we can add. An ID (so a sequence number at group level) or SEQ (a sequence number within the group).

SEQ is the one you need.

Code:
INREC  IFTHEN=(WHEN=GROUP,                                 
     BEGIN=(1,1,CH,EQ,C'B'),PUSH=(513:1,512,SEQ=3))


SEQ=3 gives you three digits, so maximum of 999 in group. SEQ=4 is maximum of 9999. Etc.

Now we have this arriving in OUTFIL:

Code:
B+B+001
C+B+002
D+B+003
E+B+004
C+B+005
D+B+006
E+B+007


With only one of each of D and E, and exactly one, we could always predict a number for a C. We don't need to do that. We only need to be able to identify the first C, and, from what you have said, we can always guarantee that by testing for the value 002.

Code:
IFTHEN=(WHEN=((1,1,CH,EQ,C'C'),
         AND,
             (23,3,CH,EQ,C'203'),
         AND,
             (1025,3,CH,EQ,C'002')),


Alternatively, you could put a WHEN=GROUP into the OUTFIL to achieve the same thing.
Back to top
View user's profile Send private message
abby.qiong.zhang

New User


Joined: 07 Jun 2012
Posts: 26
Location: China

PostPosted: Tue Apr 22, 2014 8:01 am
Reply with quote

Bill Woodger wrote:
Now we pretend that both of those Cs have 203 on, so we want to generate a B for them.

Except, for the first C, we don't. It already has the B.

So we need to differentiate between the first C in a group, and other Cs in a group.

We need to add something to the above to do that.

Code:
INREC  IFTHEN=(WHEN=GROUP,                                 
     BEGIN=(1,1,CH,EQ,C'B'),PUSH=(513:1,512))


This is what is adding the entire B record to each record. Now we need something else. If you look at PUSH, you'll see there is not much else we can add. An ID (so a sequence number at group level) or SEQ (a sequence number within the group).

SEQ is the one you need.

Code:
INREC  IFTHEN=(WHEN=GROUP,                                 
     BEGIN=(1,1,CH,EQ,C'B'),PUSH=(513:1,512,SEQ=3))


SEQ=3 gives you three digits, so maximum of 999 in group. SEQ=4 is maximum of 9999. Etc.

Now we have this arriving in OUTFIL:

Code:
B+B+001
C+B+002
D+B+003
E+B+004
C+B+005
D+B+006
E+B+007


With only one of each of D and E, and exactly one, we could always predict a number for a C. We don't need to do that. We only need to be able to identify the first C, and, from what you have said, we can always guarantee that by testing for the value 002.

Code:
IFTHEN=(WHEN=((1,1,CH,EQ,C'C'),
         AND,
             (23,3,CH,EQ,C'203'),
         AND,
             (1025,3,CH,EQ,C'002')),


Alternatively, you could put a WHEN=GROUP into the OUTFIL to achieve the same thing.


Thanks Bill for the detail explantion, let me debrief to see if my understanding is correct.
Input:
Code:

B000011
C203
D
E
C203
D
E
B000031
C227
D
E
C227
D
E
B000021
C203
D
E
C203
D
E


After INREC
Code:

INREC IFTHEN=(WHEN=GROUP,                           
     BEGIN=(1,1,CH,EQ,C'B'),PUSH=(513:1,512,SEQ=3))


It would be look like this
Code:

B000011+B000011+001
C203+B000011+002
D+B000011+003
E+B000011+004
C203+B000011+005
D+B000011+006
E+B000011+007
B000031+B000031+008
C227+B000031+009
D+B000031+010
E+B000031+011
C227+B000031+012
D+B000031+013
E+B000031+014
B000021+B000021+015
C203+B000021+016
D+B000021+017
E+B000021+018
C203+B000021+019
D+B000021+020
E+B000021+021


Then, after IFTHEN with BUILD and slash, here I change the last condition to NE as we do not need to include the B record to the first C.
Code:

   IFTHEN=(WHEN=((1,1,CH,EQ,C'C'),         
            AND,                           
                (23,3,CH,EQ,C'203'),       
            AND,                           
                (1025,3,CH,NE,C'002')),     
       BUILD=(513,512,/,1,512))             


It will be look like this.
Code:

B000011+B000011+001
C203+B000011+002
D+B000011+003
E+B000011+004
B000011+B000011+001
C203+B000011+005
D+B000011+006
E+B000011+007
B000031+B000031+008
C227+B000031+009
D+B000031+010
E+B000031+011
C227+B000031+012
D+B000031+013
E+B000031+014
B000021+B000021+015
C203+B000021+016
D+B000021+017
E+B000021+018
B000021+B000021+015
C203+B000021+019
D+B000021+020
E+B000021+021


With the statement "IFOUTLEN=512", the other extended Bs disapper, the final output will be look like this
Code:

B000011
C203
D
E
B000011
C203
D
E
B000031
C227
D
E
C227
D
E
B000021
C203
D
E
B000021
C203
D
E


We missed one requirement here is that all BCDECDE records need to be deleted if C line is not 203, so my expected output look like this:
Code:

B000011
C203
D
E
B000011
C203
D
E
B000021
C203
D
E
B000021
C203
D
E
Back to top
View user's profile Send private message
Bill Woodger

Moderator Emeritus


Joined: 09 Mar 2011
Posts: 7309
Location: Inside the Matrix

PostPosted: Tue Apr 22, 2014 12:11 pm
Reply with quote

I think you've got that, including the NE.

I missed/mis-interpreted the need to drop some stuff.

So back to dropping all the Bs, and generating all the Bs, since it is only the 203 groups you need.

Add a new WHEN=GROUP with PUSH for the position of the 203. Do this only when the seq added in the first PUSH is "two" and END when 'B'. Ensure that the SEQ is big enough for the maximum number of records in a group, and then add another digit.

Use the 203 (which is now on all relevant records if present).

Note that other than the first B, each B will have the 203-position-value of the previous group (as they will end the group), but this does not matter, as all the Bs are ignored.

Your INCLUDE=/OMIT= should:

Ignore all Bs in position 1.
Keep all other records which have 203 in your new PUSHed field.

In the OUTFIL, you need to generate a B with the Slash Operator for every C.

Although you no longer need to reference the sequence number in the OUTFIL, it is now needed earlier for the new WHEN=GROUP.
Back to top
View user's profile Send private message
abby.qiong.zhang

New User


Joined: 07 Jun 2012
Posts: 26
Location: China

PostPosted: Tue Apr 22, 2014 2:18 pm
Reply with quote

Bill Woodger wrote:
I think you've got that, including the NE.

I missed/mis-interpreted the need to drop some stuff.

So back to dropping all the Bs, and generating all the Bs, since it is only the 203 groups you need.

Add a new WHEN=GROUP with PUSH for the position of the 203. Do this only when the seq added in the first PUSH is "two" and END when 'B'. Ensure that the SEQ is big enough for the maximum number of records in a group, and then add another digit.

Use the 203 (which is now on all relevant records if present).

Note that other than the first B, each B will have the 203-position-value of the previous group (as they will end the group), but this does not matter, as all the Bs are ignored.

Your INCLUDE=/OMIT= should:

Ignore all Bs in position 1.
Keep all other records which have 203 in your new PUSHed field.

In the OUTFIL, you need to generate a B with the Slash Operator for every C.

Although you no longer need to reference the sequence number in the OUTFIL, it is now needed earlier for the new WHEN=GROUP.


Hi Bill,

I didn't get clearly about your suggestion about add the new WHEN=GROUP with PUSH for the position of the 203.

I tried with the below code, seems the group is not determined correctly, only those 'B' lines for '227' were deleted .
Code:

INREC IFTHEN=(WHEN=GROUP,                                 
     BEGIN=(1,1,CH,EQ,C'B'),PUSH=(513:1,512,SEQ=3)),     
      IFTHEN=(WHEN=GROUP,                                 
     BEGIN=((1,1,CH,EQ,C'C'),                             
        AND,                                             
           (23,3,CH,EQ,C'203'),                           
        AND,                                             
           (1025,3,CH,EQ,C'002')),
     END=(1,1,CH,EQ,C'B'),PUSH=(1028:1,1027,SEQ=4))
   
OUTFIL IFOUTLEN=512,OMIT=(1,1,CH,EQ,C'B'),               
   IFTHEN=(WHEN=(1,1,CH,EQ,C'C',AND,23,3,CH,EQ,C'203'),
       BUILD=(513,512,/,1,512))                           


The output looks like this
Code:

B
C203
D
E
B
C203
D
3
C227
D
E
C227
D
3


the last 6 CDE lines for 227 are still there.

I believe I must miss some points from your comment.
Back to top
View user's profile Send private message
abby.qiong.zhang

New User


Joined: 07 Jun 2012
Posts: 26
Location: China

PostPosted: Tue Apr 22, 2014 3:03 pm
Reply with quote

I try with this and it works, add one more condition in the OMIT statements to delete all 227 lines.

Code:

INREC IFTHEN=(WHEN=GROUP,                                   
     BEGIN=(1,1,CH,EQ,C'B'),PUSH=(513:1,512,SEQ=3)),         
      IFTHEN=(WHEN=GROUP,                                   
     BEGIN=((1,1,CH,EQ,C'C'),                               
        AND,                                                 
           (1025,3,CH,EQ,C'002')),                           
     END=(1,1,CH,EQ,C'B'),PUSH=(1028:23,3))                 
OUTFIL IFOUTLEN=512,OMIT=(1,1,CH,EQ,C'B',OR,                 
                       1028,3,CH,EQ,C'227'),                 
   IFTHEN=(WHEN=(1,1,CH,EQ,C'C',AND,23,3,CH,EQ,C'203'),     
       BUILD=(513,512,/,1,512))                             


Wondering if any function can help to count the number of groups, put the count number in the trailer
like
Code:

B
C203
D
E
B
C203
D
E
B
C203
D
E
B
C203
D
E
4
Back to top
View user's profile Send private message
Bill Woodger

Moderator Emeritus


Joined: 09 Mar 2011
Posts: 7309
Location: Inside the Matrix

PostPosted: Tue Apr 22, 2014 3:46 pm
Reply with quote

OK. Good going. I'd have used NE 203, but if there are only 227 and 203 the results would be the same, if a little less clear (as that knowledge of the data is required to understand the code).

If you have a new question, posit it as new question please.
Back to top
View user's profile Send private message
abby.qiong.zhang

New User


Joined: 07 Jun 2012
Posts: 26
Location: China

PostPosted: Wed Apr 23, 2014 8:30 am
Reply with quote

Sure Bill.

I posted a new topic regarding the count matter, looking forward to your suggestion on this.

ibmmainframes.com/viewtopic.php?p=322658#322658
Back to top
View user's profile Send private message
View previous topic :: :: View next topic  
Post new topic   Reply to topic View Bookmarks
All times are GMT + 6 Hours
Forum Index -> DFSORT/ICETOOL

 


Similar Topics
Topic Forum Replies
No new posts FTP VB File from Mainframe retaining ... JCL & VSAM 1
No new posts Need to set RC4 through JCL SORT DFSORT/ICETOOL 5
No new posts Extract the file name from another fi... DFSORT/ICETOOL 6
No new posts How to split large record length file... DFSORT/ICETOOL 10
No new posts INCLUDE OMIT COND for Multiple values... DFSORT/ICETOOL 5
Search our Forums:

Back to Top