IBM Mainframe Forum Index
 
Log In
 
IBM Mainframe Forum Index Mainframe: Search IBM Mainframe Forum: FAQ Register
 

Split the Files but the Grouped records should not split


IBM Mainframe Forums -> DFSORT/ICETOOL
Post new topic   Reply to topic
View previous topic :: View next topic  
Author Message
Nehal Soni

New User


Joined: 06 Jun 2008
Posts: 8
Location: Mumbai

PostPosted: Fri Feb 27, 2009 7:13 pm
Reply with quote

hi,

I have a file in the following format having headers and trailers for every group.

group1starts
rec1
rec2
group1end
group2 start
rec1
rec2
rec3
group2 ends
group3 starts and so on

while doing FTP if the bytes to transfer is more than 75mbytes the job abends, I need to split the file in numbers of files if it exceeds 75mbytes

but when I am splitting I have to make sure that no group is divided in two different file. all the records of one specific group should be in one file only

i have tried with 'GROUP BY' CLAUSE but did not got solution

Please help , if it is possible by ICETOOL, or I have to write cobol for it
Back to top
View user's profile Send private message
Frank Yaeger

DFSORT Developer


Joined: 15 Feb 2005
Posts: 7129
Location: San Jose, CA

PostPosted: Fri Feb 27, 2009 10:58 pm
Reply with quote

Quote:
I need to split the file in numbers of files if it exceeds 75mbytes


What is the maximum number of output files you want to allow for the split? (For example, if the maximum size of the input file could be 700MB, you'd want to allow for a maximum number of output files of 10 or 11.)

What is the RECFM and LRECL of the input file?

How do you identify the first record of a group (be specific - like it contains 'starts' in positions 11-16)?

How do you identify the last record of a group (be specific - like it contains 'ends' in positions 11-14)?
Back to top
View user's profile Send private message
Nehal Soni

New User


Joined: 06 Jun 2008
Posts: 8
Location: Mumbai

PostPosted: Mon Mar 02, 2009 9:03 am
Reply with quote

1) you are right , we can split the input file for 'n' number of outfiles each of 75 MB
2)for input file RECFM=FB , LRECL =145
3)First record is identified by '<JOB ID=' at position 1-8 and last record for the group is identified by '</JOB>' at postion 1-6 and then again the first record of new group starts with '<JOB ID='

thanks for your interest , please let me know if any other input is required for the same
Back to top
View user's profile Send private message
Frank Yaeger

DFSORT Developer


Joined: 15 Feb 2005
Posts: 7129
Location: San Jose, CA

PostPosted: Mon Mar 02, 2009 9:29 pm
Reply with quote

Quote:
you are right , we can split the input file for 'n' number of outfiles each of 75 MB


Again, what is the maximum number for 'n'?
Back to top
View user's profile Send private message
Nehal Soni

New User


Joined: 06 Jun 2008
Posts: 8
Location: Mumbai

PostPosted: Tue Mar 03, 2009 9:20 am
Reply with quote

well so far ,the files length we received is such that we have divided it in 3(max) but this time for safety , we can keep the maximum number of output files to be 4
for example :
the maximum record count this we received is = 1646622, for LRECL=145
Back to top
View user's profile Send private message
Frank Yaeger

DFSORT Developer


Joined: 15 Feb 2005
Posts: 7129
Location: San Jose, CA

PostPosted: Wed Mar 04, 2009 12:37 am
Reply with quote

See if the following DFSORT/ICETOOL job will give you what you need:

Code:

//S1    EXEC  PGM=ICETOOL
//*** SPLIT INTO FOUR GROUPS OF APPROX. 500000 RECORDS, KEEPING
//*** GROUPS TOGETHER.
//TOOLMSG   DD  SYSOUT=*
//DFSMSG    DD  SYSOUT=*
//IN DD DSN=...  input file (FB/145)
//T1 DD DSN=&&T1,UNIT=SYSDA,SPACE=(CYL,(5,5)),DISP=(,PASS)
//T2 DD DSN=&&T2,UNIT=SYSDA,SPACE=(CYL,(5,5)),DISP=(,PASS)
//SPL1 DD DSN=&&S1,UNIT=SYSDA,SPACE=(CYL,(5,5)),DISP=(,PASS)
//SPL2 DD DSN=&&S2,UNIT=SYSDA,SPACE=(CYL,(5,5)),DISP=(,PASS)
//SPL3 DD DSN=&&S3,UNIT=SYSDA,SPACE=(CYL,(5,5)),DISP=(,PASS)
//SPL4 DD DSN=&&S4,UNIT=SYSDA,SPACE=(CYL,(5,5)),DISP=(,PASS)
//TOOLIN   DD    *
COPY FROM(IN) USING(CTL0)
COPY FROM(T2) TO(SPL1) USING(CTL1)
COPY FROM(T2) TO(SPL2) USING(CTL2)
COPY FROM(T2) TO(SPL3) USING(CTL3)
COPY FROM(T2) TO(SPL4) USING(CTL4)
/*
//CTL0CNTL DD *
  OPTION COPY
  INREC IFTHEN=(WHEN=GROUP,BEGIN=(1,8,CH,EQ,C'<JOB ID='),
      PUSH=(154:ID=8))
  OUTFIL FNAMES=T1,OVERLAY=(146:SEQNUM,8,ZD)
  OUTFIL FNAMES=T2,NODETAIL,REMOVECC,
    BUILD=(1,80),
    SECTIONS=(154,8,
      TRAILER3=(SUBCOUNT=(M11,LENGTH=8)))
//CTL1CNTL DD *
  INCLUDE COND=(1,8,ZD,GE,+500000)
  OPTION STOPAFT=1
  OUTREC BUILD=(C'SPL1,+',1,8,80:X)
//CTL2CNTL DD *
  INCLUDE COND=(1,8,ZD,GE,+1000000)
  OPTION STOPAFT=1
  OUTREC BUILD=(C'SPL2,+',1,8,80:X)
//CTL3CNTL DD *
  INCLUDE COND=(1,8,ZD,GE,+1500000)
  OPTION STOPAFT=1
  OUTREC BUILD=(C'SPL3,+',1,8,80:X)
//CTL4CNTL DD *
  INCLUDE COND=(1,8,ZD,GE,+2000000)
  OPTION STOPAFT=1
  OUTREC BUILD=(C'SPL4,+',1,8,80:X)
/*
//S2  EXEC  PGM=ICEMAN
//SYSOUT    DD  SYSOUT=*
//SYMNAMES DD DSN=&&S1,DISP=(OLD,PASS)
//         DD DSN=&&S2,DISP=(OLD,PASS)
//         DD DSN=&&S3,DISP=(OLD,PASS)
//         DD DSN=&&S4,DISP=(OLD,PASS)
//SYMNOUT DD SYSOUT=*
//SORTIN DD DSN=&&T1,DISP=(OLD,PASS)
//OUT1 DD DSN=...  output file1 (FB/145)
//OUT2 DD DSN=...  output file1 (FB/145)
//OUT3 DD DSN=...  output file1 (FB/145)
//OUT4 DD DSN=...  output file1 (FB/145)
//SYSIN    DD    *
  OPTION COPY
  OUTFIL FNAMES=OUT1,
    INCLUDE=(146,8,ZD,LE,SPL1),
    BUILD=(1,145)
  OUTFIL FNAMES=OUT2,
    INCLUDE=(146,8,ZD,GT,SPL1,AND,146,8,ZD,LE,SPL2),
    BUILD=(1,145)
  OUTFIL FNAMES=OUT3,
    INCLUDE=(146,8,ZD,GT,SPL2,AND,146,8,ZD,LE,SPL3),
    BUILD=(1,145)
  OUTFIL FNAMES=OUT4,SAVE,
    BUILD=(1,145)
/*
Back to top
View user's profile Send private message
Nehal Soni

New User


Joined: 06 Jun 2008
Posts: 8
Location: Mumbai

PostPosted: Wed Mar 04, 2009 5:23 pm
Reply with quote

hi,

Thanks a lot for spending your valuable time for my requirement , the output is almost correct

but the following are the cases where it fails to give correct result

note:every day, in Input file number of records will vary

1) when the input file length is less than 500000
the code abends while doing following comparison
INCLUDE=(146,8,ZD,GT,SPL1,AND,146,8,ZD,LE,SPL2), because the SPL1 and SPL2 is empty , SPL3, SPL4 also empty
2) or lets say input files is of 1000000, that means SPL3 will be empty and thus for the comparison INCLUDE=(146,8,ZD,GT,SPL2,AND,146,8,ZD,LE,SPL3), it abends saying SPL3 is Empty.

Q-2)
can I have 4 records of main header and 2 records of main trailor in the all the output files it creates


ex. in input file
main header1
main header2
main header3
main header4
group1starts
rec1
rec2
group1end
group2 start
rec1
rec2
rec3
group2 ends
group3 starts and so on
main trailor1
main trailor2

Q-3) is there any way to decide the number of output files that it may required dynamically , i mean allocate outfiles dynamically? I think 'no '
it is still fine with current logic

Thanks once again for your help
Back to top
View user's profile Send private message
Frank Yaeger

DFSORT Developer


Joined: 15 Feb 2005
Posts: 7129
Location: San Jose, CA

PostPosted: Wed Mar 04, 2009 11:30 pm
Reply with quote

I took a shot based on what I thought you asked for. I don't really have time to design a job that will meet all of your requirements. I suggest you write a program.
Back to top
View user's profile Send private message
Skolusu

Senior Member


Joined: 07 Dec 2007
Posts: 2205
Location: San Jose

PostPosted: Thu Mar 05, 2009 8:25 am
Reply with quote

Nehal Soni,

The following DFSORT/ICETOOL JCL will give you the desired results. It even takes care of pulling the 4 headers and 2 trailers for all the output files. The tricky part here is to split the records into 75 mb group
Code:

1 megabyte = 1,048,576 bytes
LRECL= 145
No: of records in 1mb = int(1048576/145) = 7231


since we are dealing with groups of records I narrowed that number to 7000. So the first copy operator starts a sequence number with 7000 using when=init at pos 146 of every record.

using another when=init, I divide that seqnum by 7000 so that every set of 7000 records have the same seqnum and put at pos 154

Now inorder to split the file into 75mb files, I divide the number at pos 154 by 74(1 mb buffer to hold group records).

I also used GROUP function to sequence the groups.

Using report functions I generate file t1 with 74mb limits like this
Code:

0 00000000 00001389
1 00001389 00002797
2 00002797 00004204
3 00000000 00004499


The second copy operator reads in the t1 file and creates the OUTFIL statements like shown below
Code:

OUTFIL FNAMES=OUT1,BUILD=(1,145),                           
INCLUDE=(146,8,CH,EQ,C' ',OR,                               
        (146,8,ZD,GT,00000000,AND,146,8,ZD,LE,00001389))   
OUTFIL FNAMES=OUT2,BUILD=(1,145),                           
INCLUDE=(146,8,CH,EQ,C' ',OR,                               
        (146,8,ZD,GT,00001389,AND,146,8,ZD,LE,00002797))   
OUTFIL FNAMES=OUT3,BUILD=(1,145),                           
INCLUDE=(146,8,CH,EQ,C' ',OR,                               
        (146,8,ZD,GT,00002797,AND,146,8,ZD,LE,00004204))   
OUTFIL FNAMES=OUT4,BUILD=(1,145),                           
INCLUDE=(146,8,CH,EQ,C' ',OR,146,8,ZD,GT,00004204)         


The final operator uses the above control cards as well the GROUP cards and creates the output files.

Code:

//STEP0100 EXEC PGM=ICETOOL                 
//TOOLMSG  DD SYSOUT=*                     
//DFSMSG   DD SYSOUT=*                     
//IN       DD DSN=Your 145 byte file to be split,
//            DISP=SHR
//T1       DD DSN=&&T1,DISP=(,PASS),SPACE=(TRK,(1,1),RLSE)
//C1       DD DSN=&&C1,DISP=(,PASS),SPACE=(TRK,(1,1),RLSE)
//*
//OUT1     DD DSN=your.output1,
//            DISP=(NEW,CATLG,DELETE),
//            UNIT=SYSDA,
//            SPACE=(CYL,(100,100),RLSE)
//*
//OUT2     DD DSN=your.output2,
//            DISP=(NEW,CATLG,DELETE),
//            UNIT=SYSDA,
//            SPACE=(CYL,(100,100),RLSE)
//*
//OUT3     DD DSN=your.output3,
//            DISP=(NEW,CATLG,DELETE),
//            UNIT=SYSDA,
//            SPACE=(CYL,(100,100),RLSE)
//*
//OUT4     DD DSN=your.output4,
//            DISP=(NEW,CATLG,DELETE),
//            UNIT=SYSDA,
//            SPACE=(CYL,(100,100),RLSE)
//*
//TOOLIN   DD *
  COPY FROM(IN) USING(CTL1)
  COPY FROM(T1) USING(CTL2)
  COPY FROM(IN) USING(CTL3)
//*                                       
//CTL1CNTL DD *                                                         
  INREC IFTHEN=(WHEN=INIT,OVERLAY=(146:SEQNUM,8,ZD,START=7000,INCR=1)),
  IFTHEN=(WHEN=INIT,OVERLAY=(154:146,8,ZD,DIV,+7000,M11,LENGTH=8)),     
  IFTHEN=(WHEN=INIT,OVERLAY=(154:154,8,ZD,DIV,+74,M11,LENGTH=8)),       
  IFTHEN=(WHEN=GROUP,BEGIN=(1,8,CH,EQ,C'<JOB ID='),                     
  END=(1,6,CH,EQ,C'</JOB>'),PUSH=(162:ID=8))                           
                                                                       
  OUTFIL FNAMES=T1,REMOVECC,NODETAIL,BUILD=(20X),                       
  SECTIONS=(154,8,                                                     
  TRAILER3=(MAX=(161,1,ZD,M11,LENGTH=1),X,                             
            MIN=(162,8,ZD,M11,LENGTH=8),X,                             
            MAX=(162,8,ZD,M11,LENGTH=8)))                               
//*
//CTL2CNTL DD *                                                         
  OPTION STOPAFT=3                                                     
  OUTFIL FNAMES=C1,REMOVECC,                                           
  BUILD=(3:C'OUTFIL FNAMES=OUT',SEQNUM,1,ZD,C',BUILD=(1,145),',/,       
         3:C'INCLUDE=(146,8,CH,EQ,C''',X,C'''',C',OR,',/,               
         3:8X,C'(146,8,ZD,GT,',3,8,C',AND,146,8,ZD,LE,',               
         12,8,C'))',80:X),                                             
  TRAILER1=(3:'OUTFIL FNAMES=OUT4,BUILD=(1,145),',/,                   
            3:'INCLUDE=(146,8,CH,EQ,C''',X,C'''',C',OR,',               
              '146,8,ZD,GT,',12,8,')',80:X)                             
//*                                                                     
//CTL3CNTL DD *                                                         
  SORT FIELDS=COPY                                                     
  INREC IFTHEN=(WHEN=GROUP,BEGIN=(1,8,CH,EQ,C'<JOB ID='),               
  END=(1,6,CH,EQ,C'</JOB>'),PUSH=(146:ID=8))                           
//         DD DSN=&&C1,DISP=OLD,VOL=REF=*.C1                           
//*


Notes:

1. If your input has less than min of 74MB records than OUT4 will have 6 records which are the header and trailer records

2. If your input has more than 222 MB of data then your OUT4 will have rest of the data as the first 3 files will each have 74mb. For example if your input is 500 MB, the first 3 files will have Less than or equal 75 mb each while out4 will have the rest 278 MB data

We can come up with a DYNAMIC split but it becomes a little bit more complicated.
Back to top
View user's profile Send private message
Nehal Soni

New User


Joined: 06 Jun 2008
Posts: 8
Location: Mumbai

PostPosted: Thu Mar 05, 2009 7:15 pm
Reply with quote

HI,

it is working fine , and it has helped me
Many thanks for your efforts.
Back to top
View user's profile Send private message
View previous topic :: :: View next topic  
Post new topic   Reply to topic View Bookmarks
All times are GMT + 6 Hours
Forum Index -> DFSORT/ICETOOL

 


Similar Topics
Topic Forum Replies
No new posts How to split large record length file... DFSORT/ICETOOL 7
No new posts Write line by line from two files DFSORT/ICETOOL 7
No new posts Compare only first records of the fil... SYNCSORT 7
No new posts Pulling a fixed number of records fro... DB2 2
No new posts Merge two VSAM KSDS files into third ... JCL & VSAM 6
Search our Forums:

Back to Top