Dynamically splitting a huge file depending on count.

ranga_subham · New User Joined: 01 Jul 2005 Posts: 51

Hi,

Please provide me your suggestions/solutions to achieve the following:

A production job runs daily and creates a huge file with 'n' number of records. Except REXX, I want to use a utility (assuming SYNCSORT with COUNT) to know the 'n' number of records from this file and want to split the file into equal output files (each output file should have 1,00,000 records). How to achieve it dynamically if records vary on daily basis?

On a given day we may get 5,00,000 and on the other day we may get 8,00,000 records. So, depending on the count I need to split the input file into 5 or 8 pieces for further processing. After this processing (suppose a COBOL program) I may again get 5 or 8 files. So, I need to merge all these files as single file and need to FTP it to remote customer server. How to automatize this situation?

Please provide your suggestions/solutions/ideas to this problem. Please let me know if you need more inputs/details.

Note: REXX is ruled out by customer.

Please help me out.

Thanks for your time.

ranga_subham · New User Joined: 01 Jul 2005 Posts: 51

I ran the following SYNCSORT step to get the count of records from the file.

DavidatK · Posted: Thu Mar 09, 2006 12:36 am

Hi Ranga,

superk · Posted: Thu Mar 09, 2006 1:47 am

Dave, the O/P may be representing the record counts in either lakh's or in crore's. Look up the definition for each here and you'll see what I mean.

DavidatK · Posted: Thu Mar 09, 2006 3:03 am

superk,

I have not been exposed to unit counts in lakh's or crore's before. My horizons have been expanded. Thanks

Dave

manyone · New User Joined: 09 Mar 2006 Posts: 9

//* all gdg's are 1-entry each
//SORT1 EXEC PGM=SORT
//SYSOUT DD SYSOUT=*
//SORTMSG DD SYSOUT=*
//SORTIN DD DISP=SHR,DSN=TCLM.CQ000154.ECF.LATRAN.X060123
//SORTOUT DD DSN=TEMP.M4J6060.W1.DATA(+1),
// UNIT=TEMP,DISP=(,CATLG,DELETE),
// SPACE=(TRK,(10,10),RLSE)
//SORTWK01 DD UNIT=SYSDA,SPACE=(CYL,(10,20))
//SYSIN DD *
* prefix a 7-digit number to each record
SORT FIELDS=COPY
OUTFIL OUTREC=(SEQNUM,7,ZD,X,1,329)
//*
//SORT2 EXEC PGM=SORT
//SYSOUT DD SYSOUT=*
//SORTMSG DD SYSOUT=*
//SORTIN DD DISP=SHR,DSN=TEMP.M4J6060.W1.DATA
//SORTOF1 DD DSN=TEMP.M4J6060.V1.DATA(+1),
// UNIT=TEMP,DISP=(,CATLG,DELETE),
// SPACE=(TRK,(10,10),RLSE)
//SORTOF2 DD DSN=TEMP.M4J6060.V2.DATA(+1),
//SORTOF3 DD DSN=TEMP.M4J6060.V3.DATA(+1),
//SORTOF4 DD DSN=TEMP.M4J6060.V4.DATA(+1),
//SORTWK01+ DD UNIT=SYSDA,SPACE=(CYL,(10,20))
//SYSIN DD *
SORT FIELDS=COPY
OUTFIL FILES=1,INCLUDE=(1,7,ZD,GT,00,AND,1,7,ZD,LE,10),OUTREC=(9,329)
OUTFIL FILES=2,INCLUDE=(1,7,ZD,GT,10,AND,1,7,ZD,LE,20),OUTREC=(9,329)
OUTFIL FILES=3,INCLUDE=(1,7,ZD,GT,20,AND,1,7,ZD,LE,30),OUTREC=(9,329)
OUTFIL FILES=4,INCLUDE=(1,7,ZD,GT,30,AND,1,7,ZD,LE,40),OUTREC=(9,329)
//

ranga_subham · New User Joined: 01 Jul 2005 Posts: 51

Hi DavidatK,

Yes. It is 100,000 records. The COBOL program logic processes only 100,000 input records. If it is more in number then it will abend. This approach we have conveyed to the customer (pseudo code) and they agreed upon it. The thing is that the input file will have different no. of records each day. So, we need to act accordingly. They are not bothered about the scheduling because it is on request work. So, this solution will be at work once in a while. But, we are not able to achieve it. I seek the help from you experts.

I have not understood the info provided by manyone. Is it a solution to my problem?

Any inputs? Please help.

manyone · New User Joined: 09 Mar 2006 Posts: 9

the first sort prefixes a sequence number (automatically incremented) to the record so that 1-7 is sequence number, followed by 1 blank, then my record (329 bytes).
the second record splits the file from above into 4 parts according to the value of the first seven bytes (ie. sequence number). number 1 to 10 (in my example) goes to file 1, 11 to 20 goes to file 2, etc. - simply change to 100000 for your use. the result is 4 files (or however many) that can be input to 4 jobs.
i hope this is what you were looking for.

i413678 · Posted: Thu Mar 09, 2006 7:09 pm

Hi manyone,

You had given an excellent solutions.....But, ranga_subham asked we want to repeat this process for 'N' number of records......

is it right ranga_subham..........................

thx in advance...........

pavan

martin9 · Posted: Thu Mar 09, 2006 7:26 pm

hy,
a possible solution is,
you make a driver job.
means:
job1 is analyzing your input file and creates as many
jobs you need to get scheduled.
send all jobs to a dd wich redirects the jobs to the
internal reader...

//JOB DD SYSOUT=(x,INTRDR)

martin9

ranga_subham · New User Joined: 01 Jul 2005 Posts: 51

Hi manyone,

The first step has given the following error in the sysout:

ranga_subham · New User Joined: 01 Jul 2005 Posts: 51

Hi Martin,

Would you please convert this pseudo code into actual code. That will be a great help to me.

Thanks for ur time.

martin9 · Posted: Thu Mar 09, 2006 8:23 pm

hy ranga,
you write a porgram which reads the entire input file,
after each 100000 records (or less for the last portition),
you create a single job, writing it to a dd which is
defined as input to the internal reader...
it will be submitted immediate...

the created jobs should consist of:

1. splitting the file in a useful portition (ie. 100000 records)
IDCAMS
REPRO INFILE(IN1) OUTFILE(OUT1) COUNT(100000) {SKIP(n)}
2. running your cobol program
note: all names/ vars ... can be variably

more details?
martin9

manyone · New User Joined: 09 Mar 2006 Posts: 9

ranga,
sort control commands have to start after column 1 . if the command starts at col 1 it is considered as a label. so please shift right the commands 1 or 2 columns
thanks

ranga_subham · New User Joined: 01 Jul 2005 Posts: 51

Hi Manyone,

From the example you've provided here I've created my job like this. Please let me know if the job is correct. Sorry to bother you again and again. I was kept getting " ABENDED S000 U0016 CN(INTERNAL)" abend.

ranga_subham · New User Joined: 01 Jul 2005 Posts: 51

Thanks Martin. I will try that.

rohit jaiswal · Posted: Fri Mar 10, 2006 1:18 pm

hi

can u pls send me the details of it.

manyone · New User Joined: 09 Mar 2006 Posts: 9

i think you were getting cond 16 because sort1 output is not lrecl 210. pls understand that my example was for a max count of 10 for each file -hence file1 will get 1 to 10, file 2 will get 11 to 20, etc. also i'm trying to understand your process. i'm guessing that you are modifying the records as you process each sub file and you need to reconstruct the modified files back in the original sequence for later ftp. if this is the case, i suggest you keep the sequence number intact (change your sort2 to outrec=(1,210)) and pass it around but in the end, simply resort everything by the sequence number but drop it during outrec. also note that all the files may not always have data (eg. if original has 27 recs, outputs will have 10,10,7,0 in file1,2,3,4). hence your processing should handle empty files accordingly. i hope this will help.

ranga_subham · New User Joined: 01 Jul 2005 Posts: 51

Ok. I will do as you said. Thanks for the details. All in all, it is very useful info. Thanks for sharing.

martin9 · Posted: Fri Mar 10, 2006 7:03 pm

hy ranga,

do you think BLKSIZE=00000 will work?
pls provide the error messages, also joblog...

martin9

ps: you wanted a dynamic solution?