IBM Mainframe Forum Index
 
Log In
 
IBM Mainframe Forum Index Mainframe: Search IBM Mainframe Forum: FAQ Register
 

Dynamically splitting a huge file depending on count.


IBM Mainframe Forums -> JCL & VSAM
Post new topic   Reply to topic
View previous topic :: View next topic  
Author Message
ranga_subham

New User


Joined: 01 Jul 2005
Posts: 51

PostPosted: Wed Mar 08, 2006 7:52 pm
Reply with quote

Hi,

Please provide me your suggestions/solutions to achieve the following:

A production job runs daily and creates a huge file with 'n' number of records. Except REXX, I want to use a utility (assuming SYNCSORT with COUNT) to know the 'n' number of records from this file and want to split the file into equal output files (each output file should have 1,00,000 records). How to achieve it dynamically if records vary on daily basis? icon_eek.gif On a given day we may get 5,00,000 and on the other day we may get 8,00,000 records. So, depending on the count I need to split the input file into 5 or 8 pieces for further processing. After this processing (suppose a COBOL program) I may again get 5 or 8 files. So, I need to merge all these files as single file and need to FTP it to remote customer server. How to automatize this situation? icon_exclaim.gif icon_question.gif

Please provide your suggestions/solutions/ideas to this problem. Please let me know if you need more inputs/details.

Note: REXX is ruled out by customer. icon_sad.gif

Please help me out.

Thanks for your time.
Back to top
View user's profile Send private message
ranga_subham

New User


Joined: 01 Jul 2005
Posts: 51

PostPosted: Wed Mar 08, 2006 7:59 pm
Reply with quote

I ran the following SYNCSORT step to get the count of records from the file.

Code:

//S1      EXEC PGM=ICETOOL                               
//TOOLMSG   DD SYSOUT=*                                   
//SSMSG     DD SYSOUT=*                                   
//IN        DD DSN=INPUT.FILE.TO.BE.SPLITTED,   
//             DISP=SHR                                   
//TOOLIN    DD *                                         
  COUNT FROM(IN)                                         
/*                                                       
//                                                       


I got the following display in the TOOLMSG at sysout.

Code:

SYT000I  SYNCTOOL RELEASE 1.4D - COPYRIGHT 2003  SYNCSORT INC.
SYT001I  INITIAL PROCESSING MODE IS "STOP"                     
SYT002I  "TOOLIN" INTERFACE BEING USED                         
                                                               
           COUNT FROM(IN)                                     
SYT020I  SYNCSORT CALLED WITH IDENTIFIER "0001"               
SYT031I  NUMBER OF RECORDS PROCESSED: 000000000400000
SYT030I  OPERATION COMPLETED WITH RETURN CODE 0               
                                                               
SYT004I  SYNCTOOL PROCESSING COMPLETED WITH RETURN CODE 0     


So, now depending on the count (4,00,000) which JCL utility we can use to dynamically split this file into 4 equal pieces (each file having 1,00,000 records)?
Back to top
View user's profile Send private message
DavidatK

Active Member


Joined: 22 Nov 2005
Posts: 700
Location: Troy, Michigan USA

PostPosted: Thu Mar 09, 2006 12:36 am
Reply with quote

Hi Ranga,

Quote:

Please provide me your suggestions/solutions to achieve the following:

A production job runs daily and creates a huge file with 'n' number of records. Except REXX, I want to use a utility (assuming SYNCSORT with COUNT) to know the 'n' number of records from this file and want to split the file into equal output files (each output file should have 1,00,000 records). How to achieve it dynamically if records vary on daily basis? On a given day we may get 5,00,000 and on the other day we may get 8,00,000 records. So, depending on the count I need to split the input file into 5 or 8 pieces for further processing. After this processing (suppose a COBOL program) I may again get 5 or 8 files. So, I need to merge all these files as single file and need to FTP it to remote customer server. How to automatize this situation?

Please provide your suggestions/solutions/ideas to this problem. Please let me know if you need more inputs/details


I?m not clear on how many records your talking about here. 100,000 or 1,000,000?

What is the requirement for to have only 100,000 records per each split of the file? Even 8,000,000 records shouldn?t be an excessive burden on the system. And, remember, to split the master file into multiple files, you still have to make a pass through the entire file. Why not just process it.

But, that?s off the subject you posted.

I think you are taking the wrong approach to this. Having a variable number of files produced each night creates a problem for scheduling. Every night a variable number of jobs needs to be schecduled.

I think a better way is to have a constant number of files, enough to contain the absolute maximum number of records possible (at least in the foreseeable future), and write your 100,000 records to each of these files. When you run out of records you will have the remaining as empty files. This means that scheduling each night is constant; someone doesn?t have to be changing the job stream each night.

If your process cannot handle an empty file, you can check for an empty condition and COND= to skip the process.

Then you concatenate all the files together for the FTP.

Please come back with comments or questions,

Dave
Back to top
View user's profile Send private message
superk

Global Moderator


Joined: 26 Apr 2004
Posts: 4652
Location: Raleigh, NC, USA

PostPosted: Thu Mar 09, 2006 1:47 am
Reply with quote

Dave, the O/P may be representing the record counts in either lakh's or in crore's. Look up the definition for each here and you'll see what I mean.
Back to top
View user's profile Send private message
DavidatK

Active Member


Joined: 22 Nov 2005
Posts: 700
Location: Troy, Michigan USA

PostPosted: Thu Mar 09, 2006 3:03 am
Reply with quote

superk,

I have not been exposed to unit counts in lakh's or crore's before. My horizons have been expanded. Thanks

Dave
Back to top
View user's profile Send private message
manyone

New User


Joined: 09 Mar 2006
Posts: 9

PostPosted: Thu Mar 09, 2006 7:15 am
Reply with quote

//* all gdg's are 1-entry each
//SORT1 EXEC PGM=SORT
//SYSOUT DD SYSOUT=*
//SORTMSG DD SYSOUT=*
//SORTIN DD DISP=SHR,DSN=TCLM.CQ000154.ECF.LATRAN.X060123
//SORTOUT DD DSN=TEMP.M4J6060.W1.DATA(+1),
// UNIT=TEMP,DISP=(,CATLG,DELETE),
// SPACE=(TRK,(10,10),RLSE)
//SORTWK01 DD UNIT=SYSDA,SPACE=(CYL,(10,20))
//SYSIN DD *
* prefix a 7-digit number to each record
SORT FIELDS=COPY
OUTFIL OUTREC=(SEQNUM,7,ZD,X,1,329)
//*
//SORT2 EXEC PGM=SORT
//SYSOUT DD SYSOUT=*
//SORTMSG DD SYSOUT=*
//SORTIN DD DISP=SHR,DSN=TEMP.M4J6060.W1.DATA
//SORTOF1 DD DSN=TEMP.M4J6060.V1.DATA(+1),
// UNIT=TEMP,DISP=(,CATLG,DELETE),
// SPACE=(TRK,(10,10),RLSE)
//SORTOF2 DD DSN=TEMP.M4J6060.V2.DATA(+1),
//SORTOF3 DD DSN=TEMP.M4J6060.V3.DATA(+1),
//SORTOF4 DD DSN=TEMP.M4J6060.V4.DATA(+1),
//SORTWK01+ DD UNIT=SYSDA,SPACE=(CYL,(10,20))
//SYSIN DD *
SORT FIELDS=COPY
OUTFIL FILES=1,INCLUDE=(1,7,ZD,GT,00,AND,1,7,ZD,LE,10),OUTREC=(9,329)
OUTFIL FILES=2,INCLUDE=(1,7,ZD,GT,10,AND,1,7,ZD,LE,20),OUTREC=(9,329)
OUTFIL FILES=3,INCLUDE=(1,7,ZD,GT,20,AND,1,7,ZD,LE,30),OUTREC=(9,329)
OUTFIL FILES=4,INCLUDE=(1,7,ZD,GT,30,AND,1,7,ZD,LE,40),OUTREC=(9,329)
//
Back to top
View user's profile Send private message
ranga_subham

New User


Joined: 01 Jul 2005
Posts: 51

PostPosted: Thu Mar 09, 2006 2:00 pm
Reply with quote

Hi DavidatK,

Yes. It is 100,000 records. The COBOL program logic processes only 100,000 input records. If it is more in number then it will abend. This approach we have conveyed to the customer (pseudo code) and they agreed upon it. The thing is that the input file will have different no. of records each day. So, we need to act accordingly. They are not bothered about the scheduling because it is on request work. So, this solution will be at work once in a while. But, we are not able to achieve it. I seek the help from you experts.

I have not understood the info provided by manyone. Is it a solution to my problem?

Any inputs? Please help.
Back to top
View user's profile Send private message
manyone

New User


Joined: 09 Mar 2006
Posts: 9

PostPosted: Thu Mar 09, 2006 2:08 pm
Reply with quote

the first sort prefixes a sequence number (automatically incremented) to the record so that 1-7 is sequence number, followed by 1 blank, then my record (329 bytes).
the second record splits the file from above into 4 parts according to the value of the first seven bytes (ie. sequence number). number 1 to 10 (in my example) goes to file 1, 11 to 20 goes to file 2, etc. - simply change to 100000 for your use. the result is 4 files (or however many) that can be input to 4 jobs.
i hope this is what you were looking for.
Back to top
View user's profile Send private message
i413678
Currently Banned

Active User


Joined: 19 Feb 2005
Posts: 112
Location: chennai

PostPosted: Thu Mar 09, 2006 7:09 pm
Reply with quote

Hi manyone,

You had given an excellent solutions.....But, ranga_subham asked we want to repeat this process for 'N' number of records......

is it right ranga_subham..........................


thx in advance...........

pavan
Back to top
View user's profile Send private message
martin9

Active User


Joined: 01 Mar 2006
Posts: 290
Location: Basel, Switzerland

PostPosted: Thu Mar 09, 2006 7:26 pm
Reply with quote

hy,
a possible solution is,
you make a driver job.
means:
job1 is analyzing your input file and creates as many
jobs you need to get scheduled.
send all jobs to a dd wich redirects the jobs to the
internal reader...

//JOB DD SYSOUT=(x,INTRDR)

martin9
Back to top
View user's profile Send private message
ranga_subham

New User


Joined: 01 Jul 2005
Posts: 51

PostPosted: Thu Mar 09, 2006 7:28 pm
Reply with quote

Hi manyone,

The first step has given the following error in the sysout:

Code:

SYSIN :                                         
SORT FIELDS=COPY                               
     *                                         
OUTFIL OUTREC=(SEQNUM,7,ZD,X,1,200)             
             *                                 
WER275A  NO KEYWORDS FOUND ON CONTROL STATEMENT
WER268A  OUTREC STATEMENT  : SYNTAX ERROR       
WER449I  SYNCSORT GLOBAL DSM SUBSYSTEM ACTIVE 
Back to top
View user's profile Send private message
ranga_subham

New User


Joined: 01 Jul 2005
Posts: 51

PostPosted: Thu Mar 09, 2006 7:31 pm
Reply with quote

Hi Martin,

Would you please convert this pseudo code into actual code. That will be a great help to me.

Thanks for ur time.
Back to top
View user's profile Send private message
martin9

Active User


Joined: 01 Mar 2006
Posts: 290
Location: Basel, Switzerland

PostPosted: Thu Mar 09, 2006 8:23 pm
Reply with quote

hy ranga,
you write a porgram which reads the entire input file,
after each 100000 records (or less for the last portition),
you create a single job, writing it to a dd which is
defined as input to the internal reader...
it will be submitted immediate...

the created jobs should consist of:

1. splitting the file in a useful portition (ie. 100000 records)
IDCAMS
REPRO INFILE(IN1) OUTFILE(OUT1) COUNT(100000) {SKIP(n)}
2. running your cobol program
note: all names/ vars ... can be variably

more details?
martin9
Back to top
View user's profile Send private message
manyone

New User


Joined: 09 Mar 2006
Posts: 9

PostPosted: Thu Mar 09, 2006 10:09 pm
Reply with quote

ranga,
sort control commands have to start after column 1 . if the command starts at col 1 it is considered as a label. so please shift right the commands 1 or 2 columns
thanks
Back to top
View user's profile Send private message
ranga_subham

New User


Joined: 01 Jul 2005
Posts: 51

PostPosted: Fri Mar 10, 2006 10:32 am
Reply with quote

Hi Manyone,

From the example you've provided here I've created my job like this. Please let me know if the job is correct. Sorry to bother you again and again. I was kept getting " ABENDED S000 U0016 CN(INTERNAL)" abend.

Code:

//SORT01 EXEC PGM=SORT                               
//SYSOUT   DD SYSOUT=*                               
//SORTMSG  DD SYSOUT=*                               
//SORTIN   DD DSN=INPUT.FILE, 
//            DISP=(SHR,KEEP,KEEP)                   
//SORTOUT  DD DSN=OUTPUT.FILE.SORT, 
//            UNIT=SYSDA,                             
//            DISP=(NEW,CATLG,DELETE),               
//            SPACE=(TRK,(1,1),RLSE),                 
//            DCB=(RECFM=FB,LRECL=200,BLKSIZE=00000) 
//SORTWK01 DD UNIT=SYSDA,SPACE=(CYL,(10,20))         
//** PREFIX A 7-DIGIT NUMBER TO EACH RECORD **//     
//SYSIN    DD *                                       
  SORT FIELDS=COPY                                   
  OUTFIL OUTREC=(SEQNUM,7,ZD,X,1,200)                 
//*                                                   
//SORT02 EXEC PGM=SORT                             
//SYSOUT   DD SYSOUT=*                             
//SORTMSG  DD SYSOUT=*                             
//SORTIN   DD DSN=OUTPUT.FILE.SORT,
//            DISP=SHR                             
//SORTOF1  DD DSN=OUTPUT.FILE.NEW.SORT1,
//            UNIT=SYSDA,                           
//            DISP=(NEW,CATLG,DELETE),             
//            SPACE=(TRK,(1,1),RLSE),               
//            DCB=(RECFM=FB,LRECL=200,BLKSIZE=00000)
//SORTOF2  DD DSN=OUTPUT.FILE.NEW.SORT2,
//            UNIT=SYSDA,                           
//            DISP=(NEW,CATLG,DELETE),             
//            SPACE=(TRK,(1,1),RLSE),               
//            DCB=(RECFM=FB,LRECL=200,BLKSIZE=00000)
//SORTOF3  DD DSN=OUTPUT.FILE.NEW.SORT3,
//            UNIT=SYSDA,                           
//            DISP=(NEW,CATLG,DELETE),             
//            SPACE=(TRK,(1,1),RLSE),               
//            DCB=(RECFM=FB,LRECL=200,BLKSIZE=00000)
//SORTOF4  DD DSN=OUTPUT.FILE.NEW.SORT4,                   
//            UNIT=SYSDA,                                             
//            DISP=(NEW,CATLG,DELETE),                                 
//            SPACE=(TRK,(1,1),RLSE),                                 
//            DCB=(RECFM=FB,LRECL=200,BLKSIZE=00000)                   
//SORTWK01 DD UNIT=SYSDA,SPACE=(CYL,(10,20))                           
//SORTWK02 DD UNIT=SYSDA,SPACE=(CYL,(10,20))                           
//SORTWK03 DD UNIT=SYSDA,SPACE=(CYL,(10,20))                           
//SYSIN    DD *                                                       
  SORT FIELDS=COPY                                                     
  OUTFIL FILES=1,INCLUDE=(1,7,ZD,GT,00,AND,1,7,ZD,LE,10),OUTREC=(9,200)
  OUTFIL FILES=2,INCLUDE=(1,7,ZD,GT,10,AND,1,7,ZD,LE,20),OUTREC=(9,200)
  OUTFIL FILES=3,INCLUDE=(1,7,ZD,GT,20,AND,1,7,ZD,LE,30),OUTREC=(9,200)
  OUTFIL FILES=4,INCLUDE=(1,7,ZD,GT,30,AND,1,7,ZD,LE,40),OUTREC=(9,200)
//                                                                 


Please suggest.
Back to top
View user's profile Send private message
ranga_subham

New User


Joined: 01 Jul 2005
Posts: 51

PostPosted: Fri Mar 10, 2006 10:32 am
Reply with quote

Thanks Martin. I will try that.
Back to top
View user's profile Send private message
rohit jaiswal
Warnings : 2

New User


Joined: 09 Mar 2006
Posts: 36
Location: hyderabad,A.P

PostPosted: Fri Mar 10, 2006 1:18 pm
Reply with quote

hi

can u pls send me the details of it.
Back to top
View user's profile Send private message
manyone

New User


Joined: 09 Mar 2006
Posts: 9

PostPosted: Fri Mar 10, 2006 1:24 pm
Reply with quote

i think you were getting cond 16 because sort1 output is not lrecl 210. pls understand that my example was for a max count of 10 for each file -hence file1 will get 1 to 10, file 2 will get 11 to 20, etc. also i'm trying to understand your process. i'm guessing that you are modifying the records as you process each sub file and you need to reconstruct the modified files back in the original sequence for later ftp. if this is the case, i suggest you keep the sequence number intact (change your sort2 to outrec=(1,210)) and pass it around but in the end, simply resort everything by the sequence number but drop it during outrec. also note that all the files may not always have data (eg. if original has 27 recs, outputs will have 10,10,7,0 in file1,2,3,4). hence your processing should handle empty files accordingly. i hope this will help.
Back to top
View user's profile Send private message
ranga_subham

New User


Joined: 01 Jul 2005
Posts: 51

PostPosted: Fri Mar 10, 2006 6:37 pm
Reply with quote

Ok. I will do as you said. Thanks for the details. All in all, it is very useful info. Thanks for sharing.
Back to top
View user's profile Send private message
martin9

Active User


Joined: 01 Mar 2006
Posts: 290
Location: Basel, Switzerland

PostPosted: Fri Mar 10, 2006 7:03 pm
Reply with quote

hy ranga,

do you think BLKSIZE=00000 will work?
pls provide the error messages, also joblog...

martin9

ps: you wanted a dynamic solution?
Back to top
View user's profile Send private message
View previous topic :: :: View next topic  
Post new topic   Reply to topic View Bookmarks
All times are GMT + 6 Hours
Forum Index -> JCL & VSAM

 


Similar Topics
Topic Forum Replies
No new posts How to split large record length file... DFSORT/ICETOOL 10
No new posts Extracting Variable decimal numbers f... DFSORT/ICETOOL 17
No new posts To get the count of rows for every 1 ... DB2 3
No new posts SFTP Issue - destination file record ... All Other Mainframe Topics 2
No new posts Access to non cataloged VSAM file JCL & VSAM 18
Search our Forums:

Back to Top