Joined: 22 Jun 2007 Posts: 14 Location: South Africa
Hi, thought I would make my first post my solution to a rather difficult job I've had to do recently (thanks Frank for making sort so powerful ). Any ideas/advice on how I could possibly do better would be greatly appreciated.
The problem:
We receive a unknown number of files from a external company that we need to merge into one file for another system that actually processes the data. What makes it even more tricky then just not knowing how many files to expect, the files each contain a header record with a value that indicates which institution (bank) the file belongs to and these need to be in bank and then file sequence order for the system to process the files (the files are numbered 1 -> 9999 in the filename). As an example:
File1: header institute 9
File2: header institute 5
File3: header institute 9
File4: header institute 9
File5: header institute 5
File6: header institute 9
Thus the correct sequence of the files for the processing system would be:
File2: header institute 5
File5: header institute 5
File1: header institute 9
File3: header institute 9
File4: header institute 9
File6: header institute 9
The solution:
My solution requires 3 JCL's, one to get the filenames, one to reorganize the filenames using the value from the header record and then one JCL to merge the files into one file.
JCL-1
Code:
//**********************************************************************
//* DO A LISTCAT OF THE FILES
//**********************************************************************
//IDCLCAT EXEC PGM=IDCAMS,REGION=1500K,ADDRSPC=VIRT
//OUTDD DD DSN=&&CAT,
// DISP=(NEW,CATLG),
// UNIT=SYSDA,SPACE=(TRK,(5,1)),
// RECFM=VBA,LRECL=125
//SYSPRINT DD SYSOUT=*
//AMSDUMP DD SYSOUT=*
//SYSUDUMP DD SYSOUT=*
//SYSOUT DD SYSOUT=*
//SYSIN DD *
LISTCAT OFILE(OUTDD) LVL(DNGG00.CDPAKI)
//**********************************************************************
//* SORT THE CLIST TO EXTRACT THE FILENAMES WE WANT TO A MEMBER WE CAN
//* INCLUDE IN ANOTHER JCL
//* FILES WE WANT:
//* XXXXXX.XXXXXX.C4D*.**
//* EXCLUDE THE Z1 AND Z9 FILES:
//* XXXXXX.XXXXXX.C4D00Z1C.**
//* XXXXXX.XXXXXX.C4D00Z9C.**
//* FORMAT THE FILES AS JCL INPUT:
//* // DD DISP=SHR,DSN=XXXXXX.XXXXXX.C4D*.**
//**********************************************************************
//SORT01 EXEC PGM=SORT
//SYSOUT DD SYSOUT=*
//SORTIN DD DSN=&&CAT,DISP=SHR
//SORTOUT DD DSN=DNCV00.OPSUTIL0.CNTL(SAC4ALL),DISP=SHR
//SYSIN DD *
OPTION VLSHRT
SORT FIELDS=COPY
OUTFIL INCLUDE=(36,3,CH,EQ,C'C4D',AND,
41,3,CH,NE,C'Z1C',AND,
41,3,CH,NE,C'Z9C'),
CONVERT,REMOVECC,
BUILD=(1C'// DD DISP=SHR,DSN=',22,50,80:X)
//*
//PROCLIB JCLLIB ORDER=(QNCV00.OPSUTIL0.CNTL)
//**********************************************************************
//* 1. COPY THE LIST OF FILES (IN1) AND REMOVE THE FORMATTING FOR THE
//* JCL
//* 2. EXTRACT THE BANK NUMBER FROM THE HEADER RECORD OF EACH FILE (IN2)
//* 3. MERGE THE TWO VALUES TO GIVE US A LIST WE CAN USE
//* 4. SORT THE FILENAMES USING THE BANK NUMBER AS THE KEY AND THEN THE
//* FILE NUMBER. DROP THE BANK NUMBER AND REFORMAT THE RECORD TO USE
//* IN JCL
//**********************************************************************
//SORT01 EXEC PGM=ICETOOL
//TOOLMSG DD SYSOUT=*
//DFSMSG DD SYSOUT=*
//IN1 DD DSN=QNCV00.OPSUTIL0.CNTL(SAC4ALL),DISP=SHR
//* INCLUDE THE TEMP FILE AS THE FORMAT OF THE INCLUDE IS NOT CORRECT
//IN2 DD DSN=&&TEMP1,DISP=(NEW,CATLG),UNIT=SYSDA,
// SPACE=(TRK,(1,1)),RECFM=FB,LRECL=127
// INCLUDE MEMBER=SAC4ALL
//TMP1 DD DSN=&&TEMP2,DISP=(MOD,PASS),UNIT=SYSDA
//TMP2 DD DSN=&&TEMP2,DISP=(MOD,PASS),UNIT=SYSDA
//OUT DD DSN=QNCV00.OPSUTIL0.CNTL(SAC4ALL),DISP=SHR
//TOOLIN DD *
* REFORMAT THE IN1 DATASET
COPY FROM(IN1) TO(TMP1) USING(CTL1)
* REFORMAT THE IN2 DATASET
COPY FROM(IN2) TO(TMP1) USING(CTL2)
* SPLICE RECORDS WITH MATCHING SEQUENCE NUMBERS
SPLICE FROM(TMP1) TO(TMP2) ON(60,8,PD) WITH(51,4) USING(CTL3)
* SORT THE RECORDS IN THE CORRECT SEQUENCE AND REMOVE THE VALUE FROM THE
* HEADER. BUILD RECORD TO USE IN JCL
SORT FROM(TMP2) TO(OUT) USING(CTL4)
/*
//CTL1CNTL DD *
* USE OUTREC TO CREATE: |F1FLD|BLANK|SEQNUM|
OUTREC FIELDS=(1:28,50,51:4X,60:SEQNUM,8,PD)
/*
//CTL2CNTL DD *
* USE OUTREC TO CREATE: |BLANK|F2FLD|SEQNUM|
* ONLY INCLUDE THE HEADER RECORD
INCLUDE COND=(1,2,CH,EQ,C'01')
OUTREC FIELDS=(1:50X,51:68,4,60:SEQNUM,8,PD)
/*
//CTL3CNTL DD *
* MERGE TWO RECORDS AND KEEP VALUES WE WANT TO USE
OUTFIL FNAMES=TMP2,OUTREC=(1,60,80:X)
/*
//CTL4CNTL DD *
* MERGE TWO RECORDS AND KEEP VALUES WE WANT TO USE
SORT FIELDS=(51,4,CH,A,1,50,CH,A)
OUTFIL FNAMES=OUT,
OUTREC=(1C'// DD DISP=SHR,DSN=',1,50,80:X)
/*
//*
Joined: 07 Dec 2007 Posts: 2205 Location: San Jose
sclater,
You have the right idea , but you are using one to many passes to get the desired results. if i understand your requirement correctly , it can be done with less no: of passes
what is the LRECL and RECFM of the files involved? is there a way to identify the Header record? If so what is it?
Example assume that you got 3 files like this
File1:
Code:
header institute 9
datarec1
datarec2
...
File2:
Code:
header institute 5
datarec3
datarec4
...
File3:
Code:
header institute 4
datareca
datarecb
...
and you want the output as
Code:
HEADER INSTITUTE 4
DATARECA
DATARECB
HEADER INSTITUTE 5
DATAREC3
DATAREC4
HEADER INSTITUTE 9
DATAREC1
DATAREC2
The following DFSORT ICETOOL job will give you the desired results
Joined: 22 Jun 2007 Posts: 14 Location: South Africa
Skolusu wrote:
sclater,
You have the right idea , but you are using one to many passes to get the desired results. if i understand your requirement correctly , it can be done with less no: of passes
That would be excellent!
Quote:
what is the LRECL and RECFM of the files involved? is there a way to identify the Header record? If so what is it?
LRECL = 127
RECFM = FB
Header can be identified by "01" in positions 1 and 2 (will always only be one header per file).
Key for the insitute is 4 long starting in position 68
This will generate the JCL needeed to SORT all the files with the header indicator. Look at sortout output from step0300. Once you verify that you have created the JCL correctly, then change the following statement in step0300
Joined: 22 Jun 2007 Posts: 14 Location: South Africa
Hi Skolusu,
Thanks for the JCL, but something seems to be going wrong . Its putting data before the first header record. What I also can see is the incoming files have a space filled records separating every data record (don't ask me why) and with yours it sometimes has two space records between the data records and then a couple of records later it has no space record between the data records. It also seems to be changing the data records order within the file
I have also seen a increase in the CPU time when running your's compared to mine (don't really know what the rest mean )
Joined: 07 Dec 2007 Posts: 2205 Location: San Jose
Quote:
Thanks for the JCL, but something seems to be going wrong . Its putting data before the first header record.
sclater,
I looked at your sysout and it is not exactly the same job as I have given you. You have a NEW temp dataset concatenated with all the other input datasets which is not correct.
As for the wrong results may be you haven't explained the rules of the data properly. I assumed that your Data is follows. The header record has the key of 4 bytes in pos 68
Joined: 22 Jun 2007 Posts: 14 Location: South Africa
Skolusu wrote:
I looked at your sysout and it is not exactly the same job as I have given you. You have a NEW temp dataset concatenated with all the other input datasets which is not correct.
The problem is that the generated JCL complains about it having no IN dataset, by adding an empty blank dataset I can overcome this problem the quickest.
Skolusu wrote:
As for the wrong results may be you haven't explained the rules of the data properly. I assumed that your Data is follows. The header record has the key of 4 bytes in pos 68
You are 100% correct about how you describe and expect the data, even your example is correct. What is happening is that because the second portion of the job is an actual sort job and it is only using the 4 digit institution code to sort with, the sort is reordering the sequence of the records. By making a change to the build in the splice to add a sequence number and then changing the sort to also sort on this sequence number, I get the record in the correct sequence:
Joined: 07 Dec 2007 Posts: 2205 Location: San Jose
sclater wrote:
The problem is that the generated JCL complains about it having no IN dataset, by adding an empty blank dataset I can overcome this problem the quickest.
You probably forgot to add the the concatenated DD after the dollar symbol. Copy my JCL as is and you should not have any problem.
What is happening is that because the second portion of the job is an actual sort job and it is only using the 4 digit institution code to sort with, the sort is reordering the sequence of the records. By making a change to the build in the splice to add a sequence number and then changing the sort to also sort on this sequence number, I get the record in the correct sequence:
AFAIK you don't need the additional sequence number. Looks like your Shop has NOEQUALS parm as default. Add OPTION EQUALS line in your CTL2CNTL and re run your job without the sequence number. you should get the same results.
Joined: 22 Jun 2007 Posts: 14 Location: South Africa
Skolusu wrote:
sclater wrote:
The problem is that the generated JCL complains about it having no IN dataset, by adding an empty blank dataset I can overcome this problem the quickest.
You probably forgot to add the the concatenated DD after the dollar symbol. Copy my JCL as is and you should not have any problem.
Hmm, something weird happened, looked at the JCL and everything looked right, it should be generating the DD. Reran the first JCL and it generated the correct JCL. Granted I was messing around with different solutions all writing to SAC4ALL, so I might have overwritten it inbetween running the first and the second JCL.
Skolusu wrote:
AFAIK you don't need the additional sequence number. Looks like your Shop has NOEQUALS parm as default. Add OPTION EQUALS line in your CTL2CNTL and re run your job without the sequence number. you should get the same results.
Aha, that solves teh problem
Thanks for the help I have learned quite a number of new tricks, now if only the rest of my dev team could learn to use DFsort for more than just sorting a file.