Hi,
We need to process a production file that has ~95 million records of lrecl=365. Processing includes omiting few records (after omit the expected records are ~92 million)and then splitting the records to 2 files based on split condition and reformating to 61 byte file o/p lrecl is 61
we wrote a sort card to accomplish this task but this failed after processing ~60 million records saying it exceeds sort capacity. we already included 32 work datasets and included 10 more work datasets and ran successfully but was not consistent
We tried using the sort for omit condition and reformating the records then cobol program for split and write to 2 o/p files. This time the job consistently ran fine and cpu time, I/O service units reduced noticeably
After some research we found ourinitial sort card gave superior results when used with small file and COBOL program gace better results with large file.
Is our inference correct. Is COBOL program always better than Sort utility for reformating and split logic ?
Joined: 20 Oct 2006 Posts: 6966 Location: porcelain throne
probably meant the sortwk statements or DD statements.
the jcl used in the step is suspect.
without seeing all of it, to include the control cards
it will be difficult to provide any sort of help.
if the work can be done with sort control cards,
then the reason the cobol runs faster
is that the control cards for a sort step (or sortwrk) or whatever
is not properly assigned.
I can't see a cobol sort running faster than a DFSORT step.
sysout messages would also help, for both the DFSORT and the COBOL Sort.
there is just not enough information being provided.
observations are of no help.
dbzTHEdinosauer,
I knew he meant something else. I was just trying to add little humour to the post.I really hope he doesn't take it offensively cause I can't even delete it now .
Joined: 20 Oct 2006 Posts: 6966 Location: porcelain throne
sqlcode1,
ok, I jumped too fast and made an assumption.
having followed your posts for several months,
I should have realized your intent.
But I elaborated for the benefit of the TS,
who I believe is making incorrect assumptions based on not having all the facts.
But, he has asked in the proper forum,
and if Frank or Kolusu has the patience to wade thru all the irrelevant garbage that we have left, maybe he will provide an answer to the TS and us.
I think Ronald is a little more rational than I and will not take offense.
Sorry, I posted before having my second cup of Java ( Coffee, that is, not code ). I am chagrined, but I'm not offended.
I'm "guessing" from what I read that in the first scenario, a SORT was being performed as well as the OMITs, reformatting, and splitting - hence the SORT CAPACITY EXCEEDED - whereas in the second scenario no SORT was being performed, just a COPY with OMITs and reformatting.
Furthermore, I'm "guessing" that the reformatting in either scenario is being done during OUTREC processing, not INREC processing - thus forcing the SORT (in scenario one) to carry the entire input record thru the sort process, rather than the (smaller, 61-byte) reformatted record.
But, I would need to see the SYSIN ( not SORTIN ) records from both runs in order to know whether I am "guessing" correctly.
RECORDS - IN: 93183341, OUT: 93166300
RET1 : DELETED = 17628761, REPORT = 0, DATA = 75537539
RET1 : TOTAL IN = 93166300, TOTAL OUT = 75537539
COM1 : DELETED = 41695313, REPORT = 0, DATA = 51470987
COM1 : TOTAL IN = 93166300, TOTAL OUT = 51470987
* *
* JOB START JOB END JOB ELAPSED TIME *
* 08/17/10 02:03:54 08/17/10 02:40:53 00:36:59 *
CPU SERVICE UNITS ::::64,427,861
I/O SERVICE UNITS :::::2,720,547
ALL SERVICE UNITS ::::70,026,527
It is as I suspected: reformatting is occurring during outrec rather than inrec processing.
But there is also a glaring error in your INCLUDE logic: no matter WHAT the value is in position 152 the AND condition will be satisfied because all of the tests are for NE and connected by ORs. Logically, if the value is 'TRN' then it is NE 'STL'; if it is 'STL' then it is NE 'TRN'. So no matter what value is there, the AND condition is true. In the code posted below, I have taken the liberty to change the connectors to ANDs rather than ORs.
Besides that, since you are also applying INCLUDE logic during OUTFIL processing, it makes no sence to select MORE records during pre-sort INCLUDE than will be passed during OUTFIL INCLUDEs. The pre-sort INCLUDE should be inclusive of all records that will be passed during OUTFIL processing.
Then again, the way your OUTFIL INCLUDEs are coded, it would appear that 'BCD' records will be written to BOTH output files. Is that what you wanted? I did NOT change that logic.
Frank or Kolusu may come up with a better solution, but I would suggest:
By changing the INCLUDE logic slightly, you will INCLUDE ONLY those records that will also be INCLUDEd during OUTFIL processing, and will select 'BCD' records with fewer tests.
By doing the reformat during INREC processing, you will considerably reduce the amount of data being handled by the sort process.
By combining fields ( i.e. coding 1,39 instead of 1,4,5,4,9,4,13,18,31,9 ) you will reduce the number of data moves required during the refomatting process ( unless the DFSORT developers put in code to test for, and consolidate same )
Obviously, since reformatting is taking place during INREC, the INCLUDE offsets during OUTFIL processing have to be changed to reflect their new locations.
Since, during reforamtting, we moved positions 47 and 53 next to each other, a single 2-byte test can be used during OUTFIL INCLUDE processing rather than two 1-byte tests connected by AND logic.
By reformatting during INREC processing, the BUILD during OUTFIL processing is simplified to output the entire (reformatted) record.
Thanks for a detailed analysis. we were so dumb to miss that error in the negative check( NE)
By doing the reformat during INREC processing, you will considerably reduce the amount of data being handled by the sort process. - Valuable suggestion and reduced the cpu time by 5 min
We also noticed that less the number of conditions in Include less the cpu time so we optimised our query as below. This is not a conclusion just our observation.So removed all duplicate checks
I'm glad that my response was helpful.
However, I am still curious as to whether you really intend to write ALL of the 'BCD' records to BOTH output files regardless of whether they have an 'R' or a 'C' in position 47 of the input file.