Clarification on sort performance

Deepa.m · New User Joined: 28 Apr 2005 Posts: 99

Hi,
We need to process a production file that has ~95 million records of lrecl=365. Processing includes omiting few records (after omit the expected records are ~92 million)and then splitting the records to 2 files based on split condition and reformating to 61 byte file o/p lrecl is 61

we wrote a sort card to accomplish this task but this failed after processing ~60 million records saying it exceeds sort capacity. we already included 32 work datasets and included 10 more work datasets and ran successfully but was not consistent

We tried using the sort for omit condition and reformating the records then cobol program for split and write to 2 o/p files. This time the job consistently ran fine and cpu time, I/O service units reduced noticeably

After some research we found ourinitial sort card gave superior results when used with small file and COBOL program gace better results with large file.

Is our inference correct. Is COBOL program always better than Sort utility for reformating and split logic ?

Thanks,
Deepa.

sqlcode1 · Active Member Joined: 08 Apr 2010 Posts: 577 Location: USA

Deepa,
Please post your entire sysout for the abended sort step.

Thanks,

Ronald Burr · Posted: Wed Aug 18, 2010 5:58 pm

At a very minimum, post the SORTIN statements for both processes.

dbzTHEdinosauer · Posted: Wed Aug 18, 2010 6:08 pm

sqlcode1 · Active Member Joined: 08 Apr 2010 Posts: 577 Location: USA

Ronald Burr,

dbzTHEdinosauer · Posted: Wed Aug 18, 2010 7:57 pm

probably meant the sortwk statements or DD statements.

the jcl used in the step is suspect.
without seeing all of it, to include the control cards
it will be difficult to provide any sort of help.

if the work can be done with sort control cards,
then the reason the cobol runs faster
is that the control cards for a sort step (or sortwrk) or whatever
is not properly assigned.

I can't see a cobol sort running faster than a DFSORT step.

sysout messages would also help, for both the DFSORT and the COBOL Sort.

there is just not enough information being provided.
observations are of no help.

sqlcode1 · Active Member Joined: 08 Apr 2010 Posts: 577 Location: USA

dbzTHEdinosauer,
I knew he meant something else. I was just trying to add little humour to the post.I really hope he doesn't take it offensively cause I can't even delete it now

.

Thanks,

dbzTHEdinosauer · Posted: Wed Aug 18, 2010 8:12 pm

sqlcode1,

ok, I jumped too fast and made an assumption.

having followed your posts for several months,
I should have realized your intent.

But I elaborated for the benefit of the TS,
who I believe is making incorrect assumptions based on not having all the facts.
But, he has asked in the proper forum,
and if Frank or Kolusu has the patience to wade thru all the irrelevant garbage that we have left, maybe he will provide an answer to the TS and us.

I think Ronald is a little more rational than I and will not take offense.

Ronald Burr · Posted: Wed Aug 18, 2010 8:55 pm

Sorry, I posted before having my second cup of Java ( Coffee, that is, not code ). I am chagrined, but I'm not offended.

I'm "guessing" from what I read that in the first scenario, a SORT was being performed as well as the OMITs, reformatting, and splitting - hence the SORT CAPACITY EXCEEDED - whereas in the second scenario no SORT was being performed, just a COPY with OMITs and reformatting.

Furthermore, I'm "guessing" that the reformatting in either scenario is being done during OUTREC processing, not INREC processing - thus forcing the SORT (in scenario one) to carry the entire input record thru the sort process, rather than the (smaller, 61-byte) reformatted record.

But, I would need to see the SYSIN ( not SORTIN ) records from both runs in order to know whether I am "guessing" correctly.

Frank Yaeger · Posted: Wed Aug 18, 2010 10:43 pm

Deepa,

It's impossible to help you based on the little bit of information you've given (and you certainly seem to be jumping the gun on your "conclusion").

Add the following to your sort job:

//SORTDIAG DD DUMMY

to show the diagnostic messages. Then rerun it and post the complete JES log (or e-mail it to me directly - yaeger@us.ibm.com) and I'll take a look.

Deepa.m · New User Joined: 28 Apr 2005 Posts: 99

We are breaking our heads for the past 2 days to find a effective solution and was in a hurry while composing this query. Here are the details

Sort card used

Deepa.m · New User Joined: 28 Apr 2005 Posts: 99

Frank,

I am recalling the i/p file and will provide jeslog once done. I only saved few details from the last run which i furnished above.

Ronald Burr · Posted: Thu Aug 19, 2010 7:16 pm

It is as I suspected: reformatting is occurring during outrec rather than inrec processing.

But there is also a glaring error in your INCLUDE logic: no matter WHAT the value is in position 152 the AND condition will be satisfied because all of the tests are for NE and connected by ORs. Logically, if the value is 'TRN' then it is NE 'STL'; if it is 'STL' then it is NE 'TRN'. So no matter what value is there, the AND condition is true. In the code posted below, I have taken the liberty to change the connectors to ANDs rather than ORs.

Besides that, since you are also applying INCLUDE logic during OUTFIL processing, it makes no sence to select MORE records during pre-sort INCLUDE than will be passed during OUTFIL INCLUDEs. The pre-sort INCLUDE should be inclusive of all records that will be passed during OUTFIL processing.

Then again, the way your OUTFIL INCLUDEs are coded, it would appear that 'BCD' records will be written to BOTH output files. Is that what you wanted? I did NOT change that logic.

Frank or Kolusu may come up with a better solution, but I would suggest:

Deepa.m · New User Joined: 28 Apr 2005 Posts: 99

Ronald,

Thanks for a detailed analysis. we were so dumb to miss that error in the negative check( NE)

By doing the reformat during INREC processing, you will considerably reduce the amount of data being handled by the sort process. - Valuable suggestion and reduced the cpu time by 5 min

We also noticed that less the number of conditions in Include less the cpu time so we optimised our query as below. This is not a conclusion just our observation.So removed all duplicate checks

INCLUDE COND=(1,4,BI,GT,X'00010000',AND,
152,3,CH,NE,C'TRN',AND,152,3,CH,NE,C'STL',AND,
152,3,CH,NE,C'INS',AND,152,3,CH,NE,C'CRS',AND,
152,3,CH,NE,'BMC')
INREC FIELDS=(1,39,47,1,53,1,60,10,90,1,152,3,197,6)
SORT FIELDS=(5,4,BI,A)
OUTFIL FNAMES=RET1,INCLUDE=(53,3,CH,EQ,C'BCD',OR,40,2,CH,EQ,C'RO'),BUILD=(1,61)
OUTFIL FNAMES=COM1,INCLUDE=(53,3,CH,EQ,C'BCD',OR,40,2,CH,EQ,C'CO'),BUILD=(1,61)

overall your suggestions were valuable and made a great difference.
Thank you

Ronald Burr · Posted: Fri Aug 20, 2010 5:58 pm

I'm glad that my response was helpful.
However, I am still curious as to whether you really intend to write ALL of the 'BCD' records to BOTH output files regardless of whether they have an 'R' or a 'C' in position 47 of the input file.