Very Large Sort failed - Seeking recommendations

jmitchell · Posted: Thu Jan 21, 2021 5:26 am

Hello all y'all,

I was just assigned this problem. We have a job that requires sorting a very large file. The input file contains a little over 350 million 822 byte records. We are using DFSORT V2R4 in a z/OS 02.04.00 environment.

Our JCL is set up with SORTIN coming from TAPE3480, and SORTOUT going to TAPE3480. We have SORTWK01 - SORTWK92 allocated to SORTDA, SPACE=(CYL,(500,500),RLSE). Last week, production Abended with Sort Capacity Exceeded at 99% completion level. We knew it would eventually happen. Production Control added SORTWK93 - SORTWK99 to get through the problem. The successful sort took just over 6 hours wall clock and 9 minutes CPU time to complete.

Also, the sort card looks like this: SORT FIELDS=(1,8,A,9,8,A),FORMAT=BI.

Some of my questions are:

1- Does have 2 fields together as separate Sort Fields make it less efficient? Should I change it to (1,16,A),FORMAT=BI? Is the BI format helping?

2- I know I can increase the number of Sort Works to 255, using names like SORTWKA1 - SORTWKZ9. How many should I add?

3-If I add more Sort Works, should I reduce the primary allocation?

4- Should I split the file into smaller sub pieces, such as 10% of the input file, do a sort on each file, then merge them all together?

5- Should I reduce the Sort Works to 32 and have them go to TAPE3480?

6- Other suggestions?

Thanks in advance for sharing your wisdom!

dneufarth · Posted: Thu Jan 21, 2021 6:41 am

Been awhile ago for me, but this may help

www.ibm.com/search?lang=en&cc=us&q=Sorting%20huge%20files%20dfsort

Joerg.Findeisen · Posted: Thu Jan 21, 2021 11:18 am

Remove static SORTWK from JCL, use DYNALLOC=(SYSALLDA,<n>) instead. Provide extra separate storage for the SORTWK datasets as they are some kind of special. Make use of DYNAPCT=<n> and check if you are capable to use the freshly introduced ZSORT option.

Rohit Umarjikar · Posted: Thu Jan 21, 2021 5:57 pm

Option 4 is a long term solution to have it . Year on year volume may get increase and you keep getting these abends.

Also, try providing higher REGION on this step, talk to your storage team to see if they have STORCLAS options to use instead of these SORTWK.

Reading this might help further-
www.ibm.com/support/pages/system/files/inline-files/$FILE/SORTASKP.pdf

Joerg.Findeisen · Posted: Thu Jan 21, 2021 6:15 pm

STORCLAS won't do anything. Try as I say - separate SGRP for those DSN.

Rohit Umarjikar · Posted: Thu Jan 21, 2021 7:35 pm

I still prefer option-4 solution, full proof and long term way.
we use STORCLAS=COMPRESS when we had these type of issues but I don't recall what else was changed along with this. I don't disagree to try with DYNALLOC=(SYSALLDA,<n>).

Joerg.Findeisen · Posted: Fri Jan 22, 2021 12:03 pm

Skip the FILSZ parm, DFSORT knows much more about the data by it's own mechanism. The //SORTDIAG should have SYSOUT=* to provide data for analysis.

Pete Wilson · Posted: Fri Jan 22, 2021 10:43 pm

If you must use SORTWK's (or any systemp file) they cannot be compressed, or extended format. They work better as Large Format files which can grow to any size on the volumes within their max 16 extents.
(DSNTYPE=LARGE in the JCL)

I have found the below works almost every time for very large sorts (e.g. >9 billion records in one case) and without any SORTWK DD's specified. DYNALLOC and REGION can be adjusted according to available resources.

//*
//* FOR REALLY LARGE FILES THE BELOW EXEC PARMS ARE OPTIMAL
//*
//* EXEC PGM=SORT,PARM='DYNALLOC=(,24),DSPSIZE=MAX',REGION=512M
//*
//* NOTES:
//*
//* 1. THE DYNALLOC VALUE SHOULD NOT EXCEED HALF THE NUMBER OF
//* WORKPOOL VOLUMES.
//*
//* 2. REGION SHOULD HAVE A LARGE VALUE TO ALLOW MORE IN-STORAGE
//* CONTROL INFORMATION USED BY THE SORT.
//*
//* 3. DSPMAX ALLOCATES THE LARGEST POSSIBLE DATA-SPACE
//*