I am trying to merge records from one file into another file. The first input file is very small (having no more than 1000 records), the second input file is very huge (can have 30 million records). Both the files have same length (13667/FB). I am trying to use the following step to Merge, but the jobs abends with reason SORTIN02 OUT OF SEQ. I know it is because since the Output file is same as SORTIN02, so the records might be no longer in sorted order, once the Merge operation takes off. The reason for using the same file name as SORTIN02 is to save DASD space. Creating a new file altogther may give SPACE abend.
When I use a new file in SORTOUT, and having 1.1 million records in Infile 2 and around 1000 records in Infile 1, the job takes around 3 minutes to complete with high CPU. My problem is when there will be 30 million records (in production), the elapsed time and CPU will be much higher which will be almost 30 times higher than the one run with sample data. Below is complete sysout (with 1.1 million data in Infile 2)
Code:
MERGE FIELDS=(1,27,CH,A)
OPTION NOEQUALS
WER276B SYSDIAG= 493223, 1761317, 1761317, 4220850
WER164B 6,852K BYTES OF VIRTUAL STORAGE AVAILABLE, MAX REQUESTED,
WER164B 64K BYTES RESERVE REQUESTED, 3,296K BYTES USED
WER146B 64K BYTES OF EMERGENCY SPACE ALLOCATED
WER109I MERGE INPUT : TYPE=F; LRECL= 13667
WER110I SORTOUT : RECFM=FB ; LRECL= 13667; BLKSIZE= 27334
WER410B 5,824K BYTES OF VIRTUAL STORAGE AVAILABLE ABOVE THE 16MEG LINE,
WER410B 0 BYTES RESERVE REQUESTED, 3,128K BYTES USED
WER209B 1,500 PRIMARY AND 3,000 SECONDARY SORTOUT TRACKS ALLOCATED, 3,264 USED
WER211B SYNCSMF CALLED BY SYNCSORT; RC=0000
WER449I SYNCSORT GLOBAL DSM SUBSYSTEM ACTIVE
WER416B SORTIN : EXCP'S=32369
WER416B SORTOUT : EXCP'S=32428,UNIT=3390,DEV=A08C,CHP=(C0C4C8CCD0D4D8DC,1),VO
WER416B TOTAL OF 64,797 EXCP'S ISSUED FOR MERGING
WER054I RCD IN 1165056, OUT 1165056
WER072I NOEQUALS, BALANCE IN EFFECT
WER169I RELEASE 1.3 BATCH 0506 TPF LEVEL 2.1
Joined: 09 Mar 2011 Posts: 7309 Location: Inside the Matrix
It is an extremely bad idea to use the same DSN for SORTOUT as one of the input datasets. In this particular case you have trashed your file.
If you are concerned about DASD use, the answer is going to depend on just how concerned.
If completely concerned, copy your input to "tape" and delete the input. Merge from "tape" and DASD for the small file.
If concerned but it is OK for a couple of hours, backup to "tape" after the Merge, and outside the Critical Path. Then delete the input. Produce your JCL so that it can be run either from DASD directly, or by restoring to DASD first.
I realized that using the same DSN in SORTOUT is a bad idea when I got the OUT OF SEQ error message. So I used a different file name in SORTOUT. But my concern now is the CPU time and the elapsed time that the Merge is taking.
Is there any way by which I can optimize the Merge? Can using some parameter be helpful?
Joined: 09 Mar 2011 Posts: 7309 Location: Inside the Matrix
Well, you have OPTION NOEQUALS, which is good, as long as you don't mind which order the data is taken from the individual input files when keys are equal.
A MERGE should be pretty zippy. Have a look in your manual for information on performance tuning, but I don't think you'll find a magic bullet. You have a lot of data. It is going to take the time it is going to take.
I don't know about SyncSort, but DFSORT ignores any BUFNO you specify. You could experiment.
All the data is available online (in DASD). When I read the files (no merging), the Infile1 (having around 1500 records) takes 2 sec, the read for Infile 2 (having 1.1 records) takes around 58 secs. However the merge is taking 3 minutes.
I updated my JCL as below and the job is running within seconds. My getting the expected output. But is this the right way? Will I get some unexpected results with different set of inputs.
I just realized that with the above code, the records are getting appended at the end of the file and not appearing in the Sorted order. So basically my purpose is not solved.
Joined: 09 Mar 2011 Posts: 7309 Location: Inside the Matrix
Yes, DISP=MOD and those Control Cards is going to give you nothing except the data from the small file, in sorted order, appended to the original file.
To add to the lack of benefit, they were already in sorted order, so you've even expended pointless resources in getting the wrong result.
MERGE is fast. You already have OPTION NOEQUALS. Perhaps contact SyncSort support and see if there is anything they can suggest?
Joined: 23 Nov 2006 Posts: 19243 Location: Inside the Matrix
Hello,
Quote:
How long does it take to simply read both files - no merge?
Quote:
All the data is available online (in DASD). When I read the files (no merging), the Infile1 (having around 1500 records) takes 2 sec, the read for Infile 2 (having 1.1 records) takes around 58 secs. However the merge is taking 3 minutes.
Next, please run these 2 tests copying the data, not just reading it.
It's like a hippo on a submarine - there's no getting around it. You must boink the previous version then allocate a new one or your results will be poobah!
"Taxation without representation is Poobah!"
--MAD Magazine - 1956