Help needed on SYNCSORT MERGE

Indrajit_57 · New User Joined: 27 Jun 2006 Posts: 60

Hi,

I am trying to merge records from one file into another file. The first input file is very small (having no more than 1000 records), the second input file is very huge (can have 30 million records). Both the files have same length (13667/FB). I am trying to use the following step to Merge, but the jobs abends with reason SORTIN02 OUT OF SEQ. I know it is because since the Output file is same as SORTIN02, so the records might be no longer in sorted order, once the Merge operation takes off. The reason for using the same file name as SORTIN02 is to save DASD space. Creating a new file altogther may give SPACE abend.

Bill Woodger · Posted: Thu Apr 11, 2013 4:21 pm

It is an extremely bad idea to use the same DSN for SORTOUT as one of the input datasets. In this particular case you have trashed your file.

If you are concerned about DASD use, the answer is going to depend on just how concerned.

If completely concerned, copy your input to "tape" and delete the input. Merge from "tape" and DASD for the small file.

If concerned but it is OK for a couple of hours, backup to "tape" after the Merge, and outside the Critical Path. Then delete the input. Produce your JCL so that it can be run either from DASD directly, or by restoring to DASD first.

Indrajit_57 · New User Joined: 27 Jun 2006 Posts: 60

Hi Bill,

I realized that using the same DSN in SORTOUT is a bad idea when I got the OUT OF SEQ error message. So I used a different file name in SORTOUT. But my concern now is the CPU time and the elapsed time that the Merge is taking.

Is there any way by which I can optimize the Merge? Can using some parameter be helpful?

Thanks,
Indrajit

Bill Woodger · Posted: Thu Apr 11, 2013 5:19 pm

Well, you have OPTION NOEQUALS, which is good, as long as you don't mind which order the data is taken from the individual input files when keys are equal.

A MERGE should be pretty zippy. Have a look in your manual for information on performance tuning, but I don't think you'll find a magic bullet. You have a lot of data. It is going to take the time it is going to take.

I don't know about SyncSort, but DFSORT ignores any BUFNO you specify. You could experiment.

dick scherrer · Posted: Thu Apr 11, 2013 8:05 pm

Hello,

Is all of the data online and available or does media need to be mounted (tape) or recalled due to migration?

Merging 2 files that are in sequence should run very quickly.

How long does it take to simply read both files - no merge?

Indrajit_57 · New User Joined: 27 Jun 2006 Posts: 60

Hi Dick,

All the data is available online (in DASD). When I read the files (no merging), the Infile1 (having around 1500 records) takes 2 sec, the read for Infile 2 (having 1.1 records) takes around 58 secs. However the merge is taking 3 minutes.

Thanks,
Indrajit

Indrajit_57 · New User Joined: 27 Jun 2006 Posts: 60

Hi,

I updated my JCL as below and the job is running within seconds. My getting the expected output. But is this the right way? Will I get some unexpected results with different set of inputs.

Anuj Dhawan · Posted: Fri Apr 12, 2013 10:05 am

gcicchet · Posted: Fri Apr 12, 2013 10:24 am

Hi,

so you are no longer merging records ?

Gerry

Indrajit_57 · New User Joined: 27 Jun 2006 Posts: 60

I just realized that with the above code, the records are getting appended at the end of the file and not appearing in the Sorted order. So basically my purpose is not solved.

Bill Woodger · Posted: Fri Apr 12, 2013 12:08 pm

Yes, DISP=MOD and those Control Cards is going to give you nothing except the data from the small file, in sorted order, appended to the original file.

To add to the lack of benefit, they were already in sorted order, so you've even expended pointless resources in getting the wrong result.

MERGE is fast. You already have OPTION NOEQUALS. Perhaps contact SyncSort support and see if there is anything they can suggest?

dick scherrer · Posted: Fri Apr 12, 2013 11:46 pm

Hello,

If you have not already done so, change the output dsn to a dsn thet is NOT one of the input files.

Post the run time for this.

Indrajit_57 · New User Joined: 27 Jun 2006 Posts: 60

Hi Dick,

In my very first post, I provided the run time along with CPU time and SYSOUT details, which was produced by using a different DSN in SORTOUT.

Thanks,
Indrajit

dick scherrer · Posted: Sun Apr 14, 2013 8:19 am

Hello,

Dale Robertson · New User Joined: 21 Jun 2013 Posts: 44 Location: U.S.A.

Indrajit_57,

It's like a hippo on a submarine - there's no getting around it. You must boink the previous version then allocate a new one or your results will be poobah!

"Taxation without representation is Poobah!"
--MAD Magazine - 1956