Hello there. I have a huge VSAM file (100 million records) which is needed to be processed (eliminating some old records). I wrote SORT (with OMIT statement). Input is one VSAM file, output is new one. But it runs kind of slow. Anything I can add to speed it up?
I know, I could've sorted it to flat file and then REPRO back to VSAM but there is time constraint and job has to finish as quickly as possible. That's why I preallocate new file, copy to it, delete old one then rename new.
So again, is there anything I can add to this SORT here to make it run faster?
How much free space do you think that leaves on a Cylinder? 65%. How many cylinders in your primary? 4000. That means you have data in 1400 cylinders there, and the rest of that is freespace.
Believe me. That is not improving you job's throughput.
Somewhere, how that dataset was defined and loaded is documented. It was not loaded with that freespace. Someone has been "tuning" the dataset (I'll assume correctly) and you need to find the documentation PDQ.
Otherwise, you are in guess territory. I wouldn't like to guess about a dataset with 100,000,000 records and a deadline. Let your boss know of the problem, otherwise it'll be your bottom in a sling if it all goes wrong. If I had to just guess, I'd want as much information about the dataset contents (that'd be the data) as possible. With nothing to go on, I'd have to assume it was loaded with the defaults (by removing the freespace parameter from the define) and then ALTER it after loaded to the freespace you currently have.
In fact, looking at your posting, I'd suggest:
Leave it alone until you can find out exactly how the dataset got that freespace
If you have to get rid of the old records and cannot wait, write a program and delete them in situ, until you can locate the documentation
Maybe the freespace is OK, if you have an extremely volatile dataset, with many inserts across the whole thing, with the inserts being generally in key order, but I suspect you actually have many inserts in one or more particular places, and that the dataset would have been loaded originallly with less freespace (possibly even zero) and then ALTERed to give the freespace for the particular characteristic inserts of the data.
A LISTCAT of the dataset would be interesting, but wouldn't (necessarily) provide a full answer.
I think DFSORT would beat REPRO in a footrace. 100 million will take time. As has been asked already, what are you comparing it to to judge it "slow".
You have a large data CI size. Is the data generally processed in sequence? Sequential inserts at the "back" of the data?
EDIT: How did you test your job? What does that VSAM file look like (LISTCAT) in development and testing environments?