View previous topic :: View next topic
|
Author |
Message |
rmd3003
New User
Joined: 03 Jul 2006 Posts: 55
|
|
|
|
Hello there. I have a huge VSAM file (100 million records) which is needed to be processed (eliminating some old records). I wrote SORT (with OMIT statement). Input is one VSAM file, output is new one. But it runs kind of slow. Anything I can add to speed it up?
I know, I could've sorted it to flat file and then REPRO back to VSAM but there is time constraint and job has to finish as quickly as possible. That's why I preallocate new file, copy to it, delete old one then rename new.
So again, is there anything I can add to this SORT here to make it run faster?
Thank you in advance.
Code: |
//STEP1 EXEC PGM=SORT
//SORTIN DD DSN=VSAMFILE.FILE1.IN,DISP=SHR
//SORTOUT DD DSN=VSAMFILE.FILE1.OUT,DISP=SHR
//SORTMSG DD SYSOUT=*
//SYSOUT DD SYSOUT=*
//SYSUDUMP DD SYSOUT=*
//*
//SYSIN DD *
OPTION COPY
OMIT COND=(12,4,PD,LT,2999) |
Both files (IN and OUT) have identical parameters.
Code: |
DEFINE CLUSTER
(NAME(VSAMFILE.FILE1.IN)
SHAREOPTIONS(2 3)
INDEXED
NOIMBED
NOREPLICATE
NOREUSE
DATACLAS(EXTVSAMC)
RECORDSIZE(100 500)
FREESPACE(50 30)
KEY(28 0))
DATA
(NAME(VSAMFILE.FILE1.IN.DATA)
CYL(4000 2500)
VOLUMES(* * * * * *)
CISZ(18432))
INDEX
(NAME(VSAMFILE.FILE1.IN.INDEX)
CYL(750 150)
VOLUMES(* * )
CISZ(4096)) |
|
|
Back to top |
|
|
Frank Yaeger
DFSORT Developer
Joined: 15 Feb 2005 Posts: 7129 Location: San Jose, CA
|
|
|
|
Quote: |
But it runs kind of slow. |
Compared to what? Based on what criteria?
Which sort product are you using? |
|
Back to top |
|
|
Bill Woodger
Moderator Emeritus
Joined: 09 Mar 2011 Posts: 7309 Location: Inside the Matrix
|
|
|
|
rmd3003 wrote: |
[...]
[...]
|
Do you know what that little bunny is doing?
How much free space do you think that leaves on a Cylinder? 65%. How many cylinders in your primary? 4000. That means you have data in 1400 cylinders there, and the rest of that is freespace.
Believe me. That is not improving you job's throughput.
Somewhere, how that dataset was defined and loaded is documented. It was not loaded with that freespace. Someone has been "tuning" the dataset (I'll assume correctly) and you need to find the documentation PDQ.
Otherwise, you are in guess territory. I wouldn't like to guess about a dataset with 100,000,000 records and a deadline. Let your boss know of the problem, otherwise it'll be your bottom in a sling if it all goes wrong. If I had to just guess, I'd want as much information about the dataset contents (that'd be the data) as possible. With nothing to go on, I'd have to assume it was loaded with the defaults (by removing the freespace parameter from the define) and then ALTER it after loaded to the freespace you currently have.
In fact, looking at your posting, I'd suggest:
- Leave it alone until you can find out exactly how the dataset got that freespace
- If you have to get rid of the old records and cannot wait, write a program and delete them in situ, until you can locate the documentation
- There are always three things
|
|
Back to top |
|
|
Bill Woodger
Moderator Emeritus
Joined: 09 Mar 2011 Posts: 7309 Location: Inside the Matrix
|
|
|
|
Maybe the freespace is OK, if you have an extremely volatile dataset, with many inserts across the whole thing, with the inserts being generally in key order, but I suspect you actually have many inserts in one or more particular places, and that the dataset would have been loaded originallly with less freespace (possibly even zero) and then ALTERed to give the freespace for the particular characteristic inserts of the data.
A LISTCAT of the dataset would be interesting, but wouldn't (necessarily) provide a full answer.
I think DFSORT would beat REPRO in a footrace. 100 million will take time. As has been asked already, what are you comparing it to to judge it "slow".
You have a large data CI size. Is the data generally processed in sequence? Sequential inserts at the "back" of the data?
EDIT: How did you test your job? What does that VSAM file look like (LISTCAT) in development and testing environments? |
|
Back to top |
|
|
PeterHolland
Global Moderator
Joined: 27 Oct 2009 Posts: 2481 Location: Netherlands, Amstelveen
|
|
|
|
Bill Woodger wrote: |
that the dataset would have been loaded originallly with less freespace (possibly even zero) and then ALTERed to give the freespace for the particular characteristic inserts of the data.
|
Normally if using freespace, the vsam dataset is loaded and then altered to FSPC(0 0). If after a period there are lots of splits, a reorg will be necessary. |
|
Back to top |
|
|
Bill Woodger
Moderator Emeritus
Joined: 09 Mar 2011 Posts: 7309 Location: Inside the Matrix
|
|
|
|
Well, I'd do it based on the data if necessary to do anything.
I doubt that was wanted for this dataset, as the high freespace is what it has now, with 100m records. |
|
Back to top |
|
|
|