VSAM define for large file

jerryte · Posted: Wed Oct 05, 2016 1:51 am

I want to define a VSAM file that will be loaded once with a large amount of data (50M records). The key is 20 bytes followed by another 20 bytes of data. After loading it will then be read by a program to lookup the key value. No further insert/update/delete will be done.

Below is what I coded. Are there any options that will speed up the read access?

enrico-sorichetti · Posted: Wed Oct 05, 2016 2:00 am

see here

ibmmainframes.com/viewtopic.php?t=64558&highlight=

Robert Sample · Posted: Wed Oct 05, 2016 2:13 am

YOUR definition of "a large amount of data (50 million 40-byte records) is MUCH different from my definition of a large amount of data. You've got about 2 GB and that is not even getting close to the 4 GB limit of VSAM; to me a large amount of data gets well past the 4 GB limit (requiring extended data).

Once loaded, the data will be accessed completely randomly -- no sequential reads at all? If so and this data set is allocated to CICS, put it in an LSR pool. If the data set is being used in batch, use BLSR. LSR works well for random access of data in a KSDS.

As far as your definition goes, get rid of BUFFERSPACE. Do not specify space on the index -- VSAM is efficient and good at allocating index space. Change the allocation for the DATA to CYLINDERS instead of TRACKS and if your space management policy allows you, allocate the entire 2725 cylinders for the 50 million records as the primary and use a token secondary (100 cylinders maybe). Make the CISIZE on the DATA 4096 and you can probably use 2048 for the INDEX CISIZE.

Rohit Umarjikar · Posted: Wed Oct 05, 2016 2:53 am

enrico-sorichetti, Precise Search

Abid Hasan · New User Joined: 25 Mar 2013 Posts: 88 Location: India

Hello,

Apologies for tagging onto a post, and for (possible) incorrect usage of terminology.

Robert Sample · Posted: Wed Oct 05, 2016 3:22 pm

The CISIZE for the INDEX should be large enough to hold pointers to all the DATA component CI in the CA -- anything larger is not going to make any difference. The rough rule of thumb is to take half the key length plus 4 (14 in this case) times the number of CI per CA in the DATA component (15 times 12 or 180 for 4096 CISIZE on 3390) or 2520 for the INDEX CISIZE. 2048 is probably too small, thinking some more about it (although I haven't seen any updated rule of thumb in some time; the key compression may now allow 2048 to work). Note that the INDEX CISIZE does not depend upon how the data is being read. The CISIZE for the DATA component should be small for random access and larger for sequential access -- but the system does use 4K for many things, so using a CISIZE smaller than 4K may increase overhead -- and it would definitely impact the INDEX CISIZE.

Kerry Ropar · New User Joined: 14 Sep 2016 Posts: 25 Location: Australia

RahulG31 · Active User Joined: 20 Dec 2014 Posts: 446 Location: USA

Abid Hasan · New User Joined: 25 Mar 2013 Posts: 88 Location: India

Thank you Robert, that did put things in prespective.

Though, the VSAM Demystified does say the below; which in a way tends to contradict the numbers which are calculated using the formula from 'Using Data Sets', as was the case you'd shared (and I'm not sure why Demystified says so, wasting space

). I'd think that for an all random read dataset - in a CICS environment (or batch for that matter), this might not be a best-fit, MIGHT even increase the supervisor calls when the CISIZE is increased, though I'd have to test it to support the claims, as I'm not knowledgeable enough in this.

Robert Sample · Posted: Thu Oct 06, 2016 10:18 pm

I've seen plenty of VSAM KSDS with 4K data CI size and less than 3584 index CI size that did not waste space -- I think the manual is being very conservative in its recommendation.

For a data set with growth, a balanced set of values for the primary and secondary space makes sense -- you'll be doing splits anyway. However, for a fixed-size data set which is NOT going to grow, allocating all the space in one extent makes sense -- why have multiple extents if not needed?