IBM Mainframe Forum Index
 
Log In
 
IBM Mainframe Forum Index Mainframe: Search IBM Mainframe Forum: FAQ Register
 

VSAM define for large file


IBM Mainframe Forums -> JCL & VSAM
Post new topic   Reply to topic
View previous topic :: View next topic  
Author Message
jerryte

Active User


Joined: 29 Oct 2010
Posts: 202
Location: Toronto, ON, Canada

PostPosted: Wed Oct 05, 2016 1:51 am
Reply with quote

I want to define a VSAM file that will be loaded once with a large amount of data (50M records). The key is 20 bytes followed by another 20 bytes of data. After loading it will then be read by a program to lookup the key value. No further insert/update/delete will be done.

Below is what I coded. Are there any options that will speed up the read access?
Code:
DEFINE CLUSTER -
 (NAME(my.test.VSAM) -
    INDEXED NONSPANNED               -
    RECORDSIZE(40 40) KEYS(20 0)     -
    NOREUSE                          -
    NOERASE                          -
    SPEED                            -
    OWNER(me)                   -
    BUFFERSPACE(66048)               -
    FREESPACE(0 0)                   -
    SHR(1 3))                        -
  INDEX(NAME(my.test.VSAM.INDEX)      -
    TRACKS(1000 1000) CISZ(4096) )                 -
  DATA(NAME(my.test.VSAM.DATA)        -
    TRACKS(1000 1000)                              -
 )
Back to top
View user's profile Send private message
enrico-sorichetti

Superior Member


Joined: 14 Mar 2007
Posts: 10873
Location: italy

PostPosted: Wed Oct 05, 2016 2:00 am
Reply with quote

see here icon_smile.gif
ibmmainframes.com/viewtopic.php?t=64558&highlight=
Back to top
View user's profile Send private message
Robert Sample

Global Moderator


Joined: 06 Jun 2008
Posts: 8697
Location: Dubuque, Iowa, USA

PostPosted: Wed Oct 05, 2016 2:13 am
Reply with quote

YOUR definition of "a large amount of data (50 million 40-byte records) is MUCH different from my definition of a large amount of data. You've got about 2 GB and that is not even getting close to the 4 GB limit of VSAM; to me a large amount of data gets well past the 4 GB limit (requiring extended data).

Once loaded, the data will be accessed completely randomly -- no sequential reads at all? If so and this data set is allocated to CICS, put it in an LSR pool. If the data set is being used in batch, use BLSR. LSR works well for random access of data in a KSDS.

As far as your definition goes, get rid of BUFFERSPACE. Do not specify space on the index -- VSAM is efficient and good at allocating index space. Change the allocation for the DATA to CYLINDERS instead of TRACKS and if your space management policy allows you, allocate the entire 2725 cylinders for the 50 million records as the primary and use a token secondary (100 cylinders maybe). Make the CISIZE on the DATA 4096 and you can probably use 2048 for the INDEX CISIZE.
Quote:
Are there any options that will speed up the read access?
Why do you think there is a problem with the read access time? If you think there is one, you need to work with your site support group on this -- not ask a forum. Read access time can be affected by a larger variety of factors and hence only someone working at your site -- such as the site support group -- could possibly address the issue.
Back to top
View user's profile Send private message
Rohit Umarjikar

Global Moderator


Joined: 21 Sep 2010
Posts: 3053
Location: NYC,USA

PostPosted: Wed Oct 05, 2016 2:53 am
Reply with quote

enrico-sorichetti, Precise Search icon_biggrin.gif
Back to top
View user's profile Send private message
Abid Hasan

New User


Joined: 25 Mar 2013
Posts: 88
Location: India

PostPosted: Wed Oct 05, 2016 10:51 am
Reply with quote

Hello,

Apologies for tagging onto a post, and for (possible) incorrect usage of terminology.

Robert Sample wrote:
.... if your space management policy allows you, allocate the entire 2725 cylinders for the 50 million records as the primary and use a token secondary (100 cylinders maybe). Make the CISIZE on the DATA 4096 and you can probably use 2048 for the INDEX CISIZE.
.....


A small query from Mr. Sample's post, wanted to understand the recommendation of a high value for primary allocation; in case of 'guaranteed space=yes' it can possibly cause allocation of lot more storage than required.
For the CISIZE, my understanding is for an 'all random read' dataset, a smaller CISIZE is recommended (and vice-versa) so that only the requisite chunk of data is picked and not the adjoining chunks as well, a larger CI value implies more data is picked on an I/O operation and further processing would be required to reach the precise block containing the requested data. Wanted to understand the logic behind the different CI choices between data and index components here (with the record-size being a mere 40).
Back to top
View user's profile Send private message
Robert Sample

Global Moderator


Joined: 06 Jun 2008
Posts: 8697
Location: Dubuque, Iowa, USA

PostPosted: Wed Oct 05, 2016 3:22 pm
Reply with quote

The CISIZE for the INDEX should be large enough to hold pointers to all the DATA component CI in the CA -- anything larger is not going to make any difference. The rough rule of thumb is to take half the key length plus 4 (14 in this case) times the number of CI per CA in the DATA component (15 times 12 or 180 for 4096 CISIZE on 3390) or 2520 for the INDEX CISIZE. 2048 is probably too small, thinking some more about it (although I haven't seen any updated rule of thumb in some time; the key compression may now allow 2048 to work). Note that the INDEX CISIZE does not depend upon how the data is being read. The CISIZE for the DATA component should be small for random access and larger for sequential access -- but the system does use 4K for many things, so using a CISIZE smaller than 4K may increase overhead -- and it would definitely impact the INDEX CISIZE.
Back to top
View user's profile Send private message
Kerry Ropar

New User


Joined: 14 Sep 2016
Posts: 25
Location: Australia

PostPosted: Thu Oct 06, 2016 3:59 am
Reply with quote

Rohit Umarjikar wrote:
enrico-sorichetti, Precise Search icon_biggrin.gif
icon_biggrin.gif
Back to top
View user's profile Send private message
RahulG31

Active User


Joined: 20 Dec 2014
Posts: 446
Location: USA

PostPosted: Thu Oct 06, 2016 8:24 pm
Reply with quote

enrico-sorichetti wrote:
see here icon_smile.gif
ibmmainframes.com/viewtopic.php?t=64558&highlight=
icon_lol.gif LOL
Back to top
View user's profile Send private message
Abid Hasan

New User


Joined: 25 Mar 2013
Posts: 88
Location: India

PostPosted: Thu Oct 06, 2016 9:51 pm
Reply with quote

Thank you Robert, that did put things in prespective.

Though, the VSAM Demystified does say the below; which in a way tends to contradict the numbers which are calculated using the formula from 'Using Data Sets', as was the case you'd shared (and I'm not sure why Demystified says so, wasting space icon_neutral.gif ). I'd think that for an all random read dataset - in a CICS environment (or batch for that matter), this might not be a best-fit, MIGHT even increase the supervisor calls when the CISIZE is increased, though I'd have to test it to support the claims, as I'm not knowledgeable enough in this.

Quote:
....For the data component, defining 4-KB CI size provides a compromise between minimizing
data transfer time and reducing the occurrence of spanned records.
In the index component, each sequence set CI maps one CA data component. Each record
maps one CI in the CA data component. If the index CI size is too small, there is no room for
records to map all CIs. This limitation makes part of the CA unusable, wasting space. For the
index component, use a minimum Index CI size of 3584 if you are using a 4-KB data
component CISIZE. ...


Can you please also shed some light on the high value of Cylinders for primary, and not a rather balanced out value for both prim/secondary.
Back to top
View user's profile Send private message
Robert Sample

Global Moderator


Joined: 06 Jun 2008
Posts: 8697
Location: Dubuque, Iowa, USA

PostPosted: Thu Oct 06, 2016 10:18 pm
Reply with quote

I've seen plenty of VSAM KSDS with 4K data CI size and less than 3584 index CI size that did not waste space -- I think the manual is being very conservative in its recommendation.

For a data set with growth, a balanced set of values for the primary and secondary space makes sense -- you'll be doing splits anyway. However, for a fixed-size data set which is NOT going to grow, allocating all the space in one extent makes sense -- why have multiple extents if not needed?
Back to top
View user's profile Send private message
View previous topic :: :: View next topic  
Post new topic   Reply to topic View Bookmarks
All times are GMT + 6 Hours
Forum Index -> JCL & VSAM

 


Similar Topics
Topic Forum Replies
No new posts Compare 2 files and retrive records f... DFSORT/ICETOOL 3
No new posts FTP VB File from Mainframe retaining ... JCL & VSAM 8
No new posts Extract the file name from another fi... DFSORT/ICETOOL 6
No new posts How to split large record length file... DFSORT/ICETOOL 10
No new posts Extracting Variable decimal numbers f... DFSORT/ICETOOL 17
Search our Forums:

Back to Top