IBM Mainframe Forum Index
 
Log In
 
IBM Mainframe Forum Index Mainframe: Search IBM Mainframe Forum: FAQ Register
 

VSAM Tuning


IBM Mainframe Forums -> JCL & VSAM
Post new topic   Reply to topic
View previous topic :: View next topic  
Author Message
Pradip kumar Mohanty

New User


Joined: 16 Sep 2007
Posts: 33
Location: India

PostPosted: Mon Apr 30, 2012 11:44 am
Reply with quote

First, some information of interest

From LISTCAT
KSDS with Variable record size of 146(Avg), 1000(Max) and key length of 15.

Data REC-TOTAL------149,223,801
Index REC-TOTAL---------153812

Data CI size 26624 and Index CI size of 8192.

Index level is 3.

From the EXAMINE report -

MAXIMUM LENGTH DATA RECORD CONTAINS 811 BYTES

4293835 KEYS PROCESSED ON INDEX LEVEL 1, AVERAGE KEY LENGTH: 4.4
143128 KEYS PROCESSED ON INDEX LEVEL 2, AVERAGE KEY LENGTH: 6.0
201 KEYS PROCESSED ON INDEX LEVEL 3, AVERAGE KEY LENGTH: 9.1

Application program usage of the KSDS is - Sequential read - 136115347 and Random read is - 41496.

What should be the best BUFND and BUFNI values ? I have already tried with SMB.

Thanks,
Back to top
View user's profile Send private message
Pradip kumar Mohanty

New User


Joined: 16 Sep 2007
Posts: 33
Location: India

PostPosted: Mon Apr 30, 2012 12:15 pm
Reply with quote

I forgot to add that there is no freespace allocation and the number of extents for the Data part is 460. As it should be, this is a extended dataset, as such I thought 26624 is a better Data CI size than 18432.
Back to top
View user's profile Send private message
Pradip kumar Mohanty

New User


Joined: 16 Sep 2007
Posts: 33
Location: India

PostPosted: Mon Apr 30, 2012 1:33 pm
Reply with quote

One more thing - The driver file with which the KSDS in question is being read is sorted in key order of the KSDS. My bad, i could have added all these information in the very first post.
Back to top
View user's profile Send private message
PeterHolland

Global Moderator


Joined: 27 Oct 2009
Posts: 2481
Location: Netherlands, Amstelveen

PostPosted: Mon Apr 30, 2012 2:26 pm
Reply with quote

There is a lot to find on the internets about vsam tuning, you could start by reading this :

www.slideshare.net/danjodea/vsam-tuning
Back to top
View user's profile Send private message
Robert Sample

Global Moderator


Joined: 06 Jun 2008
Posts: 8700
Location: Dubuque, Iowa, USA

PostPosted: Mon Apr 30, 2012 2:39 pm
Reply with quote

Is the data set read mostly sequentially, or mostly randomly? Is the data set used in CICS or just in batch?
Back to top
View user's profile Send private message
Pradip kumar Mohanty

New User


Joined: 16 Sep 2007
Posts: 33
Location: India

PostPosted: Mon Apr 30, 2012 2:43 pm
Reply with quote

Robert,
This is a batch only file and the usage is -
Sequential read - 136115347 and Random read is - 41496

Thanks,
Back to top
View user's profile Send private message
Robert Sample

Global Moderator


Joined: 06 Jun 2008
Posts: 8700
Location: Dubuque, Iowa, USA

PostPosted: Mon Apr 30, 2012 5:09 pm
Reply with quote

Quote:
This is a batch only file and the usage is -
Sequential read - 136115347 and Random read is - 41496
What does this mean -- is that records read, or EXCP, or something else you haven't explained? Let's try again -- since you have proved yourself a master of providing absolutely useless information (really, the only people in the world who could possibly be concerned about the key compression for each index level would be the IBM employees supporting VSAM) but NOT providing essential information like the way the program is accessing the VSAM data set.

The key thing to know (for determining buffers) for each program accessing a VSAM file is how the program processes the file. There are three options: purely random, purely sequential, and skip-sequential (random read to establish a starting point, then sequential reads after that until the next random read). From the VSAM Demystified redbook, section 2.8.5:
Quote:
 For sequential processing, larger data CI sizes are desirable. For example, given a 16 KB
data buffer space to be fulfilled, it is faster to read two 8 KB CIs with one I/O operation
than four 4 KB CIs with one operation.
 A large CI is a more centralized management of the free space CI, and consequently
causing fewer CI splits. One component with one 16 KB CI with 20% of free space has
less splits than a two 8 KB CIs with the same 20% of free space each.
 For direct processing, smaller data CIs are desirable because the application only needs
to retrieve
and from section 2.8.13 of the same:
Quote:
Tan unnecessary large amount of buffers can degrade the performance. For random it is
recommended to have all the index CIs and lots of data CIs in the buffer pool. For
sequential just one index CI buffer (sequence set) and many data CIs buffers for the look
ahead.
The large data CI size you're using implies your program is doing sequential access but since you're not providing that information, who can tell for sure? And you try too hard to be clever, unsuccessfully.
Quote:
I thought 26624 is a better Data CI size than 18432.
This is wrong. 18K is categorically, absolutely the best block size since it fits the track geometry best. Plus you get 2 additional records per track using 18K versus 26K. And unless your key length is very large, I suspect you are wasting space in your index CI since 8K is overkill for a 26K data CI size.

If your program is doing purely random access to the VSAM data set, you would need at least 4 index buffers and I would use at least 30 data buffers (enough to hold a CA at a time). If your program is doing purely sequential access to the VSAM data set, you need only one index buffer but plenty of data buffers (60 to 150 but you'd want to experiment to see which provides best performance since there's a point beyond which adding buffers actually hurts performance). For skip sequential processing, the mix depends largely upon the program (so different programs may need different buffer counts) but in general, use one index buffer per level plus one and then add a batch of data buffers to handle the sequential access.

And none of these comments will apply if your site is using BLSR (or other such tools) against the VSAM data set.
Back to top
View user's profile Send private message
Pradip kumar Mohanty

New User


Joined: 16 Sep 2007
Posts: 33
Location: India

PostPosted: Tue May 01, 2012 8:10 am
Reply with quote

Thank you for your condenscending remarks.

1, I have never referred about the EXCP count, and fail to understand what is so ambiguous when I say - xxxxxxxxx is the # of sequential read and yyyyyy is the # of random read. Does not it say that its a skip sequential process?

2, Key compression is not something that interests IBM's strage engineers only. Have a read on key compression problem. The key compression for te KSDS in question looks good with a key length of 15, but I thought perhaps someone can have a look at it.

3, From the DFSMS mannual -
Quote:
If a data set is allocated as an extended format data set, 32 bytes (x’20’) are added
to each physical block. Consequently, when the control interval size is calculated or
explicitly specified, this physical block overhead may increase the amount of space
actually needed for the data set. Figure 18 shows the percentage increase in space
as indicated. - 3390 Direct Access Device
Control Interval Size Additional Space Required
512 2.1%
1536 4.5%
18432 12.5%

12.5 % of additional space per Data CI, for a KSDS with over 100 million data records ! If I see it correctly, with 18432 the physical block size is going to be 6144. Do you still say 18432 is the best Data CI size for an extended dataset?
Back to top
View user's profile Send private message
Nic Clouston

Global Moderator


Joined: 10 May 2007
Posts: 2454
Location: Hampshire, UK

PostPosted: Tue May 01, 2012 1:25 pm
Reply with quote

Quote:
xxxxxxxxx is the # of sequential read and yyyyyy is the # of random read

You did not say that this was in ONE program - I thought, and maybe Robert did as well, that these were figures from 2 different programs doing 2 different kinds of processing.
Back to top
View user's profile Send private message
Pradip kumar Mohanty

New User


Joined: 16 Sep 2007
Posts: 33
Location: India

PostPosted: Tue May 01, 2012 2:03 pm
Reply with quote

That is a valid point. But on a second thought, even if one read that as separate read counts, for different programs, I am fine with that. My intention is to get some thoughts on how to go about buffer allocation for such a large file. Quoted text from IBM manual, with very generic recommendation( large data cizise with good amount of buffer HOW BIG, HOW MUCh ( rough idea atleast)), is not what I am looking for, neither I want my job done by forum members. All I intended to, was, to get me some correct and sensible information based on VSAM tuning.

Talking about correct and sensible information, well this is what I have to say on Robert's strong recommendation of 18432 data cisize for an extended dataset.

With extended dataset, for a Data CISIZE of 18432, this is the allocation information -
PHYREC-SIZE---------6144
PHYRECS/TRK------------8
That sums up to 49152 bytes per track.

and with 26624 as CISIZE,
PHYREC-SIZE--------26624
PHYRECS/TRK------------2
That is 26624 * 2 = 53428 bytes per track

The difference ? 4096 bytes per track. . The roughly 4% better track utilization with 18432 Data Cisize does not work here, atleast to my understanding.
Back to top
View user's profile Send private message
Bill Woodger

Moderator Emeritus


Joined: 09 Mar 2011
Posts: 7309
Location: Inside the Matrix

PostPosted: Tue May 01, 2012 2:32 pm
Reply with quote

Are the types of access that you have shown for one run?

How many different programs read the data?

How is the data created/maintained?
Back to top
View user's profile Send private message
Robert Sample

Global Moderator


Joined: 06 Jun 2008
Posts: 8700
Location: Dubuque, Iowa, USA

PostPosted: Tue May 01, 2012 2:38 pm
Reply with quote

It certainly did not seem to be a single program's counts when I read it -- and with such radically different access patterns, you'd want to tune each program separately.

What is the disk type your data set is on? My statement is based, on part, on a presentation Stev Glodowski of IBM Germany made to WAVV in 2007, which CATEGORICALLY states that 18K data CI size is best for space usage (on 3390 disks) -- period. And 18K data CI size uses 3 times 18K or 54K of the track capacity as opposed to 52K for your CI size. I don't know where you got the 6144 from, since when I allocate a VSAM file using 18K data CI size, there's three CI per track.
Quote:
My intention is to get some thoughts on how to go about buffer allocation for such a large file. Quoted text from IBM manual, with very generic recommendation( large data cizise with good amount of buffer HOW BIG, HOW MUCh ( rough idea atleast)), is not what I am looking for, neither I want my job done by forum members. All I intended to, was, to get me some correct and sensible information based on VSAM tuning.
And yet you totally ignore my VERY SPECIFIC recommendations:
Quote:
If your program is doing purely random access to the VSAM data set, you would need at least 4 index buffers and I would use at least 30 data buffers (enough to hold a CA at a time). If your program is doing purely sequential access to the VSAM data set, you need only one index buffer but plenty of data buffers (60 to 150 but you'd want to experiment to see which provides best performance since there's a point beyond which adding buffers actually hurts performance). For skip sequential processing, the mix depends largely upon the program (so different programs may need different buffer counts) but in general, use one index buffer per level plus one and then add a batch of data buffers to handle the sequential access.
With your program's access profile, you would want at least 4 index buffers and 60 to 150 data buffers

Since it appears you have an attitude about being helped, and cannot read what is posted for you, perhaps you should try Beginner's and Students Forum instead of this one.
Back to top
View user's profile Send private message
Pradip kumar Mohanty

New User


Joined: 16 Sep 2007
Posts: 33
Location: India

PostPosted: Tue May 01, 2012 2:55 pm
Reply with quote

Its for one run, but the statistics stays more or less the same for every run.

Consider this as a file having static configuration data. Once a year data refresh is how this file is being maintained. Essentially, a read-only file for application programs.

There are about 100+ programs/Jobs reading this file, not concurrently though and mostly in skip-sequential mode.
Back to top
View user's profile Send private message
Bill Woodger

Moderator Emeritus


Joined: 09 Mar 2011
Posts: 7309
Location: Inside the Matrix

PostPosted: Tue May 01, 2012 3:28 pm
Reply with quote

What is your task? Save on run-time, cpu-time, I/O, DASD? To the best possible, or just a "quick fix" with buffers?

Let's assume you have no resources for even a small change to 100 programs, but anyway, have you considered an ESDS at least for those reading the vast majority of the file? If you have the available space (and time) it might be fun to do at least one full-sized comparison.

If just the buffers fix, experiment with what Robert has suggested for buffers.

If you have time/resources to do it properly, your savings could be considerable, but management doesn't often see things that way. Try to do the full-size trial and check on the cpu and IO.
Back to top
View user's profile Send private message
Pradip kumar Mohanty

New User


Joined: 16 Sep 2007
Posts: 33
Location: India

PostPosted: Tue May 01, 2012 4:10 pm
Reply with quote

The dataset is on 3390. Robert, did you tried to allocate the file on an extended data class? I bet NO.

Suggest you to have a look at z/OS news letter Isuue #20.

I have no attitude to help or get help from anyone, never had.
However, without giving me a chance to clarify, you reproved my post and for what -

1. Key compression/ Index level and number of indexes did not interest you. What is so annoying about just three lines of information? Does that qualify me ' A master of providing useless information' ?

2. The 26624 size does not go well with you. I have explained enough on why i do not think 18432 is a good choice for extended datasets. Did I tried to be clever? Well, does not matter to me if I am not clever. My thought was based entirely on IBM manual and other publications.

I was a bit sarcastic in my defence to your original reply. That is not my character, so my apologies.
Back to top
View user's profile Send private message
Robert Sample

Global Moderator


Joined: 06 Jun 2008
Posts: 8700
Location: Dubuque, Iowa, USA

PostPosted: Tue May 01, 2012 4:36 pm
Reply with quote

Explain exactly what the significance of the average record length and three index level key compression ratios have in relationship to your stated goal of finding recommendations for the number of data and index buffers.

And Bill brings up an excellent point -- just what are you attempting to do? The goal affects the recommendations; if your goal is to minimize memory usage then telling you to use 150 data buffers will conflict with that goal.

With that many records in the file, and primarily sequential access for that many programs, if your programs have already added buffers then there's not much improvement left to get at (other than completing redesigning the whole thing, as was suggested). Your site might investigate the use of a performance management tool to help.
Back to top
View user's profile Send private message
Pradip kumar Mohanty

New User


Joined: 16 Sep 2007
Posts: 33
Location: India

PostPosted: Tue May 01, 2012 4:56 pm
Reply with quote

I consider # of index levels has significance with performance. the less, the better.

Is not it that, average record length helps you caculate the number of records in a block? and thereby the number of buffers ?

Key compression ratio - I just wanted to ensure that the compression ratio are good. To me, they are ( the largest one was about 60%, which is fine as per my knowledge).

I am no expert on storage and thought better provide these information, to have a view if any.

Runtime is on top of my list and DASD at the bottom. Resource usage is in my mind and would trade off that with performance gain, before finalizing the change.
Back to top
View user's profile Send private message
Bill Woodger

Moderator Emeritus


Joined: 09 Mar 2011
Posts: 7309
Location: Inside the Matrix

PostPosted: Tue May 01, 2012 5:09 pm
Reply with quote

You have development resources available? I'd estimate you'd have big savings through specific redesign. Try the one-off ESDS. Remember, you can also have an Alternate Index on an ESDS for direct-style access. Extracts from ESDS to sub-selection KSDS, types of things like that. Good analysis of existing programs/data.
Back to top
View user's profile Send private message
Pradip kumar Mohanty

New User


Joined: 16 Sep 2007
Posts: 33
Location: India

PostPosted: Tue May 01, 2012 5:17 pm
Reply with quote

ESDS suggestion sounds good to me, Thanks Bill. I am going to try that for sure.
Back to top
View user's profile Send private message
Bill Woodger

Moderator Emeritus


Joined: 09 Mar 2011
Posts: 7309
Location: Inside the Matrix

PostPosted: Tue May 01, 2012 5:23 pm
Reply with quote

OK, let us know how it goes and if you have thoughts along the way.
Back to top
View user's profile Send private message
Bill Woodger

Moderator Emeritus


Joined: 09 Mar 2011
Posts: 7309
Location: Inside the Matrix

PostPosted: Tue May 01, 2012 5:51 pm
Reply with quote

Remember, if you are able to design from scratch, you could, for instance, if random access in key order is required across the dataset, split the dataset into n datasets, where n is such a number to get you out of three levels of index, and possibly out of your extended, and then access the multiple KSDSs through a called module. To the programs using the module, it doesn't matter whether a single dataset or 30. To the throughput, it'll make a big difference.

You have the opportunity, given that the data is so static and DASD is not a problem, to do a real tune-up specific to the requirements of the program.

I suspect you can do this yourself, but if you need it done faster and even better, let me know :-)
Back to top
View user's profile Send private message
Pradip kumar Mohanty

New User


Joined: 16 Sep 2007
Posts: 33
Location: India

PostPosted: Wed May 02, 2012 8:26 am
Reply with quote

Bill, I got your suggestion and going to try that. I am contemplating on the criteria to split the file into smaller unit. Purely on the basis of size or a little dirty trick (with a maintainance overhead) of taking the key value into consideration while spliting. The major part of the key is say a bank number and say there are only a dozen unique bank numbers in the file. With that, my thought is to split the file into say 3 parts, each having 4 unique bank numbers, of acceptable size. Then, before calling the IO module, I can check which dataset holds that record viz. IF Bank-NUM = 'xxxxxx' OR 'xxxxxx'....then pass the dataset name to the called module. The only downside of this approach is the future maintainability - If in future we have to add a new bank number into the file, we have add that into our check list. Keeping the very low maintenance of this file in mind, I consider this as a possibility. This program is a part of our first generation of product and we are very open about not marketting this anymore, so that increases the chance of not adding any new bank number in future.

And yes, while I am at it, I would be glad if you can show me your version.
Thank you very much.
Back to top
View user's profile Send private message
Bill Woodger

Moderator Emeritus


Joined: 09 Mar 2011
Posts: 7309
Location: Inside the Matrix

PostPosted: Wed May 02, 2012 11:53 am
Reply with quote

From the modules of the 100+ for which keyed access to data with fewer indexes would help over other methods, keep things simple. Just call the IO module with function, key and record-area and return-status.

Any "complexity", keep solely in the IO module. Whatever range of keys the multiple datasets have, put that information into a control file. Read the control file initially.

Ensure, in the IO module, that keys requested are always ascending. Close datasets which are no longer required. Have a "version" of the IO module which collects statistics about access.

You have an unusual situation of a big dataset which is static for a year at a time. Means you can do some good analysis of the data used, and design accordingly. You has DASD, no "implications" about data-integrity from any types of updates, makes it easy to hold duplicate data for different situations if that is what the analysis says would benefit.

As I don't have anything which matches your situation, I wasn't offering to pass anything on. Just skills in exchange for money :-)

EDIT: What I'm also trying to say, is do the analysis before the design, the design before the coding. At times you'll be amazed at how far the actual implementation is from what you thought initially, but if you stick with "solution first" someone else will be down this same route in a couple of years' time. :-)
Back to top
View user's profile Send private message
View previous topic :: :: View next topic  
Post new topic   Reply to topic View Bookmarks
All times are GMT + 6 Hours
Forum Index -> JCL & VSAM

 


Similar Topics
Topic Forum Replies
No new posts VSAM LISTCAT INFORMATION JCL & VSAM 2
No new posts Access to non cataloged VSAM file JCL & VSAM 18
No new posts Merge two VSAM KSDS files into third ... JCL & VSAM 6
No new posts CVDA value for RRDS VSAM dataset. CICS 2
No new posts VSAM return code 23 - for a Random read COBOL Programming 4
Search our Forums:

Back to Top