VSAM Tuning

Pradip kumar Mohanty · New User Joined: 16 Sep 2007 Posts: 33 Location: India

First, some information of interest

From LISTCAT
KSDS with Variable record size of 146(Avg), 1000(Max) and key length of 15.

Data REC-TOTAL------149,223,801
Index REC-TOTAL---------153812

Data CI size 26624 and Index CI size of 8192.

Index level is 3.

From the EXAMINE report -

MAXIMUM LENGTH DATA RECORD CONTAINS 811 BYTES

4293835 KEYS PROCESSED ON INDEX LEVEL 1, AVERAGE KEY LENGTH: 4.4
143128 KEYS PROCESSED ON INDEX LEVEL 2, AVERAGE KEY LENGTH: 6.0
201 KEYS PROCESSED ON INDEX LEVEL 3, AVERAGE KEY LENGTH: 9.1

Application program usage of the KSDS is - Sequential read - 136115347 and Random read is - 41496.

What should be the best BUFND and BUFNI values ? I have already tried with SMB.

Thanks,

Pradip kumar Mohanty · New User Joined: 16 Sep 2007 Posts: 33 Location: India

I forgot to add that there is no freespace allocation and the number of extents for the Data part is 460. As it should be, this is a extended dataset, as such I thought 26624 is a better Data CI size than 18432.

Pradip kumar Mohanty · New User Joined: 16 Sep 2007 Posts: 33 Location: India

One more thing - The driver file with which the KSDS in question is being read is sorted in key order of the KSDS. My bad, i could have added all these information in the very first post.

PeterHolland · Posted: Mon Apr 30, 2012 2:26 pm

There is a lot to find on the internets about vsam tuning, you could start by reading this :

www.slideshare.net/danjodea/vsam-tuning

Robert Sample · Posted: Mon Apr 30, 2012 2:39 pm

Is the data set read mostly sequentially, or mostly randomly? Is the data set used in CICS or just in batch?

Pradip kumar Mohanty · New User Joined: 16 Sep 2007 Posts: 33 Location: India

Robert,
This is a batch only file and the usage is -
Sequential read - 136115347 and Random read is - 41496

Thanks,

Robert Sample · Posted: Mon Apr 30, 2012 5:09 pm

Pradip kumar Mohanty · New User Joined: 16 Sep 2007 Posts: 33 Location: India

Thank you for your condenscending remarks.

1, I have never referred about the EXCP count, and fail to understand what is so ambiguous when I say - xxxxxxxxx is the # of sequential read and yyyyyy is the # of random read. Does not it say that its a skip sequential process?

2, Key compression is not something that interests IBM's strage engineers only. Have a read on key compression problem. The key compression for te KSDS in question looks good with a key length of 15, but I thought perhaps someone can have a look at it.

3, From the DFSMS mannual -

Nic Clouston · Posted: Tue May 01, 2012 1:25 pm

Pradip kumar Mohanty · New User Joined: 16 Sep 2007 Posts: 33 Location: India

That is a valid point. But on a second thought, even if one read that as separate read counts, for different programs, I am fine with that. My intention is to get some thoughts on how to go about buffer allocation for such a large file. Quoted text from IBM manual, with very generic recommendation( large data cizise with good amount of buffer HOW BIG, HOW MUCh ( rough idea atleast)), is not what I am looking for, neither I want my job done by forum members. All I intended to, was, to get me some correct and sensible information based on VSAM tuning.

Talking about correct and sensible information, well this is what I have to say on Robert's strong recommendation of 18432 data cisize for an extended dataset.

With extended dataset, for a Data CISIZE of 18432, this is the allocation information -
PHYREC-SIZE---------6144
PHYRECS/TRK------------8
That sums up to 49152 bytes per track.

and with 26624 as CISIZE,
PHYREC-SIZE--------26624
PHYRECS/TRK------------2
That is 26624 * 2 = 53428 bytes per track

The difference ? 4096 bytes per track. . The roughly 4% better track utilization with 18432 Data Cisize does not work here, atleast to my understanding.

Bill Woodger · Posted: Tue May 01, 2012 2:32 pm

Are the types of access that you have shown for one run?

How many different programs read the data?

How is the data created/maintained?

Robert Sample · Posted: Tue May 01, 2012 2:38 pm

It certainly did not seem to be a single program's counts when I read it -- and with such radically different access patterns, you'd want to tune each program separately.

What is the disk type your data set is on? My statement is based, on part, on a presentation Stev Glodowski of IBM Germany made to WAVV in 2007, which CATEGORICALLY states that 18K data CI size is best for space usage (on 3390 disks) -- period. And 18K data CI size uses 3 times 18K or 54K of the track capacity as opposed to 52K for your CI size. I don't know where you got the 6144 from, since when I allocate a VSAM file using 18K data CI size, there's three CI per track.

Pradip kumar Mohanty · New User Joined: 16 Sep 2007 Posts: 33 Location: India

Its for one run, but the statistics stays more or less the same for every run.

Consider this as a file having static configuration data. Once a year data refresh is how this file is being maintained. Essentially, a read-only file for application programs.

There are about 100+ programs/Jobs reading this file, not concurrently though and mostly in skip-sequential mode.

Bill Woodger · Posted: Tue May 01, 2012 3:28 pm

What is your task? Save on run-time, cpu-time, I/O, DASD? To the best possible, or just a "quick fix" with buffers?

Let's assume you have no resources for even a small change to 100 programs, but anyway, have you considered an ESDS at least for those reading the vast majority of the file? If you have the available space (and time) it might be fun to do at least one full-sized comparison.

If just the buffers fix, experiment with what Robert has suggested for buffers.

If you have time/resources to do it properly, your savings could be considerable, but management doesn't often see things that way. Try to do the full-size trial and check on the cpu and IO.

Pradip kumar Mohanty · New User Joined: 16 Sep 2007 Posts: 33 Location: India

The dataset is on 3390. Robert, did you tried to allocate the file on an extended data class? I bet NO.

Suggest you to have a look at z/OS news letter Isuue #20.

I have no attitude to help or get help from anyone, never had.
However, without giving me a chance to clarify, you reproved my post and for what -

1. Key compression/ Index level and number of indexes did not interest you. What is so annoying about just three lines of information? Does that qualify me ' A master of providing useless information' ?

2. The 26624 size does not go well with you. I have explained enough on why i do not think 18432 is a good choice for extended datasets. Did I tried to be clever? Well, does not matter to me if I am not clever. My thought was based entirely on IBM manual and other publications.

I was a bit sarcastic in my defence to your original reply. That is not my character, so my apologies.

Robert Sample · Posted: Tue May 01, 2012 4:36 pm

Explain exactly what the significance of the average record length and three index level key compression ratios have in relationship to your stated goal of finding recommendations for the number of data and index buffers.

And Bill brings up an excellent point -- just what are you attempting to do? The goal affects the recommendations; if your goal is to minimize memory usage then telling you to use 150 data buffers will conflict with that goal.

With that many records in the file, and primarily sequential access for that many programs, if your programs have already added buffers then there's not much improvement left to get at (other than completing redesigning the whole thing, as was suggested). Your site might investigate the use of a performance management tool to help.

Pradip kumar Mohanty · New User Joined: 16 Sep 2007 Posts: 33 Location: India

I consider # of index levels has significance with performance. the less, the better.

Is not it that, average record length helps you caculate the number of records in a block? and thereby the number of buffers ?

Key compression ratio - I just wanted to ensure that the compression ratio are good. To me, they are ( the largest one was about 60%, which is fine as per my knowledge).

I am no expert on storage and thought better provide these information, to have a view if any.

Runtime is on top of my list and DASD at the bottom. Resource usage is in my mind and would trade off that with performance gain, before finalizing the change.

Bill Woodger · Posted: Tue May 01, 2012 5:09 pm

You have development resources available? I'd estimate you'd have big savings through specific redesign. Try the one-off ESDS. Remember, you can also have an Alternate Index on an ESDS for direct-style access. Extracts from ESDS to sub-selection KSDS, types of things like that. Good analysis of existing programs/data.

Pradip kumar Mohanty · New User Joined: 16 Sep 2007 Posts: 33 Location: India

ESDS suggestion sounds good to me, Thanks Bill. I am going to try that for sure.

Bill Woodger · Posted: Tue May 01, 2012 5:23 pm

OK, let us know how it goes and if you have thoughts along the way.

Bill Woodger · Posted: Tue May 01, 2012 5:51 pm

Remember, if you are able to design from scratch, you could, for instance, if random access in key order is required across the dataset, split the dataset into n datasets, where n is such a number to get you out of three levels of index, and possibly out of your extended, and then access the multiple KSDSs through a called module. To the programs using the module, it doesn't matter whether a single dataset or 30. To the throughput, it'll make a big difference.

You have the opportunity, given that the data is so static and DASD is not a problem, to do a real tune-up specific to the requirements of the program.

I suspect you can do this yourself, but if you need it done faster and even better, let me know :-)

Pradip kumar Mohanty · New User Joined: 16 Sep 2007 Posts: 33 Location: India

Bill, I got your suggestion and going to try that. I am contemplating on the criteria to split the file into smaller unit. Purely on the basis of size or a little dirty trick (with a maintainance overhead) of taking the key value into consideration while spliting. The major part of the key is say a bank number and say there are only a dozen unique bank numbers in the file. With that, my thought is to split the file into say 3 parts, each having 4 unique bank numbers, of acceptable size. Then, before calling the IO module, I can check which dataset holds that record viz. IF Bank-NUM = 'xxxxxx' OR 'xxxxxx'....then pass the dataset name to the called module. The only downside of this approach is the future maintainability - If in future we have to add a new bank number into the file, we have add that into our check list. Keeping the very low maintenance of this file in mind, I consider this as a possibility. This program is a part of our first generation of product and we are very open about not marketting this anymore, so that increases the chance of not adding any new bank number in future.

And yes, while I am at it, I would be glad if you can show me your version.
Thank you very much.

Bill Woodger · Posted: Wed May 02, 2012 11:53 am

From the modules of the 100+ for which keyed access to data with fewer indexes would help over other methods, keep things simple. Just call the IO module with function, key and record-area and return-status.

Any "complexity", keep solely in the IO module. Whatever range of keys the multiple datasets have, put that information into a control file. Read the control file initially.

Ensure, in the IO module, that keys requested are always ascending. Close datasets which are no longer required. Have a "version" of the IO module which collects statistics about access.

You have an unusual situation of a big dataset which is static for a year at a time. Means you can do some good analysis of the data used, and design accordingly. You has DASD, no "implications" about data-integrity from any types of updates, makes it easy to hold duplicate data for different situations if that is what the analysis says would benefit.

As I don't have anything which matches your situation, I wasn't offering to pass anything on. Just skills in exchange for money :-)

EDIT: What I'm also trying to say, is do the analysis before the design, the design before the coding. At times you'll be amazed at how far the actual implementation is from what you thought initially, but if you stick with "solution first" someone else will be down this same route in a couple of years' time. :-)