From LISTCAT
KSDS with Variable record size of 146(Avg), 1000(Max) and key length of 15.
Data REC-TOTAL------149,223,801
Index REC-TOTAL---------153812
Data CI size 26624 and Index CI size of 8192.
Index level is 3.
From the EXAMINE report -
MAXIMUM LENGTH DATA RECORD CONTAINS 811 BYTES
4293835 KEYS PROCESSED ON INDEX LEVEL 1, AVERAGE KEY LENGTH: 4.4
143128 KEYS PROCESSED ON INDEX LEVEL 2, AVERAGE KEY LENGTH: 6.0
201 KEYS PROCESSED ON INDEX LEVEL 3, AVERAGE KEY LENGTH: 9.1
Application program usage of the KSDS is - Sequential read - 136115347 and Random read is - 41496.
What should be the best BUFND and BUFNI values ? I have already tried with SMB.
I forgot to add that there is no freespace allocation and the number of extents for the Data part is 460. As it should be, this is a extended dataset, as such I thought 26624 is a better Data CI size than 18432.
One more thing - The driver file with which the KSDS in question is being read is sorted in key order of the KSDS. My bad, i could have added all these information in the very first post.
Joined: 06 Jun 2008 Posts: 8700 Location: Dubuque, Iowa, USA
Quote:
This is a batch only file and the usage is -
Sequential read - 136115347 and Random read is - 41496
What does this mean -- is that records read, or EXCP, or something else you haven't explained? Let's try again -- since you have proved yourself a master of providing absolutely useless information (really, the only people in the world who could possibly be concerned about the key compression for each index level would be the IBM employees supporting VSAM) but NOT providing essential information like the way the program is accessing the VSAM data set.
The key thing to know (for determining buffers) for each program accessing a VSAM file is how the program processes the file. There are three options: purely random, purely sequential, and skip-sequential (random read to establish a starting point, then sequential reads after that until the next random read). From the VSAM Demystified redbook, section 2.8.5:
Quote:
For sequential processing, larger data CI sizes are desirable. For example, given a 16 KB
data buffer space to be fulfilled, it is faster to read two 8 KB CIs with one I/O operation
than four 4 KB CIs with one operation.
A large CI is a more centralized management of the free space CI, and consequently
causing fewer CI splits. One component with one 16 KB CI with 20% of free space has
less splits than a two 8 KB CIs with the same 20% of free space each.
For direct processing, smaller data CIs are desirable because the application only needs
to retrieve
and from section 2.8.13 of the same:
Quote:
Tan unnecessary large amount of buffers can degrade the performance. For random it is
recommended to have all the index CIs and lots of data CIs in the buffer pool. For
sequential just one index CI buffer (sequence set) and many data CIs buffers for the look
ahead.
The large data CI size you're using implies your program is doing sequential access but since you're not providing that information, who can tell for sure? And you try too hard to be clever, unsuccessfully.
Quote:
I thought 26624 is a better Data CI size than 18432.
This is wrong. 18K is categorically, absolutely the best block size since it fits the track geometry best. Plus you get 2 additional records per track using 18K versus 26K. And unless your key length is very large, I suspect you are wasting space in your index CI since 8K is overkill for a 26K data CI size.
If your program is doing purely random access to the VSAM data set, you would need at least 4 index buffers and I would use at least 30 data buffers (enough to hold a CA at a time). If your program is doing purely sequential access to the VSAM data set, you need only one index buffer but plenty of data buffers (60 to 150 but you'd want to experiment to see which provides best performance since there's a point beyond which adding buffers actually hurts performance). For skip sequential processing, the mix depends largely upon the program (so different programs may need different buffer counts) but in general, use one index buffer per level plus one and then add a batch of data buffers to handle the sequential access.
And none of these comments will apply if your site is using BLSR (or other such tools) against the VSAM data set.
1, I have never referred about the EXCP count, and fail to understand what is so ambiguous when I say - xxxxxxxxx is the # of sequential read and yyyyyy is the # of random read. Does not it say that its a skip sequential process?
2, Key compression is not something that interests IBM's strage engineers only. Have a read on key compression problem. The key compression for te KSDS in question looks good with a key length of 15, but I thought perhaps someone can have a look at it.
3, From the DFSMS mannual -
Quote:
If a data set is allocated as an extended format data set, 32 bytes (x’20’) are added
to each physical block. Consequently, when the control interval size is calculated or
explicitly specified, this physical block overhead may increase the amount of space
actually needed for the data set. Figure 18 shows the percentage increase in space
as indicated. - 3390 Direct Access Device
Control Interval Size Additional Space Required
512 2.1%
1536 4.5%
18432 12.5%
12.5 % of additional space per Data CI, for a KSDS with over 100 million data records ! If I see it correctly, with 18432 the physical block size is going to be 6144. Do you still say 18432 is the best Data CI size for an extended dataset?
Joined: 10 May 2007 Posts: 2454 Location: Hampshire, UK
Quote:
xxxxxxxxx is the # of sequential read and yyyyyy is the # of random read
You did not say that this was in ONE program - I thought, and maybe Robert did as well, that these were figures from 2 different programs doing 2 different kinds of processing.
That is a valid point. But on a second thought, even if one read that as separate read counts, for different programs, I am fine with that. My intention is to get some thoughts on how to go about buffer allocation for such a large file. Quoted text from IBM manual, with very generic recommendation( large data cizise with good amount of buffer HOW BIG, HOW MUCh ( rough idea atleast)), is not what I am looking for, neither I want my job done by forum members. All I intended to, was, to get me some correct and sensible information based on VSAM tuning.
Talking about correct and sensible information, well this is what I have to say on Robert's strong recommendation of 18432 data cisize for an extended dataset.
With extended dataset, for a Data CISIZE of 18432, this is the allocation information -
PHYREC-SIZE---------6144
PHYRECS/TRK------------8
That sums up to 49152 bytes per track.
and with 26624 as CISIZE,
PHYREC-SIZE--------26624
PHYRECS/TRK------------2
That is 26624 * 2 = 53428 bytes per track
The difference ? 4096 bytes per track. . The roughly 4% better track utilization with 18432 Data Cisize does not work here, atleast to my understanding.
Joined: 06 Jun 2008 Posts: 8700 Location: Dubuque, Iowa, USA
It certainly did not seem to be a single program's counts when I read it -- and with such radically different access patterns, you'd want to tune each program separately.
What is the disk type your data set is on? My statement is based, on part, on a presentation Stev Glodowski of IBM Germany made to WAVV in 2007, which CATEGORICALLY states that 18K data CI size is best for space usage (on 3390 disks) -- period. And 18K data CI size uses 3 times 18K or 54K of the track capacity as opposed to 52K for your CI size. I don't know where you got the 6144 from, since when I allocate a VSAM file using 18K data CI size, there's three CI per track.
Quote:
My intention is to get some thoughts on how to go about buffer allocation for such a large file. Quoted text from IBM manual, with very generic recommendation( large data cizise with good amount of buffer HOW BIG, HOW MUCh ( rough idea atleast)), is not what I am looking for, neither I want my job done by forum members. All I intended to, was, to get me some correct and sensible information based on VSAM tuning.
And yet you totally ignore my VERY SPECIFIC recommendations:
Quote:
If your program is doing purely random access to the VSAM data set, you would need at least 4 index buffers and I would use at least 30 data buffers (enough to hold a CA at a time). If your program is doing purely sequential access to the VSAM data set, you need only one index buffer but plenty of data buffers (60 to 150 but you'd want to experiment to see which provides best performance since there's a point beyond which adding buffers actually hurts performance). For skip sequential processing, the mix depends largely upon the program (so different programs may need different buffer counts) but in general, use one index buffer per level plus one and then add a batch of data buffers to handle the sequential access.
With your program's access profile, you would want at least 4 index buffers and 60 to 150 data buffers
Since it appears you have an attitude about being helped, and cannot read what is posted for you, perhaps you should try Beginner's and Students Forum instead of this one.
Its for one run, but the statistics stays more or less the same for every run.
Consider this as a file having static configuration data. Once a year data refresh is how this file is being maintained. Essentially, a read-only file for application programs.
There are about 100+ programs/Jobs reading this file, not concurrently though and mostly in skip-sequential mode.
Joined: 09 Mar 2011 Posts: 7309 Location: Inside the Matrix
What is your task? Save on run-time, cpu-time, I/O, DASD? To the best possible, or just a "quick fix" with buffers?
Let's assume you have no resources for even a small change to 100 programs, but anyway, have you considered an ESDS at least for those reading the vast majority of the file? If you have the available space (and time) it might be fun to do at least one full-sized comparison.
If just the buffers fix, experiment with what Robert has suggested for buffers.
If you have time/resources to do it properly, your savings could be considerable, but management doesn't often see things that way. Try to do the full-size trial and check on the cpu and IO.
The dataset is on 3390. Robert, did you tried to allocate the file on an extended data class? I bet NO.
Suggest you to have a look at z/OS news letter Isuue #20.
I have no attitude to help or get help from anyone, never had.
However, without giving me a chance to clarify, you reproved my post and for what -
1. Key compression/ Index level and number of indexes did not interest you. What is so annoying about just three lines of information? Does that qualify me ' A master of providing useless information' ?
2. The 26624 size does not go well with you. I have explained enough on why i do not think 18432 is a good choice for extended datasets. Did I tried to be clever? Well, does not matter to me if I am not clever. My thought was based entirely on IBM manual and other publications.
I was a bit sarcastic in my defence to your original reply. That is not my character, so my apologies.
Joined: 06 Jun 2008 Posts: 8700 Location: Dubuque, Iowa, USA
Explain exactly what the significance of the average record length and three index level key compression ratios have in relationship to your stated goal of finding recommendations for the number of data and index buffers.
And Bill brings up an excellent point -- just what are you attempting to do? The goal affects the recommendations; if your goal is to minimize memory usage then telling you to use 150 data buffers will conflict with that goal.
With that many records in the file, and primarily sequential access for that many programs, if your programs have already added buffers then there's not much improvement left to get at (other than completing redesigning the whole thing, as was suggested). Your site might investigate the use of a performance management tool to help.
I consider # of index levels has significance with performance. the less, the better.
Is not it that, average record length helps you caculate the number of records in a block? and thereby the number of buffers ?
Key compression ratio - I just wanted to ensure that the compression ratio are good. To me, they are ( the largest one was about 60%, which is fine as per my knowledge).
I am no expert on storage and thought better provide these information, to have a view if any.
Runtime is on top of my list and DASD at the bottom. Resource usage is in my mind and would trade off that with performance gain, before finalizing the change.
Joined: 09 Mar 2011 Posts: 7309 Location: Inside the Matrix
You have development resources available? I'd estimate you'd have big savings through specific redesign. Try the one-off ESDS. Remember, you can also have an Alternate Index on an ESDS for direct-style access. Extracts from ESDS to sub-selection KSDS, types of things like that. Good analysis of existing programs/data.
Joined: 09 Mar 2011 Posts: 7309 Location: Inside the Matrix
Remember, if you are able to design from scratch, you could, for instance, if random access in key order is required across the dataset, split the dataset into n datasets, where n is such a number to get you out of three levels of index, and possibly out of your extended, and then access the multiple KSDSs through a called module. To the programs using the module, it doesn't matter whether a single dataset or 30. To the throughput, it'll make a big difference.
You have the opportunity, given that the data is so static and DASD is not a problem, to do a real tune-up specific to the requirements of the program.
I suspect you can do this yourself, but if you need it done faster and even better, let me know :-)
Bill, I got your suggestion and going to try that. I am contemplating on the criteria to split the file into smaller unit. Purely on the basis of size or a little dirty trick (with a maintainance overhead) of taking the key value into consideration while spliting. The major part of the key is say a bank number and say there are only a dozen unique bank numbers in the file. With that, my thought is to split the file into say 3 parts, each having 4 unique bank numbers, of acceptable size. Then, before calling the IO module, I can check which dataset holds that record viz. IF Bank-NUM = 'xxxxxx' OR 'xxxxxx'....then pass the dataset name to the called module. The only downside of this approach is the future maintainability - If in future we have to add a new bank number into the file, we have add that into our check list. Keeping the very low maintenance of this file in mind, I consider this as a possibility. This program is a part of our first generation of product and we are very open about not marketting this anymore, so that increases the chance of not adding any new bank number in future.
And yes, while I am at it, I would be glad if you can show me your version.
Thank you very much.
Joined: 09 Mar 2011 Posts: 7309 Location: Inside the Matrix
From the modules of the 100+ for which keyed access to data with fewer indexes would help over other methods, keep things simple. Just call the IO module with function, key and record-area and return-status.
Any "complexity", keep solely in the IO module. Whatever range of keys the multiple datasets have, put that information into a control file. Read the control file initially.
Ensure, in the IO module, that keys requested are always ascending. Close datasets which are no longer required. Have a "version" of the IO module which collects statistics about access.
You have an unusual situation of a big dataset which is static for a year at a time. Means you can do some good analysis of the data used, and design accordingly. You has DASD, no "implications" about data-integrity from any types of updates, makes it easy to hold duplicate data for different situations if that is what the analysis says would benefit.
As I don't have anything which matches your situation, I wasn't offering to pass anything on. Just skills in exchange for money :-)
EDIT: What I'm also trying to say, is do the analysis before the design, the design before the coding. At times you'll be amazed at how far the actual implementation is from what you thought initially, but if you stick with "solution first" someone else will be down this same route in a couple of years' time. :-)