View previous topic :: View next topic
|
Author |
Message |
vasanthz
Global Moderator
Joined: 28 Aug 2007 Posts: 1742 Location: Tirupur, India
|
|
|
|
Hi,
Quote: |
DASD Response time = CONN + DISC + PEND + IOSQ
IOSQ time is a delay time accumulated while the I/O is still in MVS and is waiting
for a UCB to allow the I/O against the device. (UCB an MVS control block) |
Could you please explain in layman terms what IOSQ means.
Does it mean that UCB is unavailable for a particular request for IOSQ period of time?
An example for IOSQ in real time scenario would also be helpful to understand.
We are seeing more than 30ms response time with moderate IO activity rate. IOSQ value is high for some volumes(10-20 ms).
Thanks & Regards, |
|
Back to top |
|
|
Paul Voyner
New User
Joined: 26 Nov 2012 Posts: 52 Location: UK
|
|
|
|
Vasanthz, you could probably have found the answer yourself if you'd googled it.
Here's one of many clear explanations "IOS Queue represents the average time that an I/O waits because the device is already in use by another task on this system, signified by the device's UCBBUSY bit being on"
In laymans terms, more than one address space* is trying to access the disk, so the requests have to queue until the disk - or more accurately the UCB for that disk -is free. Just like if only one checkout is available in the supermarket, so everyone queues up waiting their turn.
(* @pedants,yes it could also be one address space and multiple TCBs, but the guy wants a simple explanation)
BTW those response times are very bad. You'll need to investigate with a monitoring tool e.g. RMF, Omegamon |
|
Back to top |
|
|
vasanthz
Global Moderator
Joined: 28 Aug 2007 Posts: 1742 Location: Tirupur, India
|
|
|
|
Hi Paul,
Thank you very much for the clear explanation. I now have a mental picture of what it is.
Quote: |
You'll need to investigate with a monitoring tool e.g. RMF, Omegamon |
We are using them to investigate.
Thanks & Regards |
|
Back to top |
|
|
Ed Goodman
Active Member
Joined: 08 Jun 2011 Posts: 556 Location: USA
|
|
|
|
Is...is that a velociraptor with a machine gun and a bomb...riding a great white shark???
A case study from the old STROBE manual had a situation where a travel agency had a spike in DASD wait time. They discovered that the spot on the disk was for DisneyLand reservation information. So everyone was banging against the same spot on the disk.
If you are adding cases/invoices/customer records at the end of a file area, then doing most of your work against those new entries, this can happen.
If you are using mammoth buffers trying to speed things up, but one program is locking down records for updates, this can happen. |
|
Back to top |
|
|
Akatsukami
Global Moderator
Joined: 03 Oct 2009 Posts: 1788 Location: Bloomington, IL
|
|
|
|
Ed Goodman wrote: |
Is...is that a velociraptor with a machine gun and a bomb...riding a great white shark??? |
Indeed it is. |
|
Back to top |
|
|
vasanthz
Global Moderator
Joined: 28 Aug 2007 Posts: 1742 Location: Tirupur, India
|
|
|
|
Hello Ed,
Quote: |
Is...is that a velociraptor with a machine gun and a bomb...riding a great white shark??? |
Yes correct, except thats an uzi SMG :-)
Quote: |
If you are using mammoth buffers trying to speed things up, but one program is locking down records for updates, this can happen. |
I was having an understanding that large buffers equated to lengthy transfers and this time is attributed to CONN time, and not IOSQ time.
Or do large buffers affects both IOSQ and CONN?
I have a lot of reading to do :S
Thanks & Regards, |
|
Back to top |
|
|
steve-myers
Active Member
Joined: 30 Nov 2013 Posts: 917 Location: The Universe
|
|
|
|
vasanthz wrote: |
... Or do large buffers affects both IOSQ and CONN?
I have a lot of reading to do :S |
Sometimes in performance analysis you run into (sorry about the language for a family oriented web site, but I don't think too many will be offended) situations where you're damned if you do and damned if you don't.
Most performance analysts regard large buffers, typically 1/2 track for 3380 and 3390 devices, and full track for older devices as a Good Thing for batch processing of basically sequential data, BUT (there's usually a BUT) if the device with the sequential data shares an I/O path with a device containing data used by an online process like CICS, the longer path busy time required to transfer the large records for the sequential process can interfere with the online process. |
|
Back to top |
|
|
expat
Global Moderator
Joined: 14 Mar 2007 Posts: 8797 Location: Welsh Wales
|
|
|
|
Have you also thought about using PAV ?
Especially for the most utilised volumes
Maybe SDS (sequential Data Striping) could also help by spreading the naughty dataset over multiple volumes. |
|
Back to top |
|
|
vasanthz
Global Moderator
Joined: 28 Aug 2007 Posts: 1742 Location: Tirupur, India
|
|
|
|
Quote: |
Have you also thought about using PAV ? |
I am not sure if PAV is enabled for these volumes, I would check, Thank you.
Quote: |
sequential Data Striping |
Is dataset striping still required? I remember reading that with the latest DASD boxes like DS8000, and RAID & RANK architechture... DASD boxes by default stripes a dataset at hardware level across multiple physical disks.
64 physical volumes for RAID 5 with 8 RANKS.
Regards, |
|
Back to top |
|
|
expat
Global Moderator
Joined: 14 Mar 2007 Posts: 8797 Location: Welsh Wales
|
|
|
|
SDS is for logical volumes. I used this about 10 years ago for poor response times and it worked rather well.
If you stripe the dataset over say XX logical volumes, you can have up to XX simultaneous accesses to the dataset which has been spread over the the number of logical volumes with one access to each of the logical volumes. SMS sort of knows where all of the data is so it's pretty easy to implement and use.
It's been quite a while since I've been heavily involved with DASD farming so things may well have changed. |
|
Back to top |
|
|
Paul Voyner
New User
Joined: 26 Nov 2012 Posts: 52 Location: UK
|
|
|
|
Striping works very well when the need is to improve performance for specific datasets. I did a test a while ago with a dataset which sustained an I/o rate of 5000 on a single disk, but 20000 as a 6-way stripe.
But I'd guess that Vasanth's problem with high IOSQ is more likely to be caused by high contention for a volume by a large number of users e.g. a TSO work volume. That won't be helped by striping. |
|
Back to top |
|
|
expat
Global Moderator
Joined: 14 Mar 2007 Posts: 8797 Location: Welsh Wales
|
|
|
|
Paul, I've not come across a situation where SDS hasn't been beneficial before.
Even if the volumes are high usage by splitting the dataset over a number of volumes spreads and reduces the contention too. |
|
Back to top |
|
|
vasanthz
Global Moderator
Joined: 28 Aug 2007 Posts: 1742 Location: Tirupur, India
|
|
|
|
Hi,
Thank you Paul & Expat for your views.
Please feel free to correct me if the below statements are wrong. I've been doing some reading & below is my understaning.
In modern DASD boxes, a dataset residing in single volume or multiple volume does not matter, due to hardware striping with RAID and RANK.
Even a logical single volume dataset may actually be residing in multiple physical volumes in the box. So the response time of a single logical volume is actually response time of a number of combination of physical volumes.
The logical volume comes into picture only when there is a contention for UCB. Since the logical dataset is placed in multiple physical volumes or available straight off the cache we can perform mutiple concurrent reads of a dataset, but z/OS is unable to do concurrent reads/writes eventhough the hardware supports it. So a concept of PAV was implemented to make z/OS think it is writing to multiple UCBs, but actually it writes to only single logical volume. HyperPAV gave the most performance by dynamically allocating multiple UCBs to a particular logical volume.
IOSQ delay occurs only when there is a shortage of UCB.
If we enable hyperpav then we could possibly mitigate the IOSQ delay.
Is it correct? or a bunch of nonsense? :S
In our shop we have hyperPAV enabled. Could you please let me know if it is possible to determine the utilization of UCBs? so we can determine if there is a shortage of UCBs from hyperpav's pool?
Thanks & Regards, |
|
Back to top |
|
|
Paul Voyner
New User
Joined: 26 Nov 2012 Posts: 52 Location: UK
|
|
|
|
Vasanth - 10/10 for your summing up. You've nailed it. The only quibble I have is the wording "IOSQ delay occurs only when there is a shortage of UCB" when I think it should be something like "IOSQ delay occurs when UCB is busy because an IO is already active to the device".
Of course, enabling PAV isn't something you can do overnight. Striping is easier and can be implemented in SMS routines with minimal risk. Or, easier still, you could simply move some of the most heavily used datasets to another volume, if you know which they are (and that's not easy to find out without a nice tool like Omegamon) |
|
Back to top |
|
|
expat
Global Moderator
Joined: 14 Mar 2007 Posts: 8797 Location: Welsh Wales
|
|
|
|
Hi Vasanthz,
Hardware striping and Sequential Data Striping are two completely different beasties. SDS stripes the dataset across a number of volumes making a larger UCB range for accessing it. SMS keeps a map of what data is where, it isn't as straight forward as putting nn GB on one volume and then the next nn onto the next volume. It writes a chunk on the first volume, and then the next volume until it reaches the specified stripe count and then starts again from volume 1 through to volume nn.
It really is worth investigating with your DASD farmers.
As for HyperPAV - it may be installed at your site but possibly not available to the volumes that you are having grief with. That's something else you will need to find out from your shop.
It might also be beneficial to take a look at the volume(s) that are causing the problems to see what else is on the volume and how heavily used it is. In the past I've done this analysis and moved a few datasets about and improved the situation greatly.
I know how much you love sifting through SMF and RMF data
Good luck |
|
Back to top |
|
|
vasanthz
Global Moderator
Joined: 28 Aug 2007 Posts: 1742 Location: Tirupur, India
|
|
|
|
Quote: |
you could simply move some of the most heavily used datasets to another volume, |
Quote: |
It might also be beneficial to take a look at the volume(s) that are causing the problems to see what else is on the volume and how heavily used it is. In the past I've done this analysis and moved a few datasets about and improved the situation greatly. |
Thank you I would do that study.
Quote: |
I know how much you love sifting through SMF and RMF data |
SMF - Definitely fun :-) |
|
Back to top |
|
|
Pete Wilson
Active Member
Joined: 31 Dec 2009 Posts: 580 Location: London
|
|
|
|
You need to speak to your hardware support and/or vendor people to establish if there's a hyperpav/ucb issue, if channels are busy or degraded, or if your PPRC/XRC mirrors struggling to keep up due to link/network errors etc which can all contribute to response times. GRS set up can still have an effect if volume reserves are not all converted to global enqueues
In the meantime expats suggestion of Striping and/or moving contentious data around other volumes/pools probably wouldn't go amiss, but that would be your call based on your better knowledge of the data. |
|
Back to top |
|
|
|