View previous topic :: View next topic
|
Author |
Message |
charanmsrit
New User
Joined: 25 Oct 2007 Posts: 81 Location: Australia
|
|
|
|
Hi,
I have rare deadlocks in online transactions. This transaction is defined with class so that it can run under 10 MPP regions. so 10 transactions under this tran code can run simultaneously. we have PI lock manager and from the deadlock report the locks are held at record level (GRIDX).
the program under the transaction is simple
Tran is initiated by passing a partial key
1. use partial key and invoke GU call with GE to retrieve the immediate available root after the partial key position.
2. If the record found belongs to the partial key and matches to a criteria, issue GHU and then DLET to remove the record, update another DB-2 and trigger an MQ. Else if the record retrieved is higher key than partial key, exit.
3. if the record retrieved belong to partial key and does not match to a criteria, issue GU with GE until a record is matched. if it matches repeat step 2 above.
The deadlock is between two different partial keys say 1234 and 1267. in both cases, the deadlock report indicates the call being made was DLET. i am quite puzzled on why the tran which is processing higher key holds a lock on lower key. Is using GU for read next after delete has any effect? i am not able to create a deadlock in test environment as it is very hard. Could someone be able to direct me in the right direction to find out the cause?? i have tried looking at IMS manual and deadlocks but with no help in finding the cause considering the situation i have.
Thanks
Charan |
|
Back to top |
|
|
dick scherrer
Moderator Emeritus
Joined: 23 Nov 2006 Posts: 19243 Location: Inside the Matrix
|
|
|
|
Hello,
I suspect that in the deadlock situation, one of the processes acquired the lock "in the wrong order". Look at the process(es) that are involved with the deadlock and make sure the locks are acquired in the same order. |
|
Back to top |
|
|
Ed Goodman
Active Member
Joined: 08 Jun 2011 Posts: 556 Location: USA
|
|
|
|
Pretty sure that even at "record level" locking, the lock is for the entire control interval, or what passes for a control interval in OSAM.
This means that there may be dozens of records locked by a single GHU call.
Also, be aware that for a PCB that CAN do an update, ALL of the calls take a read lock because you are reading the segment "with integrity." This mean IMS has to make certain that no one else changes the record while you have it.
The easiest way to get this working is to add a second PCB to the PSB. Make it a copy of the PCB you are using now to do the DLTE and GHU. Use that second PCB to do all of the searching and finding, then use the original one to do a GHU (with a FULLY qualified key) and DLTE. This will cut down on the locks quite a bit by eliminating locks during read-only operations.
Shouldn't be too much of a performance hit because the segment you just read and found will still be in the buffers. |
|
Back to top |
|
|
dick scherrer
Moderator Emeritus
Joined: 23 Nov 2006 Posts: 19243 Location: Inside the Matrix
|
|
|
|
Hello,
Quote: |
Pretty sure that even at "record level" locking, the lock is for the entire control interval, |
Yup. While i don't "do" ims, i believe it has to retain the locks until the "unit of work" is committed or rolled back.
If the rows are locked in the same order, there is a bit of a risk of a timeout, but there should not be a deadlock? |
|
Back to top |
|
|
Gary Jacek
New User
Joined: 17 Dec 2007 Posts: 64 Location: Victoria, BC, Canada
|
|
|
|
Hi Charan
One other thing you should do, if possible, is make your call to DB2 before you do your IMS DLET call. A call to the external DB2 subsystem can take a long time, depending upon your SQL. If you do the IMS DLET call before the SQL call to DB2, you are holding an IMS lock for the duration of the SQL call.
Don't worry that the results of your SQL call to DB2 will be available to other applications before your IMS DLET has taken effect. IMS/DB2 two-phase commit will take care of this for you. Updates in IMS and DB2 will take effect when you do the next GU to the message queue.
Gary |
|
Back to top |
|
|
charanmsrit
New User
Joined: 25 Oct 2007 Posts: 81 Location: Australia
|
|
|
|
Hi,
Thanks all for your time and responses
Quote: |
Pretty sure that even at "record level" locking, the lock is for the entire control interval, or what passes for a control interval in OSAM.
This means that there may be dozens of records locked by a single GHU call.
|
even in this case, i would have expected it to be a timeout and not deadlock. Am i missing something?
We use a vendor package which sits b/w zos and our application COBOL programs. the compiler is provided by the vendor. We do not build and issue DLI command directly from our application programs but use the the vendor supplied built-in functions in the COBOL which are converted and interfaced with IMS by the package at run time. In order to invetigate further, i finally managed to get a DLI trace at PSB level for my transaction and found that the application read call to IMS DB after the first delete is being converted to two GU calls to IMS. the first one with SSA as my application intends to and the second one is GU call with unqualified SSA. i believe this may be causing the issue as it is repositioning the DB from the beginning after a successful delete and therefore resulting in a lock on lower key. i have asked the vendor package SME to analyse why the system issued two GU calls for one application read call. the vendor package is all assembler which builds and issue DLI calls via ASMTDLI. as the package builds the SSA based on the contents in DB root copybook, i am surprised with a unqualified SSA GU. i am hoping to get an answer soon. Will be back with further updates soon. |
|
Back to top |
|
|
dick scherrer
Moderator Emeritus
Joined: 23 Nov 2006 Posts: 19243 Location: Inside the Matrix
|
|
|
|
Hello,
Quote: |
even in this case, i would have expected it to be a timeout and not deadlock. Am i missing something? |
Yes, i believe so.
If all of the locks were issued against one resource, the timeout might occur.
When locks are issued against 2 resources, in different orders, the deadlock can occur. |
|
Back to top |
|
|
Ed Goodman
Active Member
Joined: 08 Jun 2011 Posts: 556 Location: USA
|
|
|
|
wait wait wait. When I say "control interval" I'm referring the physical record in a vsam file. You know, there can be several logical records in a group, and vsam tracks them together an keeps the key attached. when you insert something, these can get split. Control Areas and Control Intervals.
I don't mean the length of time a transaction takes. As an IMS-er, I would call that a LUW or Logical Unit of Work.
I thought the OP was trying to figure out how two seemingly unrelated calls could get locked out. I was trying to explain that even though the key is different, the IMS system may be placing a lock on the entire control interval, which MIGHT contain the other key involved.
The reason they aren't getting a time out is that the PCB is marked as "read for integrity" because it can do updates. That's why I suggested a second PCB. |
|
Back to top |
|
|
dick scherrer
Moderator Emeritus
Joined: 23 Nov 2006 Posts: 19243 Location: Inside the Matrix
|
|
|
|
Hi Ed,
As i mentioned, i'm not an IMSer, but on many different database systems, a timeout and a deadlock were caused by the same events as described earlier.
Is this not the case with IMS? If not, i'll go quietly
d |
|
Back to top |
|
|
Nic Clouston
Global Moderator
Joined: 10 May 2007 Posts: 2454 Location: Hampshire, UK
|
|
|
|
Sounds like the same problem as we are having with CLOAS. Don't know if anyone has deciphered it enough to know if 2 calls are being generated for one particular request. May pass this by the boffins tomorrow. |
|
Back to top |
|
|
Ed Goodman
Active Member
Joined: 08 Jun 2011 Posts: 556 Location: USA
|
|
|
|
Dick, you are 100% correct. I was just adding to the pile by letting OP know that it's MORE than just the actual keys in question that are locked. The entire area of the CI is locked, which contains multiple database roots.
So just because the locks are in the right order doesn't protect you as much as he might be thinking. You could have 20 roots locked with a single update. It depends on how they get stored.
This is in contrast to a discrete row in a DB2 table, where they are each independent. |
|
Back to top |
|
|
dick scherrer
Moderator Emeritus
Joined: 23 Nov 2006 Posts: 19243 Location: Inside the Matrix
|
|
|
|
Hi Ed,
When some of my clients were first trying db2 more than 20 years, they had considerable locking problems. Mentioned to them the IBM default for db2 was page-level locking. Caused no end of grief.
I've never been a full-fledged db2 dba, but when the came for help, i suggested switching to row-level locking. Talk about a/n Over-reaction. . .
Their biggest problem was that they had a table of "next available values" for a bunch of other tables. So when one "next" number was being assigned, no others could. . . Suggested that if they did not want to do this well, they could always make sure that each of these rows were on a page of their own . . . Yup, 4k+ filler in each row.
I did not realize that this late in the life of IMS, it still locked an entire page. . .
Thanks - i'll continue learnin'!
Happy Thanksgiving!
d |
|
Back to top |
|
|
charanmsrit
New User
Joined: 25 Oct 2007 Posts: 81 Location: Australia
|
|
|
|
Thanks Ed/Dick,
following the previous update for two GU calls, we have found that this happens when the record satisfying the partial ey criteria is not found in the current physical partition (end of segment/status GE), the package generates an unqualified GU call to the next partitioned physical DB. These partitions are key-ranged and the package identifies the DB specific for the key. So, this should not be an issue.
Totally agree that the locking depends on the DB organsation. In my case, the DB is OSAM HIDAM which means the resource locked must be generally only RBA of the root record and not the CI (or RAP). I have come across an IBM paper which specifies the locked resource can be Hashed key (i,e all roots linked to that hash key) for HIDAM in certain cases where a record is being inserted or deleted. in my case, the deadlock is while the record is being deleted. I am not very sure of this hash algorithm and if this is the what causing the lock.
Extract from the paper:
The resource that is locked is the value in this table plus an identification of the database and database data set.
For HISAM, IMS hashes the key of the root segment. This produces a resource name that is locked. There are millions of possible values that the hashing algorithm produces. This tends to minimize the possibility of different keys hashing to the same value and producing lock conflicts.
For HIDAM and PHIDAM, the RBA of the root is always used to identify the database record. The root segment resides in the prime HIDAM or PHIDAM database, not the index. The RBA is from this prime HIDAM or PHIDAM database. There are times when the hashed value of the key of the root segment is also used. This is the key in the index. Locking of the hashed key occurs when IMS is either inserting a root segment or erasing it. These are the only times that changes are made to the index. When a record is being inserted into or deleted from the index, IMS locks the hashed key to prevent two programs from adding or deleting the same root segment concurrently.
IMS locks the RBA of the Root Anchor Point (RAP) from which the root is chained for HDAM and PHDAM databases. Since multiple roots may be chained from the same RAP, this is really a lock on one or more database records. When one root is locked, all the roots on the RAP chain are locked.
Code: |
Table 5. Full Function Database Record Locks
Access Method Lock Resource
-----------------------------------------------------------
HISAM Hashed key of root segment
HIDAM and RBA of root segment
PHIDAM Hashed key of root segment
HDAM and RBA of RAP
PHDAM
|
Thanks,
Charan |
|
Back to top |
|
|
PeterHolland
Global Moderator
Joined: 27 Oct 2009 Posts: 2481 Location: Netherlands, Amstelveen
|
|
Back to top |
|
|
Ed Goodman
Active Member
Joined: 08 Jun 2011 Posts: 556 Location: USA
|
|
|
|
So, when the package does the GU on the net partition, is it using a different PCB? I'm asking because that would leave the lock on the prior PCB.
You mentioned that you are using HIDAM, not PHIDAM, so that probably means you have multiple PCBs for the physical databases.
This means that the read lock will be held until you move off of that segment, or take a commit point. When it harries off to the next database, it's may be leaving that lock in place. |
|
Back to top |
|
|
charanmsrit
New User
Joined: 25 Oct 2007 Posts: 81 Location: Australia
|
|
|
|
Hi Ed,
Quote: |
So, when the package does the GU on the net partition, is it using a different PCB? I'm asking because that would leave the lock on the prior PCB. |
Yes, we use different PCB's for each physical DB.
In all the cases so far, the deadlock is on records within same physical DB. It was only in our dev region the unqualified GU call was traced. As
More details of the deadlock.
Code: |
DEADLOCK ANALYSIS REPORT - LOCK MANAGER IS PI
...............................................................................
RESOURCE DMB-NAME LOCK-LEN LOCK-NAME
01 OF 02 DBABT1 08 10F09BB402D50140
KEY FOR RESOURCE IS FROM DELETE WORK AREA
KEY1=(ABC1234564711990101201231122012000100000000000)
IMS-NAME TRAN/JOB PSB-NAME PCB--DBD PST# RGN CALL LOCK LOCKFUNC STATE
WAITER IMSP TRAN1234 PSBABC02 DBABT1 00111 MPP DLET GRIDX 30400378 03
BLCKER IMSP TRAN1234 PSBABC02 -------- 00203 MPP ---- ----- -------- 03
...............................................................................
RESOURCE DMB-NAME LOCK-LEN LOCK-NAME - WAITER FOR THIS RESOURCE IS VICTIM
02 OF 02 DBABT1 08 10F08D9802D50140
KEY FOR RESOURCE IS FROM DELETE WORK AREA
KEY2=(ABC1234567541990101201231122012000100000000000)
IMS-NAME TRAN/JOB PSB-NAME PCB--DBD PST# RGN CALL LOCK LOCKFUNC STATE
WAITER IMSP TRAN1234 PSBABC02 DBABT1 00203 MPP DLET GRIDX 30400378 03
BLCKER IMSP TRAN1234 PSBABC02 -------- 00111 MPP ---- ----- -------- 03 |
my application logic goes as below. (considering the example for PST 111 and the record is returned in one physical DB for the DLI GU calls above)
1. GU to the DB with partial qualified SSA (RO=GE)
(ABC1234564711990000000000000000000000000000000)
2. if no record returned which satisfy the partial key value in SSA, end the transaction
3. GHU call to DB record returned
4. DLET call to DB to delete the root
5. GU call again with partial qualified SSA (RO=GE) now with slightly lower key
(ABC1234567541000000000000000000000000000000000)
6. if record found satisfying the criteria, update (REPL) a flag in another ims db.
7. trigger MQ and update some other ims db's.
8. repeat step 1 to 7 above.
Notice that the record key in the KEY2 of the report has lower key (offset 1-12). Also, the report specifies that the deadlock is during the delete function. There will be only one record which satisfy the delete in a transaction. so there will be no two delete functions excuted under one transaction. As per IBM, the read for integrity lock will get reduced to level-1 (for PI) when the root position moves to the next record. so, i am still not convinced as to why a lower key (offset 1-12) record is locked by the other tran. our DBA is clarifying few things with IBM. |
|
Back to top |
|
|
Ed Goodman
Active Member
Joined: 08 Jun 2011 Posts: 556 Location: USA
|
|
|
|
Pretty sure you answered your own question in that last paragraph:"when the root position moves to the next record"
If the next record is on a different phys partition, how would the position get moved on the PRIOR partition? That PCB is still sitting there locked. You use a diff PCB to do the search on the next partition.
I want to make sure I'm getting what you're saying about level-1 locking. It says the "next record". Does that mean that you could have locks from two separate tasks on the same RBA? Remember, there are multiple roots in an RBA. |
|
Back to top |
|
|
|