IMS Deadlock

charanmsrit · New User Joined: 25 Oct 2007 Posts: 81 Location: Australia

Hi,

I have rare deadlocks in online transactions. This transaction is defined with class so that it can run under 10 MPP regions. so 10 transactions under this tran code can run simultaneously. we have PI lock manager and from the deadlock report the locks are held at record level (GRIDX).

the program under the transaction is simple

Tran is initiated by passing a partial key
1. use partial key and invoke GU call with GE to retrieve the immediate available root after the partial key position.
2. If the record found belongs to the partial key and matches to a criteria, issue GHU and then DLET to remove the record, update another DB-2 and trigger an MQ. Else if the record retrieved is higher key than partial key, exit.
3. if the record retrieved belong to partial key and does not match to a criteria, issue GU with GE until a record is matched. if it matches repeat step 2 above.

The deadlock is between two different partial keys say 1234 and 1267. in both cases, the deadlock report indicates the call being made was DLET. i am quite puzzled on why the tran which is processing higher key holds a lock on lower key. Is using GU for read next after delete has any effect? i am not able to create a deadlock in test environment as it is very hard. Could someone be able to direct me in the right direction to find out the cause?? i have tried looking at IMS manual and deadlocks but with no help in finding the cause considering the situation i have.

Thanks
Charan

dick scherrer · Posted: Thu Nov 15, 2012 8:02 pm

Hello,

I suspect that in the deadlock situation, one of the processes acquired the lock "in the wrong order". Look at the process(es) that are involved with the deadlock and make sure the locks are acquired in the same order.

Ed Goodman · Active Member Joined: 08 Jun 2011 Posts: 556 Location: USA

Pretty sure that even at "record level" locking, the lock is for the entire control interval, or what passes for a control interval in OSAM.

This means that there may be dozens of records locked by a single GHU call.

Also, be aware that for a PCB that CAN do an update, ALL of the calls take a read lock because you are reading the segment "with integrity." This mean IMS has to make certain that no one else changes the record while you have it.

The easiest way to get this working is to add a second PCB to the PSB. Make it a copy of the PCB you are using now to do the DLTE and GHU. Use that second PCB to do all of the searching and finding, then use the original one to do a GHU (with a FULLY qualified key) and DLTE. This will cut down on the locks quite a bit by eliminating locks during read-only operations.

Shouldn't be too much of a performance hit because the segment you just read and found will still be in the buffers.

dick scherrer · Posted: Thu Nov 15, 2012 10:05 pm

Hello,

Gary Jacek · Posted: Thu Nov 15, 2012 11:48 pm

Hi Charan

One other thing you should do, if possible, is make your call to DB2 before you do your IMS DLET call. A call to the external DB2 subsystem can take a long time, depending upon your SQL. If you do the IMS DLET call before the SQL call to DB2, you are holding an IMS lock for the duration of the SQL call.

Don't worry that the results of your SQL call to DB2 will be available to other applications before your IMS DLET has taken effect. IMS/DB2 two-phase commit will take care of this for you. Updates in IMS and DB2 will take effect when you do the next GU to the message queue.

Gary

charanmsrit · New User Joined: 25 Oct 2007 Posts: 81 Location: Australia

Hi,

Thanks all for your time and responses

dick scherrer · Posted: Mon Nov 19, 2012 8:49 pm

Hello,

Ed Goodman · Active Member Joined: 08 Jun 2011 Posts: 556 Location: USA

wait wait wait. When I say "control interval" I'm referring the physical record in a vsam file. You know, there can be several logical records in a group, and vsam tracks them together an keeps the key attached. when you insert something, these can get split. Control Areas and Control Intervals.

I don't mean the length of time a transaction takes. As an IMS-er, I would call that a LUW or Logical Unit of Work.

I thought the OP was trying to figure out how two seemingly unrelated calls could get locked out. I was trying to explain that even though the key is different, the IMS system may be placing a lock on the entire control interval, which MIGHT contain the other key involved.

The reason they aren't getting a time out is that the PCB is marked as "read for integrity" because it can do updates. That's why I suggested a second PCB.

dick scherrer · Posted: Mon Nov 19, 2012 9:36 pm

Hi Ed,

As i mentioned, i'm not an IMSer, but on many different database systems, a timeout and a deadlock were caused by the same events as described earlier.

Is this not the case with IMS? If not, i'll go quietly

d

Nic Clouston · Posted: Tue Nov 20, 2012 1:35 am

Sounds like the same problem as we are having with CLOAS. Don't know if anyone has deciphered it enough to know if 2 calls are being generated for one particular request. May pass this by the boffins tomorrow.

Ed Goodman · Active Member Joined: 08 Jun 2011 Posts: 556 Location: USA

Dick, you are 100% correct. I was just adding to the pile by letting OP know that it's MORE than just the actual keys in question that are locked. The entire area of the CI is locked, which contains multiple database roots.

So just because the locks are in the right order doesn't protect you as much as he might be thinking. You could have 20 roots locked with a single update. It depends on how they get stored.

This is in contrast to a discrete row in a DB2 table, where they are each independent.

dick scherrer · Posted: Wed Nov 21, 2012 11:46 pm

Hi Ed,

When some of my clients were first trying db2 more than 20 years, they had considerable locking problems. Mentioned to them the IBM default for db2 was page-level locking. Caused no end of grief.

I've never been a full-fledged db2 dba, but when the came for help, i suggested switching to row-level locking. Talk about a/n Over-reaction. . .

Their biggest problem was that they had a table of "next available values" for a bunch of other tables. So when one "next" number was being assigned, no others could. . . Suggested that if they did not want to do this well, they could always make sure that each of these rows were on a page of their own . . . Yup, 4k+ filler in each row.

I did not realize that this late in the life of IMS, it still locked an entire page. . .

Thanks - i'll continue learnin'!

Happy Thanksgiving!

d

charanmsrit · New User Joined: 25 Oct 2007 Posts: 81 Location: Australia

Thanks Ed/Dick,

following the previous update for two GU calls, we have found that this happens when the record satisfying the partial ey criteria is not found in the current physical partition (end of segment/status GE), the package generates an unqualified GU call to the next partitioned physical DB. These partitions are key-ranged and the package identifies the DB specific for the key. So, this should not be an issue.

Totally agree that the locking depends on the DB organsation. In my case, the DB is OSAM HIDAM which means the resource locked must be generally only RBA of the root record and not the CI (or RAP). I have come across an IBM paper which specifies the locked resource can be Hashed key (i,e all roots linked to that hash key) for HIDAM in certain cases where a record is being inserted or deleted. in my case, the deadlock is while the record is being deleted. I am not very sure of this hash algorithm and if this is the what causing the lock.

Extract from the paper:
The resource that is locked is the value in this table plus an identification of the database and database data set.

For HISAM, IMS hashes the key of the root segment. This produces a resource name that is locked. There are millions of possible values that the hashing algorithm produces. This tends to minimize the possibility of different keys hashing to the same value and producing lock conflicts.

For HIDAM and PHIDAM, the RBA of the root is always used to identify the database record. The root segment resides in the prime HIDAM or PHIDAM database, not the index. The RBA is from this prime HIDAM or PHIDAM database. There are times when the hashed value of the key of the root segment is also used. This is the key in the index. Locking of the hashed key occurs when IMS is either inserting a root segment or erasing it. These are the only times that changes are made to the index. When a record is being inserted into or deleted from the index, IMS locks the hashed key to prevent two programs from adding or deleting the same root segment concurrently.

IMS locks the RBA of the Root Anchor Point (RAP) from which the root is chained for HDAM and PHDAM databases. Since multiple roots may be chained from the same RAP, this is really a lock on one or more database records. When one root is locked, all the roots on the RAP chain are locked.

PeterHolland · Posted: Mon Nov 26, 2012 12:51 pm

Nice piece of reading (the link provided wants to open a PDF):

www.google.nl/url?sa=t&rct=j&q=ims%20deadlock&source=web&cd=3&cad=rja&ved=0CEoQFjAC&url=https%3A%2F%2Fshare.confex.com%2Fshare%2F116%2Fwebprogram%2FHandout%2FSession8566%2FUnderstanding%2520IMS%2520Locking%2520Mar2011.pdf&ei=1BezUIHJHeqw0AXHyIDICQ&usg=AFQjCNHQEhuB59I78sbTwcM1rCQABfbPnQ

Ed Goodman · Active Member Joined: 08 Jun 2011 Posts: 556 Location: USA

So, when the package does the GU on the net partition, is it using a different PCB? I'm asking because that would leave the lock on the prior PCB.

You mentioned that you are using HIDAM, not PHIDAM, so that probably means you have multiple PCBs for the physical databases.

This means that the read lock will be held until you move off of that segment, or take a commit point. When it harries off to the next database, it's may be leaving that lock in place.

charanmsrit · New User Joined: 25 Oct 2007 Posts: 81 Location: Australia

Hi Ed,

Ed Goodman · Active Member Joined: 08 Jun 2011 Posts: 556 Location: USA

Pretty sure you answered your own question in that last paragraph:"when the root position moves to the next record"

If the next record is on a different phys partition, how would the position get moved on the PRIOR partition? That PCB is still sitting there locked. You use a diff PCB to do the search on the next partition.

I want to make sure I'm getting what you're saying about level-1 locking. It says the "next record". Does that mean that you could have locks from two separate tasks on the same RBA? Remember, there are multiple roots in an RBA.