IMS Deadlock with BMP jobs

rajatbagga · Active User Joined: 11 Mar 2007 Posts: 199 Location: india

Hello People,

We have a situation in which two BMP jobs JOB01 and JOB02 are
contending with each other on the same database AREA. These
jobs are not entended to use the other areas initially [at the time
when it was designed] but now due to some information which is
retrived from AREA01 by JOB01, JOB01 also requires information
to be retrieved from AREA02 which is currently been used by JOB02.
So JOB01 fails[on GU call] as AREA02 is holded by JOB02[with GU call] on
the same information with JOB01 tries to access.

There are 24 such DB areas and 24 such jobs which can have this kind of
issue. At present we are bit fortunate to face only one 2 to 3 jobs getting
abended once in a while every week.

The PROCOPT option which is defined for segment which the JOB's try to
access is G and so it fail with FD status code on the GU call.

I believe decreasing the checkpoint freq. and increasing the buffer space
could help as it would increase the processing so the chances of getting
into deadlock would be less.

Please share your opinions on how to fix this issue.

Thank You,
Rajat

enrico-sorichetti · Posted: Mon Jan 21, 2013 12:25 pm

the checkpoint frequency has nothing to do with a DEADLOCK situation

which is ( the deadlock )

task A holding resource X and trying to get control of resource Y

task B holding resource Y and trying to get control of resource X

and both will be stuck waiting for something that will never happen

the concept does apply to any process enqueuing on two different resources in reverse order

so it is the process that must be reviewed !

don.leahy · Posted: Mon Jan 21, 2013 8:58 pm

The change that required JOB01 to read data from AREA02 has broken the original design.

Three options:

1. Redesign the process as per Enrico's recommendation.

2. Do not run JOB01 and JOB02 in parallel.

3. Tolerate the occasional deadlock. Train your operations personnel to recognize the situation and restart the failed job. (It IS restartable, right?)

rajatbagga · Active User Joined: 11 Mar 2007 Posts: 199 Location: india

Well will have to run the jobs in parallel otherwise the batch window will increase quiet a lot and yes the jobs are restartable.. I think changing the PROCOPT option form G to GOT in the PSB for the DB seg. can also solve the issue as G requires a exclusive control whereas GOT do not.. Please correct me if i am wrong...

Ed Goodman · Active Member Joined: 08 Jun 2011 Posts: 556 Location: USA

If you change to GOT, then you may be taking the segment BEFORE the lock/update during the read. If that is acceptable, then go ahead.

The 'T' in 'GOTP' will change the behavior during the read. Instead of locking out, you can get a 'GG' status code. You'll have to code for that.

dick scherrer · Posted: Tue Jan 22, 2013 9:15 pm

Hello,

Sounds like that there is a bigger problem cominig downstream . . .

Suggest you run these completely serially and see how long they take. The parallel execution may actually be slowing the processes due to contention - even when the deadlock is not raised. Too much contention could also lead to the deadlock(s).

Look into the jobs that cause the most/longest locking and improve these.