View previous topic :: View next topic
|
Author |
Message |
rajatbagga
Active User
Joined: 11 Mar 2007 Posts: 199 Location: india
|
|
|
|
Hello People,
We have a situation in which two BMP jobs JOB01 and JOB02 are
contending with each other on the same database AREA. These
jobs are not entended to use the other areas initially [at the time
when it was designed] but now due to some information which is
retrived from AREA01 by JOB01, JOB01 also requires information
to be retrieved from AREA02 which is currently been used by JOB02.
So JOB01 fails[on GU call] as AREA02 is holded by JOB02[with GU call] on
the same information with JOB01 tries to access.
There are 24 such DB areas and 24 such jobs which can have this kind of
issue. At present we are bit fortunate to face only one 2 to 3 jobs getting
abended once in a while every week.
The PROCOPT option which is defined for segment which the JOB's try to
access is G and so it fail with FD status code on the GU call.
I believe decreasing the checkpoint freq. and increasing the buffer space
could help as it would increase the processing so the chances of getting
into deadlock would be less.
Please share your opinions on how to fix this issue.
Thank You,
Rajat |
|
Back to top |
|
|
enrico-sorichetti
Superior Member
Joined: 14 Mar 2007 Posts: 10873 Location: italy
|
|
|
|
the checkpoint frequency has nothing to do with a DEADLOCK situation
which is ( the deadlock )
task A holding resource X and trying to get control of resource Y
task B holding resource Y and trying to get control of resource X
and both will be stuck waiting for something that will never happen
the concept does apply to any process enqueuing on two different resources in reverse order
so it is the process that must be reviewed ! |
|
Back to top |
|
|
don.leahy
Active Member
Joined: 06 Jul 2010 Posts: 765 Location: Whitby, ON, Canada
|
|
|
|
The change that required JOB01 to read data from AREA02 has broken the original design.
Three options:
1. Redesign the process as per Enrico's recommendation.
2. Do not run JOB01 and JOB02 in parallel.
3. Tolerate the occasional deadlock. Train your operations personnel to recognize the situation and restart the failed job. (It IS restartable, right?) |
|
Back to top |
|
|
rajatbagga
Active User
Joined: 11 Mar 2007 Posts: 199 Location: india
|
|
|
|
Well will have to run the jobs in parallel otherwise the batch window will increase quiet a lot and yes the jobs are restartable.. I think changing the PROCOPT option form G to GOT in the PSB for the DB seg. can also solve the issue as G requires a exclusive control whereas GOT do not.. Please correct me if i am wrong... |
|
Back to top |
|
|
Ed Goodman
Active Member
Joined: 08 Jun 2011 Posts: 556 Location: USA
|
|
|
|
If you change to GOT, then you may be taking the segment BEFORE the lock/update during the read. If that is acceptable, then go ahead.
The 'T' in 'GOTP' will change the behavior during the read. Instead of locking out, you can get a 'GG' status code. You'll have to code for that. |
|
Back to top |
|
|
dick scherrer
Moderator Emeritus
Joined: 23 Nov 2006 Posts: 19244 Location: Inside the Matrix
|
|
|
|
Hello,
Sounds like that there is a bigger problem cominig downstream . . .
Suggest you run these completely serially and see how long they take. The parallel execution may actually be slowing the processes due to contention - even when the deadlock is not raised. Too much contention could also lead to the deadlock(s).
Look into the jobs that cause the most/longest locking and improve these. |
|
Back to top |
|
|
|