View previous topic :: View next topic
|
Author |
Message |
Rijit
Active User
Joined: 15 Apr 2010 Posts: 168 Location: Pune
|
|
|
|
In my project we are getting frequent contention problems and jobs failing with -911..The temporary solution I use is to wait for the contending job to complete and then restart the failed job.
But I am looking forward for permanent solution. Seniors pls tell me the ways in which I can resolve these types of issues. Ridiculously some of the select queries are also failing with -911.
|
|
Back to top |
|
|
dick scherrer
Moderator Emeritus
Joined: 23 Nov 2006 Posts: 19244 Location: Inside the Matrix
|
|
|
|
Hello,
There is a design problem. . .
Without changing the design (which may or may not completely resolve this), there needs to be better scheduling. Do not allow these to be runnng at the same time. |
|
Back to top |
|
|
Rijit
Active User
Joined: 15 Apr 2010 Posts: 168 Location: Pune
|
|
|
|
Hmm even I thought of it..but the contending job are running multiple times a day..how do we handle these cases in shceduling..Like we have 2 jobs A & B..Both are updating table C..But the issue is job A runs 6 times in a day and the other job B runs every 2 hrs in a day..I m scared if we put scheduling resolution it may messs up a little bit..
And the other area where I don't have much knowledge is the locking involved for the tablespaces..Why should the select statement fail.?but the select statement in the contending program was not having "WITH UR" in the query.. |
|
Back to top |
|
|
dick scherrer
Moderator Emeritus
Joined: 23 Nov 2006 Posts: 19244 Location: Inside the Matrix
|
|
|
|
Hello,
Quote: |
I m scared if we put scheduling resolution it may messs up a little bit.. |
It is already "messed up". How might proper scheduling cause a problem. Done correctly, it will circumvent this problem. . .
Suggest you spend some time learning how a "fatal embrace" occurs.
I've posted this other places and maybe it will help clarify:
Quote: |
To reach this impasse one task "locks" a resource in tableA and another task "locks" a resource in tableB. Then the first task tries to lock the resource already held by the second task. So far, so good. So far, no deadlock. The deadlock occurs when the second task tries to lock the resource already being held by the first task.
Most often this is caused by the tasks locking resources in different sequences. I'd suggest you look at the problem program(s) and see if there are multiple processes that would lock resources in multiple sequences.
Something i've seen multiple places is when code needs to lock "header" and "detail" data and one process locks the header first and another process locks the detail first. This is a deadlock just waiting to happen. Usually, it will happen when transaction volumes are increasing or when the system becomes bottlenecked and locks are not freed quickly enough.
|
There may be only 1 of 2 bits of code that lock in the "problem order" or there may be many.
Suggest you work with the scheduling people to make sure these jobs do not run together. |
|
Back to top |
|
|
Ronald Burr
Active User
Joined: 22 Oct 2009 Posts: 293 Location: U.S.A.
|
|
|
|
Much depends on how long either job runs when it does run successfully.
If both jobs only run for a minute or two, setting negative dependencies on both jobs in the scheduler should suffice ( a negative dependency is one in which both jobs are flagged in such a way that the job scheduler will not release one job if the other has already been released but has not yet ended ). Sure, one job may have to wait a minute or two, but it will avoid the -911's.
If either job runs for an extended period, it may be possible that ONE of the two could have its queries changed to specify FOR FETCH ONLY WITH UR - that would avoid contention, though doing so at the risk of the WITH UR job not having the most up-to-date table rows, especially those being simultaneously manipulated by the other job. |
|
Back to top |
|
|
don.leahy
Active Member
Joined: 06 Jul 2010 Posts: 765 Location: Whitby, ON, Canada
|
|
|
|
Suggest you also look at the checkpoint or commit frequency of each process. |
|
Back to top |
|
|
Rijit
Active User
Joined: 15 Apr 2010 Posts: 168 Location: Pune
|
|
|
|
Ronald Burr wrote: |
Much depends on how long either job runs when it does run successfully.
If both jobs only run for a minute or two, setting negative dependencies on both jobs in the scheduler should suffice ( a negative dependency is one in which both jobs are flagged in such a way that the job scheduler will not release one job if the other has already been released but has not yet ended ). Sure, one job may have to wait a minute or two, but it will avoid the -911's.
If either job runs for an extended period, it may be possible that ONE of the two could have its queries changed to specify FOR FETCH ONLY WITH UR - that would avoid contention, though doing so at the risk of the WITH UR job not having the most up-to-date table rows, especially those being simultaneously manipulated by the other job. |
Thanks for sharing knowledge, It was really helpful..I really like this idea of negative dependency..we have zeke scheduler in our project, is there a way to set this type of dependency in Zeke? I know about successors and predecessors but this negative dependency thing I was not aware of..
Secondly I also thought of the idea of using the 'WITH UR' clause with the select statement. In fact I saw the select query in the abending program did not have the 'WITH UR' clause. But then we run into the risk of getting the dirty read data..I there a way to prevent this..How will it be if we use CS instead of UR?
Thanks, |
|
Back to top |
|
|
Rijit
Active User
Joined: 15 Apr 2010 Posts: 168 Location: Pune
|
|
|
|
dick scherrer wrote: |
Hello,
Quote: |
I m scared if we put scheduling resolution it may messs up a little bit.. |
It is already "messed up". How might proper scheduling cause a problem. Done correctly, it will circumvent this problem. . .
Suggest you spend some time learning how a "fatal embrace" occurs.
I've posted this other places and maybe it will help clarify:
Quote: |
To reach this impasse one task "locks" a resource in tableA and another task "locks" a resource in tableB. Then the first task tries to lock the resource already held by the second task. So far, so good. So far, no deadlock. The deadlock occurs when the second task tries to lock the resource already being held by the first task.
Most often this is caused by the tasks locking resources in different sequences. I'd suggest you look at the problem program(s) and see if there are multiple processes that would lock resources in multiple sequences.
Something i've seen multiple places is when code needs to lock "header" and "detail" data and one process locks the header first and another process locks the detail first. This is a deadlock just waiting to happen. Usually, it will happen when transaction volumes are increasing or when the system becomes bottlenecked and locks are not freed quickly enough.
|
There may be only 1 of 2 bits of code that lock in the "problem order" or there may be many.
Suggest you work with the scheduling people to make sure these jobs do not run together. |
Hi Dick,
I really did not understand some lines which you qioted. What are trying to mean my sequences in your statement?
"Most often this is caused by the tasks locking resources in different sequences. I'd suggest you look at the problem program(s) and see if there are multiple processes that would lock resources in multiple sequences."
I did not undesrtand what yu mean by problem order.. Can you please help me understand these things..
Thanks |
|
Back to top |
|
|
Anuj Dhawan
Superior Member
Joined: 22 Apr 2006 Posts: 6250 Location: Mumbai, India
|
|
|
|
Rijit wrote: |
..we have zeke scheduler in our project, is there a way to set this type of dependency in Zeke? I know about successors and predecessors but this negative dependency thing I was not aware of.. |
Sorry but is it really so tough to look into the Manual - when you already know what to search for? |
|
Back to top |
|
|
dick scherrer
Moderator Emeritus
Joined: 23 Nov 2006 Posts: 19244 Location: Inside the Matrix
|
|
|
|
Hello,
Quote: |
I really did not understand some lines which you qioted. What are trying to mean my sequences in your statement? |
Quote: |
I did not undesrtand what yu mean by problem order.. Can you please help me understand these things.. |
Then you need to re-read the part about a fatal embrace. Only "your" code can cause this. . . One way this can happen is if some processes read the header row for update and then detail rows for update. At the same time another process reads the same detail rows for update and then tries to read the same header for update.
Quote: |
I know about successors and predecessors but this negative dependency thing I was not aware of.. |
This is why you need to work with your scheduling people. |
|
Back to top |
|
|
Rijit
Active User
Joined: 15 Apr 2010 Posts: 168 Location: Pune
|
|
|
|
Gt it thanks! |
|
Back to top |
|
|
Rijit
Active User
Joined: 15 Apr 2010 Posts: 168 Location: Pune
|
|
|
|
Anuj Dhawan wrote: |
Rijit wrote: |
..we have zeke scheduler in our project, is there a way to set this type of dependency in Zeke? I know about successors and predecessors but this negative dependency thing I was not aware of.. |
Sorry but is it really so tough to look into the Manual - when you already know what to search for? |
I m new to Zeke, don't have zeke manual..If you let me know from where can I procure it, I would b very happy..Prior to this I have worked with CA7 & ESP.. |
|
Back to top |
|
|
Rijit
Active User
Joined: 15 Apr 2010 Posts: 168 Location: Pune
|
|
|
|
Scheduling is definately a better option but I would also like to explore other options like tuning the queries.
I saw many select queries are also abending with -911 tat is a concern for me..If we change the islolation level of the select queries to "WITH UR" are we running into a risk of fetching wrong data..I read that islolation level UR permitts uncomitted read..How do we make sure that it does not take the uncommited data?
Secondly I have cae arcoss situations where the update job failed because the tablespace is in rad status but ideally the table space should be in RW status..ny idea what may have altered the tablespace status..I Ma nt aware how the tablsespace gets altered..Might b the DBA has to do something with that! |
|
Back to top |
|
|
dick scherrer
Moderator Emeritus
Joined: 23 Nov 2006 Posts: 19244 Location: Inside the Matrix
|
|
|
|
Hello,
Quote: |
How do we make sure that it does not take the uncommited data? |
One way is to change the code so that all of the locks are done in the same order. . . There may be timeouts if the queries take too long but the deadlock will go away.
It sounds like the environment is somewhat out of control. . . If more and more continues to be added to this, it will only get worse.
I suspect that lots of the code locks "things" when it is not necessary or at least before it is necessary.
Quote: |
I m new to Zeke, don't have zeke manual.. |
Have you spent some time with the scheduling people? If not, do so. . . They surely have the documentation and can tell you were the local copy lives on dasd (possibly some file server). |
|
Back to top |
|
|
Rijit
Active User
Joined: 15 Apr 2010 Posts: 168 Location: Pune
|
|
|
|
Is it a good idea to reduce the COMMIT frequency for minimizing the possibility of a deadlock? |
|
Back to top |
|
|
dick scherrer
Moderator Emeritus
Joined: 23 Nov 2006 Posts: 19244 Location: Inside the Matrix
|
|
|
|
Hello,
Not if there is no way to "get back to where you were". How many commits were issued won't matter if some of the data is lost or corrupted.
I believe the search for a "magic bullet" should cease and the application be corrected. . . |
|
Back to top |
|
|
rocky_balboa
New User
Joined: 24 Mar 2010 Posts: 61 Location: Cape of Good Hope
|
|
|
|
Find out with help of DBAs whether lock escalation is leading the entire table to be locked due to insufficient memory for new locks. Make sure You aren't issuing "lock table" statements in any of your programs. You may need to revisit your commit frequency.
Also issue frequent commits even after your ready only access to release all your share locks. |
|
Back to top |
|
|
|