IBM Mainframe Forum Index
 
Log In
 
IBM Mainframe Forum Index Mainframe: Search IBM Mainframe Forum: FAQ Register
 

Frequent -911 abends


IBM Mainframe Forums -> DB2
Post new topic   Reply to topic
View previous topic :: View next topic  
Author Message
Rijit

Active User


Joined: 15 Apr 2010
Posts: 168
Location: Pune

PostPosted: Sun Mar 13, 2011 11:26 am
Reply with quote

In my project we are getting frequent contention problems and jobs failing with -911..The temporary solution I use is to wait for the contending job to complete and then restart the failed job.

But I am looking forward for permanent solution. Seniors pls tell me the ways in which I can resolve these types of issues. Ridiculously some of the select queries are also failing with -911.

icon_eek.gif
Back to top
View user's profile Send private message
dick scherrer

Moderator Emeritus


Joined: 23 Nov 2006
Posts: 19244
Location: Inside the Matrix

PostPosted: Sun Mar 13, 2011 11:51 am
Reply with quote

Hello,

There is a design problem. . .

Without changing the design (which may or may not completely resolve this), there needs to be better scheduling. Do not allow these to be runnng at the same time.
Back to top
View user's profile Send private message
Rijit

Active User


Joined: 15 Apr 2010
Posts: 168
Location: Pune

PostPosted: Sun Mar 13, 2011 3:19 pm
Reply with quote

Hmm even I thought of it..but the contending job are running multiple times a day..how do we handle these cases in shceduling..Like we have 2 jobs A & B..Both are updating table C..But the issue is job A runs 6 times in a day and the other job B runs every 2 hrs in a day..I m scared if we put scheduling resolution it may messs up a little bit..

And the other area where I don't have much knowledge is the locking involved for the tablespaces..Why should the select statement fail.?but the select statement in the contending program was not having "WITH UR" in the query..
Back to top
View user's profile Send private message
dick scherrer

Moderator Emeritus


Joined: 23 Nov 2006
Posts: 19244
Location: Inside the Matrix

PostPosted: Mon Mar 14, 2011 2:27 am
Reply with quote

Hello,

Quote:
I m scared if we put scheduling resolution it may messs up a little bit..
It is already "messed up". How might proper scheduling cause a problem. Done correctly, it will circumvent this problem. . .

Suggest you spend some time learning how a "fatal embrace" occurs.

I've posted this other places and maybe it will help clarify:
Quote:
To reach this impasse one task "locks" a resource in tableA and another task "locks" a resource in tableB. Then the first task tries to lock the resource already held by the second task. So far, so good. So far, no deadlock. The deadlock occurs when the second task tries to lock the resource already being held by the first task.

Most often this is caused by the tasks locking resources in different sequences. I'd suggest you look at the problem program(s) and see if there are multiple processes that would lock resources in multiple sequences.

Something i've seen multiple places is when code needs to lock "header" and "detail" data and one process locks the header first and another process locks the detail first. This is a deadlock just waiting to happen. Usually, it will happen when transaction volumes are increasing or when the system becomes bottlenecked and locks are not freed quickly enough.


There may be only 1 of 2 bits of code that lock in the "problem order" or there may be many.

Suggest you work with the scheduling people to make sure these jobs do not run together.
Back to top
View user's profile Send private message
Ronald Burr

Active User


Joined: 22 Oct 2009
Posts: 293
Location: U.S.A.

PostPosted: Mon Mar 14, 2011 3:08 am
Reply with quote

Much depends on how long either job runs when it does run successfully.
If both jobs only run for a minute or two, setting negative dependencies on both jobs in the scheduler should suffice ( a negative dependency is one in which both jobs are flagged in such a way that the job scheduler will not release one job if the other has already been released but has not yet ended ). Sure, one job may have to wait a minute or two, but it will avoid the -911's.
If either job runs for an extended period, it may be possible that ONE of the two could have its queries changed to specify FOR FETCH ONLY WITH UR - that would avoid contention, though doing so at the risk of the WITH UR job not having the most up-to-date table rows, especially those being simultaneously manipulated by the other job.
Back to top
View user's profile Send private message
don.leahy

Active Member


Joined: 06 Jul 2010
Posts: 765
Location: Whitby, ON, Canada

PostPosted: Mon Mar 14, 2011 6:56 am
Reply with quote

Suggest you also look at the checkpoint or commit frequency of each process.
Back to top
View user's profile Send private message
Rijit

Active User


Joined: 15 Apr 2010
Posts: 168
Location: Pune

PostPosted: Mon Mar 14, 2011 12:55 pm
Reply with quote

Ronald Burr wrote:
Much depends on how long either job runs when it does run successfully.
If both jobs only run for a minute or two, setting negative dependencies on both jobs in the scheduler should suffice ( a negative dependency is one in which both jobs are flagged in such a way that the job scheduler will not release one job if the other has already been released but has not yet ended ). Sure, one job may have to wait a minute or two, but it will avoid the -911's.
If either job runs for an extended period, it may be possible that ONE of the two could have its queries changed to specify FOR FETCH ONLY WITH UR - that would avoid contention, though doing so at the risk of the WITH UR job not having the most up-to-date table rows, especially those being simultaneously manipulated by the other job.


Thanks for sharing knowledge, It was really helpful..I really like this idea of negative dependency..we have zeke scheduler in our project, is there a way to set this type of dependency in Zeke? I know about successors and predecessors but this negative dependency thing I was not aware of..

Secondly I also thought of the idea of using the 'WITH UR' clause with the select statement. In fact I saw the select query in the abending program did not have the 'WITH UR' clause. But then we run into the risk of getting the dirty read data..I there a way to prevent this..How will it be if we use CS instead of UR?

Thanks,
Back to top
View user's profile Send private message
Rijit

Active User


Joined: 15 Apr 2010
Posts: 168
Location: Pune

PostPosted: Mon Mar 14, 2011 1:00 pm
Reply with quote

dick scherrer wrote:
Hello,

Quote:
I m scared if we put scheduling resolution it may messs up a little bit..
It is already "messed up". How might proper scheduling cause a problem. Done correctly, it will circumvent this problem. . .

Suggest you spend some time learning how a "fatal embrace" occurs.

I've posted this other places and maybe it will help clarify:
Quote:
To reach this impasse one task "locks" a resource in tableA and another task "locks" a resource in tableB. Then the first task tries to lock the resource already held by the second task. So far, so good. So far, no deadlock. The deadlock occurs when the second task tries to lock the resource already being held by the first task.

Most often this is caused by the tasks locking resources in different sequences. I'd suggest you look at the problem program(s) and see if there are multiple processes that would lock resources in multiple sequences.

Something i've seen multiple places is when code needs to lock "header" and "detail" data and one process locks the header first and another process locks the detail first. This is a deadlock just waiting to happen. Usually, it will happen when transaction volumes are increasing or when the system becomes bottlenecked and locks are not freed quickly enough.


There may be only 1 of 2 bits of code that lock in the "problem order" or there may be many.

Suggest you work with the scheduling people to make sure these jobs do not run together.


Hi Dick,

I really did not understand some lines which you qioted. What are trying to mean my sequences in your statement?

"Most often this is caused by the tasks locking resources in different sequences. I'd suggest you look at the problem program(s) and see if there are multiple processes that would lock resources in multiple sequences."


I did not undesrtand what yu mean by problem order.. Can you please help me understand these things..

Thanks
Back to top
View user's profile Send private message
Anuj Dhawan

Superior Member


Joined: 22 Apr 2006
Posts: 6250
Location: Mumbai, India

PostPosted: Mon Mar 14, 2011 3:37 pm
Reply with quote

Rijit wrote:
..we have zeke scheduler in our project, is there a way to set this type of dependency in Zeke? I know about successors and predecessors but this negative dependency thing I was not aware of..
Sorry but is it really so tough to look into the Manual - when you already know what to search for?
Back to top
View user's profile Send private message
dick scherrer

Moderator Emeritus


Joined: 23 Nov 2006
Posts: 19244
Location: Inside the Matrix

PostPosted: Mon Mar 14, 2011 8:25 pm
Reply with quote

Hello,

Quote:
I really did not understand some lines which you qioted. What are trying to mean my sequences in your statement?
Quote:
I did not undesrtand what yu mean by problem order.. Can you please help me understand these things..
Then you need to re-read the part about a fatal embrace. Only "your" code can cause this. . . One way this can happen is if some processes read the header row for update and then detail rows for update. At the same time another process reads the same detail rows for update and then tries to read the same header for update.

Quote:
I know about successors and predecessors but this negative dependency thing I was not aware of..
This is why you need to work with your scheduling people.
Back to top
View user's profile Send private message
Rijit

Active User


Joined: 15 Apr 2010
Posts: 168
Location: Pune

PostPosted: Tue Mar 15, 2011 1:33 am
Reply with quote

Gt it thanks!
Back to top
View user's profile Send private message
Rijit

Active User


Joined: 15 Apr 2010
Posts: 168
Location: Pune

PostPosted: Tue Mar 15, 2011 1:36 am
Reply with quote

Anuj Dhawan wrote:
Rijit wrote:
..we have zeke scheduler in our project, is there a way to set this type of dependency in Zeke? I know about successors and predecessors but this negative dependency thing I was not aware of..
Sorry but is it really so tough to look into the Manual - when you already know what to search for?


I m new to Zeke, don't have zeke manual..If you let me know from where can I procure it, I would b very happy..Prior to this I have worked with CA7 & ESP..
Back to top
View user's profile Send private message
Rijit

Active User


Joined: 15 Apr 2010
Posts: 168
Location: Pune

PostPosted: Tue Mar 15, 2011 1:42 am
Reply with quote

Scheduling is definately a better option but I would also like to explore other options like tuning the queries.

I saw many select queries are also abending with -911 tat is a concern for me..If we change the islolation level of the select queries to "WITH UR" are we running into a risk of fetching wrong data..I read that islolation level UR permitts uncomitted read..How do we make sure that it does not take the uncommited data?

Secondly I have cae arcoss situations where the update job failed because the tablespace is in rad status but ideally the table space should be in RW status..ny idea what may have altered the tablespace status..I Ma nt aware how the tablsespace gets altered..Might b the DBA has to do something with that! icon_confused.gif
Back to top
View user's profile Send private message
dick scherrer

Moderator Emeritus


Joined: 23 Nov 2006
Posts: 19244
Location: Inside the Matrix

PostPosted: Tue Mar 15, 2011 2:05 am
Reply with quote

Hello,

Quote:
How do we make sure that it does not take the uncommited data?
One way is to change the code so that all of the locks are done in the same order. . . There may be timeouts if the queries take too long but the deadlock will go away.

It sounds like the environment is somewhat out of control. . . If more and more continues to be added to this, it will only get worse.

I suspect that lots of the code locks "things" when it is not necessary or at least before it is necessary.

Quote:
I m new to Zeke, don't have zeke manual..
Have you spent some time with the scheduling people? If not, do so. . . They surely have the documentation and can tell you were the local copy lives on dasd (possibly some file server).
Back to top
View user's profile Send private message
Rijit

Active User


Joined: 15 Apr 2010
Posts: 168
Location: Pune

PostPosted: Wed Mar 16, 2011 1:52 am
Reply with quote

Is it a good idea to reduce the COMMIT frequency for minimizing the possibility of a deadlock?
Back to top
View user's profile Send private message
dick scherrer

Moderator Emeritus


Joined: 23 Nov 2006
Posts: 19244
Location: Inside the Matrix

PostPosted: Wed Mar 16, 2011 2:04 am
Reply with quote

Hello,

Not if there is no way to "get back to where you were". How many commits were issued won't matter if some of the data is lost or corrupted.

I believe the search for a "magic bullet" should cease and the application be corrected. . .
Back to top
View user's profile Send private message
rocky_balboa

New User


Joined: 24 Mar 2010
Posts: 61
Location: Cape of Good Hope

PostPosted: Thu Mar 17, 2011 12:18 pm
Reply with quote

Find out with help of DBAs whether lock escalation is leading the entire table to be locked due to insufficient memory for new locks. Make sure You aren't issuing "lock table" statements in any of your programs. You may need to revisit your commit frequency.

Also issue frequent commits even after your ready only access to release all your share locks.
Back to top
View user's profile Send private message
View previous topic :: :: View next topic  
Post new topic   Reply to topic View Bookmarks
All times are GMT + 6 Hours
Forum Index -> DB2

 


Similar Topics
Topic Forum Replies
No new posts Batch call online program, EXCI task ... CICS 3
No new posts Recovery Routine/Abend exit to releas... PL/I & Assembler 14
No new posts Abends using up address spaces? ABENDS & Debugging 6
No new posts IEFZB476 getting various 0C4 abends PL/I & Assembler 18
No new posts EMPTY FILE - JOB ABENDS JCL & VSAM 4
Search our Forums:

Back to Top