View previous topic :: View next topic
|
Author |
Message |
Mainak_Dalal
New User
Joined: 05 May 2010 Posts: 19 Location: USA
|
|
|
|
One of our CICS Cobol is sporadically abending with 0C1/AEKA. Looking at the assembler listing, the abending instruction is MVCL 6,0 which is move character long. The instruction is actually moving 320 kb of data.
Checked the CICS logs, PSW, register values. But no clue why the MVCL instruction causes 0C1 sporadically.
Can you please suggest what else to check, what could be the cause of this rare issue and what could be the solution?
As a temporary solution, we newcopy / phase in the load module and it solves the issue. |
|
Back to top |
|
|
Robert Sample
Global Moderator
Joined: 06 Jun 2008 Posts: 8700 Location: Dubuque, Iowa, USA
|
|
|
|
The S0C1 may be caused by something else (such as table overflowing into code) and have nothing to do with the MVCL. You may have to use one -- or more -- traces in CICS to figure out what, exactly, is happening. The fact that a newcopy resolves the problem points to some kind of storage issue causing the operation exception.
And WHY are you moving 320KB of data in a CICS program? CICS programs should be transaction oriented and hence deal with no more than a few K at most.
Transient issues, especially S0C1 and S0C4 ABENDs, are notoriously hard to resolve since they frequently are caused by something far removed from where the problem becomes apparent. |
|
Back to top |
|
|
Bill O'Boyle
CICS Moderator
Joined: 14 Jan 2008 Posts: 2501 Location: Atlanta, Georgia, USA
|
|
|
|
Is there anything in the DFHTACB? The registers at the time of the dump begin at X'60' off the DFHTACB. |
|
Back to top |
|
|
Mainak_Dalal
New User
Joined: 05 May 2010 Posts: 19 Location: USA
|
|
|
|
Robert, the program handles compression and decompression of a very large VSAM file that stores transactions history of last 20+ years. It does some initialization of a large array and hence does the 320kb move through MVCL. It seems to me to be some kind of storage overlay. Will try to trace it. But the problem is it happens very infrequently and the programs involved, have not been changed in several years.
Bill, I will check the DFHTACB and will keep you posted
Thanks for your suggestions. |
|
Back to top |
|
|
Bill Woodger
Moderator Emeritus
Joined: 09 Mar 2011 Posts: 7309 Location: Inside the Matrix
|
|
|
|
Note that the execution of an MVCL cannot cause a S0C1.
Your program code must have been overwritten, or a wild branch has entered part-way through an instruction.
Most likely the former, since the problem disappears with a new executable loaded.
As Robert has said, not necessarily so easy to track. It will be a program that were changed immediately (where immediately is some timespan) before the failures started to occur. It need have nothing to do with the program that is failing. But it might, and probably a higher likelihood than other programs. When was the program last changed? What was the change? |
|
Back to top |
|
|
Mainak_Dalal
New User
Joined: 05 May 2010 Posts: 19 Location: USA
|
|
|
|
When I checked the Memory dump, I am seeing missing module names
Code: |
DSA Entry E Offset Statement Load Mod Program Unit
1 CEEHDSP +00004A4C CEEHDSP
2 CEECGEX +000001FE CEECGEX
3 +27C5E7D2
4 +00000000
5 +00000000
6 +00000000
7 IVEXAI1 +00003138 IVEXAI1
8 IGZCFCC +000002FC IGZCFCC
9 IVEXAI1 +000001E2 VIXK019
10 IOIKTIA +000036B2 IOIKTIA
11 VIXK009 +000002C6 VIXK009
12 CEECRINV +00000306 CEECRINV
13 CEECRINI +00000B4E CEECRINI
|
Code: |
DSA DSA Addr E Addr PU Addr PU Offset Comp Date Compile Attributes
1 250F03F0 256C5BD8 256C5BD8 +00004A4C 20150130 CEL
2 2506C660 256BB890 256BB890 +000001FE 20130313 CEL
3 2506C188 00000000 00000000 +27C5E7D2 ******** COBOL
4 2506BFC0 00000000 00000000 +00000000 ******** COBOL
5 2506BD00 00000000 00000000 +00000000 ******** COBOL
6 2506B960 00000000 00000000 +00000000 ******** COBOL
7 25057C48 27C00000 27C00000 +00003138 20060301 COBOL
8 25057A50 25819A08 25819A08 +000002FC 20140722 LIBRARY
9 25057810 265A60E6 265A5B70 +00000758 20060630 COBOL
10 25057448 265A16A0 265A16A0 +000036B2 20100902 COBOL
11 250572B0 265A10E0 265A10E0 +000002C6 20080903 COBOL
12 25057108 256BEC98 256BEC98 +00000306 20130313 CEL
13 25057088 256BDFA8 256BDFA8 +00000B4E 20150130 CEL
|
Looking at the offset, the instruction is
Register 15 contains +27C5E7D2, which is probably not being found and causing the 0C1.
I do not see any changes to the programs being done in last several years, but the problem started happening from last year only and that also only once in a month or 2. No trend of activities. This is clueless abend :-) |
|
Back to top |
|
|
Robert Sample
Global Moderator
Joined: 06 Jun 2008 Posts: 8700 Location: Dubuque, Iowa, USA
|
|
|
|
FWIW, X'27C5E7D2' is an X'27' followed by EXK -- perhaps the second half of a length followed by the starting bytes of a field? If you don't have a dump from the region, get one and look for those bytes in the dump.
If the problem only started last year, could it be date-related? Have you pulled together a list of occurrences to look for commonalities (same day of month, Julian day incremented by some constant, that type of thing)?
As we said, these issues can be very tough to debug since you don't have much of a starting idea. Does the CICS region have storage protection turned on? Is the problem occurring in production AND test? |
|
Back to top |
|
|
Mainak_Dalal
New User
Joined: 05 May 2010 Posts: 19 Location: USA
|
|
|
|
Code: |
TACB Abend
TACB Address Code Program PSW BEAR
****** ******** ****** ******** ******** ******** *****************
TACB01 25579008 ASRA GIOIKTI 079D0000 A7C5E7D4 00000000_01916372
|
This is what I see in DFHTACB |
|
Back to top |
|
|
Mainak_Dalal
New User
Joined: 05 May 2010 Posts: 19 Location: USA
|
|
|
|
Robert, it is only in production |
|
Back to top |
|
|
Bill Woodger
Moderator Emeritus
Joined: 09 Mar 2011 Posts: 7309 Location: Inside the Matrix
|
|
|
|
It is highly likely that you've trashed your program. You don't seem to understand yet what a S0C1 is. BALR cannot give you a S0C1. You can branch to a wild address and happen to get a S0C1, or your BALR can get overwritten and give an invalid instruction. What is the actual instruction pointed-to in the dump? |
|
Back to top |
|
|
Mainak_Dalal
New User
Joined: 05 May 2010 Posts: 19 Location: USA
|
|
|
|
Bill, the actual instruction is a BALR in the dump. Most likely it is causing a wild branch. register 15 contains the address that you are seeing in DSA 3 in my screenshot. Which has offset +27C5E7D2. But when I checked the dump, I do not see any content at this offset.
where I am confused is that the chain of programs are called from prod CICS region hundreds of times a day. But they abend seldom, specially when called first time after region job is restarted. Our shop has AUTOINSTALL. So the first call to the program loads the program and does a phase in. In majority of the S0C1 with this load module happens when a massive VSAM is being accessed. |
|
Back to top |
|
|
PeterHolland
Global Moderator
Joined: 27 Oct 2009 Posts: 2481 Location: Netherlands, Amstelveen
|
|
Back to top |
|
|
Bill Woodger
Moderator Emeritus
Joined: 09 Mar 2011 Posts: 7309 Location: Inside the Matrix
|
|
|
|
Principles of Operation wrote: |
Operation Exception
An operation exception is recognized when the CPU attempts to execute an instruction with an invalid operation code. The operation code may be unassigned, or the instruction with that operation code may not be installed on the CPU. |
I am sure that your CPU supports BALR, so it is not a BALR which is giving the S0C1, so that is not the "instruction" in the dump.
If you use Peter's link and find the actual instruction in the dump you'd find something other than a BALR.
If you think you've found a BALR, you have probably "adjusted" your view of what was actually pointed-to, and you have a wild branch that may have landed somewhere within a BALR.
If you find the location and it looks like a storage-area, not program code, then you're lucky. The storage-looking stuff would convince you of the overwriting.
Everything (which since you've not provided that much, is not much to go on) is pointing to your program code having been overwritten. You mention large VSAM records. Can the records be larger than the area defined in the program to store them? To put it another way, do you have any records which are unexpectedly large? Reasonably simple to check that on your file. |
|
Back to top |
|
|
Bill O'Boyle
CICS Moderator
Joined: 14 Jan 2008 Posts: 2501 Location: Atlanta, Georgia, USA
|
|
|
|
As Bill has said, it would be safe to assume that your compiler supports the BALR instruction. But, you had said from the start that it was an MVCL? Under certain circumstances, COBOL will bypass the in-line MVCL and CALL (BALR) to a run-time routine to perform the MOVE.
Could you check the Assembler expansion and to your right, when R15 is loaded with the routine's address, the name will/should be present. If this is one of the routines which didn't resolve during link-edit, then I think you'll have your answer.
Or (as an exercise) divide the target storage-area by 256, get the quotient and use reference modification in an in-line PERFORM UNTIL, moving 256 bytes at a time.
When the PERFORM UNTIL completes, if you had a remainder after the divide then move this as a stand-alone reference-modification MOVE with the starting position "Quotient" times 256 plus 1 for the length-value in the remainder or you can omit the length-value and the compiler will calculate this.
This would be a process of elimination, overriding the MVCL issue. |
|
Back to top |
|
|
|