IBM Mainframe Forum Index
 
Log In
 
IBM Mainframe Forum Index Mainframe: Search IBM Mainframe Forum: FAQ Register
 

Sporadic 0C1/AEKA abend in CICS


IBM Mainframe Forums -> CICS
Post new topic   Reply to topic
View previous topic :: View next topic  
Author Message
Mainak_Dalal

New User


Joined: 05 May 2010
Posts: 19
Location: USA

PostPosted: Fri Feb 12, 2016 12:29 am
Reply with quote

One of our CICS Cobol is sporadically abending with 0C1/AEKA. Looking at the assembler listing, the abending instruction is MVCL 6,0 which is move character long. The instruction is actually moving 320 kb of data.
Checked the CICS logs, PSW, register values. But no clue why the MVCL instruction causes 0C1 sporadically.
Can you please suggest what else to check, what could be the cause of this rare issue and what could be the solution?
As a temporary solution, we newcopy / phase in the load module and it solves the issue.
Back to top
View user's profile Send private message
Robert Sample

Global Moderator


Joined: 06 Jun 2008
Posts: 8696
Location: Dubuque, Iowa, USA

PostPosted: Fri Feb 12, 2016 1:05 am
Reply with quote

The S0C1 may be caused by something else (such as table overflowing into code) and have nothing to do with the MVCL. You may have to use one -- or more -- traces in CICS to figure out what, exactly, is happening. The fact that a newcopy resolves the problem points to some kind of storage issue causing the operation exception.

And WHY are you moving 320KB of data in a CICS program? CICS programs should be transaction oriented and hence deal with no more than a few K at most.

Transient issues, especially S0C1 and S0C4 ABENDs, are notoriously hard to resolve since they frequently are caused by something far removed from where the problem becomes apparent.
Back to top
View user's profile Send private message
Bill O'Boyle

CICS Moderator


Joined: 14 Jan 2008
Posts: 2501
Location: Atlanta, Georgia, USA

PostPosted: Fri Feb 12, 2016 5:29 am
Reply with quote

Is there anything in the DFHTACB? The registers at the time of the dump begin at X'60' off the DFHTACB.
Back to top
View user's profile Send private message
Mainak_Dalal

New User


Joined: 05 May 2010
Posts: 19
Location: USA

PostPosted: Fri Feb 12, 2016 9:56 am
Reply with quote

Robert, the program handles compression and decompression of a very large VSAM file that stores transactions history of last 20+ years. It does some initialization of a large array and hence does the 320kb move through MVCL. It seems to me to be some kind of storage overlay. Will try to trace it. But the problem is it happens very infrequently and the programs involved, have not been changed in several years.

Bill, I will check the DFHTACB and will keep you posted

Thanks for your suggestions.
Back to top
View user's profile Send private message
Bill Woodger

Moderator Emeritus


Joined: 09 Mar 2011
Posts: 7309
Location: Inside the Matrix

PostPosted: Fri Feb 12, 2016 4:15 pm
Reply with quote

Note that the execution of an MVCL cannot cause a S0C1.

Your program code must have been overwritten, or a wild branch has entered part-way through an instruction.

Most likely the former, since the problem disappears with a new executable loaded.

As Robert has said, not necessarily so easy to track. It will be a program that were changed immediately (where immediately is some timespan) before the failures started to occur. It need have nothing to do with the program that is failing. But it might, and probably a higher likelihood than other programs. When was the program last changed? What was the change?
Back to top
View user's profile Send private message
Mainak_Dalal

New User


Joined: 05 May 2010
Posts: 19
Location: USA

PostPosted: Mon Feb 15, 2016 10:12 pm
Reply with quote

When I checked the Memory dump, I am seeing missing module names

Code:
DSA   Entry       E  Offset  Statement   Load Mod             Program Unit
1     CEEHDSP     +00004A4C                                   CEEHDSP     
2     CEECGEX     +000001FE                                   CEECGEX     
3                 +27C5E7D2                                               
4                 +00000000                                               
5                 +00000000                                               
6                 +00000000                                               
7     IVEXAI1     +00003138                                   IVEXAI1     
8     IGZCFCC     +000002FC                                   IGZCFCC     
9     IVEXAI1     +000001E2                                   VIXK019     
10    IOIKTIA     +000036B2                                   IOIKTIA     
11    VIXK009     +000002C6                                   VIXK009     
12    CEECRINV    +00000306                                   CEECRINV   
13    CEECRINI    +00000B4E                                   CEECRINI   


Code:
DSA   DSA Addr   E  Addr    PU Addr    PU Offset  Comp Date  Compile Attributes
1     250F03F0   256C5BD8   256C5BD8   +00004A4C  20150130   CEL               
2     2506C660   256BB890   256BB890   +000001FE  20130313   CEL               
3     2506C188   00000000   00000000   +27C5E7D2  ********   COBOL             
4     2506BFC0   00000000   00000000   +00000000  ********   COBOL             
5     2506BD00   00000000   00000000   +00000000  ********   COBOL             
6     2506B960   00000000   00000000   +00000000  ********   COBOL             
7     25057C48   27C00000   27C00000   +00003138  20060301   COBOL             
8     25057A50   25819A08   25819A08   +000002FC  20140722   LIBRARY           
9     25057810   265A60E6   265A5B70   +00000758  20060630   COBOL             
10    25057448   265A16A0   265A16A0   +000036B2  20100902   COBOL             
11    250572B0   265A10E0   265A10E0   +000002C6  20080903   COBOL             
12    25057108   256BEC98   256BEC98   +00000306  20130313   CEL               
13    25057088   256BDFA8   256BDFA8   +00000B4E  20150130   CEL               

Looking at the offset, the instruction is
Code:
BALR  14,15

Register 15 contains +27C5E7D2, which is probably not being found and causing the 0C1.

I do not see any changes to the programs being done in last several years, but the problem started happening from last year only and that also only once in a month or 2. No trend of activities. This is clueless abend :-)
Back to top
View user's profile Send private message
Robert Sample

Global Moderator


Joined: 06 Jun 2008
Posts: 8696
Location: Dubuque, Iowa, USA

PostPosted: Mon Feb 15, 2016 10:20 pm
Reply with quote

FWIW, X'27C5E7D2' is an X'27' followed by EXK -- perhaps the second half of a length followed by the starting bytes of a field? If you don't have a dump from the region, get one and look for those bytes in the dump.

If the problem only started last year, could it be date-related? Have you pulled together a list of occurrences to look for commonalities (same day of month, Julian day incremented by some constant, that type of thing)?

As we said, these issues can be very tough to debug since you don't have much of a starting idea. Does the CICS region have storage protection turned on? Is the problem occurring in production AND test?
Back to top
View user's profile Send private message
Mainak_Dalal

New User


Joined: 05 May 2010
Posts: 19
Location: USA

PostPosted: Mon Feb 15, 2016 10:31 pm
Reply with quote

Code:
        TACB      Abend                                                   
TACB    Address   Code    Program   PSW                 BEAR             
****** ******** ****** ******** ******** ******** *****************
TACB01 25579008 ASRA    GIOIKTI  079D0000 A7C5E7D4 00000000_01916372
This is what I see in DFHTACB
Back to top
View user's profile Send private message
Mainak_Dalal

New User


Joined: 05 May 2010
Posts: 19
Location: USA

PostPosted: Mon Feb 15, 2016 10:33 pm
Reply with quote

Robert, it is only in production
Back to top
View user's profile Send private message
Bill Woodger

Moderator Emeritus


Joined: 09 Mar 2011
Posts: 7309
Location: Inside the Matrix

PostPosted: Mon Feb 15, 2016 11:49 pm
Reply with quote

It is highly likely that you've trashed your program. You don't seem to understand yet what a S0C1 is. BALR cannot give you a S0C1. You can branch to a wild address and happen to get a S0C1, or your BALR can get overwritten and give an invalid instruction. What is the actual instruction pointed-to in the dump?
Back to top
View user's profile Send private message
Mainak_Dalal

New User


Joined: 05 May 2010
Posts: 19
Location: USA

PostPosted: Tue Feb 16, 2016 12:41 am
Reply with quote

Bill, the actual instruction is a BALR in the dump. Most likely it is causing a wild branch. register 15 contains the address that you are seeing in DSA 3 in my screenshot. Which has offset +27C5E7D2. But when I checked the dump, I do not see any content at this offset.

where I am confused is that the chain of programs are called from prod CICS region hundreds of times a day. But they abend seldom, specially when called first time after region job is restarted. Our shop has AUTOINSTALL. So the first call to the program loads the program and does a phase in. In majority of the S0C1 with this load module happens when a massive VSAM is being accessed.
Back to top
View user's profile Send private message
PeterHolland

Global Moderator


Joined: 27 Oct 2009
Posts: 2481
Location: Netherlands, Amstelveen

PostPosted: Tue Feb 16, 2016 2:42 pm
Reply with quote

Maybe the next link can help you to pinpoint your problem :

194.196.36.29/support/knowledgecenter/SSGMCP_4.2.0/com.ibm.cics.ts.doc/dfhs1/topics/dfhs14h.html
Back to top
View user's profile Send private message
Bill Woodger

Moderator Emeritus


Joined: 09 Mar 2011
Posts: 7309
Location: Inside the Matrix

PostPosted: Tue Feb 16, 2016 4:37 pm
Reply with quote

Principles of Operation wrote:
Operation Exception
An operation exception is recognized when the CPU attempts to execute an instruction with an invalid operation code. The operation code may be unassigned, or the instruction with that operation code may not be installed on the CPU.


I am sure that your CPU supports BALR, so it is not a BALR which is giving the S0C1, so that is not the "instruction" in the dump.

If you use Peter's link and find the actual instruction in the dump you'd find something other than a BALR.

If you think you've found a BALR, you have probably "adjusted" your view of what was actually pointed-to, and you have a wild branch that may have landed somewhere within a BALR.

If you find the location and it looks like a storage-area, not program code, then you're lucky. The storage-looking stuff would convince you of the overwriting.

Everything (which since you've not provided that much, is not much to go on) is pointing to your program code having been overwritten. You mention large VSAM records. Can the records be larger than the area defined in the program to store them? To put it another way, do you have any records which are unexpectedly large? Reasonably simple to check that on your file.
Back to top
View user's profile Send private message
Bill O'Boyle

CICS Moderator


Joined: 14 Jan 2008
Posts: 2501
Location: Atlanta, Georgia, USA

PostPosted: Tue Feb 16, 2016 5:23 pm
Reply with quote

As Bill has said, it would be safe to assume that your compiler supports the BALR instruction. But, you had said from the start that it was an MVCL? Under certain circumstances, COBOL will bypass the in-line MVCL and CALL (BALR) to a run-time routine to perform the MOVE.

Could you check the Assembler expansion and to your right, when R15 is loaded with the routine's address, the name will/should be present. If this is one of the routines which didn't resolve during link-edit, then I think you'll have your answer.

Or (as an exercise) divide the target storage-area by 256, get the quotient and use reference modification in an in-line PERFORM UNTIL, moving 256 bytes at a time.

When the PERFORM UNTIL completes, if you had a remainder after the divide then move this as a stand-alone reference-modification MOVE with the starting position "Quotient" times 256 plus 1 for the length-value in the remainder or you can omit the length-value and the compiler will calculate this.

This would be a process of elimination, overriding the MVCL issue.
Back to top
View user's profile Send private message
View previous topic :: :: View next topic  
Post new topic   Reply to topic View Bookmarks
All times are GMT + 6 Hours
Forum Index -> CICS

 


Similar Topics
Topic Forum Replies
No new posts Using API Gateway from CICS program CICS 0
No new posts ISAM and abend S03B JCL & VSAM 10
No new posts Calling an Open C library function in... CICS 1
No new posts How to 'Ping' a CICS region in JCL CICS 2
No new posts Parallelization in CICS to reduce res... CICS 4
Search our Forums:

Back to Top