IBM Mainframe Forum Index
 
Log In
 
IBM Mainframe Forum Index Mainframe: Search IBM Mainframe Forum: FAQ Register
 

u4087 Abend - not able to resolve.


IBM Mainframe Forums -> ABENDS & Debugging
Post new topic   Reply to topic
View previous topic :: View next topic  
Author Message
shankarm

Active User


Joined: 17 May 2010
Posts: 175
Location: India

PostPosted: Wed Jun 13, 2012 7:09 am
Reply with quote

Hello,

My production job which is running for more 20 years is failing and I am not able to locate the exact issue.

I am trying different options for the past 10 hours.. AM missing something and i am not able to locate the issue..

Below is what is available in the logs

Code:

IGD104I RFSAIAS.D164.T2020328.RFSAIAS2               RETAINED,  DDNAME=SYS00001
IEA822I COMPLETE TRANSACTION DUMP WRITTEN TO RFSAIAS.D164.T2020328.RFSAIAS2   
+CEE3797I LANGUAGE ENVIRONMENT HAS DYNAMICALLY CREATED A DUMP.                 
+CEE0374C CONDITION=CEE3204S TOKEN=00030C84 59C3C5C5 00000000  933             
          WHILE RUNNING PROGRAM IGG019BP                                       
          AT THE TIME OF INTERRUPT                                             
          PSW     078D0000 00CA422E                                           
          GPR 0-3 00000000 000BF038 000BF038 002CBDC0                         
          GPR 4-7 002C3FFE 002C4C8C 00007E58 00000030                         
          GPR 8-B 00CA40B8 00404040 002CBDF0 0000FEF0             
          GPR C-F 00007B78 000BD730 501D7948 00CA40B8             
          FLT 0-2 4410810000000000  4A6F33F5FD9E0000             
          FLT 4-6 0000000000000000  0000000000000000             
+CEE0374C CONDITION=CEE3206S TOKEN=00030C86 59C3C5C5 00000000  934
          WHILE RUNNING PROGRAM CEEBINIT                         
          AT THE TIME OF INTERRUPT                               
          PSW     078D2000 8002472C                               
          GPR 0-3 00000006 0002B240 002CB958 002CB890             
          GPR 4-7 0002B240 00000076 00000079 0000000C             
          GPR 8-B 0002B1E0 0002B1C0 404040B9 00D9060C             
          GPR C-F 0002CB88 0003FC00 800246FA 00D90618             
          FLT 0-2 4410810000000000  4A6F33F5FD9E0000             
          FLT 4-6 0000000000000000  0000000000000000       
IEA995I SYMPTOM DUMP OUTPUT  941                           
  USER COMPLETION CODE=4087 REASON CODE=00000007           
 TIME=20.20.32  SEQ=15804  CPU=0000  ASID=0091             
 PSW AT TIME OF ERROR  078D1000   86BDFDBC  ILC 2  INTC 0D
   NAME=UNKNOWN                                                           
   DATA AT PSW  06BDFDB6 - 00181610  0A0DA7F4  001C1811                   
   AR/GR 0: 00000000/84000000   1: 00000000/84000FF7                       
         2: 00000000/00000007   3: 00000000/00031038                       
         4: 00000000/06C1C128   5: 00000000/06C1C2A0                       
         6: 00000000/0002B340   7: 00000000/0002B7F0                       
         8: 00000000/80000000   9: 00000000/00041F9E                       
         A: 00000000/00000001   B: 00000000/86BDFCE8                       
         C: 00000000/0002CB88   D: 00000000/0003FFA0                       
         E: 00000000/8003204A   F: 01000002/00000007                       
 END OF SYMPTOM DUMP                                                       
IEC915I 219-03,RFSAIAS2,AIAANAL2,********                                 
IEC999I IFG0TC0A,IFG0TC0B,RFSAIAS2,AIAANAL2                               
IEC999I IGC00020,RFSAIAS2,AIAANAL2                                         
IEC999I IFG0TC0A,IFG0TC0B,RFSAIAS2,AIAANAL2,DEB ADDR=8B5480  ,DSN = UNKNOWN
         CC                                                             
IEC205I ERRORS,RFSAIAS2,AIAANAL2,FILESEQ=1, COMPLETE VOLUME LIST,  946   
DSN=AIA.RFSAIA.ERRORS.G0788V00,VOLS=710825,TOTALBLOCKS=350               
IEF450I RFSAIAS2 AIAANAL2 STEP1 - ABEND=S000 U4087 REASON=00000007  947 
        TIME=20.20.47                                                   
IEF234E K 552D,234229,PVT,RFSAIAS2,AIAANAL2                             
TMS014  IEF234E K 552D,234229,PVT,RFSAIAS2,AIAANAL2                     
IEF234E K 5566,710825,PVT,RFSAIAS2,AIAANAL2                             
TMS014  IEF234E K 5566,710825,PVT,RFSAIAS2,AIAANAL2                     
-STEP1    AIAANAL2 U4087  80116  63293   7.60    .00   15.9 13759K  BATCH



Code:


IEC915I 219-03,RFSAIAS2,AIAANAL2,********                                       
IEC999I IGC00020,RFSAIAS2,AIAANAL2                                             
IEC999I IFG0TC0A,IFG0TC0B,RFSAIAS2,AIAANAL2,DEB ADDR=8B5480  ,DSN = UNKNOWN     
IEC205I ERRORS,RFSAIAS2,AIAANAL2,FILESEQ=1, COMPLETE VOLUME LIST,               
DSN=AIA.RFSAIA.ERRORS.G0788V00,VOLS=710825,TOTALBLOCKS=350                     
IEF472I RFSAIAS2 AIAANAL2 STEP1 - COMPLETION CODE - SYSTEM=000 USER=4087 REASON=00000007



People in the forum have helped me may times in the difficult situations.. I am hoping that i will get a recommendations about what to do here again.. please help....
Back to top
View user's profile Send private message
shankarm

Active User


Joined: 17 May 2010
Posts: 175
Location: India

PostPosted: Wed Jun 13, 2012 7:36 am
Reply with quote

I also tried to re compile the program now as i thought this could be a s0c4..

Below data is available in sysdump,
PSW AT ENTRY TO ABEND 078D1000 86BDFDBC ILC 02 INTC 000D
PSW LOAD MODULE ADDRESS = 06B5F000 OFFSET = 00080DBC
NAME=CEEPLPKA

ASCB: 00F6B200


I recompiled with list option but not able to find the address or offset...

Nothing in CAIprint.. Do i have to use some CA optimizer options to locate this?
Back to top
View user's profile Send private message
Robert Sample

Global Moderator


Joined: 06 Jun 2008
Posts: 8700
Location: Dubuque, Iowa, USA

PostPosted: Wed Jun 13, 2012 7:37 am
Reply with quote

Quote:
My production job which is running for more 20 years is failing
Who cares how long the program ran without failing? What matters now is that SOMETHING CHANGED and the program is now failing.

The 219 message states in the MAC manual what to do:
Quote:
System Action: The system issues an SVC Dump, writes a software error record to the logrec data set, and the task is ended. Operator Response: Start a generalized trace facility (GTF) trace, and re-create the problem. Reply to message AHL100A with:

TRACE=SYS,USR,SLIP


On the DD statement for the data set in error, specify:

DCB=DIAGNS=TRACE


Application Programmer Response: Make sure that your program does not alter the DCB or IOB during processing of SVC 25.

System Programmer Response: If the error recurs and the program is not in error, look at the messages in the job log for more information. Search problem reporting data bases for a fix for the problem. If no fix exists, contact the IBM Support Center. Provide the JCL, the program listing for the job, and the logrec data set error record.
while the CEE3204S message in the manual indicates
Quote:
CEE3204S The system detected a protection exception (System Completion
Code=0C4).


Explanation: Your program attempted to access a storage location to which it was not authorized. Programmer Response: Check your application for these common errors:
Using the wrong AMODE to reference storage
Trying to use a pointer that has not been set
Trying to store data into storage reserved for the system
Using an invalid index to an array
See a Principles of Operation manual for a full list of protection exceptions. System Action: The thread is terminated. Symbolic Feedback Code: CEE344
If you have tried the diagnostics in the 219 message, then the next step is to contact IBM and open a PMR.
Back to top
View user's profile Send private message
shankarm

Active User


Joined: 17 May 2010
Posts: 175
Location: India

PostPosted: Wed Jun 13, 2012 7:47 am
Reply with quote

Thanks Robert. I have a question,

If this is S0c4, the where can i find the address and offset? no data is available in sysout.

As i mentioned in the previous post, i tried to use list option in compiler and find the exact location but the offset and address shown in the sysdump is not available in the compilation job?
Back to top
View user's profile Send private message
dick scherrer

Moderator Emeritus


Joined: 23 Nov 2006
Posts: 19243
Location: Inside the Matrix

PostPosted: Wed Jun 13, 2012 7:49 am
Reply with quote

Hello,

Keep in mind that when "things" change on a system, old modules may fail.

Was the jcl changed recently?

Suggest you check with the system support or Cofiguration Management group (if there is one) to learn if there have been any upgrades or fixes applied since the program last ran successfully.

Suggest you re-compile the program into a test loadlib using a different load module name and see if the newly created test module will:
a. compile/link successfully
b. execute successfully
Back to top
View user's profile Send private message
dick scherrer

Moderator Emeritus


Joined: 23 Nov 2006
Posts: 19243
Location: Inside the Matrix

PostPosted: Wed Jun 13, 2012 7:50 am
Reply with quote

Hello,

Quote:
If this is S0c4, the where can i find the address and offset? no data is available in sysout.
Quite possibly the program has "walked on storage" and generated an invalid address.
Back to top
View user's profile Send private message
shankarm

Active User


Joined: 17 May 2010
Posts: 175
Location: India

PostPosted: Wed Jun 13, 2012 7:55 am
Reply with quote

When you say "walked on storage".. does it mean that the pogram has used up all the memory allocated to the program and generated an invalid address?

if yes, what are the possible fixes for this?

I already tried to run the program with region=0k.. but still it is failing...
Back to top
View user's profile Send private message
dick scherrer

Moderator Emeritus


Joined: 23 Nov 2006
Posts: 19243
Location: Inside the Matrix

PostPosted: Wed Jun 13, 2012 7:58 am
Reply with quote

Hello,

"Walked on storage" means the code caused data to be moved to some intended address that was still valid. When the corrupt data (which was supposed to contain an address) is used, the 0c4 can occur.

Again, you need to identify what has changed since the last successful executon.

Have you made the test program, compiled it, and run a test?
If not, suggest you do so now.
Back to top
View user's profile Send private message
shankarm

Active User


Joined: 17 May 2010
Posts: 175
Location: India

PostPosted: Wed Jun 13, 2012 8:02 am
Reply with quote

I have compiled and ran the job.. it succesfully processes 2.3 million records.. after 2.3 million was processed the job failed...
Back to top
View user's profile Send private message
shankarm

Active User


Joined: 17 May 2010
Posts: 175
Location: India

PostPosted: Wed Jun 13, 2012 8:03 am
Reply with quote

I thought tis could an issue with one particular record and i skipped the record from the input again the job fails...
Back to top
View user's profile Send private message
shankarm

Active User


Joined: 17 May 2010
Posts: 175
Location: India

PostPosted: Wed Jun 13, 2012 8:06 am
Reply with quote

There were no changes done to this for atleast 2 years.. i am sure about that...
Back to top
View user's profile Send private message
dbzTHEdinosauer

Global Moderator


Joined: 20 Oct 2006
Posts: 6966
Location: porcelain throne

PostPosted: Wed Jun 13, 2012 8:15 am
Reply with quote

instead of skipping the record (after 2.3 million). skip the 2.3 million and then run.
Back to top
View user's profile Send private message
shankarm

Active User


Joined: 17 May 2010
Posts: 175
Location: India

PostPosted: Wed Jun 13, 2012 8:17 am
Reply with quote

There were no changes done to this for atleast 2 years.. i am sure about that...
Back to top
View user's profile Send private message
dbzTHEdinosauer

Global Moderator


Joined: 20 Oct 2006
Posts: 6966
Location: porcelain throne

PostPosted: Wed Jun 13, 2012 8:19 am
Reply with quote

is this one cobol program,
tape in, tape out
any cobol internal tables?

no CALLs to other modules?
Back to top
View user's profile Send private message
dick scherrer

Moderator Emeritus


Joined: 23 Nov 2006
Posts: 19243
Location: Inside the Matrix

PostPosted: Wed Jun 13, 2012 8:35 am
Reply with quote

Hello,

Quote:
There were no changes done to this for atleast 2 years.. i am sure about that...
Possibly not, but that has nothing to do with i mentioned earlier. . .

Something HAS changed somewhere. It could be the data or ANY of the other possibilities mentioned above. There is also the chance that the problem has been in the code all along and just never caused the problem til now.

Be suspicous of any arrays or called modules (as DBZ mentioned).
Back to top
View user's profile Send private message
shankarm

Active User


Joined: 17 May 2010
Posts: 175
Location: India

PostPosted: Wed Jun 13, 2012 9:05 am
Reply with quote

This is a cobol program and it calls many modules.. it has 5 internal tables... loaded...
Back to top
View user's profile Send private message
rajesh1183

New User


Joined: 07 Jan 2008
Posts: 98
Location: Hyderabad

PostPosted: Wed Jun 13, 2012 11:52 am
Reply with quote

hope you have gone thru QW for U04087, else,

Code:

Explanation:  A recursive error was detected. A condition was raised, 
causing the number of nested conditions to exceed the limit set by the
DEPTHCONDLMT option. The reason code indicates which subcomponent or   
process was active when the exception was detected.                   


Code:

Reason code Explanation

X'07' (7) While Language Environment was trying to output a message, a 
          subsequent condition was raised.                             



Code:

Programmer Response:  In the case of CEEHDLR routine, recursion can occur
when you use the DEPTHCONDLMT run-time option.  It may be helpful to     
generate a system dump of the original error by using run-time options   
TERMTHDACT(UAIMM) and TRAP(ON,NOSPIE).                                   


not sure though how it helps u
Back to top
View user's profile Send private message
Nic Clouston

Global Moderator


Joined: 10 May 2007
Posts: 2454
Location: Hampshire, UK

PostPosted: Wed Jun 13, 2012 11:52 am
Reply with quote

maybe one of your internal tables has overflowed - do you have checking on them?
Back to top
View user's profile Send private message
Bill Woodger

Moderator Emeritus


Joined: 09 Mar 2011
Posts: 7309
Location: Inside the Matrix

PostPosted: Wed Jun 13, 2012 1:21 pm
Reply with quote

Code:
+CEE0374C CONDITION=CEE3204S TOKEN=00030C84 59C3C5C5 00000000  933             
          WHILE RUNNING PROGRAM IGG019BP                                       
+CEE0374C CONDITION=CEE3206S TOKEN=00030C86 59C3C5C5 00000000  934
          WHILE RUNNING PROGRAM CEEBINIT   


You have failed in a system module, via CEEBINIT.

CEEBINIT is used when you call a Cobol program.

As others have suggested, something has done your storage in somewhere, sufficiently to knock over the IGG019BP.

The current data that you are using or possibly the immediately previous data or, if you are unlucky, data some time earlier, has caused your problem.

If possible, try to run with SSRANGE on, which should check the tables for overflow. Otherwise it is likely one of the called modules that is doing something.

dbz's suggestion is useful to you. If you shorten your file but leave about 1000 before the abend, it may help you to track it down.

I would go back through the calling chain and see what program was called to get to the abend. Fortunately even the IBM routines follow the call/save conventions, so it is just some work going back through the dump.

When you find the module call, that might not be the one causing the problem. Storage has been overwritten at some point, but that might be earlier than immediately previous to that, as the abend won't occur until you happen to have something try to use the corrupted storage as instructions.
Back to top
View user's profile Send private message
dbzTHEdinosauer

Global Moderator


Joined: 20 Oct 2006
Posts: 6966
Location: porcelain throne

PostPosted: Wed Jun 13, 2012 1:29 pm
Reply with quote

what are the considerations (reasons) that a new item is added to any one of the several tables?

Has any business definition changed recently?
e.g. new departments, additional somethings that could affect the way an incoming record would be stored in your internal tables.
Back to top
View user's profile Send private message
Anuj Dhawan

Superior Member


Joined: 22 Apr 2006
Posts: 6248
Location: Mumbai, India

PostPosted: Wed Jun 13, 2012 1:34 pm
Reply with quote

You need to give us something concrete to work upon as you're getting a user abend which might mean anything. USER COMPLETION CODE=4087 REASON CODE=00000007 might have a distant possibility to be called by IMS if IMS is involved. Having said that, as Robert indicates about CEE3204S -- I suspect that, it can be as trivial as not using an index properly.

Do you have ODO in your COBOL program?

Suggest, as one of the options, you compile the program with SSRANGE option and execute it again. DISPLAY all the INDEXes and SUBSCRIPTs and check if they are in permissible range.
Back to top
View user's profile Send private message
Bill Woodger

Moderator Emeritus


Joined: 09 Mar 2011
Posts: 7309
Location: Inside the Matrix

PostPosted: Wed Jun 13, 2012 3:16 pm
Reply with quote

Looking a bit further, the CEEBINIT and the IGG are probably "artefacts", the original abend causing subsequent abends as LE tries to "clear up".

IEA995I SYMPTOM DUMP OUTPUT 941
Code:
  USER COMPLETION CODE=4087 REASON CODE=00000007           
 TIME=20.20.32  SEQ=15804  CPU=0000  ASID=0091             
 PSW AT TIME OF ERROR  078D1000   86BDFDBC  ILC 2  INTC 0D
   NAME=UNKNOWN                                                           
   DATA AT PSW  06BDFDB6 - 00181610  0A0DA7F4  001C1811                   
   AR/GR 0: 00000000/84000000   1: 00000000/84000FF7                       
         2: 00000000/00000007   3: 00000000/00031038                       
         4: 00000000/06C1C128   5: 00000000/06C1C2A0                       
         6: 00000000/0002B340   7: 00000000/0002B7F0                       
         8: 00000000/80000000   9: 00000000/00041F9E                       
         A: 00000000/00000001   B: 00000000/86BDFCE8                       
         C: 00000000/0002CB88   D: 00000000/0003FFA0                       
         E: 00000000/8003204A   F: 01000002/00000007 


The above is probably what you want to concentrate on.

The PSW looks "odd".

Register 8 "looks like" the last parameter passed to a Cobol program had no address. But with storage overwriting this needn't matter necessarily (as anything could be happening).

You dropped your current record and still abended, so it is a prior record causing the problem. "Eyeball" the file (see if records "look" consistent, check any occurence values) and the dump (look for "repeating" storage, if you find it, follow it backwards to where it starts and work out what module that is).

As Anuj said, we have little really to go on. If still stuck (after some sleep/someone else taken over) tell us about the "files" the program is reading, how many modules are called and any unanswere questions from above. I think we can ignore the "recursive" bit :-)
Back to top
View user's profile Send private message
Robert Sample

Global Moderator


Joined: 06 Jun 2008
Posts: 8700
Location: Dubuque, Iowa, USA

PostPosted: Wed Jun 13, 2012 4:36 pm
Reply with quote

This particular problem is one of the class of problems that is rarely possible for an application programmer to solve alone. Usually resolution requires generating a trace and reading that trace to determine what stepped on storage and when -- and that could have happened before the first record was processed.

You need to talk to your site support group and get a system programmer involved. Otherwise, the only realistic option to resolve the issue is to contact IBM and open a PMR -- and IBM will certainly want to see the trace as mentioned in the 219 error message text.
Back to top
View user's profile Send private message
Bill Woodger

Moderator Emeritus


Joined: 09 Mar 2011
Posts: 7309
Location: Inside the Matrix

PostPosted: Wed Jun 13, 2012 6:24 pm
Reply with quote

Robert Sample wrote:
This particular problem is one of the class of problems that is rarely possible for an application programmer to solve alone. [...]


I eat 'em for breakfast :-)

Good advice, though, Robert. After 10 hours there must have been some sort of "escalation".

Takes our fun away, but Production waits for no-one...
Back to top
View user's profile Send private message
enrico-sorichetti

Superior Member


Joined: 14 Mar 2007
Posts: 10886
Location: italy

PostPosted: Wed Jun 13, 2012 6:43 pm
Reply with quote

I wonder....

after 20 years ( according to the TS ) the criteria used for the original design should be reviewed

the amount of data processed in a reasonable healthy organization must have significantly grown after 20 years.

if I had money to spend I would bet on an internal table overflow!
and consequent program/working storage corruption
Back to top
View user's profile Send private message
View previous topic :: :: View next topic  
Post new topic   Reply to topic View Bookmarks
All times are GMT + 6 Hours
Forum Index -> ABENDS & Debugging Goto page 1, 2  Next

 


Similar Topics
Topic Forum Replies
No new posts Call an hlasm from REXX in batch and ... CLIST & REXX 4
No new posts What is the most complex abend you've... Mainframe Interview Questions 8
No new posts ISAM and abend S03B JCL & VSAM 10
No new posts Need help to resolve a hard edit COBOL Programming 8
This topic is locked: you cannot edit posts or make replies. Need help to resolve a hard edit COBOL Programming 4
Search our Forums:

Back to Top